Keyboard Shortcuts
Likes
- H390-Vm
- Messages
Search
This group is for all folks running the original IBM VM/370 Release 6 operating system (or later (e.g. VMTCE (Community Edition)) on Hercules. Like the other early IBM operating systems this version has always been in the public domain and so can be freely distributed. The base version as supplied by IBM is lacking in many facilities. IBM solved this by providing additional extension products which were licensed and so are not available. There are however many user enhancements available which can be installed. In addition, in order to get users up and running quickly updated "releases" of VM/370 included the most popular updates are available for download, so novices can start to learn VM without having to delve into the system internals. It is intended that this wiki will provide information on the base release and these updates.
The available versions are here :-
?
?
Re: memset help
On 4/14/20 11:32 PM, Peter Coghlan wrote:
I wonder could this have been the COBOL compiler abusing MVCL instructionsI was told back in the 1980s that for performance reasons MUSIC moved 4096 bytes of data via a series of MVC commands in place of one MVCL. -- Drew Derbyshire "All right, Mr. DeMille, I'm ready for my close-up." -- "Sunset Blvd.," |
Re: memset help
On Wed, Apr 15, 2020 at 08:02 AM, adriansutherland67 wrote:
it takes me about 5 mins to write a single line of S/370 assembler not counting debugging!I do indeed loath assembler - especially as I use the "infinite number of monkeys" method - however I have managed to detect the stack running out of space. Big day! A |
Re: memset help
开云体育Adrian, ? I made a coding error in all three of the MEMSET texts that I sent – in all of them I have this assembler instruction to declare register 15 as the code base register: ? ???????? USING 15,*????????? USE R15 AS BASE REG ? Wrong order of the operands.? Please change this to the following in all three versions: ? ???????? USING *,15????????? USE R15 AS BASE REG ? The syntax is “ USING BASEADDR,BASEREG[,BASEREG]* “ ? Apologies for the mistake. ? Peter ? From: [email protected] <[email protected]> On Behalf Of adriansutherland67
Sent: Wednesday, April 15, 2020 3:02 AM To: [email protected] Subject: Re: [h390-vm] memset help ? All interesting ... and I will try out each candidate and report back. It will be tested only on Hercules so in one sense not a fair test. On the other hand we could argue that that Hercules is S/370 done well ... meaning I agree with a comment that if a CPU manufacturer defines a bulk move command they should implement it fast! |
Re: memset help
toggle quoted message
Show quoted text
-----Original Message----- With my pedants' hat on, it actually generates "normal" Assembler that is assembled by the XF assembler on VM or MVS. That is why we get "normal" object files which can be loaded with the VM loader or the OS Linkage Editor. I have tried feeding the assembler into Assembler G with poor results. I haven't tried Assembler H... .... its pretty easy to produce a GCC that compiles 370 code on Windows/Linux. After all that’s how I built the first GCCCMS. Getting all the assembler to CMS to compile it was the fun part.. Harold GrovesteenDave |
Re: memset help
On Wed, 2020-04-15 at 13:24 +0100, Steven Fosdick wrote:
?Yes. ?I did accomplish this and it is documented with scripts in the SATK. ?It used GNU as as the assembler for stand alone, aka bare metal, coding on Hercules. ?After literally years of work on that, it just did not work well enough and I changed to a new toolset that is part of the project. However, the key difference between GCC on VM and other operating systems supported by Hercules and GCC as used on Linux is the output format. ?GCC typically generates ELF object module files. ?The GCC on VM generates mainframe object modules. ?Huge difference and a fundamental reason this GCC is used with the operating systems that run on Hercules. Harold Grovesteen |
Re: memset help
On Wed, 2020-04-15 at 00:02 -0700, adriansutherland67 wrote:
?I remind everyone working with GCC on VM/370 that it is a port of the old i370 architecture version of GCC. ?The modern optimizations are likely to be quite limited. ?This GCC is not the same version of GCC that is used today. The GCC group had decided to remove i370 from the product because of its lack of use or development. ?It was rescued and modified to run on the various mainframe operating systems. Whether this is a consideration or not, there are other versions of this GCC that are essentially the same. ?The GCC that runs on other operating systems are from the same source code. ?I do not know enough of the inner workings to know where the operating system dependent code exists (which is different for each) and what is common. If it is important that this altered version of GCC on VM/370 is source compatible, care needs to be taken as to what is altered. Just a heads up, but I think this should be understood. Harold Grovesteen |
Re: memset help
开云体育Hi Steven,no, not that I know of. But it is good to remember because assumptions that it will end during a scheduler slice can be false. That is, in z/OS, where I, long time ago, came across errors based on that assumption, some my own. I don’t know about VM. 搁别苍é.
|
Re: memset help
On Tue, 14 Apr 2020 at 23:14, rvjansen@... <rvjansen@...> wrote:
Is that part of the performance issue? I mean indirectly, of course. I don't really know the 370 architecture but I have come across a similar move instruction, LDIR on the Z80 that is rather slow. That's a relative term because it's still faster than writing the loop yourself. In the case of LDIR there is a non-repeating version (LDI) which loads the value whose address is in register HL, stores it to the address in register DE, increments HL and DE and decrements BC. The repeating version works by adding a final step of testing BC and, if that is not zero, it forces the program counter back to the address of the LDIR instruction. That means the LDIR instruction is now re-fetched, re-decoded and re-executed and the process repeats for each byte moved until BC becomes zero. It is interruptible and the interrupt is serviced just before the LDIR instruction is re-fetched so it would push the values for BC, DE, HL from halfway through the move, service the interrupt, then pop those values and carry on where it left off. Back to memset on 370, it's great to have an efficient implementation in the library but having the compiler inline it would make it even faster. Apart from removing the call overhead the compiler may know the length already, i.e. it may be a constant expression, and can thus avoid tests and having two loops, one for 256 bytes and one for the remainder. It should also know if the data are aligned. I know gcc can and does do this, at least on X86 - here's an example, first the C: #include <string.h> extern void do_something(char *x); int main(int argc, char *argv[]) { char x[45]; memset(x, 0, sizeof(x)); do_something(x); return 0; } I have deliberately declared an external function to received the string so the compiler does not detect dead code and remove it altogether. Here's the result to compiling to assembler: .file "memstst.c" .text .section .text.startup,"ax",@progbits .p2align 4 .globl main .type main, @function main: .LFB0: .cfi_startproc subq $72, %rsp .cfi_def_cfa_offset 80 pxor %xmm0, %xmm0 movq %fs:40, %rax movq %rax, 56(%rsp) xorl %eax, %eax movq %rsp, %rdi movl $0, 40(%rsp) movq $0, 32(%rsp) movb $0, 44(%rsp) movaps %xmm0, (%rsp) movaps %xmm0, 16(%rsp) call do_something@PLT movq 56(%rsp), %rax xorq %fs:40, %rax jne .L5 xorl %eax, %eax addq $72, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 ret .L5: .cfi_restore_state call __stack_chk_fail@PLT .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Arch Linux 9.3.0-1) 9.3.0" .section .note.GNU-stack,"",@progbits Note the absence of a call to memset. So the core of this is using a zeroed 16 byte wide register: movaps %xmm0, (%rsp) movaps %xmm0, 16(%rsp) for the lower part of the space and a selection of other lengths for the remainder: movl $0, 40(%rsp) movq $0, 32(%rsp) movb $0, 44(%rsp) I did wonder about the possibility of setting up gcc as a cross-compiler but that doesn't seem trivial to do. Steve. |
Re: memset help
开云体育Translate - like the rexx function but then in assembler.Anecdote: I once got a performance problem on my desk. It did a character translation, in C. It looped through a string, and replaced character by character. I noticed it had one flaw: it did not stop when it found the right character, but went on to 255 - for every character. (true story!) Next day I replaced it by an assembler version with two tables and a translate instruction. The application flew. (this was on a 486, with XLATB, but it is the same thing. These things are hard to figure out for a compiler). (this is also why we need drivers for e.g. SSL to use cryptographic assist/acceleration instructions - the compiler won’t do that for you - that would be at bit like clippy telling you “ah, I see you are using a bubblesort, let me replace that by a blockset search"). 搁别苍é. ?
|
Re: memset help
开云体育Peter, ? When I first started work, I worked in small insurance UK insurance company. We had a Honeywell H3200, which basically ran IBM1401 code, but with “Improved io”… We had one small program that was very slow. It was Cobol. We replace lots of conditional performs with “ALTER” statements. It ran about 100 times faster but was totally understandable. I doubt any change on Hercules will yield such a performance increase. As Michael Jackson said ? “I am on a world tour. My tour is in pursuit of exceptional beer. That's why they call me the Beer Hunter.” ? That’s a pursuit I try and emulate to the best of my ability…. … that was by the way Michael the Beer Hunter ? ? .. another Michael Jackson said, in “Principles of Program Design” ? ? there are two rules for optimization… ?
? I think we are still at Rule 1. ?You can’t optimize something that doesn’t work….. … for you to run “Strobe” it had to work… ? Dave p.s. I believe that some folks think Michael Jackson the singer and dancer was most famous. Possibly this is true, but I still prefer the works of the two above.. ? From: [email protected] <[email protected]> On Behalf Of pjfarley3
Sent: 15 April 2020 07:12 To: [email protected] Subject: Re: [h390-vm] memset help ? Dave, ? I’m acutely aware of that IBM advice, but in the last two decades I have also been involved in multiple rounds of “MIPS-saving” projects when management wanted application teams to “do more with less” (i.e. increase performance / reduce batch execution times without buying more/bigger hardware). ? The most effective solutions in those projects were finding the “CPU hot spots” (the Strobe product was always particularly effective for such efforts), and more times than not the worst “hot spots” turned out to be MVCL and sometimes CLCL instructions in compiler-generated COBOL code, and the second worst “hot spots” were the COBOL INITIALIZE verb for large heterogenous structures used inside of a loop, or at every invocation of a subroutine. ? Finding ways to hoist long-length moves/compares and INITIALIZE verbs out of business processing loops usually yielded the best/largest reductions of CPU and elapsed time used.? Second-best solutions were complicated and usually application-specific adjustments to business processing rules (along the lines of “the fastest I/O or business process is the one not done”). ? But I digress from the subject at hand.? You and Harold are right here, for Hercules fewer instructions will yield better performance, so if replacement of the C version of MEMSET would dramatically improve performance for C programs then the MVCL solution will probably work best under Hercules. ? Peter ? From: [email protected] <[email protected]> On Behalf Of Dave Wade ? Peter, I tend not to worry about performance, but any way I believe that IBM current advice is don’t try and instruction timings disappeared from the manuals yonks ago… For example early 9370 were especially bad on non-aligned instructions, but modern hardware doesn’t give a jot. I suppose it might be nice to try it on the P390 but still trying to re-write a web sit … Dave ? From: [email protected] <[email protected]> On Behalf Of pjfarley3 ? Because I did not remember that MVCL was available at the 370 architecture level (and failed to go look it up) and because MVCL has mostly been quite slow at the real-iron hardware level. ? Of course, Hercules might do MVCL relatively better than the real iron. ? MEMSETCL.txt using MVCL attached. ? Peter ? > -----Original Message----- > From: [email protected] <[email protected]> On Behalf Of Peter Coghlan > Sent: Tuesday, April 14, 2020 3:33 AM > To: [email protected] > Subject: Re: [h390-vm] memset help > <Snipped> > Why don't you suggest using an MVCL instruction? > > Regards, > Peter Coghlan. -- |
Re: memset help
开云体育It was for a long time already that compilers made faster code because the human coders kept on choosing storage-to-storage instructions where the register based versions were faster.But in cases like TR and TRT the assembler programmer always wins. 搁别苍é. On 15 Apr 2020, at 09:02, adriansutherland67 <adrian@...> wrote:
|
Re: memset help
All interesting ... and I will try out each candidate and report back. It will be tested only on Hercules so in one sense not a fair test. On the other hand we could argue that that Hercules is S/370 done well ... meaning I agree with a comment that if a CPU manufacturer defines a bulk move command they should implement it fast!
One thing, modern compilers generally produce faster code than a human assembler programmer. This is because both of front end optimisations (e.g. reassigning / calculating values that have not changes), and backend optimisation based on loop unrolling, and instruction reordering based on CPU pipelines etc. This is why LLVM is becoming the one toolchain to rule them all ... even IBM works with them to ensure mainframe CPU internals are fully leveraged. A |
Re: memset help
I wonder could this have been the COBOL compiler abusing MVCL instructions in situations where they were not the appropriate instructions to use? Perhaps instructions such as MVCL would be expected to be "hot spots" because they can deliver a relatively large amount of work for a single instruction? Or is it that implementations of this instruction were sometimes poorer than they ought to be and they were really not delivering bang for buck? Regards, Peter Coghlan. |
Re: memset help
开云体育Dave, ? I’m acutely aware of that IBM advice, but in the last two decades I have also been involved in multiple rounds of “MIPS-saving” projects when management wanted application teams to “do more with less” (i.e. increase performance / reduce batch execution times without buying more/bigger hardware). ? The most effective solutions in those projects were finding the “CPU hot spots” (the Strobe product was always particularly effective for such efforts), and more times than not the worst “hot spots” turned out to be MVCL and sometimes CLCL instructions in compiler-generated COBOL code, and the second worst “hot spots” were the COBOL INITIALIZE verb for large heterogenous structures used inside of a loop, or at every invocation of a subroutine. ? Finding ways to hoist long-length moves/compares and INITIALIZE verbs out of business processing loops usually yielded the best/largest reductions of CPU and elapsed time used.? Second-best solutions were complicated and usually application-specific adjustments to business processing rules (along the lines of “the fastest I/O or business process is the one not done”). ? But I digress from the subject at hand.? You and Harold are right here, for Hercules fewer instructions will yield better performance, so if replacement of the C version of MEMSET would dramatically improve performance for C programs then the MVCL solution will probably work best under Hercules. ? Peter ? From: [email protected] <[email protected]> On Behalf Of Dave Wade
Sent: Tuesday, April 14, 2020 7:17 PM To: [email protected] Subject: Re: [h390-vm] memset help ? Peter, I tend not to worry about performance, but any way I believe that IBM current advice is don’t try and instruction timings disappeared from the manuals yonks ago… For example early 9370 were especially bad on non-aligned instructions, but modern hardware doesn’t give a jot. I suppose it might be nice to try it on the P390 but still trying to re-write a web sit … Dave ? From: [email protected] <[email protected]> On Behalf Of pjfarley3 ? Because I did not remember that MVCL was available at the 370 architecture level (and failed to go look it up) and because MVCL has mostly been quite slow at the real-iron hardware level. ? Of course, Hercules might do MVCL relatively better than the real iron. ? MEMSETCL.txt using MVCL attached. ? Peter ? > -----Original Message----- > From: [email protected] <[email protected]> On Behalf Of Peter Coghlan > Sent: Tuesday, April 14, 2020 3:33 AM > To: [email protected] > Subject: Re: [h390-vm] memset help > <Snipped> > Why don't you suggest using an MVCL instruction? > > Regards, > Peter Coghlan. -- _._,_._,_ |
Re: memset help
开云体育Peter, I tend not to worry about performance, but any way I believe that IBM current advice is don’t try and instruction timings disappeared from the manuals yonks ago… For example early 9370 were especially bad on non-aligned instructions, but modern hardware doesn’t give a jot. I suppose it might be nice to try it on the P390 but still trying to re-write a web sit … Dave ? From: [email protected] <[email protected]> On Behalf Of pjfarley3
Sent: 14 April 2020 23:24 To: [email protected] Subject: Re: [h390-vm] memset help ? Because I did not remember that MVCL was available at the 370 architecture level (and failed to go look it up) and because MVCL has mostly been quite slow at the real-iron hardware level. ? Of course, Hercules might do MVCL relatively better than the real iron. ? MEMSETCL.txt using MVCL attached. ? Peter ? > -----Original Message----- > From: [email protected] <[email protected]> On Behalf Of Peter Coghlan > Sent: Tuesday, April 14, 2020 3:33 AM > To: [email protected] > Subject: Re: [h390-vm] memset help > <Snipped> > Why don't you suggest using an MVCL instruction? > > Regards, > Peter Coghlan. -- ? |
Re: memset help
开云体育Because I did not remember that MVCL was available at the 370 architecture level (and failed to go look it up) and because MVCL has mostly been quite slow at the real-iron hardware level. ? Of course, Hercules might do MVCL relatively better than the real iron. ? MEMSETCL.txt using MVCL attached. ? Peter ? > -----Original Message----- > From: [email protected] <[email protected]> On Behalf Of Peter Coghlan > Sent: Tuesday, April 14, 2020 3:33 AM > To: [email protected] > Subject: Re: [h390-vm] memset help > <Snipped> > Why don't you suggest using an MVCL instruction? > > Regards, > Peter Coghlan. -- ? |
Re: memset help
toggle quoted message
Show quoted text
On 15 Apr 2020, at 00:07, Bob Polmanter <wably@...> wrote:
|