On Tue, 2020-04-14 at 08:32 +0100, Peter Coghlan wrote:
As an academic exercise, two assembler alternatives attached as
text files.
?
MEMSET.txt uses STC/MVC to set 256 bytes at a time and then STC and
variable-length MVC to set the remainder less than 256 bytes, with
optimizations not to set anything unneeded if N is zero or an
integer
multiple of 256.
?
??
?
MEMSET16.txt uses 2 more registers than MEMSET.txt but replaces the
length=255 MVC instruction with multiple STore operations (loop 16
times
storing 4 bytes at a time 4 times for each loop around).? MVC was
notoriously slow on some real-iron IBM hardware models.
?
??
?
Not sure which technique would have been faster on any real 360-era
iron,
but there could be differences in the Hercules MVC vs STore
operations that
may make the STore solution faster (or not).
?
?
These are untested, so I could have errors in my coding.?
Corrections and
improvements welcome.
Why don't you suggest using an MVCL instruction?
Regards,
Peter Coghlan.
Regardless of the various performance characteristics that may have
existed on real 370-era hardware, VM/370 runs today on Hercules. ?So
what makes sense? ?The fewer instructions the better. ?I would
recommend use of the MVCL instruction as well.
Remember that when using C on Hercules one compiles the C to machine
instructions that again are interpreted by another C program,
Hercules.?
This is not to say that any part of GCC should or should not
incorporate assembler nor am I saying that performance should not be
tested.
I am saying that, in general, fewer instructions the better. ?And we
all know that this is particularly true for loops.
MVCL allows the loops to be embedded within the interpreter,
eliminating them as part of the GCC implementation.
Harold Grovesteen