¿ªÔÆÌåÓý

Re: memset help


 

¿ªÔÆÌåÓý

As an academic exercise, two assembler alternatives attached as text files.

?

MEMSET.txt uses STC/MVC to set 256 bytes at a time and then STC and variable-length MVC to set the remainder less than 256 bytes, with optimizations not to set anything unneeded if N is zero or an integer multiple of 256.

?

MEMSET16.txt uses 2 more registers than MEMSET.txt but replaces the length=255 MVC instruction with multiple STore operations (loop 16 times storing 4 bytes at a time 4 times for each loop around).? MVC was notoriously slow on some real-iron IBM hardware models.

?

Not sure which technique would have been faster on any real 360-era iron, but there could be differences in the Hercules MVC vs STore operations that may make the STore solution faster (or not).

?

These are untested, so I could have errors in my coding.? Corrections and improvements welcome.

?

A C solution doing the same sort of thing as MEMSET16.txt using casts to INT and taking advantage of the C compiler¡¯s optimization and code generation could be even faster.? Something along these lines doing 16 bytes at a time (could obviously do 32 or 64 each loop as well, but 16 gives you the picture):

?

void *memset(void *s, int c, size_t n)

{

? ? size_t x;

??? int cccc = c + (c << 8) + (c << 16) + (c << 24);

?

? ? for (x = 0; x < (n / 16); x+=16)

? ? {

? ? ? ? *((int *)((char *)s + x???? )) = cccc;

? ? ? ? *((int *)((char *)s + x + ?4)) = cccc;

? ? ? ? *((int *)((char *)s + x + ?8)) = cccc;

? ? ? ? *((int *)((char *)s + x + 12)) = cccc;

? ? }

? ? for (; x < n; x++)

? ? {

? ? ? ? *((char *)s + x) = (unsigned char)c;

? ? }

? return (s);

}

?

HTH

?

Peter

?

From: [email protected] <[email protected]> On Behalf Of adriansutherland67
Sent: Monday, April 13, 2020 1:33 PM
To: [email protected]
Subject: [h390-vm] memset help

?

Folks

Currency GCCLIB has memset() as

void *memset(void *s, int c, size_t n)

{

? ? size_t x = 0;

?

? ? for (x = 0; x < n; x++)

? ? {

? ? ? ? *((char *)s + x) = (unsigned char)c;

? ? }

? ? return (s);

}

Slow ...

Is anyone willing to do an optimised S/370 assembler version?

If people want a competition, I am happy to benchmark: needs to win both for small memory areas and for large memory areas :-)

Thanks!

Adrian

Join [email protected] to automatically receive all group messages.