羲堁极郤

Re: memset help


 

Question:
每?What should be used to move or clear large blocks of data?

∫?Answer:

∫?There are several ways to move or clear a large block of storage provided in the z/Architecture

  1. One MVCL instruction

  2. Loops of MVCs to move data

  3. Loops of MVC <Len>,<Addr>+1,<Addr> or XC <Len>,<Addr>,<Addr> to pad/clear an area

∫?As discussed on page 31 titled ※MOVE LONG instructions§,?每?MVCLisimplementedthroughmillicoderoutines
每?Millicodeisafirmwarelayerintheformofverticalmicrocode

??Incurs some overhead in startup, boundary/exception checking, and ending
每?MVCLfunctionimplementedusingloopsofMVCsorXCs
每?Millicodehasaccesstospecialnear-memoryenginesthatcandopage-alignedmoveandpage-alignedpadding

  • Can be faster than dragging cache lines through the cache hierarchy

  • However, the destination data will NOT be in the local cache

    ∫?As such, the answer is ※it depends§ as there is no one answer to all situations. There are many factors to consider?每?Willthetargetbeneededinlocalcachesoon?

??Then moving/padding ※locally§ will be better by using MVCs or XCs?每?Isthesourceinlocalcache?

??Then moving/padding ※locally§ may be better by using MVCs, or XCs?每?Howmuchdataisbeingprocessed?

??The more data you are required to process, the more you may benefit from using MVCL due to special hardware engines used by millicode

每?Experimentationis,therefore,highlyadvised


On Wed, Apr 15, 2020 at 6:00 PM Tony Harminc <tharminc@...> wrote:
On Wed, 15 Apr 2020 at 17:47, Joe Monk <joemonk64@...> wrote:

> No doubt ... but MVCL/E are still millicode instructions. MVC is a hardware instruction.

Little is that simple these days...

> MOVE characters (MVC)
> 每 If <=16 bytes, it is cracked into separate load and store 米ops
> 每 If > 16 bytes, it is handled by a hardware sequencing logic inside the LSU
> 每 If the destination address is 1 byte higher than the source address
> (and they overlap), it is special cased into hardware as a 1-byte
> storage padding function (with faster handling)
> 每 If the destination address is 8 byte higher than the source address
> (and they overlap), it is special cased into hardware as a 8-byte
> storage padding function (with faster handling)
> 每 If other kinds of address overlaps, it will be forced into millicode
> to be handled a byte at a time

> MOVE LONG
? A special engine is built per CP chip for aligned copying or padding
functions at a page granularity
每 The page-aligned copying or padding is done ※near memory§, instead
of through caches, if
? Not executed inside a transaction
? Padding character specified is neither X*B1* nor X*B8*
? A preceding NIAI instruction does not indicate that the storage data
will be used subsequently
? The operands must not have an access exception
? Length >= 4K bytes
? For moves: source and destination addresses are both 4K-byte aligned
? For padding: destination address is 4K-byte aligned
每 Otherwise, the move process will operate through the caches (L1, L2#)
每 Note that the evaluation is revised every unit-of-op
每 For padding, even if starting address is not aligned, millicode pads
in cache to the first 4K-byte boundary, then uses ※near
memory§ pad engine for the next aligned 4K-byte pages until the
remaining length is less than 4K bytes. After that,
padding is done in cache again
? Near-Memory engine usage is best when the amount of data involved is
large and the target memory is not to be
immediately consumed in subsequent processes
每 Since the special engine is shared within a CP chip, contention
among processors is possible
每 Such contention is handled transparently by millicode and additional
delay may be observed



Tony H.



Join [email protected] to automatically receive all group messages.