memmove() is a very, very fast non-overlapping memory copy, but
since it does not save certain registers it uses, calling from
GCC can be tricky.  GCC does a fabulous job of grinding all it
can out of the available registers; MSC doesn't make very good
use of registers, er, I mean keeps a lot of registers for scratch
use.  I've never seen 386 MSC use dx except as a side effect of
multiply.  I guess that's what somebody (Gates?) meant by "tuning"
one compiler to meet many needs being adequate. Enough philosophy,
though.

I am using -fcall-used-{ax,bx,cx,dx} since in at least one
case (ecufkey.c display_keyset()), rigorous optimization and
really righteous register usage caused a call to strlen() to
screw up since cx is not preserved by strlen.

strlen:         push    edi
strlen+0x1:     mov     edi,[esp+0x8]
strlen+0x5:     xor     eax,eax               <-- goodbye ax 
strlen+0x7:     mov     ecx,0xffffffff        <-- goodbye cx
strlen+0xc:     repne   scasb
strlen+0xe:     inc     ecx     
strlen+0xf:     mov     eax,ecx 
strlen+0x11:    not     eax
strlen+0x13:    pop     edi
strlen+0x14:    ret

memmove:        push    ebp
memmove+0x1:    mov     ebp,esp
memmove+0x3:    mov     edx,edi               <-- move rather than push
memmove+0x5:    mov     ebx,esi               <-- move rather than push
memmove+0x7:    mov     esi,[ebp+0xc]
memmove+0xa:    mov     edi,[ebp+0x8]
memmove+0xd:    mov     eax,edi               <-- goodbye ax 
memmove+0xf:    mov     ecx,[ebp+0x10]        <-- goodbye cx (OK w/MSC)
memmove+0x12:   jcxz    memmove+0x43
memmove+0x14:   cmp     edi,esi
memmove+0x16:   jbe     memmove+0x2e
memmove+0x18:   mov     eax,esi
memmove+0x1a:   add     eax,ecx
memmove+0x1c:   cmp     edi,eax
memmove+0x1e:   jae     memmove+0x2e
memmove+0x20:   mov     eax,edi
memmove+0x22:   add     esi,ecx
memmove+0x24:   add     edi,ecx
memmove+0x26:   dec     esi
memmove+0x27:   dec     edi
memmove+0x28:   std
memmove+0x29:   rep     movsb
memmove+0x2b:   cld
memmove+0x2c:   jmp     near memmove+0x43
memmove+0x2e:   mov     eax,edi
memmove+0x30:   test    Byte Ptr 0x1f:0x1,al
memmove+0x36:   je      memmove+0x3a
memmove+0x38:   movsb
memmove+0x39:   dec     ecx
memmove+0x3a:   shr     ecx,1
memmove+0x3c:   rep     movsw
memmove+0x3f:   adc     ecx,ecx
memmove+0x41:   rep     movsb
memmove+0x43:   mov     esi,ebx
memmove+0x45:   mov     edi,edx
memmove+0x47:   pop     ebp
memmove+0x48:   ret
memmove+0x49:   nop
memmove+0x4a:   nop
memmove+0x4b:   nop

---------------------------------------------------------------------

The memmove in theis directory is written in x86 assembler
and is courtesy of Chip Salzenberg with some help from
Roger Cornelius.  I hacked out the .asm versions.

Chip Salzenberg:
> SCO's memmove() function in the 3.2v2 development system libc.a
> library has an insidious bug: it trashes the EBX register.  This
> register is used to hold register variables.  I suspect the bug crept
> in due to a simple-minded translation of a '286 routine, because on
> the '286, BX need not be preserved.
> 
> The fix is to replace memmove.o in /lib/libc.a with the version
> included below.  Note that if you use profiling, you must also put a
> profiling version of memmove() in /usr/lib/libp/libc.a.
> 
> To assemble the non-profiling version:
> 
>     as -m -o memmove.o memmove.s

(How strange that this bug has gone unnoticed for so long...)

Roger Cornelius <rac@sherpa.UUCP> :
> The following will build the profiling memmove.o correctly:
> 
> m4 profile.s memmove.s > memmove_p.s    # order is important!
> as -o memmove_p.o memmove_p.s
> 
> Note also that manually running memmove.s through m4 (instead of
> using as -m) before assembling will also save 100 or so bytes in the
> .o file for the non-profiling version.
