*nix performance

Sat Oct 22 02:49:19 AEST 1988

This is a reply to points Brian Cuthrie made on my reply to some
questions about DMA.   

We seem to have a different background - I'm more software and you seem
to be more hardware-oriented.  I yield (in general) to your hardware
knowledge so perhaps you can educate me out of some ideas I have picked
up over the years.

Points in order:
   1.  I have seen 80386's advertised at 25 MHz so I don't know why
you underscore that with a question ?  Since they are all non-IBM
clones, I have presumed they used the basic AT motherboard with the
same DMA chip from 1981, which I have been told by someone who makes
PC I/O boards (designs and builds) for a living have never been
built for over 5 MHz since the design isn't worth continuing.      
   2.  My understanding of why I as a software engineer use DMA is
that they are useful to make I/O a non-blocking process.  Eg, I want
to dump some stuff to tape but I don't want to halt my program to wait
for it to complete.  So, I set up the DMA to some memory-mapped I/O
board and then go on with other processing.  The DMA process takes
EVERY OTHER CYCLE ( which the cpu often can't use) while the cpu
continues on with other work.  When the I/O board's buffer is full, it
refuses further DMA transfers (or else I've set up the DMA to
transfer only N bytes, depending on the architecture).  The DMA
chip generates an interrupt informing me that the transfer is completed
 and I either restart it if I want more transfers, or do something based
on my knowledge that my I/O is done.  In practice, one program usually
has to block on I/O anyway so that is a good time for the OS (like UNIX)
to suspend it and give the cpu to another program while the DMA runs on
EVERY OTHER CYCLE.  The way one does this is usually based on interrupts
from the DMA or I/O board.
   3.  My understanding of the reason DMA isn't used much on the PC is
that to get pass-through mode going on the PC's DMA requires that it
do one read, transfer the byte to it's internal write buffer, and then
write it to memory.  Thus 2 cycles which is why I said the string moves
were as fast, since the 8088 cpu has internal look-ahead cache of
several bytes and so can overlap read and write somewhat.  Since the
I/O pins are multiplexed  this may not mean much in practice.
   4.  We have some confusion about the word re-entrancy.
My definition of it applies only to programs, not to individual hardware
instructions.  Here's an example of what I mean.  A utility (say a
terminal I/O routine) is used by a lot of other programs.  Rather than
make a copy of it for each user, or block all users but one from using
it at a time, the utility is made re-entrant.  That is program A enters
it, writes a byte to a terminal and while it is waiting for the byte
to go out the port, the OS allows program B to enter the utility.
Obviously, B can't be allowed to trash the registers, I/O address, etc.,
of A, so the utility (or OS) saves the state of A in some buffer, allows
B to process long enough to do something useful (based on time, or I/O)
and then saves B's registers, etc., restoring A's registers etc., to
allow A to continue sending, or receiving.  The reason interrupts
figure in all this is that usually (but not necessarily) this
re-entrancy is provided at intervals signaled by a clock interrupt,
or an I/O interrupt.  DOS was not written to provide re-entrancy, so to
provide it oneself, one must save the surrent segment registers in a
very precise and careful way, reset the stack and other segment
registers in an equally careful way (the DOS stack will only work
for you 95% of the time) and then reverse the process upon exiting.
The first time I did this, it took me many late night weeks, but after
the first time it takes a day at worst.
   5. The 8086 came out the same year as the 68000.  Both companies had
the same technologies to work with but Motorola chose a linear address
space, and general purpose address and data registers.  Programs written
for the 68000 can run on a 68030 without change or impingeing on a
program designed for a 68030.  To run an 8086 program on an 80286 (which
does not have a linear address space) requires major effort and you
can only run one 8086 program with 80286 programs running in protected
mode.  Motorola had to implement some hardware instructions in software
traps for a little while, but a software engineer didn't have to worry
about it and life was good and stayed good.  The 80286 was Intel's idea
of what to do next which shows that they hadn't picked up on the linear
address space and general purpose registers until later.  there's
usually a trade-off in micro-code between time and space, so they
might have tried for more cycles but cleaner instruction set.

Regards,
Michael Goldman