Asynchronous I/O under UNIX

Larry McVoy lm at snafu.Sun.COM
Fri Dec 29 18:35:46 AEST 1989


peterson at crash.cts.com (John Peterson) writes:
>   My collegues and I have worked out a rough sketch of a way of doing
>asynchronous I/O. One would fork off a copy of your process, the child
>would 'nap' until an I/O request came from the parent. Upon receipt of
>an I/O request, the child goes off and issues a synchronous I/O request
>like one ordinarily does, and then set a flag of some sort when the I/O
>has completed. The data to be moved would be stored in memory accessible
>to the parent and child processes, probably using System V shared memory.

Yeah, this will work.  A couple things to note:

(1) This is a bad idea for writes, especially under SunOS 4.x.  See
    (2), (3), (4) below.

    It's a great idea for reads.  Especially if you do it right.  I would
    keep a pool of processes around - i.e., don't do a fork per read,
    do a fork iff you haven't got someone hanging around (forks are not
    cheap, contrary to popular opinion).  Also, let read ahead work for
    you. Oh, yeah, do yourself a favor and valloc() your buffers rather
    than allocating space off the stack.  It won't help you now, but
    I'm looking at ways of making I/O go fast and one game I can play
    will only work if you give me a page aligned buffer.  And use mmap()
    if you can.  It's much nicer than sys5 shm and it's in 5.4.

(2) Writes are already async, especially so on SunOS 4.x.  I think it
    is limited by segmap, which is around 4megs.  On buffer cache Unix's,
    you'll be limited to the size of the buffer cache (no kidding) which
    is fairly small, around 10-20% of mem.

(3) Having lots of outstanding writes doesn't buy you very much.  In fact,
    it can really lead to weird behavior.  Everyone should know that (on
    simple controllers, at least) writes go through disk sort.  Including
    synchronous writes (NFS is a heavy user of sync writes).  Well, given 
    that you go through disk sort, you won't ever get to starvation (i.e.,
    a buffer will get written out) but you can get to something I call 
    being very hungry.  Suppose you have a disk queue that starts out
    with requests for cyl 0 and 100.  Then suppose you do a series of
    writes onto cyls >=0 but < 100.  The buffer waiting for cyl 100
    will wait until all of those i/o's (that came in after it did)
    complete.  That buffer waiting for 100 is in the "hungry" state.

    Fortunately, this doesn't happen very often.  Traces I've taken indicate
    that disk requests (due to the BSD fs) are nicely grouped.  You have to
    have lots of busy processes doing unrelated I/O to get into this state.
    I suspect the async i/o could hit this problem.

(4) Those outstanding writes cost memory.  You have to grab the users data
    before saying "I'm done".  SunOS 4.x claims this is a feature "Our
    writes finish faster than your writes, especially for big ones" seems
    to be the party line.  Well, for what I do this is a waste of mem
    so I run a hacked version of ufs that limits outstanding writes
    (mail me if you have src and want to try this - it's trivial to
    implement and tunable.  I'd be interested in outside comments).

(4) Reads could work really well.

	 What I say is my opinion.  I am not paid to speak for Sun.

Larry McVoy, Sun Microsystems                          ...!sun!lm or lm at sun.com



More information about the Comp.unix.wizards mailing list