O_SYNC and filesystem updating

rcodi at yabbie.UUCP rcodi at yabbie.UUCP
Thu Feb 12 13:50:16 AEST 1987


In article <12946 at sun.uucp>, guy%gorodish at Sun.COM (Guy Harris) writes:
> No.  The only thing it does is guarantee 1) that all writes to the
> data blocks of the file are done synchronously, 2) all writes of
> indirect blocks of the file are done synchronously, and 3) the inode
> is updated synchronously after every write.

This O_SYNC feature sounds like a humungous overkill, and not
very well thought out.  I'll bet its there because it could be added
to most kernels in less than half an hour (by basically changing
calls to bdwrite() into calls to bwrite() if O_SYNC is set).

It is not sensible to ensure that *every* write be syncronously updated
to disk.  To do this would incurr enormous disk overheads.  Picture
doing a write of 2 bytes here and then 3 bytes there -- both in the same
disk block -- it requires 2 disk writes to do it.  If it wasn't important
that the 2 bytes be syncronously written before the 3 bytes then why
go to the effort to do it?  If it was important, then fine, we must live
with it.  The thing is, that most times it *isn't* necessary.

For an application that requires that at "certain times" the disk
image must be correct, 4BSD's fsync() call is much more sensible, and
only incurs overhead when you call it -- you can do an unlimited
number of writes to the buffer cache between fsync() calls, and
your application will fly during that time.  The only time it will
be physically written to disk is when update does a sync() or the
system needs the buffers for something else or fsync() is called
by the user.

I suppose that you could open the file twice with SVR[23], one
with O_SYNC and one without to acheive a similar effect, but then you 
have all sorts of problems if you use a buffered package such as stdio 
to do I/O on them, unless you use the O_SYNC fd for just flushing
blocks (even that won't work properly unless you rewrite all the
blocks you wrote with the non O_SYNC fd).

Ian D.



More information about the Comp.unix.wizards mailing list