Recovery possible from signal aborted write(2) ?

Wed Sep 26 14:23:22 AEST 1984

> Using systems 3 or 5, is there any way of deterministically restarting
> output to a tty that has been interrupted by a signal.
> 
> Scenario:
> 	Screen managing type program generating lots of escape sequences
> 	also is subject to receiving several signals a minute.  The
> 	program wants to buffer its output for effeciency, so it does
> 	write(2)s of say 50-100 characters at a time.  When it receives
> 	a signal, however, the write returns a -1, rather than indicating
> 	the number of characters actually output.  This makes it very
> 	hard to guarantee screen integrity while buffering output.
> 
> One obvious solution is to forgo buffering.  Any other suggestions?

You could disable interrupts, but that's all you can do in System III or
System V.  4.1BSD, 4.2BSD, and several other systems have added a "hold"
action to "signal" (yes yes, I know, 4.2 actually replaced the whole
signal mechanism) like the "ignore" action, except that it "holds" any
signals that come in while that action is on rather than discarding them.
Then, when the action is changed back to "catch", the signal will come
through.  Thus, you just "hold" the interrupts while the screen is
being painted.

However, if you actually want to catch the signals while the write is
occurring, no common UNIX I know of lets you do this.

The problem here is that signals were originally intended as "traps";
they indicated that some "error" had occurred and that the program should
stop what it's doing and quit (the user hitting their interrupt key is
considered an "error" of this sort).  Then they were shanghaied into
service as software interrupts; unfortunately, they don't work well
as software interrupts.  For one thing, you can't continue an interrupted
system call; 4.xBSD will *restart* an interrupted system call that never
got started, but if the "write", say, had already written some data it
says "the hell with it" and just aborts it.  For another, you can't defer
them inside a critical region (except with the aforementioned "hold"
mechanism).  And, of course, when a signal comes in the signal action
is reset to the default, which usually blows the process away; this means
that if the signals come in fast enough the process will simply (and
mysteriously) die.  (For fun, if your interrupt key is a quickly
repeating key on your terminal, try holding it down while at the shell
level; unless you're on 4.2 or some other system that doesn't do this
reset, you stand a good chance of getting logged out as the shell gets
blown away by a SIGINT before it gets a chance to reset the signal handler.)

Our office automation system has already run into this problem; if the
user hits their interrupt key while the screen's being painted, they can
lose.  So it is a real problem; the "hold" mechanism will do OK for us
(we only get interrupts from the keyboard, so deferring them while the
screen is painted is no problem) but it may not work for everybody.

Having the "write" return the number of characters actually written out
seems good offhand.  If a "read" moved 0 bytes, however, it could be
mistaken for an EOF.  Perhaps this one would have to be special-cased.
None of the other "slow" system calls have this problem, as far as I
know.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy