A possible network bug in Sun unix?

Fri Dec 19 01:01:07 AEST 1986

This may be a feature of Sun UNIX, but is probably not restricted to it.
It is caused by two problems:

1) Unix has a keepalive option on sockets that times out (breaks)
connections if the peer in connection goes away. For TCP, "going away"
is defined as not having recieved any packets from the peer in X
amount of time. Rlogind uses this option.

2) Sun diskless machines reboot much more quickly then normal Unix
machines, because they don't have large disks for fsck to churn away
on. In particular, they are back up and running before old connections
have timed out due to (1). 

1 is implemented by running a timer that expires whenever no packets
have been exchanged for a certain period of time. When the timer
expires, TCP sends a one byte data segment that is outside of its send
window (i.e. it already has an ACK for that sequence number). The peer
TCP, in receiving the segment, notes that it already has the data and
sends back an ACK for the sequence number that it expects to see. The
client TCP gets that ACK, and updates its timer indicating that the
connection is still alive. The connection eventually breaks if no ACKs
are received.

This works just fine as long as both TCPs are still there, or if one
end of the TCP connection goes away in the sense that the host is
unreachable. On the other hand, if one machine crashes and reboots
quickly, the following occurs:

The client TCP sends a keepalive packet, which the peer TCP receives.
Now however, there is no protocol control block for that connection,
so the peer TCP sends back a RESET. The client TCP receives the
packet, updates its keepalive timer (hum... I got a packet, the
connection must still be fine), then checks the sequence numbers that
were ACKed. The ACK is outside of its receive window and there was no
data sent in the segment, so TCP drops the packet ignoring the RESET.
(This follows the TCP spec).

>Since the other end of the rlogin will stick around until some I/O
>forces it to recognize the connection is broken (we just cat'ed to the
>pty on the remote system and it closed),

This results from the RESET being ignored since it is not within its
recieve window. If you force the TCP to send real data, the ACK that
gets returned will be within the receive window and the RESET causes
the connection to break.

One workaround is to change the line in tcp_input(...):
	tp->t_timer[TCPT_KEEP] = TCPTV_KEEP;
to something like:
	if ((tiflags&TH_RST) == 0)
		tp->t_timer[TCPT_KEEP] = TCPTV_KEEP;

This will cause the connection to eventually timeout. Both 4.2 and 4.3
BSD suffer from this problem.

>1) You rlogin from your sun workstation (Sun-3/50 in this case) to another
>   system on the network.
>2) Your sun workstation crashes.
>3) After rebooting you try to rlogin to the same other system again and
>   you can't even after multiple tries.

I tried to duplicate your behavior on our Sun machines running NFS3.2
trying to connect to 4.2, 4.3 and NFS3.0 machines. I don't have a 3.0
machine handy that I can crash at will. I would rlogin to host A,
reboot the workstation, and rlogin to A again. Each time, I was able
to rlogin successfully. Each connection used the same port numbers.
Note that under normal conditions, the following packet exchange takes
place:

A				B
send SYN, SEQ=n,ACK=0		(thinks connection is established)
				gets SYN, sends back ACK=m,SEQ=o
gets ACK, notices sequence 
number is not what it
expects & replies with:
ACK=0,SEQ=m,RESET			
				gets RESET,drops connection and sends back
				RESET,ACK=m,SEQ=o

At this point the "old" rlogin has gone away, and the next SYN will
cause the connection to become established properly. 

I suppose that things could break if the sequence number chosen by A
was the same as B was expecting, but that would be an awful
coincidence. It is the case, however, that when a machine reboots, it
starts with an initial sequence number of 0. If your machine crashes
several times in quick succession, it is possible that the sequence
numbers on the peer connection could also be very low. Still, I find
it hard to believe that this is the cause the problem.

Do you have anyway of determinig what sequence numbers are involved in
the connections or what sort of packets are floating around for the
connection in question?

Thomas Narten
narten at purdue.EDU or {ihnp4, allegra}!purdue!narten