Purge 4.2BSD TCP tasks in FIN_WAIT_2

Wed Apr 17 12:24:45 AEST 1985

      It is legitimate for a TCP connection to be in FIN_WAIT_2 forever! 
There is no idle traffic on a TCP connection.  When a connection is 
in the ESTABLISHED state at both ends, and nothing is going on (as with
an idle TELNET connection) no packets are exchanged.  So if one end goes
down in this situation, the other end never hears about it unless some
traffic is generated on the connection.  Jon Postel, the ARPANET protocol 
czar in the days before the Defense Communications Agency took over the
responsibility for the standards, has taken the position that
no idle traffic should be generated in TCP and that TCP should have no
idle timeout; if a host can't handle a large number of half-dead connections,
that's tough.  He thinks that idle traffic, if any, should be generated
in the applications layer.
      FIN_WAIT_2 is the state your end is in when your end has closed, and the 
other end has acknowledged your close, but the other end has not closed yet.  
Closing in TCP is separate for read and write; you can close your write pipe and
continue to read in some implementations.  (I don't know if 4.2BSD is one
of these.)  One way to get into this situation legitimately is to start
a long job remotely via TELNET, then close the TELNET connection at your end.
This indicates that you want to receive data from your remote job and log out
when the remote job finally completes.  In this situation, you can 
still receive output but cannot send any more data.  If the remote
machine crashes while you are in this situation, you will be hung in 
FIN_WAIT_2 forever.
     This is a tough one.  In the hung-in-ESTAB case, a local attempt to
send anything will start the retransmission mechanism, which will detect
failure within a minute or two.  But if you are hung in FIN_WAIT_2, there
is nothing that your end can do to probe the connection; you've closed,
and are forbidden to send anything; you have to sit there and wait for
a FIN that may never come.  (Your end can abort the connection, of course,
but the application has to do that; TCP isn't entitled to do so.)
     The typical work-around here is to close idle TCP connections after
some huge timeout, such as one day.  Whatever timeout is chosen must be longer
than the longest legitimate idle connection; if it was, for example, ten
minutes, idle TELNET connections would log out in ten minutes.  Given the
present standards; it's hard to do better than this.  4.2BSD is probably
doing something like this. 
     We have a one hour idle timeout in our implementation (based on 3COM's
UNET) and send an empty ACK every 4 minutes to keep things alive.  But this
solution won't work generally unless everybody sends an empty ACK every
few minutes.  Still, it's a clean work-around.

				John Nagle
				Ford Aerospace and Communications Corp.