RFS bug fixes #3

Todd Brunhoff toddb at tekcrl.UUCP
Tue Feb 18 12:38:40 AEST 1986


4 bug fixes follow.  The most important is one I did out of sheer
frustration:  when a server host goes down, the client kernel marks the
connection as closing and refuses to start up another connection until
all the processes which have used that host have exited.  I now think
that this is ridiculous requirement.  And from the hassles I have
received from users and those you have told me you get from your users,
I surmise that I am not alone in ridiculing my own implementation.

The fix simply entailed not waiting for processes using the connection
to exit.  The result is that any process that has open file descriptors
or a working directory on the remote host, suddenly loses them... but
then, they were lost anyway.  Most important, though, the connection will
restart (if the other host is up).  Two files changed in the kernel,
remote/rmt_io.c and remote/rmt_generic.c.  First, here are the changes
to remote/rmt_io.c:

***************
*** 11,17
   * may be copied, modified or used in any way, without fee, provided this
   * notice remains an unaltered part of the software.
   *
!  * $Header: rmt_io.c,v 2.3 86/02/04 10:26:43 toddb Exp $
   *
   * $Log:	rmt_io.c,v $
   * Revision 2.3  86/02/04  10:26:43  toddb

- --- 11,17 -----
   * may be copied, modified or used in any way, without fee, provided this
   * notice remains an unaltered part of the software.
   *
!  * $Header: rmt_io.c,v 2.4 86/02/17 17:43:21 toddb Exp $
   *
   * $Log:	rmt_io.c,v $
   * Revision 2.4  86/02/17  17:43:21  toddb
***************
*** 14,19
   * $Header: rmt_io.c,v 2.3 86/02/04 10:26:43 toddb Exp $
   *
   * $Log:	rmt_io.c,v $
   * Revision 2.3  86/02/04  10:26:43  toddb
   * Added a missing rmt_unuse() call when it is too soon to call a host
   * that has failed a connection.  Contributed by Dennis Rockwell @ csnet.

- --- 14,25 -----
   * $Header: rmt_io.c,v 2.4 86/02/17 17:43:21 toddb Exp $
   *
   * $Log:	rmt_io.c,v $
+  * Revision 2.4  86/02/17  17:43:21  toddb
+  * If the host is closing, remote_getconnection() goes right ahead and
+  * closes and then tries to reopen.  This provides a pretty good recover
+  * mechanism, except for the fact that current working directories are
+  * lost and open files are lost.
+  * 
   * Revision 2.3  86/02/04  10:26:43  toddb
   * Added a missing rmt_unuse() call when it is too soon to call a host
   * that has failed a connection.  Contributed by Dennis Rockwell @ csnet.
***************
*** 88,93
  		return(rp->r_openerr);
  	}
  	/*
  	 * We may already have a connection
  	 */
  	else if (rp->r_sock)

- --- 94,104 -----
  		return(rp->r_openerr);
  	}
  	/*
+ 	 * If it is closing, do it here and get it over with.
+ 	 */
+ 	else if (rp->r_close)
+ 		rmt_closehost(rp);
+ 	/*
  	 * We may already have a connection
  	 */
  	else if (rp->r_sock)
***************
*** 103,108
  		rp->r_failed = FALSE;
  	}
  	rp->r_opening = TRUE;
  
  	/*
  	 * pseudo loop to avoid labels...

- --- 114,120 -----
  		rp->r_failed = FALSE;
  	}
  	rp->r_opening = TRUE;
+ 	rp->r_close = FALSE;
  
  	/*
  	 * pseudo loop to avoid labels...
***************
*** 205,211
  	/*
  	 * get a connection.
  	 */
! 	if ((so = rp->r_sock) == NULL)
  		if (error = remote_getconnection(system)) {
  			(void) m_freem(*m);
  			return(error);

- --- 217,223 -----
  	/*
  	 * get a connection.
  	 */
! 	if ((so = rp->r_sock) == NULL || rp->r_close)
  		if (error = remote_getconnection(system)) {
  			(void) m_freem(*m);
  			return(error);

Changes to remote/rmt_generic.c:
***************
*** 11,17
   * may be copied, modified or used in any way, without fee, provided this
   * notice remains an unaltered part of the software.
   *
!  * $Header: rmt_generic.c,v 2.3 86/02/17 14:36:37 toddb Exp $
   *
   * $Log:	rmt_generic.c,v $
   * Revision 2.3  86/02/17  14:36:37  toddb

- --- 11,17 -----
   * may be copied, modified or used in any way, without fee, provided this
   * notice remains an unaltered part of the software.
   *
!  * $Header: rmt_generic.c,v 2.4 86/02/17 17:46:07 toddb Exp $
   *
   * $Log:	rmt_generic.c,v $
   * Revision 2.4  86/02/17  17:46:07  toddb
***************
*** 14,19
   * $Header: rmt_generic.c,v 2.3 86/02/17 14:36:37 toddb Exp $
   *
   * $Log:	rmt_generic.c,v $
   * Revision 2.3  86/02/17  14:36:37  toddb
   * Fix so that the kernel properly reacts to nonexistant hosts
   * on the generic mount point: sockargs() was being called in the

- --- 14,23 -----
   * $Header: rmt_generic.c,v 2.4 86/02/17 17:46:07 toddb Exp $
   *
   * $Log:	rmt_generic.c,v $
+  * Revision 2.4  86/02/17  17:46:07  toddb
+  * We no longer stop processing in isremote() if the connection is in
+  * the middle of closing.
+  * 
   * Revision 2.3  86/02/17  14:36:37  toddb
   * Fix so that the kernel properly reacts to nonexistant hosts
   * on the generic mount point: sockargs() was being called in the
***************
*** 441,448
  	}
  
  	/*
! 	 * We have a remote mount point.  However, it may be in the midst of
! 	 * shutting down from some error.  This mount point may also be
  	 * a "generic" mount point, in which case, we must figure out
  	 * the host.
  	 */

- --- 445,451 -----
  	}
  
  	/*
! 	 * We have a remote mount point.   This mount point may be
  	 * a "generic" mount point, in which case, we must figure out
  	 * the host.
  	 */
***************
*** 452,462
  			debug8("isremote: can't map path %s\n", path);
  			return(TRUE);
  		}
- - 	}
- - 	if (rp->r_close) {
- - 		debug8("isremote: host %s is closing\n", rp->r_mntpath);
- - 		u.u_error = ENOREMOTEFS;
- - 		return(TRUE);
  	}
  	u.u_error = EISREMOTE;
  	remote_sysindex = sysnum;

- --- 455,460 -----
  			debug8("isremote: can't map path %s\n", path);
  			return(TRUE);
  		}
  	}
  	u.u_error = EISREMOTE;
  	remote_sysindex = sysnum;

Dennis Rockwell @ csnet found one of the counting bugs in the RFS
bookkeepping.  I had forgotten a call to rmt_unuse() if it is not time
for a retry.  Here are the changes to the kernel file remote/rmt_io.c:

***************
*** 11,17
   * may be copied, modified or used in any way, without fee, provided this
   * notice remains an unaltered part of the software.
   *
!  * $Header: rmt_io.c,v 2.2 86/01/21 11:05:41 toddb Exp $
   *
   * $Log:	rmt_io.c,v $
   * Revision 2.2  86/01/21  11:05:41  toddb

- --- 11,17 -----
   * may be copied, modified or used in any way, without fee, provided this
   * notice remains an unaltered part of the software.
   *
!  * $Header: rmt_io.c,v 2.3 86/02/04 10:26:43 toddb Exp $
   *
   * $Log:	rmt_io.c,v $
   * Revision 2.3  86/02/04  10:26:43  toddb
***************
*** 14,19
   * $Header: rmt_io.c,v 2.2 86/01/21 11:05:41 toddb Exp $
   *
   * $Log:	rmt_io.c,v $
   * Revision 2.2  86/01/21  11:05:41  toddb
   * Missing argument to sleep() in rmt_getconnection().
   * 

- --- 14,23 -----
   * $Header: rmt_io.c,v 2.3 86/02/04 10:26:43 toddb Exp $
   *
   * $Log:	rmt_io.c,v $
+  * Revision 2.3  86/02/04  10:26:43  toddb
+  * Added a missing rmt_unuse() call when it is too soon to call a host
+  * that has failed a connection.  Contributed by Dennis Rockwell @ csnet.
+  * 
   * Revision 2.2  86/01/21  11:05:41  toddb
   * Missing argument to sleep() in rmt_getconnection().
   * 
***************
*** 92,98
  	 * If we have failed previously, it may be time to try again.
  	 */
  	else if (rp->r_failed) {
! 		if (rp->r_age > time.tv_sec)
  			return(rp->r_openerr);
  		rp->r_failed = FALSE;
  	}

- --- 96,103 -----
  	 * If we have failed previously, it may be time to try again.
  	 */
  	else if (rp->r_failed) {
! 		if (rp->r_age > time.tv_sec) {
! 			rmt_unuse(rp, system);
  			return(rp->r_openerr);
  		}
  		rp->r_failed = FALSE;
***************
*** 94,99
  	else if (rp->r_failed) {
  		if (rp->r_age > time.tv_sec)
  			return(rp->r_openerr);
  		rp->r_failed = FALSE;
  	}
  	rp->r_opening = TRUE;

- --- 99,105 -----
  		if (rp->r_age > time.tv_sec) {
  			rmt_unuse(rp, system);
  			return(rp->r_openerr);
+ 		}
  		rp->r_failed = FALSE;
  	}
  	rp->r_opening = TRUE;


Bill Sommerfeld found a bug in the kernel half of the name server.  If
some invalid host was implied by the pathname /net/host, the server properly
returned a null internet address, but the kernel still called sockargs()
to allocate space for it.  Here are the changes to the kernel
file remote/rmt_generic.c (my appologies, but you will probably have to do
these context diffs by hand, because they contain the code for #ifdef BSD4_3
which my installation script removes).

***************
*** 11,17
   * may be copied, modified or used in any way, without fee, provided this
   * notice remains an unaltered part of the software.
   *
!  * $Header: rmt_generic.c,v 2.2 86/01/02 13:52:32 toddb Exp $
   *
   * $Log:	rmt_generic.c,v $
   * Revision 2.2  86/01/02  13:52:32  toddb

- --- 11,17 -----
   * may be copied, modified or used in any way, without fee, provided this
   * notice remains an unaltered part of the software.
   *
!  * $Header: rmt_generic.c,v 2.3 86/02/17 14:36:37 toddb Exp $
   *
   * $Log:	rmt_generic.c,v $
   * Revision 2.3  86/02/17  14:36:37  toddb
***************
*** 14,19
   * $Header: rmt_generic.c,v 2.2 86/01/02 13:52:32 toddb Exp $
   *
   * $Log:	rmt_generic.c,v $
   * Revision 2.2  86/01/02  13:52:32  toddb
   * Ifdef'ed calls to sockargs() for the differences in 4.2 vs. 4.3.
   * 

- --- 14,25 -----
   * $Header: rmt_generic.c,v 2.3 86/02/17 14:36:37 toddb Exp $
   *
   * $Log:	rmt_generic.c,v $
+  * Revision 2.3  86/02/17  14:36:37  toddb
+  * Fix so that the kernel properly reacts to nonexistant hosts
+  * on the generic mount point: sockargs() was being called in the
+  * remotename() system call (case NM_NAMEIS) even if uap->name was null.
+  * Bill Sommerfeld (wesommer at athena.mit.edu).
+  * 
   * Revision 2.2  86/01/02  13:52:32  toddb
   * Ifdef'ed calls to sockargs() for the differences in 4.2 vs. 4.3.
   * 
***************
*** 328,333
  			m_free(m); /* free extra mbuf */
  			m = remote_ns.rn_name = NULL;
  		}
  #ifdef BSD4_3
  		error = sockargs(&m, uap->name, uap->namelen, MT_SONAME);
  #else BSD4_3

- --- 334,340 -----
  			m_free(m); /* free extra mbuf */
  			m = remote_ns.rn_name = NULL;
  		}
+ 		if (uap->name) {
  #ifdef BSD4_3
  		    error = sockargs(&m, uap->name, uap->namelen, MT_SONAME);
  #else BSD4_3
***************
*** 329,335
  			m = remote_ns.rn_name = NULL;
  		}
  #ifdef BSD4_3
! 		error = sockargs(&m, uap->name, uap->namelen, MT_SONAME);
  #else BSD4_3
  		error = sockargs(&m, uap->name, uap->namelen);
  #endif BSD4_3

- --- 336,342 -----
  		}
  		if (uap->name) {
  #ifdef BSD4_3
! 		    error = sockargs(&m, uap->name, uap->namelen, MT_SONAME);
  #else BSD4_3
  		    error = sockargs(&m, uap->name, uap->namelen);
  #endif BSD4_3
***************
*** 331,337
  #ifdef BSD4_3
  		error = sockargs(&m, uap->name, uap->namelen, MT_SONAME);
  #else BSD4_3
! 		error = sockargs(&m, uap->name, uap->namelen);
  #endif BSD4_3
  		if (error == 0) {
  			cp = mtod(m, char *) + m->m_len;

- --- 338,344 -----
  #ifdef BSD4_3
  		    error = sockargs(&m, uap->name, uap->namelen, MT_SONAME);
  #else BSD4_3
! 		    error = sockargs(&m, uap->name, uap->namelen);
  #endif BSD4_3
  		    if (error == 0) {
  			cp = mtod(m, char *) + m->m_len;
***************
*** 333,339
  #else BSD4_3
  		error = sockargs(&m, uap->name, uap->namelen);
  #endif BSD4_3
! 		if (error == 0) {
  			cp = mtod(m, char *) + m->m_len;
  			if (error = copyin((caddr_t)uap->path, (caddr_t)cp,
  			    MIN(R_MNTPATHLEN, MLEN - m->m_len))) {

- --- 340,346 -----
  #else BSD4_3
  		    error = sockargs(&m, uap->name, uap->namelen);
  #endif BSD4_3
! 		    if (error == 0) {
  			cp = mtod(m, char *) + m->m_len;
  			if (error = copyin((caddr_t)uap->path, (caddr_t)cp,
  			    MIN(R_MNTPATHLEN, MLEN - m->m_len))) {
***************
*** 341,346
  				m = NULL;
  			}
  			debug13("nmsrv: %s@%x\n", cp, m);
  		}
  		remote_ns.rn_name = m;
  		wakeup(&remote_ns.rn_name);

- --- 348,354 -----
  				m = NULL;
  			}
  			debug13("nmsrv: %s@%x\n", cp, m);
+ 		    }
  		}
  		remote_ns.rn_name = m;
  		wakeup(&remote_ns.rn_name);

Joe Othmer may have found the bug that caused a command like

	ls /host1/host2/file

to hang.  The sentry server (which does only nameserver functions for
"/net/...", and listens for connections) had the proper call to the
system call "remoteoff(NULL)" so that remote access is denied for the
server.  This sets the SNOREMOTE bit in the proc structure p_flag.
Unfortunately, the gateway server (forked by the sentry server) and
any of the other servers (forked by the gateway server), get this
bit cleared by a fork().  The effect was that the server was actually
allowing the hops to happen.  Here are the changes so that any server is
sure to have the SNOREMOTE bit set.  The changes go in the server
file remote/fileserver.c:

***************
*** 12,17
   * notice remains an unaltered part of the software.
   *
   * $Log:	fileserver.c,v $
   * Revision 2.0  85/12/07  18:21:14  toddb
   * First public release.
   * 

- --- 12,22 -----
   * notice remains an unaltered part of the software.
   *
   * $Log:	fileserver.c,v $
+  * Revision 2.1  86/02/17  15:11:03  toddb
+  * The SNOREMOTE bit is cleared in the children of the server, so a call
+  * to remoteoff(NULL) was added to server() and to become_server().
+  * Joe Othmer (vax135!jo).
+  * 
   * Revision 2.0  85/12/07  18:21:14  toddb
   * First public release.
   * 
***************
*** 16,22
   * First public release.
   * 
   */
! static char	*rcsid = "$Header: fileserver.c,v 2.0 85/12/07 18:21:14 toddb Rel $";
  #include <errno.h>
  #include <stdio.h>
  #include <ctype.h>

- --- 21,27 -----
   * First public release.
   * 
   */
! static char	*rcsid = "$Header: fileserver.c,v 2.1 86/02/17 15:11:03 toddb Exp $";
  #include <errno.h>
  #include <stdio.h>
  #include <ctype.h>
***************
*** 177,182
  	/*
  	 * Ok, now be a server!
  	 */
  	for(;;)
  	{
  		if (i_am_gateway && ! i_have_control)

- --- 182,188 -----
  	/*
  	 * Ok, now be a server!
  	 */
+ 	remoteoff(NULL);
  	for(;;)
  	{
  		if (i_am_gateway && ! i_have_control)
***************
*** 342,347
  	 * our pid, etc.  Also, try to dup stderr so that flock will work.
  	 * If we can't do it, we are in big trouble.
  	 */
  	current_ppid = current_pid;
  	current_pid = getpid();
  	if (i_am_gateway)

- --- 348,354 -----
  	 * our pid, etc.  Also, try to dup stderr so that flock will work.
  	 * If we can't do it, we are in big trouble.
  	 */
+ 	remoteoff(NULL);
  	current_ppid = current_pid;
  	current_pid = getpid();
  	if (i_am_gateway)


------- End of Blind-Carbon-Copy



More information about the Comp.sources.bugs mailing list