longjmp botches in sendmail on 4.3+NFS

Jeffrey I. Schiller jis at mit-trillian.MIT.EDU
Thu Dec 18 14:09:26 AEST 1986


	The problem is caused by the two nested setjumps. Basically
what happens is that smtpinit() sets up a timer to go off after five
minutes (if it doesn't get a greeting). It then calls reply() which
ultimately calls sfgets(). sfgets sets up a timer (usually 2 hours) to
go off if no data is received (ie. you are in a collect and no data
comes in after 2 hours). The code in sfgets does a setjmp, sets a
timer (which will do a longjmp) and does the read. If the read
completes the timer is removed... HOWEVER if the 5 minute timer goes
off in smtpinit, the stack frame of sfgets is abandoned with the timer
still active.

	Now if the same sendmail process is around when that timer
goes off (ie. in two hours), which will typically only happen on LARGE
mailing lists, you get a longjmp botch.

	I found this bug a few weeks ago (with a mailing list of about
~250 recipients). I fixed it by changing the code in smtpinit to NOT
SET A TIMER, but to instead change the value of "ReadTimeout" (which
is the global variable that sfgets() uses to determine how long to
wait) to 5 minutes and then restore it later. Here is the comment in
my code:

	/*
	**  Get the greeting message.
	**	This should appear spontaneously.  Give it five minutes to
	**	happen.
        **
	**  JIS: We change the global variable ReadTimeout to be 5
	**      minutes. This variable is used by the lowlevel routine
	**      sfgets to determine how long to wait for input.
	**      when we get our greeting we return ReadTimeout to its
	**      previous state. IMPORTANT: The older code I replaced
	**      used a separate timeout (via a setjmp and longjmp)
	**      this LOSES REAL BIG if the 5 minute timeout goes off
	**      for then sfgets gets its stack unwound and leaves
	**      a lingering event that will eventually cause a longjmp
	**      to some ancient stack history, sendmail then dies horribly.
	**      This usually happens only when dealing with large mailing
	**      lists ("xpert" in this case > 200 recipients), which is
	**      the LAST place you want to dump core, for then the queue
	**      files are out of date and LOTS of people get a duplicate
	**      copy of the message that was in progress.
	*
	*/

	Btw. Another unrelated bug just discovered yesterday is that
if you have a LARGE number of recipients at one destination (like
wiscvm or seismo) then syslog() may get called with a line greater
then 1024 characters.... and blamo! core dump. This bug is really
in the syslog(3) routine, not sendmail itself...

			-Jeff



More information about the Comp.unix.wizards mailing list