sendmail abort while running the queue

dplatt at teknowledge-vaxc.UUCP dplatt at teknowledge-vaxc.UUCP
Fri Feb 20 06:50:13 AEST 1987


I'm running into a strange sendmail abort and haven't been able to pin
it down... can anybody give me a hint?

The situation is as follows: I'm on a Sun 3/52 workstation, running
SunOS 3.2.  My sendmail daemon has been invoked with the "-bd -q15m"
options.  Occasionally, the queue-running daemon aborts (the forked
child, not the parent).

The conditions appear to be the following:

1) I've sent a message to a host that is down or unreachable.

2) sendmail has made repeated tries to deliver the message.

3) The system has been up (without a reboot) for at least a day.

4) The abort typically occurs between midnight and 8 AM.

The symptoms appear to be:

a) sendmail aborts quietly and dumps core; it doesn't generate a message
   into the system log or to the console.

b) the "d" (data) file of the undelivered mailgram remains in the
   /usr/spool/mqueue directory.

c) The "l" (lock) file is apparently being left in the mqueue directory,
   as the next queue run generates an "id: locked" message in the
   syslog.  This happens only once, though... the queue run 30 minutes
   after the abort does not report "id: locked", so it appears that
   somebody is deleting the lock file.

d) the "q" (control) file is being deleted at some point, although I'm
   not sure when;  it's gone when I come to work.

My sendmail.cf is derived from the "sendmail.cf.subsidiary" file that
came with SunOS 3.2, with a couple of mods:

-  I use the "or10m" option to cause SMTP connections to time out if the
   foreign host doesn't respond within 10 minutes.

-  I have two mailers ("ether" and "localether") which are defined with
   the P=[IPC], A=IPC options.  Ruleset 0 selects the "localether"
   mailer for outbound mail being sent to hosts that don't have a domain
   specification (i.e. are on our local Ethernet), and "ether" for hosts
   with a domain spec.  The "localether" mailer delivers mail directly;
   the "ether" mailer passes the mail to our local Internet relay
   host for delivery, and hacks the "From:" address to include the relay
   host's name rather than the sending Sun's name (which isn't
   registered on the Internet).

-  I have a "frozen" sendmail.fc, derived from the sendmail.cf after the
   last set of changes were made.

Any ideas what might be going on here?  I've seen some symptoms in the
past that lead me to suspect that the SunOS 3.2 sendmail may begin to
suffer from "bit decay" after the system has been up for a prolonged
period of time [strange aborts, curable only by a reboot... killing
all copies of sendmail and restarting the daemon does NOT cure the
problem... sticky-pages damaged, perhaps?).  Anybody else seen these
symptoms, or have a cure or a diagnosis procedure?

As a possible workaround, I've removed the "-q15m" from the daemon
invokation in /etc/rc.local, and have added a queue-running command in
crontab.  It'll be interesting to see if the problem goes away!

                Dave Platt
Internet:       dplatt at teknowledge-vaxc.arpa
Usenet:         {hplabs|sun|ucbvax}!dplatt%teknowledge-vaxc.arpa
Voice:          (415) 424-0500
USnail:         Teknowledge, Inc.
                1850 Embarcadero Road
                Palo Alto, CA  94303



More information about the Comp.unix.wizards mailing list