Sys V/386/3.2 UNIX system getting hung (?)

Sam Vause vause at cs-col.Columbia.NCR.COM
Fri Apr 7 23:45:50 AEST 1989


In article <6226 at homxc.ATT.COM> mrb1 at homxc.ATT.COM (M.BAKER) writes:
>	We have an AT&T 6386E system running UNIX SysV/3.2.
>	While running our application, it has been observed to
>	'hang'.  Specifically, the application stops in the
>	middle of things.  More importantly, all the terminal I/O
>	stops.......including the system console.  You can't log
>        in on a free getty.  Anything you
>	type gets echoed back to the screen, but nothing gets 
>	done with it...

Well, it's possible that the clist increment mentioned later
in the original posting is actually *hurting* the situation, rather
than helping.

My experience indicates that this symptom is possibly from a variety
of situations, but personal observation leads me to believe that the
kernel logical address space is being exhausted.

Perhaps the best method of identifying the actual problem symptoms (in
the absence of the memory dump), is to use the crash(1m) command on the
running kernel to examine the status of the System Page Table Map.

Although I am not personally familiar with the way this command executes
on other machines, I have used it during kernel debug enough to give you
the general expectations:

	# crash <CR>
	> stat
	sysname: UnixV
	nodename: cs-col
	release: 020001
	version: config
	machine: 68020
	time of crash: Fri Apr  7 09:11:05 1989
	age of system: 21 day, 23 hr., 
	> map sptmap
	sptmap
	address  size
	00000000    97
	00001f99    71
	2 segments, 168 units
	>od maxspace
	00e67e18: 00100000

I've included this example from my machine (NCR TOWER 32/600) for your
reference.  For this system, there are only two segments and a total of
168 units (each is 2K clicks) of System Page Table (SPT) space left.  The
first segment is reserved for the actual kernel code itself, and is not
generally available to the user.  The second segment (and any possible
following ones) are available to user processes (but not until the fork(2)
system call returns...).

Since the MAXSPACE kernel configuration parameter is 0x100000, each active
process will dynamically sptalloc() 4K of kernel SPT space. (Your mileage
(may vary...)  For this machine, each 1MB (0x100000) increase to the
MAXSPACE parameter will also place an additional 2K burden on each processes
SPT requirements.

For this machine, I can realistically create only 35 additional processes
(71 clicks * 2K / 4K).

What this all means is that systems where SPT space is tight will exhibit
the symptoms you've described:  character echo at the terminal is okay, 
but no processes appear to be in execution.  System degradation appears
to occur slowly, rather than "all at once".  Generally, no error messages
are written to the console.  Crash(1m) shown the SPT space to be generally
less than 4 segments, with a *total* number of units less than 120.

The cure?  Well, if possible, increase your Kernel Address Space size.  If
there is not already a configuration parameter for this purpose, your only
alternatives are to reduce the number of buffers and clists, in order to
furnish more kernel logical address space for SPT usage, and delete any
kernel features and drivers you do not need.  Failing this, you get to buy
another machine...

Perhaps this is not your actual situation, but it sure sounds *PAINFULLY*
similar to situations that I've recently encountered....

+------------------------------------------------------------------+
|Sam Vause, NCR Corporation, Customer Services - TOWER Support	   |
|3325 Platt Springs Road, West Columbia, SC 29169 (803) 791-6953   |
|                                vause at cs-col.Columbia.NCR.COM     |
|			 ...!uunet!ncrlnk!ncrcae!cs-col!vause	   |
|		...!ucbvax!sdcsvax!ncr-sd!ncrcae!cs-col!vause      |
+------------------------------------------------------------------+



More information about the Comp.unix.questions mailing list