Controlling high load on 4.2 unix

Sat Oct 27 21:36:00 AEST 1984

In response to the high number of requests for the load control system
I described about one week ago, I am going to post it to net.sources
sometime next week. Enclosed below is a rough draft of the manual page
for the load control server. 

	Keith Muller
	UCSD Academic ComputerCenter
	ucbvax!sdcsvax!sdcc3!muller
---------------------------------------------------------------------
	.TH LDD 8 "25 May 1984"
	.UC 4
	.ad
	.SH NAME
	ldd \- load system server (daemon)
	.SH SYNOPSIS
	.B /etc/ldd
	[ -L load ] [ -M max_time ] [ -T alarm ]
	.SH DESCRIPTION
	.TP
	.B \-L
	changes the load average that ldd attempts to maintain to
	.I load
	instead of the default (usually 10).
	.TP
	.B \-M
	changes the maximum time (in seconds) that a job can be queued to
	.I max_time
	seconds instead of the default (usually 14200 seconds or 4 hours).
	.TP
	.B \-T
	changes the time (in seconds) that the
	.I ldd
	server waits between load average checks to 
	.I time
	seconds instead of the default (usually 60 seconds).
	.PP
	.I Ldd
	is the load control server (daemon) and is normally invoked
	at boot time from the
	.IR rc (8)
	file.
	The
	.I ldd
	server attempts to maintain the system load average (number
	of 
	.I runnable
	processes) below a preset value so interactive programs like
	.IR vi (1)
	remain responsive.
	When the system load average (1 minute as shown bye
	.IR uptime (1)
	) is above the preset limit,
	.I ldd
	will "block" specific \f2cpu intensive\f1 processes from running and place
	them in a queue.
	These blocked jobs are not \f2runnable\f1 and therefore do not 
	contribute to the system load. When the load average drops below the preset
	limit, 
	.I ldd
	will remove jobs from the queue and tell them to continue
	execution.
	The system administration determines which programs are 
	considered \f2cpu intensive\f1 and places control of their execution under the
	.I ldd
	server.
	.PP
	A front end
	.I client
	program replaces each of the programs to be controlled by the
	.I ldd
	server.
	Each time a user requests execution of a controlled program, the
	.I client
	enters the "request state",
	sends a "request to run" datagram to the server and waits for a response. The
	waiting client is "blocked" waiting for the response from the
	.I ldd
	server.
	If the 
	.I client
	determines that the 
	.I ldd
	server is not running, the requested 
	program is executed as if there was no load control system. 
	A process will not block if the 
	.I ldd
	server is not running.
	.PP
	The
	.I ldd
	server can send one of four different messages to the client.
	A "queued message" indicates that the client has
	been entered into the queue and should wait.
	A "poll message" indicates that the message should be resent (the server
	did not get the message).
	A "terminate message" indicates that this request cannot be honored
	and the client should exit abnormally.
	A "run message" indicates the requested program should be run.
	.PP
	If the client does not receive an answer to a request after a certain
	period of time has elapsed (usually 90 seconds), the request is resent.
	If after a preset number of times 
	resending the request no response is obtained from the server,
	the requested program
	is executed. This prevents the process from blocking forever
	if
	.I ldd's
	fails to respond to the requests (due to a failure).
	.PP
	After receiving the "queued message" the client enters the "queued state"
	and waits for another command
	from the server (usually getting the run command).
	If the user does not have the environment variable "LOAD" set to "quiet",
	the status string "queued" will be printed on stderr.
	If no further commands
	are received after a preset time has elapsed (usually 15 minutes),
	the server re-enters the "request state" and sends the request
	to the server again.
	This assures that the server has not terminated or
	failed since the time the client was queued.
	.PP
	The
	.I ldd
	server logs all recoverable and unrecoverable errors in a logfile. Advisory
	locks are used to prevent more than one executing server at a time.
	When the
	.I ldd
	server first begins execution, it scans the spool directory for clients that
	might have been queued from a previous
	.I ldd
	server and sends them a "poll request". 
	Waiting
	.I clients
	will resend their "request to run" message to the new
	.I ldd
	server, and re-enter the "request state".
	The
	.I ldd
	server will rebuild the queue of waiting tasks 
	ordered by the time each client began execution.
	This allows the
	.I ldd
	server to terminate and be re-started without
	loss or blockage of any waiting clients.
	.PP
	When the server receives a "request to run",
	it has to determine if the job should run immediately, or be queued.
	If the queue is not empty, the request is added to the queue,
	and the client is sent a "queued message" to indicate that
	it has been placed in the queue.
	If the queue is empty, 
	the server checks the current load average, and
	if it is below the limit,
	the client is sent a "run message".
	Otherwise the server queues the request, sends the client a "queued message",
	and starts the interval timer.
	The interval timer is bound to a handler that checks the system load every
	few seconds (usually 60 seconds). 
	If the handler finds the current load average is below the limit,
	jobs are removed from the queue and sent a "run message".
	The number of jobs
	sent "run messages" depends on how much the current load average has
	dropped below the limit.
	If the queue becomes empty the handler 
	will shut off the interval timer (as it no longer needed).
	If the handler finds the load average is above the limit, it checks
	how long the oldest process has been waiting to run.
	If that time is greater than a preset limit (usually 4 hours) the job is 
	removed from the queue and told
	to run regardless of the load.
	This prevents jobs from being blocked forever due to load averages that
	remain above the limit for long periods of time.
	.PP
	Commands can be sent to the server by the
	.IR ldc (8)
	control program. These commands can manipulate the queue and change the
	values of the various preset limits used by the server.
	.SH FILES
	.nf
	.ta \w'/usr/spool/ldd/cntrlsock           'u
	/usr/spool/ldd	ldd spool directory
	/usr/spool/ldd/msgsock	name of server datagram socket
	/usr/spool/ldd/cntrlsock	name do server socket or control messages
	/usr/spool/ldd/list		list of queued jobs (not always up to date)
	/usr/spool/ldd/lock	lock file (contains pid of server)
	/usr/spool/ldd/errors	log file of server errors
	.fi
	.SH "SEE ALSO"
	ldc(8),
	ldq(1),
	ldrm(1).