Wanted: New process scheduler with per-uid time slices

Wed Oct 10 05:50:47 AEST 1984

The same problem exsists here at UCSD with large Computer Science classes
beating the life out any kind of machine thrown at them. The biggest
problem with running machines for student use has been the fact they are
very competitive and think nothing of running as many compiles in the
background as they feel they need. The problem is that unix will attempt
to time slice among all the runnable programs at any given moment. As the
number of runnable jobs increases, the size of the actual virtual memory in use
increases causing the system to page then eventually swap (for 4.2 systems).
The more the system pages the less of the cpu is available for running user
processes. This increasing rate of system overhead usage to support the larger
number of runnable jobs slowly erodes the throughput of the system to the point
were no real work is being done other than supporting the context switches
and their subsequent page faults. The actual point at which the throughput
begins to rapidly erode depends on the actual job mix at that time, but you
can roughly relate it to the load average.

However all is not as bad as it seems. It turns out that you can group
the job mix into two kinds of jobs: background jobs and interactive jobs.
I use the term background jobs to classify those jobs that do not require
user input after the job has started. This means that background jobs includes
compiles, nroff, etc. Interactive jobs are things like vi. If you look at
how the two different kinds of jobs effect the load on the machine you
quickly realize that the long term load average is mainly due to the background
jobs soaking up all the cpu cycles. Interactive jobs like vi are for the
most part (as averaged over time) blocked on i/o. What you want to do is to
limit the number of runnable background jobs at a given time to prevent the
vitrual memory system from thrashing. This also means that when an interactive
job becomes runnable it will quickly get it required time slices. This 
quick response to interactive jobs is really the single factor that a user
will use to classify a system as being fast or slow. The quick response a
user gets from interactive jobs seems to far outweigh the effects of queueing
background class jobs.

The solution used on the UCSD Academic computer center machines (vaxes, suns,
pyramid's all running 4.2) is through a mechanism called load control.
Load control consists of a server and client "front end" programs. Each
background class job is replaced by a client front end with the actual
binary being hidden in a place that users are prevented from accessing.
When the user invokes a background class job he is really running the front
end client program. The client sends a request to run packet to the server.
The server can either respond with a run message or a queue message. The
actual request depends on the current load situation. If the load average
of the machine is above a certain point the job is queued. When the client
gets a queued messages, he blocks waiting for a run message. By blocking he
does not contribute to the load so in effect the load on the machine is limited
to approximately the threshold point set in the server for queuing jobs. When
the server is queueing it checks the load at specific time intervals to see
if the load has dropped below the threshold load. If the load is below the
threshold, the server starts sending run commands to queued background jobs.
(The actual number of run commands sent depends on the current load average
and the load average queueing point). When the client gets a run message he
execs the protected binary he replaced (it uses group access restrictions to
protect the actual binary files).

The load control mechanism is quite successful. Before load control the
systems load would swing from long periods of very high load to periods
where the machine went idle. One machine for example had 52 users with
48 vi's and 25 c compiles running at once to create a load of 28 before
load control. After load control with the same mix of jobs the load
was reduced to around 10 (the threshold for queuing). The average time a
job was queued was around 90 seconds (the machine was a small pyramid 90x
a large 780 with the same job mix has an average queue time of around
400 seconds. The pyramid clearly has several times the throughput).
What the load control effectively does is to clip the high load peaks
and to fill in the low load "valleys" with the queued background class
jobs. This has increased throughput tremendously and prevents the mess
created from long periods of very high load averages.

The load control mechanism is actually much more sophisticated than described
above but the basic idea is the same. The real advantage to this approach is
that kernel based approaches can not easily distinguish between a vi and a
compile, causing interactive jobs to become unuseable. If users cannot
get their vi's to complete they do not get off the system (they never get
to input the file they want to compile, nroff etc).

	Keith Muller
	Academic Computer Center
	University of California, San Diego