Killer Micro Question

Thu Nov 15 07:42:16 AEST 1990

In article <16364 at s.ms.uky.edu> randy at ms.uky.edu (Randy Appleton) writes:
>I have been wondering how hard it would be to set up several 
>of the new fast workstations as one big Mainframe.  For instance,
>imagine some SPARCstations/DECstations set up in a row, and called
>compute servers.  Each one could handle several users editing/compiling/
>debugging on glass TTY's, or maybe one user running X.

Harvard did this with slower workstations, a few years ago.  The
first step was to set up several machines to look like each other
as much as possible.  NFS was used.

>But how does each user, who is about to log in, know which machine to
>log into?  He ought to log into the one with the lowest load average, yet
>without logging on cannot determine which one that is.

These machines were accessed via serial terminals.  The tty lines
came in on a terminal switch.  The initial arrangement was to have
a "class" connect you to a random machine.  Each machine would have
several users at any given time, and the loads tended to be pretty
similar.

Dan's "queued" utility was used to provide some load balancing.
This system allows you to arrange for individual programs to be
executed elsewhere dependent on load average, or whatever.  This
package has been posted to source groups, and probably is
available via ftp.

However, even this doesn't distribute the load of user keystrokes
causing interrupts and context switches, especially in editors.
There was some talk of making more sophisticated use of the
terminal switch.  There was a control port to the switch.  If
this were connected to a host somewhere, then one could arrange
for the next connection to be made to a particular machine.

>What would be nice is to have some software running at each machine, maybe
>inside of rlogind or maybe not, that would take a login request, and check
>to see if the current machine is the least loaded machine.  If so, accept
>the login, else re-route it to that machine.

It could be done in rlogin - check the rwho (ruptime) database.
If you have sources, this isn't that hard.

>It would not be good to have each packet come in over the company ethernet, 
>and then get sent back over the ethernet to the correct machine.  That would 
>slow down the machine doing the re-sending, causes un-needed delays in the
>turn-around, and clog the ethernet.

Packet forwarding isn't that bad.  The overhead is low compared
to, say, NFS.

>Also, all of these machines should mount the same files (over NSF or
>some such thing), so as to preserve the illusion that this is one big
>computer, and not just many small ones.  But still, it would be best 
>to keep such packets off the company net.

An ethernet bridge can help here.  Isolate the hog.

                +--------+
A --------------+ bridge +------------- B
                +--------+

hosts on either side of the bridge can talk to each other as if
they were on the same wire.  Packets on the A side bound to hosts
on the A side aren't forwarded to the B side (and the same is true
on the other side).  Broadcast packets are always forwarded.

>One solution that directs logins to the least loaded machine, and keeps
>the network traffic shown to the outside world down, is this one:
>
>   Company Ethernet
>--------------------------+------------------------------
>                          |         
>                          |  ---------------
>                          ---| Login Server|----   ----------
>                             ---------------   |  | Server1 |
>                                    |          |-------------
>                                 -------       |   ----------
>                                 | Disk|       |---| Server2 |
>                                 ------        |   ----------
>                                               |        .
>                                               |        .
>                                               |   ----------
>                                               |--| Server N|
>                                                  ----------
>
>The idea is that as each person logs into the Login Server, their login 
>shell is actually a process that looks for the least loaded Server, and 
>rlogin's them to there.  This should distribute the load evenly 
>(well, semi-evenly) on all the servers.  Also, the login server could have 
>all the disks, and the others would mount them, so that no matter what node
>you got (which ought to be invisible) you saw the same files.

What I don't understand is how you intend to implement this.
Is the Login Server a general purpose system, or something that
serves terminals?  You can get boxes that hook into ethernet
that do this.

>The advantage is that this setup should be able to deliver a fair 
>number of MIPS to a large number of users at very low cost.  Ten SPARC
>servers results in a 100 MIPS machine (give or take) and at University
>pricing, is only about $30,000 (plus disks).  Compare that to the price
>of a comparabe sized IBM or DEC!

Plus disks?  Without any I/O of any kind, 100 MIPS is of very
little use.  For $3K, you can get a 486 box of similar
performance.  For an extra $500, maybe they'll give you one
without RAM.  Seriously, adding up the costs for the naked CPU
chips gets you nowhere.  Add up the whole systems costs, and
compare system features.  For example, having one server and
three diskless nodes may be slower and more expensive than three
diskfull nodes - even though the nodes are the same in both systems.
The diskfull nodes may end up with slightly more or less disk
for users.

>So my questions is, do you think this would work?  How well do you
>think this would work?  Do you think the network delays would be
>excessive?

Sure, it would work.  The key for network delays is really human
factors.  A delay of one second is painfull.  Ten delays of one
tenth of a second may not be.  A longer delay can be tolerated
if it is more consistent.

The Harvard approach has several advantages:

1. It is incremental.
   They bought the hardware a little at a time.  Set each system
   up & put it into production.  As the load got worse, they could
   add new systems.  The system started with one CPU.

2. The load balancing has pretty low overhead.

3. The load balancing system was incrementally implemented.
   There was very little delay getting it going.

4. No computers were special cased beyond what data was on their
   disks, and what peripherals happened to be attached.

5. When network loads became high, a bridge was installed.

6. Redundancy of hardware and software allowed tuning the
   reliability of the overall system.

Some of the disadvantages:

1. The system becomes more complicated as more systems are
   added.  I've never seen so many symbolic links.

2. It is highly dependent on NFS.  NFS does not preserve UNIX
   filesystem semantics well.  The NFS code that was available
   three years ago cause machines to crash daily.  This later
   has improved.

3. NFS has noticeable latency.  NFS has noticeable additional
   overhead.  It is far faster and more consistent to use
   local disk.  The efficiency is not has high as you'd like,
   and the speed is not as high as you'd like.

4. It is easy to create single point failures that aren't apparent
   until they bite you.  Obviously, if /usr/bin only exists on one
   system, if it goes down, everything is pretty useless.

One system I've used that seems to solve most of these problems
is the Sequent.  The symmetric multiprocessor gives you fine
grained dynamic load balancing, and incremental expansion.  You
also get the ability to run a single application quickly, if it
happens to be easily parallelizable.  The administrative overhead
is low, and does not expand with processors.  There seems to be
no penalty for shared disk (no NFS is required).  Its biggest
disadvantage is that the minimum cost is higher.

Naturally, the best solution is the one that the office here is
using: everyone gets their own hardware.  If a system goes down,
most everyone else continues as if nothing is wrong.  System
response is consistent.  If you need data from some other host,
it is available via the network - NFS or whatever.  Since they
are 386 and 486 systems running our OS and software, DOS stuff
runs too.  Thus, I'd recommend getting lots and lots of 386 or
486 system, with lots of RAM, disk, tape units, network cards,
big screens, and of course, INTERactive UNIX.

Stephen Uitti
suitti at ima.isc.com
Interactive Systems, Cambridge, MA 02138-5302
--
"We Americans want peace, and it is now evident that we must be prepared to
demand it.  For other peoples have wanted peace, and the peace they
received was the peace of death." - the Most Rev. Francis J. Spellman,
Archbishop of New York.  22 September, 1940