Celerity evaluation

hammond at petrus.UUCP hammond at petrus.UUCP
Tue Apr 23 22:42:56 AEST 1985


I have done a fair amount of simple benchmarks on a Celerity C1200,
Pyramid 90x, Vax 780, and Vax 785, to compare performance of the CPUs.
The machines all had optional floating point accelerators, the Pyramid
also had a data cache option.  The basic results:

For double precision floating point in C (using register double variables,
which the 4.2 BSD and Pyramid appear to equate to double variables), I
can confirm that the Celerity C1200 appears to be 2 times an 11/780 w/FPA.
That makes it the fastest floating point of the 4 types tested.

I also, at least on the trivial integer benchmarks we tested, can say that
the basic CPU for integer aritmetic appears to be about 3 times an 11/780
or roughly the same as a Pyramid 90x.

Disk Performance: Although my trivial benchmarks took almost the same amount
of CPU (using their new, faster cc) as the Pyramid, they took 3 times as
long in real time.  Our Pyramid has eagles, the Celerity had the slower
120Mb disks.  I don't know what improvement an eagle would make.

Flies in the ointment: The Celerity is a Fortran machine, it has a stack
register array (I'd call it a cache, but caches in my view empty/fill
automagically and this doesn't) of 16 levels.  If your code makes procedure
calls which nest to a depth of greater than 16, then the OS has to copy the
registers to main memory.  This is VERY expensive in CPU time.
Our test of Ackerman's function died after CPU times of 6.3 user, 107.5 sys
(to do all those copies of the stack registers). It died because of a
second flaw: the stack can only grow to a depth of 128K (about 1024 calls deep)
by default.  You can (at compile time) tell the system to allocate
more stack space. I have not yet received an explanation of why they did this
behaviour change from standard BSD, if there is a good reason, we could
probably live with it, since few (other than Ackermann's) procedures get
all that deep.  However, the stack register array filling/unfilling is a
more immediate concern, since it is quite expensive in CPU resources and
it does happen.  We noted that the C compiler rolled up fair amounts of
system time (several times a Pyramid 90x), probably for stack growth.

Another problem we noted was that the system calls we tried measuring
( some of those common to Sys V and 4.2 BSD) were on the average 20% slower
than an 11/780, despite having a (by our tests) 3 times faster CPU.
We are still trying to find out what was going on.  My suspicion
is the loading/unloading of the stack register set for context saves.

If Celerity fixes the stack growth to be less painful, it is a
very interesting machine for number crunching.

Rich Hammond	{allegra | decvax | ucbvax} bellcore!hammond



More information about the Comp.unix.wizards mailing list