Crash a RISC machine from user-mode code:

Wed Aug 15 11:34:40 AEST 1990

In article <1826 at mountn.dec.com> akhiani at ricks.enet.dec.com
	(Homayoon Akhiani) forwards the following from (gjc):

>                          Reports have arrived that all of these machines
>can be crashed using CRASHME.C:
>IBM RT, MIPS, DECSTATION 5000, SPARC.
> 
>On the two CISC architectures tried, VAX/VMS and SUN-3, the program
>either completed or exited with a core or register dump, as expected.

[Peter da Silva has since crashed a 386, and Dominic Dunlop's A/UX
68020 got somewhat confused.  Peter: what happened to the V.2 68000 ?]

I ran this program continuously with different seed values on a (new)
MIS-2 and an (old) 9805.  They've been running overnight, and they're
still running, merrily catching bus errors.  Both machines use
Pyramid's proprietary RISC CPU, both run OSx.

>Some background/motivation. My experience with microcode programming
>taught me that some sequences of MICROINSTRUCTIONS could wedge or jam
>the hardware in such a way that recovery was impossible without
>a reboot of some kind. The RISC architectures have some of the same
>properties of MICROCODE in that certain instruction sequences have
>UNDEFINED behavior. Now one of the great costs in a CISC machine is
>usually the trouble the designers go through to make sure that
>every instruction returns the MACHINE to a KNOWN STATE. That way
>the behavior of every instruction can be well defined, tested, and
>documented, individually verified and tested, and by simple induction
>be valid for arbitrary SEQUENCES of instructions. (In general).

Looks like some SWEEPING generalisations CREPT in there.  Damn CAPS
lock KEY keeps getting STUCK ... there, that's better.

>Engineers of RISC machines don't bother to do this, which is one of
>the reasons they are CHEAPER (the hardware, not the engineers).

A(nother) sweeping generalisation if ever I saw one.  Of course, that
doesn't mean it's untrue.  I always thought they (the processors) were
cheaper because they were simpler.  If, in fact, they were cheaper at
all.  ECL SPARC, anyone ?

>The problem of proving that an arbitary sequence of instructions "N"
>long will not crash the machine is much more costly if N > 1.
>(To say the least, if you know anything about mathematical logic).
>If there are M instructions (and M is probably around 1 BILLION)
>then there may be about M^N cases to check. And what is N? 
>For a classic CISC machine a price is paid to make N = 1, or
>at least small. But for a RISC machine, might N be 10 or more?

Seems to me that this argument is completely the wrong way around.  Say
you've got a machine that executes fixed length (say 16-bit)
instructions on fixed (in this case, even-byte) boundaries.  If you can
prove that it behaves properly for each of these instructions (and can
trap odd-byte instruction fetches) then you can prove by induction that
it behaves properly for all possible instruction sequences.

If, on the other hand, you have a machine that executes variable-length
instructions, you get into the combinatorial explosion that the
original poster is concerned about.  You have to prove that the machine
behaves for all operand lengths, for any and all instruction prefixes,
and (since you can now jump into the middle of an instruction) for
execution of partial instructions.  This is a much more difficult job
than handling the fixed-length instruction case.

Consider the guys at Intel who, just when they think they've validated
their microinstruction sequencer, have to deal with segment override
and bus lock prefixes.  [Well, presumably it's not sprung on them as
such a surprise, but you get the point.]

Guess which end of this spectrum RISC machines lie at.

[If anyone wants to discuss this further, I'd suggest comp.arch ...
there hasn't been a good RISC/CISC row in there for weeks now.  I've
changed the header to send followups to that group.]

>Anyway, no need to make too big a deal about this. Probably all the
>vendors can fix things in software alone ...

Maybe.  If the hardware works up to some minimum level, then yes.

>                                      ... and certainly CISC chips
>with bugs in them have been shipped in the past too.

But surely never a VAX :-)

Oh, BTW, I'm not trying to grind a RISC axe here.  Seems to me that
this problem is pretty much independent of RISC/CISC, with the slight
exception that a RISC chip _ought_ to be easier to characterise and
validate than a CISC chip.  Whether that gets done properly with either
(or indeed any) architecture is a rather different matter.

Regards, Mike.

moliver at pyramid.com
{allegra,decwrl,hplabs,munnari,sun,utai,uunet}!pyramid!moliver