Crash a RISC machine from user-mode code:

Sat Aug 11 16:43:56 AEST 1990

NOTE: -LONG- POSTING, look at the summary at the bottom first if you don't
want to read a single long posting on crashing systems!!  I'd boil it down,
but I've already spent too much access $$ just composing the thing, so I
apologize to everyone for the length and to those who knew me as a software
tech writer long ago (I've always been an overly verbose engineer :-)!
--------

Hmm, this discussion was at first very interesting to me but seems to have
gotten off the track I was hoping for...let me explain:

1) As I recall, the original posting talked about somebody wondering if the
   new RISC machines were bullet-proof in user mode (essentially, based on
   their wording -- something about "register paths" and such), and
   proposed running a program that jumped to random data.  The result of
   such a program is the execution of random defined AND UNDEFINED
   instructions.

2) This kind of program should ONLY be run under so-called "user-mode"
   protection, i.e. under operating systems like UNIX, OS/2 (I think), VMS,
   A/UX, and so on, and only on CPUs where those systems offer (and have
   enabled) memory protection, fault catching, preemptive scheduling, and
   such like.  Thus it is NOT USEFUL to run the program on systems like
   IBM PCs running DOS, XENIX (I think), or Macintoshes running Apple's non-
   UNIX OS.  (Maybe A/UX fits in this category too?)  Why?  Because no matter
   what it does (short of reducing oil prices), any hand-written program could
   have done the same -- including (caution!) erasing your hard disk!

   Doubt I've lost anyone so far....

3) There's no doubt that jumping to random junk produces no useful productive
   work in the normal sense; nobody is suggesting this is a good way to use
   any kind of computer.  BUT, by running random junk, one may increase the
   likelihood of discovering a "hole" in the system (hardware or kernel,
   usually) compared to running regular code generated by a compiler or even
   regular assembly code written by users.  It may even have a better chance
   than examining the instruction set architecture and trying to purposely
   write code that breaks the machine.

4) If such a program does anything that any normal user mode program may
   conceivably do, then it should not be considered worth noting.  This is
   especially the case (even for weird things like deleting files) if the
   program is run after some other useful program has been run and still has
   parts of it sitting around in memory; the random program could easily jump
   to it.  Other things included in this "not interesting behavior", IMHO:

      a) Putting the process into an infinite loop (but the system as a whole
         still works to the same extent it would if one actually ran a
         hand-coded infinite loop).

      b) Spewing junk to the terminal screen, or hanging for input from the
         terminal.

      c) Signaling conditions caught by the OS.

      d) Logging out, playing with files, network connections, or other things
         like that.

      e) Thrashing the swapper or pager (again, assuming any user program can
         do it).

5) However, if the random program manages to do things clearly out of the
   accepted realm of "user program", and assuming it (and thus the user
   or "wetware") cannot invoke "superuser" or some other "give me direct access
   to the kernel" function, such as "poke the kernel's memory" or "write
   to raw disk sector", then one may conclude that either the operating system
   in control has a security hole, or perhaps the hardware itself has a
   security hole.  THIS IS PART OF HOW RICHARD MORRIS'S PROGRAM TRASHED THE
   NET: he knew passing a certain invalid value to a kernel-mode function from
   user-mode would escape normal defensive programming (since there wasn't any
   in that particular case), and allow his program to insinuate part of
   itself (data/instructions) into the kernel's memory and then be executed
   as kernel, not user, code.

   I highlight this issue because it IS important: if your operating system
   provides a "hole" through which any user (who can write and execute raw
   machine code if even only via BASIC POKE instructions, but certainly via
   use of assembler/loader) can do something not normally allowed in user
   mode, then your operating system has a security hole.  (I'm not talking
   about the non-user-mode systems like MAC OS, PC/MS-DOS; I mean Unix, VMS,
   PRIMOS, and so on.)  Very likely, the hole can be found and fixed (though
   the fix is painful if the "bug" is really a convenient "entry point" for
   utilities needing special features; I've dealt with fixing this kind of
   thing many times, usually involving timesharing systems' batch and printer
   queue utilities).

   But if the problem is that the underlying CPU allows a user mode program
   to somehow circumvent documented user mode protection, then the problem
   cannot be fixed without either switching to another kind of CPU (not easy;
   porting is a problem) or preventing users from writing machine code (the
   acceptable answer if you are providing only pure end-user services; for
   example, the Prodigy on-line service allows no programming, so conceptually
   could be implemented entirely on Apple IIs without having any architectural
   exposure from a security perspective -- of course, performance is another
   issue :-).

   IF a system is "hackable" from the hardware perspective, the manufacturer
   of that CPU better find and fix the problem fast, and perhaps even provide
   inexpensive replacements to their customers.  Otherwise their machine
   becomes a "target" of evil hackers, and administrators will learn to avoid
   any system based on their CPU especially when it comes to attaching such
   a system to any network or putting any sensitive data on it.

   SAMPLE WAYS TO TELL if your system has a "hole" like this, based on the
   behavior of the "random-jumping" program:

      a) Running the program crashes the entire system, but there is no known
         way of so doing with a hand-written user program.  (The culprit may
         be the OS or the CPU, but check the OS carefully first.)

      b) The program manages to rewrite part of the (protected) kernel
         without crashing the system or calling any "may I write the kernel"
         function.  (Likely to be a CPU bug.)

      c) The program somehow causes Iraq to unilaterally disarm.

   Again, if the random program does something you cannot imagine ANY user
   mode program doing (not just a correct or "well-written" one), then you
   might well be looking at a security hole.  ("Security" meaning either
   a user can access things he/she shouldn't be able to, or is able to
   trash or crash things he/she shouldn't have access to, like the CPU
   itself.)

6) If you think the random-jumping program has exposed a hole in your system,
   be it RISC or CISC, first determine (by reading the documentation or
   asking an expert on your configuration of CPU and OS) whether your system
   even ATTEMPTS to catch all possible user-mode violations.  If your CPU
   allows, for example, I/O instructions in user mode, then although it
   wouldn't fit MY definition of "user mode", it would mean a user mode
   program could do almost anything (including rewriting a swapping/paging
   kernel or other kernel-mode programs right out from under themselves by
   rewriting their disk images), so the random-jumping program would simply
   be something to avoid running any more!

   But, if you are running, say, under VAX/VMS, or on a 68030 running a
   memory-protecting UNIX, or some such thing, and the random-jumper does
   something out of bounds, then perhaps you've discovered one or more "holes"
   in the system.  If you can reproduce the problem reliably (if the program
   always creates the same random data each time, for example), then you might
   be able to step through it and find the actual instruction or instruction
   sequence that causes the crash.  (HINT: if it takes long in terms of
   instruction steps, and your system provides a user-mode n-stepper, find
   a large value of "n" to step the program that results in a crash, then use
   a binary search technique to lower "n" until you have a value that falls
   just short of the crash.)

   Once you've narrowed down the problem to a few instructions, if they're
   user mode and don't involve a kernel call, you might have a true-blue
   CPU bug: document the problem and discuss with another expert on that CPU
   (especially, try and reproduce on other chips in case yours just has a
   local flaw, and on other slightly different models of the same CPU, e.g.
   a 486 if the failure is on a 386, a 68030 if on a 68040, etc), then if
   you still think it's a hardware problem, report it to the manufacturer.

   However, it's likely that a supposed CPU problem is really an OS problem
   if the offending instructions cause a valid trap to the OS that the OS
   mishandles or fails to handle, so make sure the offending instructions
   aren't trapping to kernel mode at all.

   If you find the problem's in the OS, for example a call to an OS function
   with absurd arguments that don't get "noticed" until it's too late, then
   (again, after checking with experts and other copies and different versions
   of the OS) let the OS writer know.  And, depending on your own sense of
   ethics, perhaps let everyone else (via a newsgroup) know as well, so they
   can plan their own defenses if they are using that OS.

   (I wouldn't personally recommed advertising a CPU hole; if you're wrong
   about somebody's OS, it's fairly easy for them to prove assuming you've
   narrowed the problem down adequately, and in any case people can actually
   defend their systems fairly rapidly via patches, but if you're wrong about
   someone's CPU, they can't show everyone the schematics to prove it and
   you may have tarnished the manufacturer's image permanently, and meanwhile
   there isn't much most people can do about it quickly.  Wait for the
   manufacturer to verify/refute the problem and take their own steps, IMHO.)

7) Remember that even if you're the ONLY USER of a system with a "hole", you've
   still got a security problem unless you're also the ONLY PROGRAMMER of
   every new (or recent) program running in user mode on your system.  If
   someone else knows of a hole in a particular OS/CPU combination, they might
   use that knowledge to write a trojan horse program that pretends to be one
   thing but, when it detects a system for which it has a "kernel access"
   code to exploit a security hole, does bad things like attaching viruses
   to other programs or erasing disks.  (IMHO, the best protection against
   this kind of situation is to only use "free software" that comes with source
   code and never use the binaries, but always do the rebuilds yourself, and
   only after inspecting the source code via a quick perusal: it is much
   harder to hide a code missile in source code than in binary code.  Be
   suspicious of any data tables without adequate explanation, especially if
   they can get "jumped" to.  Unfortunately, scanning assembler code can
   be much harder than scanning HLL code like C, Pascal, or (best of all due
   to lack of pointers and such) Fortran and Cobol.)

I know this has been a long posting, but I've tried to explain what I think
are the important issues about a random-jumping program.  Again, please don't
get excited if such a program goes into an infinite loop, or signals
conditions that your OS catches -- any user program can do those things.
DON'T run such a program on ANY system that doesn't offer full user mode
protections like memory, I/O, scheduling, and others; you might just end up
trashing your hard disk or some such thing.  Finally, if you DO run the
program on an "interesting" (i.e. protecting) system and it produces
"interesting" (i.e. not-normally-allowed-in-user-mode) results, PLEASE look
into it further and, if possible, involve an expert -- you may have taken
a step towards preventing the next major virus or trojan horse infiltration!
(I mean, if YOU can find the problem, so can someone else who wants publicity
for being a mediocre, obnoxious hacker!)

James Craig Burley, Software Craftsperson    burley at world.std.com