Self-modifying code

Jan Christiaan van Winkel jc at atcmp.nl
Fri Oct 12 21:33:59 AEST 1990


>From article <829 at neccan.oz>, by peter at neccan.oz (Peter Miller):
> 1. On a Z80 I wrote some code which used a NMI (non-maskable interrupt).
This reminds me of the code used in the 8080 basic interpreter by Microsoft.
They had several entries into the errorroutine. The errorroutine expected
an errornumber in register b. Now what they had done was:
ld hl,<some number>       ; registerpair hl gets the value <some number>
ld hl,<some other number>
ld hl,<some other number>
and so on.
The 16 bit numbers themselves were actually instructions:
ld b,errorcode

By jumping into the middle of one of the ld hl,... instructions, they would
load the errorcode in b, and then execute some dummy ld hl,... instructions.
that would not globber the value in b, eventhough the ld b,xxx instructions
were just a byte away.
Although this is not self modifying code, it is 'shifting the bits a bit and
interpreting the result'. Very clever

> 4. At some point, I realized that using a compiler is rather like
>    self-modifying code.  The compiler, itself a binary data file, chews on a
>    text file and makes a binary data file.  When we run the program we just
>    compiled, we are asking the OS to load a binary data file and leap into it.
Hmmm. I think you should read Ken Thompson's Turing award lecture. He dis-
cussed the possibility of getting code into a C compiler, without having it
in the source. The trick is illustrated with the addition of a new escaped
character like \n. In the lex. analyzer there is some sort of code like this:
case '\': switch(getnewchar()) {
   case 'n': return '\n';
   case 'a': return '\007';    /* the newly added character */
			       /* my name's Bond, James Bond :-) */
   .
   .

Now compile the compiler, and you'll have a new compiler that recognizes '\a'.
Now edit the sourcecode to look like this:
    case 'a': return '\a'     

Tghis is possible because the compiler will be compiled with the compiler that
knows about '\a'. The result is a C compiler that knows that '\a' is in
reality '\007', but nowhere in the source of the C compiler that knowledge
is stored. It is inherited from the previous generation of the C compiler.

JC
-- 
___  __  ____________________________________________________________________
   |/  \   Jan Christiaan van Winkel      Tel: +31 80 566880  jc at atcmp.nl
   |       AT Computing   P.O. Box 1428   6501 BK Nijmegen    The Netherlands
__/ \__/ ____________________________________________________________________



More information about the Comp.lang.c mailing list