Self-modifying code

Leo de Wit leo at philmds.UUCP
Sun Jul 17 04:13:25 AEST 1988


In article <752 at cernvax.UUCP> hjm at cernvax.UUCP () writes:
>I have been mulling over the idea of self-modifying code (SMC) for a while and
>I've come to the conclusion that there is no good definition of SMC.
>
>For example, if treating code as data is the definition, then does passing a
>procedure as a parameter in PASCAL, or a pointer to a function in C count?
>Probably not.

True. A pointer is data.

>OK, what about a jump table.  Consider an array of pointers to functions in C.
>Does changing the pointers count as SMC?  Again, I don't think so.

Again true. Note that it may be possible that the system (O.S.,
processor) checks whether the new pointer value represents a valid
address, or a valid entry point. If this is (always) desirable is an
entirely different question. E.g. some architectures will protest if
you try to jump to an odd address. PASCAL will leave you probably less
room to cheat than C.

>So, changing a pointer by assigning to it is not SMC, but putting a new jump
>instruction in (e.g. jmp #somewhere_else) in place of an existing instruction
>is SMC.  Does one level of indirection really make that much difference?

Yes, I think it does make a difference. Maybe not always, but there are
cases that you can't get away with this: think of re-entrant code, or
shared text segments. Now I'm thinking of recursion, what about putting
the code on the stack 8-) ? No worry about re-entrancy, and the C
program model becomes more complete (we have already global, static and
automatic data, and global and static functions, and now there's
automatic functions...).

>Of course, if you want to be really horrid in C, you can try something like
>this:
>
>char codearray[] = { 0x03, /* one byte instruction for something */
>		     0xc9  /* one byte return instruction */
>		   }

This must be (or was a) Z80 freak! At least I recognize the 0xc9 as:
RET.  On a Z80 you could do other horrible things too, since
instruction sizes vary from 1-4 bytes; by carefully picking your
instructions you could use the same code twice (using a one-off entry
point), with entirely different result. Good (?) old days, when memory
was more expensive than programming effort....

>and then 'call' this function using a cast to turn the pointer codearray into
>a pointer to a function.  (Who needs #asm anyway!)  Then you can modify the
>code as much as you want.  This _is_ SMC without a doubt, because you can
>overwrite code.  So, I propose a very weak definition for SMC as code that
>writes over other code.

Note that not all implementations will allow initialized arrays to be
altered.  I remember a problem we had last year on VMS while passing a
initialized char array to mktemp() (or whatever it's name is); the
program stackdumped because mktemp tried to write into the readonly
array (yes, VMS put it in readonly!).

>As a final note, why is it 'clean' to alter a jump table and 'unclean' to alter
>an inline constant (e.g. jmp @offset(r0) uses a value in memory as the address
>but mov (pc)+,#1234 which loads an immediate does so too)?  Why the subtle
>difference?  Any thoughts on the subject?

Try putting your code in ROM and see what happens. Just one example.
Besides I think jump tables are generally not altered, pointers are.
The jump tables could well be in the text segment, for instance.
A jump table is not an array of function pointers.

>	Hubert Matthews
>
>(I don't consider LISP or PROLOG programs that create code on the fly to be
>SMC.  Does anyone disagree?)

No, unless the code is buggy; such code on a fly could well be
Self Multiplying 8-).

  Leo.



More information about the Comp.lang.c mailing list