When it is amoral... (Re: When is a cast not a cast?)

Steve Summit scs at sloth.pika.mit.edu
Thu May 11 13:39:44 AEST 1989


I think everybody has figured out by now why pointer addition
doesn't work or make sense, but I'll throw in another
perspective just for good measure.

I think Blair's original confusion stemmed from wanting to treat
a pointer as an actual memory address.  It's true that pointers
are represented on many if not most machines by actual memory
addresses, and that pointers are generated with the "&" operator
which is named "address of", and that thinking about machine
addresses is often a helpful way to think about pointers; but as
has been amply pointed out, a pointer is properly a higher-level
language construct which is removed from, and insulates the
programmer from the details of, the implementation.

The one time I wanted to add pointers was when writing a dynamic
linker.  It seems reasonable, at first, to use pointers to
describe the addresses (locations) of the symbols within an
object module being read in.  (I'm already groping though; I
talked about pointers as addresses because that's how they're
usually implemented; now I'm turning around and trying to
implement an address as a pointer as if they were the same.)

Additionally, I may well have a pointer (call it "base") to the
spot in memory into which the object module is being dynamically
read.

One of the things a linker must do is relocation.  Suppose an
object module defines a symbol x, and that the symbol's
address/location is 4 (relative to the beginning of the object
module; that is, the object module essentially defines a frame of
reference that assumes that the module begins at location/address 0.)
If I am using a pointer as my generic address type, I might
read the "4" out of the object module's symbol table and cram it
(via any suitable means) into a variable (call it "loc") of
pointer type.

Once the object module is read in, the actual address/location in
memory of the symbol x will be base + loc (x was at "loc"
relative to the start of the module, which is being read in at
address/location "base").

So, if I had declared both base and loc as pointers, the compiler
would have complained when I tried to compute base + loc.  (In
fact, that is how I did attempt to write it, at first; and in
figuring out why it couldn't work, I gained a deeper understanding
of the relationship, and differences, between pointers and
addresses, which is what I am trying to impart here.)

The problem is that, in writing a linker, I have dropped
completely beneath the machine-independent high-level abstract
model which the language provides.

What I eventually did was to represent addresses/locations as
unsigned integers, no longer attempting to disguise the fact that
I was, in fact, dealing with actual machine addresses, which are
(for a flat-address machine) honest-to-God numbers, not
pointers.

On a non-flat-address space machine, the appropriate type for a
machine address may be some semi-complicated structure, and
computing base + loc might require a subroutine call (which C++
could hide for me...) to a routine which knew about the memory
model of the machine in use.

Obviously, such code is unportable, but code which gets real
close to the machine (assemblers, linkers, debuggers, kernels)
does tend to have its nonportable aspects.  (Don't lose heart,
though; they can also have their portable aspects.)

The moral is, if you want (and have good reason) to talk about
actual machine addresses, don't beat around the bush with
pointers.  Use integers or structures or whatever you have to
use to accurately describe the machine addresses you're actually
using.

                                            Steve Summit
                                            scs at adam.pika.mit.edu



More information about the Comp.lang.c mailing list