OK, so why _does_ ld resolve text against data?

Richard A. O'Keefe ok at goanna.cs.rmit.oz.au
Thu Aug 23 14:03:24 AEST 1990


In article <930 at eplunix.UUCP>, das at eplunix.UUCP (David Steffens) writes:
1> Now my question is, why does the linker silently resolve
1> [ a ] function reference to [ a ] global variable
1> without even a whisper of a warning? ...

To start with, there are operating systems where this kind of thing
cannot happen (love that B6700 MCP...).  I was *appalled* the first
time I had a subroutine call link with a common block.  So this is
a deliberate choice.

The UNIX linkers allow a function reference to be resolved by a global
variable because they have no way at all of telling one from another.
The title of this thread refers to "text" and "data", but a read-only
array may well be in "text" space.  Consider the "-R" flag on BSD UNIX
compilers and the "const" keyword in ANSI C.

(There _is_ symbol table information available in COFF format, but the
linker can't use it because it isn't always there.  Galling, no?)

If you use Simula 67, Ada, Modula-2, or recent versions of C++, their
language support systems keep around enough information to ensure that
this kind of mistake _is_ detected.  cfront 2.whatever-it-is kludges
it by frobbing the names.  Ugh.  But it works.  So you might consider
switching to C++.

> Nevertheless, the linker _is_ blameworthy because it will _also_ happily
> use the address of one of my global variables to resolve a function call
> embedded in a library routine for which I have no lintable source, e.g.

> int index;
> main()
> {
> 	/* lots of code, none of which uses index() */
> 	vendor_library_routine(); /* which, unknown to me, uses index() */
> }

The ANSI C committee thought about this; that's precisely the "namespace
pollution" issue they were concerned about.  Unfortunately, all that gives
you is assurance that the C runtime library doesn't pollute your namespace;
no guaranteees about anything else.

Myself, I don't see data/function collisions as being any worse than
function/function collisions.  There is a certain UNIX variant that I
sometimes use which provides a dynamic loading library routine, which
swipes an _amazing_ number of useful and obvious names; if I get one
of those by accident, or if my routine interferes with it, it doesn't
improve matters that at least it was a function I collided with.  If
anything, it makes the mistake _harder_ to find.

> The chances of a name collision of this sort rises exponentially
> with every new UNIX release.

Get an ANSI-compliant compiler and the chance of accidental collision
*with the C run-time library* drops to 0.  But use a vendor-supplied
function which is in neither ANSI C nor POSIX, and I'm afraid you're right.

The ultimate problem is that C assumes a _single_ global "extern" namespace.
Just like Fortran and Pascal, in fact.

Using COFF format, it wouldn't be too hard to produce a "packaging tool"
which took description files
	import <id>, ...
	export <id>, ...
	source <file>, ...
and did a "ld -r" to hook the files together into one library with names
mangled so that only the imports and exports were left untouched (it would
have to know about the local Pascal, C, and Fortran run-time libraries, so
that their names were preserved, but providing for that wouldn't be hard).
I've thought about doing this, but the trouble is that I keep running into
machines which are COFF "with extensions" or modifications.  It would also
be comparatively straightforward to look at the symbol table information
left by the "-g" option, execpt that (a) too many compilers won't give you
symbol table *and* optimisation at the same time and (b) the symbol table
information is not as portable as it might be either.
-- 
The taxonomy of Pleistocene equids is in a state of confusion.



More information about the Comp.unix.wizards mailing list