length of external names

Paul Schauble Schauble at MIT-MULTICS.ARPA
Sun Jan 6 17:21:17 AEST 1985


I'm not sure that I should post this to the net, but I can't resist..

Henry Spencer, who seems to be one of the chief exponents of short
external names, just posted a convincing explaination of the need to not
break existing linkers.  I understand why and the issues involved.  I
even mostly agree.  In a previous incarnation I worked on COBOL and PL/1
for a manufacturer that had the same problem:  a language that required
long names and a linker that only handled short ones.

The solution that was used, and worked, was to have the COMPILER use the
external "name" to store a hashed value.  During the recent net
discussion I posted a description of this technique and some analysis of
the chance and cost of collisions.

This is done entirely in the compiler, and has no effect on the linker.

I have not seen any reasonable statement of why this would not be
workable.  The only objection that I can recall was that having to look
up the name translation during debugging was extra work.  True, but
consider...Would you rather have the extra work on the few occasions
that you need to look up a symbol on the load map, or on the many more
frequent occasions that you are dealing with C source and have to guess
what "dtfmdu" or something means?  You know which way I will vote.

More recent discussion prompts me to post a small modification of the
technique.  Several people have pointed out the desirability of a
language feature that would have the internal and external names of a
global item be different, e.g.

          extern int date_and_time() entry "SYS$TIME";
          extern int memory_size entry "CSYS$MEMSIZ";

I like this, other languages have it, it's useful, and it would have
saved me having to write a number of assembler routines whose only
purpose was to change names.

It also allows me to suggest a modification of the hashing technique.
Note that this only applies to systems with deficient linkers.

If the declaration contains an entry clause, use that as an external
name.

Otherwise, if the item name is short enough, use the item name.

Otherwise, hash the item name and use the result as the external name.

This allows programming using the full names, and using the entry clause
for those cases where you really care what the external name is, or in
the rare cases when the hash causes a duplication of external names.

----------------------------------------------------------------------

Now, my questions:

   To the standards commiteee poeple:

1.  Suppose that the standard required longer names and suggested the
    hashing technique as an implementation technique, you would force
    manufacturers to update either linker or compiler to meet the
    standard.  Is this politically possible?

2.  In some other areas, I am told, the standard described a relatively
    high level language, rather than the mimimum of implementations.
    This will prevent some present compilers from meeting the standard.
    Why should it pick the mimimum here?

3.  How can I get a copy of the draft standard?

4.  Is this an adequate method of getting comments and questions to the
    committee? If not, what is a useful channel?

    To the net at large:

1.  What are specific objections to the hashing technique?

2.  Are there any machines where it won't work, and why?


Please copy me on any answers.  Service from the list has been erratic
lately.

          Thanks for all the fish...

          Paul
          Schauble at MIT-Multics.ARPA



More information about the Comp.lang.c mailing list