length of external names

William D. Ricker wdr at faron.UUCP
Fri Jan 11 04:44:17 AEST 1985


>From: Paul Schauble <Schauble at MIT-MULTICS.ARPA>

>.
>:
>The solution that was used, and worked, was to have the COMPILER use the
>external "name" to store a hashed value.  During the recent net
>discussion I posted a description of this technique and some analysis of
>the chance and cost of collisions.

>This is done entirely in the compiler, and has no effect on the linker.

>.
>:
>More recent discussion prompts me to post a small modification of the
>technique.  Several people have pointed out the desirability of a
>language feature that would have the internal and external names of a
>global item be different,  ...

>If the declaration contains an entry clause, use that as an external
>name.
>Otherwise, if the item name is short enough, use the item name.
>Otherwise, hash the item name and use the result as the external name.

-----------

One interpretative language I'm familiar with uses a similar hashing
scheme.  (This ties in with the suggestion of 7chars & length, as was
used in PL/I.) The length, initial three characters, and a hash-code of
1-31 character identifiers where used as the internal name.  In the
special case of length=4, the hash-code is the fourth character, also
compressed.

In reality, the length and 4 characters are compressed into 4 bytes.
This is possible due to the limited character set for identifiers.  The
interpreter unpacked the length, initials and hash when the structure
was displayed (in debugging or listing what routines were loaded).  It
even altered the format to distinguish visually between

	4 frob

	7 fro b

to empashize that "frob" hashes itself, "4frob", but "frobble" hashes to
"7frob".

I'm not sure what the character set was, nor bit assignments.  (I could
look it up at home if anyone cares.) It might have been 5 bits for the
length (1-31) and 6 bits for the compressed initials and hash--but it
wasn't the SIXBIT character set.  For some reason, I think the number
of bits for the 3-char compression (perfect hash) was not divisible by
three, though, and the 4th char/hash was compressed separately.  17
bits would represent 128k combinations, which would represent 3
characters from a 50-character font; 16 bits suffices for 3 alphamerics
(40-character font: [A-Z0-9@$#_]).


-- 

  William Ricker
  wdr at MITRE-Bedford.ARPA					(MIL)
  wdr at faron.UUCP						(UUCP)
  decvax!genrad!linus!faron!wdr					(UUCP)
 {allegra,ihnp4,utzoo,philabs,uw-beaver}!linus!faron!wdr	(UUCP)

Opinions are my own and not necessarily anyone elses.  Likewise the "facts".



More information about the Comp.lang.c mailing list