Is your system polluted?

Piercarlo Grandi pcg at aber-cs.UUCP
Sat Dec 23 04:34:04 AEST 1989


In article <8912211630.aa04575 at ICS.UCI.EDU> rfg at ICS.UCI.EDU writes:
    
    As part of the work I'm doing on protoize/unprotoize, I decided that it would
    be a good idea to be able to find out (for any given system) what the
    names of all of the functions declared in system include files are.
    I wrote the following script to do part of the job.
    
    The results that I got from running this script on one system are very
    saddening.  It appears that (for some systems at least) there is an awful
    lot of pollution of various name spaces contained in the system include
    files.  Specifically, there are lots of clashes of names where one name
    is used for two (or more) different things in two (or more) different
    include files.  This means that you may/will get errors if particular
    pairs of include files are included into the same base file. :-(

Actually things are even worse than Ron Guilmette says. Not only a lot
of second rate hackers put duplicate names in system headers, but they do
the following things as well:

	1) internal kernel entities are declared in headers for application
	use. A very bad offender here is System V.3.2, some BSD versions
	make an attempt at least to bracket these within #ifdef KERNEL
	#endif (which is still unsatisfactory).

	2) a more generic problem is that a lot of user level packages
	declare in the headers also entities that are only used internally
	to it.

	3) even worse, a lot of libraries contain externals that are not
	declared static. This is very dangerous, because you may unwittingly
	use the same name in your program, and then all hell breaks loose. A
	particularly bad offender is curses.

In C++ this is less troublesome as you can stuff things within the walls of
a class, and their scope will then be local to it. Except for typedefs,
unfortunately, but at least C++ 2.0 allows encapsulation of enums (and class
names, but that is virtually unavoidable).

In C, where we don't have a proper modularization facility, the following
guidelines ought to be followed:

	1) All global entities declared by a module should start with a well
	advertised module prefix, including #defines, procedure, variables,
	enums, structs, typdefs,... This has already been partially done with
	existing libraries, e.g. for prefixes 'str', 'f' (stdio), 'w'
	(curses), but usually in a half baked way. As a solution it is not
	complete, in that you may have then clashes of prefixes, but at
	least the problem becomes an order of magnitude less severe. In C++
	this is done by putting as much as possible within class boundaries.

	2) File names should also start with the modules prefix, both
	headers and sources. Such names can be either of the form
	<prefix><suffix>.h (e.g. StreamIn.h, StreamOut.h, StreamRw.h) or
	<prefix>/<suffix>.h (e.g. Inet/Udp.h, Inet/Tcp.h, ...), depending
	usually on their number (or the length of the name under System V).

	3) Published headers should contain only the client interface of a
	module. Actually, for sophisticated modules, the client interface
	should be split in several headers, each containing only a subset,
	of entities likely to be used together. Eschew all inclusive header
	files (e.g. like "builtin.h" in libg++).

	4) The internal interfaces of a module should be in a separate set
	of headers that is not published.  For example, my tree library has
	two headers, "Tree.h" and "Tree/Own.h", and the latter contains the
	declarations of utility entities used by the other sources in the
	library, and is not published. Splitting the header is better than
	bracketing with #ifdef KERNEL #endif.

	5) Under Unix, published headers ought to be in /usr/include if they
	are for modules implemented at the user level, /usr/include/sys if
	they are for kernel level modules. Internal interfaces ought not to
	be in either; they ought to be in /usr/sys/h or the directory that
	holds the module sources, e.g. /usr/src/lib/libc. If there are
	multiple headers, according to rule 2),

	6) All file global entities internal to a module should be declared
	static. If they cannot, because the module is split in several
	source files, then respect of rule 1 is absolutely essential.

Naturally all these rules are palliatives; what we should really have, and
given C, C++, and Unix and other similar operating systems, we will not
have, is a tree of symbol tables. To have this the best way is to have an
object store, like in RSRE Flex or Cambridge CAP, or some Lisp machines or
systems, but this is wishful thinking... Second best would be something like
Multics, as usual.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk at nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg at cs.aber.ac.uk



More information about the Comp.unix.wizards mailing list