Library partitioning (was: Re: free (NULL))

Thu May 31 16:38:10 AEST 1990

In article <3078 at goanna.cs.rmit.oz.au> ok at goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>In article <2574 at skye.ed.ac.uk>, richard at aiai.ed.ac.uk (Richard Tobin) writes:
>The problem I've often had is that I have a data structure containing
>some pointers, and *I* have filled some of them in (using strdup())
>and the caller preset some of them to defaults.  Now I'm going to change
>one of them.  Should I free it?  If *I* allocated it, certainly, there
>isn't any other copy of the pointer.  If the *caller* allocated it...

It is an axiom of data abstraction methodologies that only
certain routines associated with a data structure should
directly modify or access any of the fields in it.  This means
that you don't need to worry whether the caller coded

	structp->namefield = "rosebud";

or

	structp->namefield = strdup("rosebud");

because you gave him a routine so he could

	setname(structp, "rosebud");

and within your implementation of setname() you can enforce
whatever allocation scheme you require.

When I am writing a seriously modularized package, the structure
declarations in the header files are surrounded by #ifdefs which
make the declarations visible only to the routines which
implement the package.  For the above example, the file would
look something like

	#ifdef STRUCT_INTERNALS

	struct whatever
		{
		char *namefield;
		/* other fields... */
		};

	#endif

	extern struct whatever *structalloc(void);
	extern void setname(struct whatever *, char *);

Any calling program (unless it cheats and #defines STRUCT_INTERNALS)
sees only extern declarations and maybe #definitions of flag
values, but does not end up knowing the "shape" of the structure.
C allows functions to pass around pointers to undefined
functions, and this programming style makes good use of that
feature.  (Unfortunately, lint -h complains about "struct
whatever never defined".)

In article <224 at taumet.COM> steve at taumet.UUCP (Stephen Clamage) writes:
>This is easy to handle (safely and portably) in C++ (which is not
>precisely the question you asked).
>In the constructor for the data structure...
>The destructor,
>and each member function which modifies the structure, checks...
>to see whether to free the data...
>Forever after, only
>the member functions are used to modify the data...

Steve is saying the same thing, although the techniques are by no
means limited to languages like C++.  C++'s programming style
(like object-oriented programming in general) merely encourages
the use of short little "member access" functions.  C++ also
allows inlining them in case you are worried about function call
overhead.

In article <3102 at goanna.cs.rmit.oz.au> ok at goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>C++ is another language.  I simply do not have the option of using it...

As mentioned, you don't have to.  Just use the techniques it
would have encouraged, such as good data hiding.

In article <3103 at goanna.cs.rmit.oz.au> ok at goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>In article <1739 at necisa.ho.necisa.oz>, boyd at necisa.ho.necisa.oz (Boyd Roberts) writes:
>> Well, once you've coded yourself into a corner all bets are off.  Choose
>> a better algorithm, one that has all the pointers in a free-able state.
>Whether something is freeable is NOT a property of the algorithm which has
>to make the immediate decision.  It is a property of the program that USES
>the algorithm.  If you're writing a library function, you simply haven't
>any control over the code that uses your function.

The problem is when you are writing a program which tries to use
a badly-written library which you have no control over.  When you
are the one writing the library, you are finally in a position to
set things up correctly.  You must arrange to give the calling
program sufficient control, but not by "giving away the farm;"
many details must be reserved to the library implementation so
that it can be changed or improved later without breaking things.

Member access functions, even for seemingly trivial operations,
are an important part of a successful interface.  The differences
between having the caller say

	structp->somefield = 1;

and

	setfield(structp, 1);

are vast, and not immediately obvious.  By using the second form,
it is possible to

	link the calling program against a new version of the
	library without recompiling, even if the structure layout
	has changed

	change the library in such a way that some other value
	must be changed whenever somefield's value changes

	enforce access rights on the field, disallowing changes
	or checking changes for validity

For example, suppose that one of the fields in the structure is
an averaging interval.  One day you decide you're spending too
much time dividing by the number of samples, and you want to
replace it with a right shift.  You can change the function to
set the averaging interval from

	setaveraginginterval(structp, avgint)
	struct whatever *structp;
	int avgint;
	{
	structp->avgint = avgint;
	}

to

	setaveraginginterval(structp, avgint)
	struct whatever *structp;
	int avgint;
	{
	if(avgint is not a power of two)
		complain;

	structp->avgint = avgint;
	structp->log2avgint = log2(avgint);
	}

and instead of later saying

	average = total / p->avgint;

you can say

	average = total >> p->log2avgint;

(Note the implication that a function might want to be declared
as returning int, rather than void, even if initially it always
returns successfully, so that later you can add cases which
complain and/or return a failure code, such as when the requested
averaging interval isn't a power of two.)

Someone will inevitably complain that the function call overhead
implied by all these little access functions is intolerable.  You
can avoid function calls by using a compiler that supports
inlining, but that does remove the relink-without-recompile
advantage.  Unless the function is called so often that the call
overhead is truly a factor, it's MUCH better to have it a
bona-fide function -- the existence of a "hook" at which
arbitrary code can fire whenever a structure field is modified
(or even examined!) is frequently invaluable.

(I'm sorry that the example above centers on an efficiency hack,
since normally I eschew them.  The example is artificial, and is
merely meant to illustrate why you might need to keep two fields
in synch, which you couldn't depend on your caller to do.  In
fact, having to keep them in synch at all is undesirable.  Keep
those silly little function calls and long divisions in there,
unless it is conclusively demonstrated that replacing them with
in-line code and/or right shifts will have tangible and useful
benefits.)

Good library design is a fascinating subject, and one worth
careful thought and study.

                                            Steve Summit
                                            scs at adam.mit.edu