dynamic arrays (was: Re: lint on V.3)

Tue Sep 5 15:31:10 AEST 1989

[Crossposted to comp.lang.c because it's now a programming topic.]

In article <282 at 6sigma.UUCP> blm at 6sigma.UUCP (Brian Matthews) writes:
>...I agree
>with your comment that they shouldn't have hard-coded limits on anything.
>...you may be able to get your vendor to
>recompile with larger limits, but you probably won't be able to get
>them to do the job right and replace the arrays with mallocs.

Brian is probably right (many people think that dynamic
allocation is hard), but I'd like to show how easy it can be.
(It's beyond me, though, why a production compiler would be using
arrays for symbol tables in the first place, rather than trees or
hash tables.)

Suppose you have a fixed-size array

	#define NELEM 100
	char *array[NELEM];
	int nents = 0;

into which elements are inserted as follows:

	extern char *strdup();

	addentry(newent)
	char *newent;
	{
	if(nents >= NELEM)
		fatalerror("table overflow");

	array[nents++] = strdup(newent);
	}

This is simple, straightforward, and correct; and drives
people nuts because no matter how big you #define NELEM,
one day somebody tries to exceed it.

Look how easy it is to grow the array dynamically:

	int nalloc = 0;
	char **array = NULL;	/* simulates char *array[nalloc] */
	int nents = 0;

	extern char *realloc();					/* note 3 */

	addentry(newent)
	char *newent;
	{
	if(nents >= nalloc) {
		nalloc += 10;					/* note 1 */
		array = (char **)realloc((char *)array,		/* note 2 */
						nalloc * sizeof(char *));
		if(array == NULL)
			fatalerror("out of memory");
		}
	array[nents++] = strdup(newent);
	}

Also, in any files where the array is referenced, you have to
change

	extern char *array[];
to
	extern char **array;

and any other references to NELEM (there shouldn't be any)
must be changed to nalloc.

The nice thing is that no actual references to the array need to
be changed, because the reference array[i] acts the same [4]
whether array is a char *[] or a char **.

Changing fixed-sized arrays to dynamic can therefore actually be
quite simple.  Merely remove the last level of brackets in the
array declaration, add a leading asterisk, and initialize the
resulting pointer to NULL.  Add another variable which keeps
track of how many entries have been allocated (as opposed to how
many are being used).  When the number used would exceed the
number allocated, grow the array using realloc.  (If code and
data space are more important than speed, the auxiliary variable
and the chunking behavior can be omitted and the array always
kept at the exact size required.)

                                                Steve Summit

Footnotes:

1. Many people feel that the allocation of the array should be
   grown by multiplicative factors, not additive:

	nalloc *= 2;

   The drawback here is that you may allocate much more than you
   need, and an intermediate allocation may fail unnecessarily,
   so the code should probably be changed so it "backs off" and
   tries again.  The additive approach, although it could be
   O(n**2) in data movement for some realloc implementations, has
   proved to be efficient enough for my uses.  (The additive
   approach is not expensive in data movement if the underlying
   malloc implementation is already doing power-of-2 chunking, as
   the 4bsd malloc does.)

2. For pre-ANSI realloc implementations, the code should be
   changed to

	if(array == NULL)
		array = (char **)malloc(nalloc * sizeof(char *));
	else	array = (char **)realloc((char *)array,
						nalloc * sizeof(char *));

   since realloc was not formerly guaranteed to handle NULL
   pointers.  (This example illustrates perfectly why the new
   behavior is useful.)

3. Under ANSI C, realloc is more correctly declared

	extern void *realloc();

4. The statement that "the reference array[i] acts the same
   whether array is a char *[] or a char **" should NOT be
   construed as meaning that pointers are the same as arrays.
   They are not, and the compiler will generate different code
   for array[i] depending on how array is declared; but the
   effect from the user's (author's of the code) point of view is
   the same.

5. The thrust of the examples above may be slightly obscured by
   the fact that the array being simulated is already an array of
   pointers.  Some may be unnecessarily bewildered by the double
   indirection in

	char **array;

   If each occurrence of the sequence "char *" is replaced by
   "int ", "double ", or "struct something " (with the exception
   of the declaration of realloc() and the cast of its first
   argument), and the new array element filled in with something
   other than the return from strdup, the intent may be clearer.

6. The above code is for illustrative purposes and may contain
   typoes.  (It also doesn't use function prototypes.)  Code just
   like it is in practically every program I write and works fine.