Followup: Variable-length string at end of structure

Sam Kendall kendall at wjh12.UUCP
Thu Jul 5 03:00:15 AEST 1984


The response to my news item "Variable-length string at end of
structure" has been awe-inspiring.  Thanks to those who replied:
wjh12!bb, hscvax!sasaki, oddjob!sean, wateng!padpowell, rlgvax!guy,
hsi!stevens, eagle!msf, browngr!jnp, nbires!rcd, whuxle!mp,
fortune!crane, rlgvax!jack, sdccs6!ix269, lzmi!psc, gatech!jeff,
petsd!joe, scc!ted, arizona!whm, sdchema!jwp, bbncca!keesan,
mordor!jdb.

For those who missed it, here is my original request:

> I am wondering how many programs use the following construct, or
> something similar:
>
> 	struct a {
> 		...
> 		char varlen_string[1];
> 	} a_struct;
> 	...
> 	p = (struct a *) malloc(sizeof (struct a) + strlen(a_string));
> 	... /* fill in structure members */
> 	(void) strcpy(p->varlen_string, a_string);
>
> That is, malloc'ing space for a fixed-length structure plus a
> variable-length string, and referencing the string using the last member
> of the structure.
>
> The Rand Editor and its derivatives do this, and Martin Minow's cpp does
> it; has anyone seen other programs that do?  I'd be interested to know
> how many programs do this, and exactly what type the last member of the
> structure is (i.e.  is it char [1]?  char?  Something else?) I need to
> know in order to put some kludge in our runtime checker to handle it;
> currently it is flagged as an error.

>From Jeff Lee (gatech!jeff) comes, finally, a short name for this
construction: "open-ended structure".  A good subject line for further
news items on this subject.

The most important point in the responses was that open-ended structures
are used a good deal.  A few felt uneasy about it ("Guilty as charged,
your honor." --Brent Byer (wjh12!bb)).  But the following programs and
libraries use it, according to letters: malloc(3), awk(1), UNSW Prolog
Interpreter, Multiplan, BBN's cc(1), APL\11, msg*(2) (message-passing
system calls in System V); and plenty of people said they use it in
proprietary software, or just use it a lot.  The most common use is for
symbol tables where names can get large.  Interestingly, Berkeley's cc
with flexnames does not use open-ended structures, since it has a hash
table with fixed-size entries.

Another point is that strings of things other than chars can use this
mechanism.  Several use it for arrays of substructures.

There were several types mentioned for the open-ended last member.  The
most common is array of 1 element, for whatever element type is
appropriate, usually char.  Array of 0 elements, or empty brackets,
which is the same thing on most compilers, was also mentioned.  The
problem with arrays of 0 in structures is that the Portable C Compiler
dislikes them to the point of giving a fatal error; thus any use of them
is highly nonportable.  (An array of 0 elements makes no sense if you
consult the reference manual, since when you use an array you get "a
pointer to the first object in the array".  Empty brackets used with
storage definitions are not discussed, and so are also questionable.)
Morris Keesan (bbncca!keesan) objects to the use of "char" type for the
last element, and I agree that array[1] is clearer; the only piece of
code I have heard of that uses "char" is a version of the Rand Editor.
Incidentally, my runtime checker now understands that array[1] at the
end of a structure in dynamic storage means an open-ended structure.

One person claimed,

> This [empty brackets] seems more straightforward to me than "char name[1]",
> since 
>     a) it makes it clear that the array is really variable length, and
>     b) you can "malloc(sizeof(SYMBOL) + strlen(s))" instead of having to
>        remember "malloc(sizeof(SYMBOL) + strlen(s) - 1)" [where SYMBOL
>	 is the structure type].

I agree that empty brackets are clearer, but unfortunately they are not
portable.  Point (b) isn't right; strlen doesn't count the null byte, so
either of those expressions can allocate too little storage.

Joe Pato (browngr!jnp) defines a constant VARYING (of value 1) to use as
the array bound in such cases.  This is nice, although a person looking
at the code would not know why
	sizeof (struct_type) + strlen(string)
would be the correct amount of storage to allocate, unless he keeps in
mind the value of VARYING, which he should not have to.  To make the
storage allocation as high-level as the declaration, you need another
macro:
	#define VARYSIZE(struct_type, nelem, elsize) \
		(sizeof (struct_type) + ((nelem) - 1) * (elsize))
which, for strings, would be called as
	VARYSIZE(struct_type, strlen(string)+1, sizeof (char))
Well, you might want to have another macro for the case of strings,
since the general macro is, ah, bulky to use.  Perhaps it is easier to
forget the whole thing and stay low-level, the way God intended.  In any
case, the use of this construct is high-level--you can think of the
storage as part of the structure, even though it isn't.

Some people objected to open-ended structures, making two points.  (1)
They are nonportable; and (2) it is just as easy to do it the other way,
putting a pointer into the structure and allocating the variable-length
data seperately.  However, they are portable, and it really can be
harder (and less clear, as Marty Sasaki (hscvax!sasaki) pointed out) to
have to allocate a second area of memory and do an additional
indirection to access the variable-length element.  It is debatable, but
I think when someone is reading a large program, it won't take him/her
that long to figure out what this construct does, even if it is not
commented and he/she hasn't seen it before.

Here, finally, are two interesting comments:

> We've done that frequently here.  Unfortunately, you can't declare something
> like
> 
> 	struct header_plus_stuff {
> 		...declarations...
> 		int	length;
> 		char	stuff[length];
> 	}
> 
> and have "sizeof()", etc. work.  It would be a major change, and twice as
> major if the variable-length stuff weren't at the end of the structure.
> PL/I does it, but PL/I is a bigger language.  So we're stuck with that trick.
> 
>	Guy Harris
>	{seismo,ihnp4,allegra}!rlgvax!guy

> I did something like this for a DBMS I was working on.  What I really
> wanted was this:
> 	struct foo {
> 		short	num_items;
> 		short	total_length;
> 		short	offsets[ num_items ];
> 		char	data[ total_length ];
> 	}
> that is, a bunch of (null terminated) strings stored in a data space,
> and a table of offsets into it.  What I wrote was:
> 	struct foo {
> 		short	num_items;
> 		short	total_length;
> 		short	offsets[ 1 ];
> 	}
> and some macros to figure out where the beginning of the data was.
> 	-Paul S R Chisholm, AT&T-IS, {lznv,lzmi,lzwi}!psc


	Sam Kendall	  {allegra,ihnp4,ima,amd}!wjh12!kendall
	Delft Consulting Corp.	    decvax!genrad!wjh12!kendall



More information about the Comp.lang.c mailing list