Why does C hate 2d arrays?

Fri May 25 13:53:32 AEST 1990

In article <12722:May2501:41:2690 at stealth.acf.nyu.edu>,
 brnstnd at stealth.acf.nyu.edu writes:
> I've tried to shorten this article by replacing flames with the phrase
> 'Nuff said.

That's more incendiary than most flames.

> The language gcc compiles is a C-*like* language. 'Nuff said.
No, *not* enough said.  It does *not* handle all the cases which were
being discussed.  

> > It looks rather silly to be dogmatic about a language that doesn't
> > exist yet:
> The language gcc compiles certainly does exist. 'Nuff said.

No, *not* enough said.  The language gcc compiles does not meet the
criteria of this discussion.  However, I did over-state the case; C++
does exist.

> > > I find it very difficult to believe that a compliant Ada implementation
> > > can work with any other language.
> > Believe it.

> Name a validated Ada compiler that can produce stdio routines usable
> from C on the same machine. 'Nuff said.

Oh, _now_ I see!  An implementation of Ada can work "with ANY OTHER language"
only if it can be used to implement stdio!  Why didn't I think of that?
I thought that "working with any other language" meant that applications
programmers could call either language from the other, passing a useful
variety of data in either direction, even if not absolutely everything.
By this criterion, it would of course be impossible to mix Fortran and C
(can't implement stdio in Fortran) or Pascal and C (can't implement stdio
in Pascal) and those Lisp vendors who think their language can call C in
a useful sense are deluding themselves (hard to implement stdio in Lisp).
It doesn't matter if an Ada program can interface to an X Window System
implementation in C without difficulty or if C code can call back to Ada,
if you can't implement stdio you just aren't "working with" C.

Indeed, there is a sense in which it is impossible for *C* to "work with"
C itself according to this criterion.  The stdio function names are
reserved by the standard, and a portable program may not redefine them.
(Ok, as macros or statics is ok, but then who else can call them?)

> > > > In the presence of dynamic arrays, [whether sizeof X is legal in a
> > > > static expression is] no longer a simple syntactic test.
        ^^^^^^
> > > Say what? The same is true with or without dynamic arrays.
> > False.  sizeof is currently ALWAYS legal in a static expression.
                                                  ^^^^^^
> Rather than repeatedly insisting that you're right, why don't you give
> an example of a sizeof that can't be used in a dynamic expression with
					         ^^^^^^^
> the obvious C extensions. Repetition does not replace logic. 'Nuff said.

Sir, please have the goodness to read what I wrote.  I wrote
	[whether sizeof X is legal in a ***STATIC*** expression...]
I don't know of any instances of sizeof that can't be used in DYNAMIC
expressions, and I never said that there were any.  Misreading does not
replace logic!  The posting I am following up now quoted in the next
few lines just such an example of a sizeof which would not be legal in
a *STATIC* expression.  I ***DID*** give an example of what I was
talking about.  brnstnd QUOTED it.

You might reasonably argue that it isn't important or that the semantic
check required isn't that hard to do.  And to be perfectly fair, there
is already a semantic check required which I had overlooked:
	static char *x = &y;
is legal only if y is static or extern, and if you're going to keep that
much symbol table around in a checker, you might as well keep track of
which arrays are statically dimensioned and which aren't.

> Your example doesn't show the point you're trying to make, so why should
> anyone be expected to see it?

Another reader of this newsgroup _did_ see the point, which was that as
declared the initialiser wasn't legal.

> > > > 	void dynamic_own_array(int m, int n)
> > > > 		static double a[m][n];
> 
> Sorry, missed the ``static'' here. Without the static, assuming m and n
> are declared as const int parameters, this is fine, as I said.

Yes, but again, why assume something contrary to what I explicitly
declared?  The point is that there was once a rather nice programming
languge in which *precisely* this construct was legal, defined, meaningful,
and sometimes useful.  Look up "dynamic own arrays" in a good survey of
programming languages some time.

> Real computers have floating-point operations. 'Nuff said.

Oh dear.  That means that the Sun-3/50 in my office is not a real computer.
I must be imagining it.  Every Macintosh I have ever used must have been
just a toy.  (The PC I used last year, _that_ was a toy, I grant you.)

> > > > With specific reference to fwrite(), there are at least three fairly
> > > > obvious possible complications which I'll not trouble you with.
> > > How about naming them?  I see no problem.
> > It shows.
> Again you attempt to replace logic with rhetorical bullshit. I doubt you
> can state your ``three fairly obvious possible complications,'' because
> they don't exist. You've had two chances; think you can be explicit on
> your third? 'Nuff said.

Of course I can state them.  But I'm afraid the examination runs the
other way.  If you don't know what they are without my telling you,
you haven't understood the problem.  If someone _else_ would like to
know what the problems are, ask, and I'll hand out the model answers.

> > Here's another question about dynamic arrays.  Consider
>   [ void f(int n) { double a[n]; int m = g(n); double b[m]; ... } ]

> Better stated:

>   void f(const int n)
>   { double a[n]; int m; m = g(n); { const int x = m; { double b[x] ...

That's not "better stated".  That's **RE**stated, as in CHANGED.
There were two significant features about the way I formulated the
question:  m was *NOT* const, and the declarations were in one group.
If brnstnd means that he expects that his restatement is how a compiler
would implement such declarations, fine.  Algol 68 and Ada both allow
a series of declarations like this, and at least some Algol 68
implementations used to handle it with one stack frame.

With reference to m:  there is one language (fortunately dying) in which
the equivalent of
	int (*a)[m];
"declare a as pointer to array-m-of-int" used the value of 'm' which
was current whenever 'a' was _used_, not when it was _declared_.
In fact, proposals that array parameters should be handled as

	f(... int m, int n, elt_t a[m][n] ...)

are in essence suggesting exactly the same thing, that the shape of
an array should not be a property of the array value itself but of
the value of some other variable entirely at a later time.  So there
is a serious decision to be made: if 'm' is changed, does that affect
the apparent size of the array?

An interface I suggested for C once would have looked like

	f(... elt_t a[int m][int n] ...)

This has a two-fold point:  the programmer would not have to explicitly
pass the array bounds (so that _using_ such a function is simpler) and
it would be impossible to get the bounds wrong.  That interface would,
alas, not have allowed a programmer to pass an 'extern' 2-d array where
the size of the first dimension was unknown, so I'm afraid it just won't
fly.  C is not Pascal, thank goodness.

> (The inner braces are for clarity and to handle a different view of
> initializers.) This is perfectly fine. Upon f entry, n is in the frame.
> a and m are either in the frame or allocated immediately. x is either in
> the frame upon entry, allocated immediately after entry, or allocated
> upon entry to the second block. Finally, b is allocated after x

Yes, but *where* is b[] allocated?  In the absence of even gcc's
half-baked dynamic arrays (if you can't have dynamic arrays in records
-- as you can in Algol 68, Simula 67, Ada, &c -- you still haven't got
full dynamic arrays), every variable is at an offset from one end of the
stack frame, which offset is known at compile time.  Indeed, in many C
compilers, internal blocks are a notational fiction, entering and leaving
a block involves no stack adjusting.  (So jumping out of a block is easy,
and the justly notorious switch()ing into a block is not an allocation
problem.)  The simplest answer is to make dynamic arrays pointers, so
that

    void f(int n) 
	{ 
	    double a[n];
	    int m = g(n);
	    double b[m];
	    ...
	}

is treated as the equivalent of

    void f(int n)
	{
	    double * const a = alloca(n * sizeof *a);
	    int m = g(n);
	    double * const b = alloca(m * sizeof *b);
	    ...
	}

Now, I can see how to make this work on a PR1ME, because although
you are not in general going to be able to fit a bunch of arrays in
a stack frame there, there is an instruction for extending the frame.
The new block it returns may be in another segment entirely, but the
instruction that returns from a procedure will delink and reclaim 
these stack extensions.  However, a 'goto' leading out of a block
gets complicated.  Not impossible.  Just complicated.  It has to
deallocate all the local dynamic arrays.  I haven't got gcc at the
moment.  What happens if I do

	int i;
	for (i = 1; i <= 10000; i++) {
	    {
		double a[i];
		int j;
		for (j = 0; j < i; j++)
		    if (rand()%16 == 0) goto bogeyman; 
		    else a[j] = j*2+1;
		foo(a);
	    }
bogeyman:   { ... }
	}

Will a[] be deallocated by the 'goto'?  It should be.	    

What does gcc do with
	if (i >= 0) switch (i) {
	    double a[i+1];
	    case  0: ...a[i-1]...; break;
	    case  1: ...a[i-2]...; break;
	    default: ...a[i/2]...; break;
	}

If the dimension of a[] here were a static expression with a sufficiently
large value, this would be legal and meaningful C.

What does gcc do with
	int m = f();
	int (*a)[m] = malloc(sizeof a);

In case it isn't obvious, I respect C a lot.  It's a beautiful example of
well chosen tradeoffs.  C was not intended for the kinds of applications
that Fortran 90 has been designed for (Fortran 90 has dynamic arrays...).
The point that I have been making is that if you really want to do a
thorough job of supporting multi-dimensional dynamic arrays, there are
lots of details to get right.  My argument is _support_ for C in more or
less its present form.  Better not to offer a feature than to pretend to
have it and get it wrong.  We _don't_ want C to turn into something like
SQL.

-- 
"A 7th class of programs, correct in every way, is believed to exist by a
few computer scientists.  However, no example could be found to include here."