Improving C
Laws at SRI-AI.ARPA
Laws at SRI-AI.ARPA
Fri Feb 24 09:29:55 AEST 1984
From: Ken Laws <Laws at SRI-AI.ARPA>
The recent messages about strncpy() illustrate the need for string
commands in addition to the character vector commands offered in C
and UNIX. Character manipulation combined with malloc() can be made
to do whatever you want, but the semantics can be confusing. I find
it absurd that there is not even a standard library of dynamic string
routines supplied with UNIX. I have written such a package myself,
and I am sure many others have also. String routines are easy to
write in C, which may be why they are always hacked inline, but why
must we all reinvent such wheels? A separate string package could
be made reasonably efficient and could include extras such as a
length field (making it possible to embed nulls in a string) or a
current position pointer (making a string into a virtual disk file).
The following is a list of other suggestions I have for improving
C and the C environment:
The C language is reasonably clean, but it could be improved.
(Maybe the next version should be named D?) In particular,
I would like:
Dynamic strings that are distinct from character vectors. A
string should be represented by its address as is now done for
arrays. String routines should return copied substrings, etc.
A concatenation routine is particularly needed. (We have provided
one on our testbed, but without garbage collection such things are
a little dangerous.)
Dynamic matrices that are addressable using multidimensional subscripts.
Lists. Definition of a list as a char ** works, but it must be
initialized as a (char *)[]. This could be fixed in the compiler.
Classes, as implemented in the "class" preprocessor from Bell Labs.
Begin(name) and end(name) delimiters as part of the language. Our
SRI testbed macros do not check for matching names, and cannot be
used for top-level brackets because ctags does not expand the macros
and gets confused. The cb program also fails to recognize brackets
hidden by macros.
True nested procedures in addition to the current nested blocks.
At present it is difficult to make certain variables global to
a main subroutine and its "servants", yet not global to everyone.
This also makes it difficult to convert code from other languages
that do have this capability.
Variables declared outside functions should be private (static) by
default. A "global" or "public" keyword should be required to make
them available externally.
A "proc" or similar keyword used in function headers so that
they can be easily distinguished from variable declarations.
A "forward" or "extern" keyword could be required to distinguish
headers without bodies. This would simplify the job of cc, cb,
ctags, and other programs that analyze C source files.
It should be possible to use an enumeration code (e.g., NONE) with
different values in different enumerations. Macro names must
necessarily override enumeration names, so it is probably an error
to have the same codes for both. Some type of package or union
specification is needed for enums.
An nargs() function to return the number of arguments passed
to a routine. Such a function exists in the Berkeley
UNIX, but is not documented. [The Berkeley routine actually
returns the number of words in the argument list, which can
differ from the number of arguments.]
Macros that can handle a variable number of arguments. At present
it is impossible to extract some of the arguments for various
purposes and then pass the rest (however many) on to printf.
It is also impossible to replace "return" with a macro because
it may or may not have an argument.
An OMITTED argument code of some type that can be used to test
whether an argument to a function or macro was omitted by typing
successive commas or providing too few arguments. This might be
coupled to a default mechanism, but the user can easily write
his own defaults if the OMITTED code were implemented.
Some type of entry and exit hooks that can be used for debug tracing,
timing instrumentation, etc. It is currently awkward to intercept
return statements because they accept a variable number of arguments
(one argument or none, but not an empty argument list).
The assignment operator should have been := instead of ==. Use of
= instead of == in conditionals is a common source of error.
I particularly object to the statement in the manual that "Expressions
involving a commutative and associative operator (*, +, &, |, ^) may be
rearranged arbitrarily, even in the presence of parentheses; ...".
This is inexcusible in a modern language.
I am also unhappy about the number of machine-dependent results
that C permits. (E.g., overflow and divide check, rounding of
negative numbers, mod (%) on negative numbers, sign extension on
chars, sign fill on right shift, direction of bits accessed by
bit fields.)
It should be possible to put spaces before a # command for the
compiler. Also, it should be documented that spaces are legal
after the #.
Use of escaped linefeeds in a macro confuses the compiler: its
diagnostic messages do not count the continuations as lines, but
vi does count them. (This has been fixed in Berkeley 4.2.)
Fclose should be called automatically when a program terminates
abnormally. (It is already called for normal terminations.)
It is very difficult to find some bugs when buffers are not dumped.
If the program runs for a long time, it is convenient to pipe
its output into a log file instead of tying up a terminal. If the
log file is not flushed, however, this is not only unproductive;
it is misleading.
We just found another bug where setting array[4] in something
declared "int array[4]" overwrote a pointer in a distant piece
of code. C ought to offer a run-time subscript checking facility,
and certainly should have caught this compile-time error.
(Hardware speed and storage are becoming less of a consideration
every year. Programming ease and software reliability should be
dominant.)
The compiler should warn about statements like "x+1;" since they
can have no side effects or other useful purpose. Most likely
the statement is intended to be "x+=1;".
The expression "(cast) (flag == 0) ? 0 : 1" applies the
cast to the boolean test rather than to the output of the
conditional expression. I would much rather see the syntax
"(cast) ifv (flag == 0) thenv 0 elsev 1" where the cast
applies to the final value. [I have implemented the ifv/thenv/elsev
macros, but there is no way to put hidden parentheses around the
entire constuct unless one adds a special terminator (e.g., "fi").]
The expression (A,B) returns the value of B. There needs to
be a similar syntax for those cases where the value of A is
desired, and A must be executed first. In particular, suppose
that we are writing a macro noteerr() which is supposed to
evaluate its argument and take some action based on a global
return code, then return the result of the initial evaluation
as its value. For example, suppose we want to pass some functional
value, func(), to a subroutine, subr(), and that we want to wrap
the evaluation of func() in an error handler:
subr(...,noteerr(func()),...);
There is currently no way to do this and have the whole noteerr()
macro return an object of the same type and value as func() when
func() may be of arbitrary type. It could be done if there were
an (A,B) syntax that returned the value of B.
The compiler should accept string continuations of the form
printf("Beginning of string"
" continuation of string.");
The SAIL/MAINSAIL dynamic string concatenation syntax is
even more flexible, but even this primitive convention would be
adequate for compile-time concatenation.
Every enum should have a validity checking routine, e.g.
valid(...). This would permit one to identify illegal
values without converting everything to ints. Note that
valid enums are not necessarily sequential, so that the
test can be complicated. This checking cannot be done at
compile time, so it may be necessary for the user to provide
the checking routines; a pity.
I just got caught again closing a comment with \* instead of
*/. The compiler just ate everything up to the next comment.
I see no reason why C can't allow nested comments and also
check for proper balance of comment delimiters.
-------
More information about the Comp.unix
mailing list