Improving C

Laws at SRI-AI.ARPA Laws at SRI-AI.ARPA
Fri Feb 24 09:29:55 AEST 1984


From:  Ken Laws <Laws at SRI-AI.ARPA>

The recent messages about strncpy() illustrate the need for string
commands in addition to the character vector commands offered in C
and UNIX.  Character manipulation combined with malloc() can be made
to do whatever you want, but the semantics can be confusing.  I find
it absurd that there is not even a standard library of dynamic string
routines supplied with UNIX.  I have written such a package myself,
and I am sure many others have also.  String routines are easy to
write in C, which may be why they are always hacked inline, but why
must we all reinvent such wheels?  A separate string package could
be made reasonably efficient and could include extras such as a
length field (making it possible to embed nulls in a string) or a
current position pointer (making a string into a virtual disk file).

The following is a list of other suggestions I have for improving
C and the C environment:


  The C language is reasonably clean, but it could be improved.
  (Maybe the next version should be named D?)  In particular,
  I would like:

    Dynamic strings that are distinct from character vectors.  A
    string should be represented by its address as is now done for
    arrays.  String routines should return copied substrings, etc.
    A concatenation routine is particularly needed.  (We have provided
    one on our testbed, but without garbage collection such things are
    a little dangerous.)

    Dynamic matrices that are addressable using multidimensional subscripts.

    Lists.  Definition of a list as a char ** works, but it must be
    initialized as a (char *)[].  This could be fixed in the compiler.

    Classes, as implemented in the "class" preprocessor from Bell Labs.

    Begin(name) and end(name) delimiters as part of the language.  Our
    SRI testbed macros do not check for matching names, and cannot be
    used for top-level brackets because ctags does not expand the macros
    and gets confused.  The cb program also fails to recognize brackets
    hidden by macros.

    True nested procedures in addition to the current nested blocks.
    At present it is difficult to make certain variables global to
    a main subroutine and its "servants", yet not global to everyone.
    This also makes it difficult to convert code from other languages
    that do have this capability.

    Variables declared outside functions should be private (static) by
    default.  A "global" or "public" keyword should be required to make
    them available externally.

    A "proc" or similar keyword used in function headers so that
    they can be easily distinguished from variable declarations.
    A "forward" or "extern" keyword could be required to distinguish
    headers without bodies.  This would simplify the job of cc, cb,
    ctags, and other programs that analyze C source files.

    It should be possible to use an enumeration code (e.g., NONE) with
    different values in different enumerations.  Macro names must
    necessarily override enumeration names, so it is probably an error
    to have the same codes for both.  Some type of package or union
    specification is needed for enums.

    An nargs() function to return the number of arguments passed
    to a routine.  Such a function exists in the Berkeley
    UNIX, but is not documented.  [The Berkeley routine actually
    returns the number of words in the argument list, which can
    differ from the number of arguments.]

    Macros that can handle a variable number of arguments.  At present
    it is impossible to extract some of the arguments for various
    purposes and then pass the rest (however many) on to printf.
    It is also impossible to replace "return" with a macro because
    it may or may not have an argument.

    An OMITTED argument code of some type that can be used to test
    whether an argument to a function or macro was omitted by typing
    successive commas or providing too few arguments.  This might be
    coupled to a default mechanism, but the user can easily write
    his own defaults if the OMITTED code were implemented.

    Some type of entry and exit hooks that can be used for debug tracing,
    timing instrumentation, etc.  It is currently awkward to intercept
    return statements because they accept a variable number of arguments
    (one argument or none, but not an empty argument list).

    The assignment operator should have been := instead of ==.  Use of
    = instead of == in conditionals is a common source of error.

    I particularly object to the statement in the manual that "Expressions
    involving a commutative and associative operator (*, +, &, |, ^) may be
    rearranged arbitrarily, even in the presence of parentheses; ...".
    This is inexcusible in a modern language.

    I am also unhappy about the number of machine-dependent results
    that C permits.  (E.g., overflow and divide check, rounding of
    negative numbers, mod (%) on negative numbers, sign extension on
    chars, sign fill on right shift, direction of bits accessed by
    bit fields.)

    It should be possible to put spaces before a # command for the
    compiler.  Also, it should be documented that spaces are legal
    after the #.

    Use of escaped linefeeds in a macro confuses the compiler:  its
    diagnostic messages do not count the continuations as lines, but
    vi does count them.  (This has been fixed in Berkeley 4.2.)

    Fclose should be called automatically when a program terminates
    abnormally.  (It is already called for normal terminations.)
    It is very difficult to find some bugs when buffers are not dumped.
    If the program runs for a long time, it is convenient to pipe
    its output into a log file instead of tying up a terminal.  If the
    log file is not flushed, however, this is not only unproductive;
    it is misleading.


  We just found another bug where setting array[4] in something 
  declared "int array[4]" overwrote a pointer in a distant piece
  of code.  C ought to offer a run-time subscript checking facility,
  and certainly should have caught this compile-time error.
  (Hardware speed and storage are becoming less of a consideration
  every year.  Programming ease and software reliability should be
  dominant.)


  The compiler should warn about statements like "x+1;" since they
  can have no side effects or other useful purpose.  Most likely
  the statement is intended to be "x+=1;".


  The expression "(cast) (flag == 0) ? 0 : 1" applies the
  cast to the boolean test rather than to the output of the
  conditional expression.  I would much rather see the syntax
  "(cast) ifv (flag == 0) thenv 0 elsev 1" where the cast
  applies to the final value.  [I have implemented the ifv/thenv/elsev
  macros, but there is no way to put hidden parentheses around the
  entire constuct unless one adds a special terminator (e.g., "fi").]


  The expression (A,B) returns the value of B.  There needs to
  be a similar syntax for those cases where the value of A is
  desired, and A must be executed first.  In particular, suppose
  that we are writing a macro noteerr() which is supposed to
  evaluate its argument and take some action based on a global
  return code, then return the result of the initial evaluation
  as its value.  For example, suppose we want to pass some functional
  value, func(), to a subroutine, subr(), and that we want to wrap
  the evaluation of func() in an error handler:

      subr(...,noteerr(func()),...);

  There is currently no way to do this and have the whole noteerr()
  macro return an object of the same type and value as func() when
  func() may be of arbitrary type.  It could be done if there were
  an (A,B) syntax that returned the value of B.


  The compiler should accept string continuations of the form

      printf("Beginning of string"
	  " continuation of string.");

  The SAIL/MAINSAIL dynamic string concatenation syntax is
  even more flexible, but even this primitive convention would be
  adequate for compile-time concatenation.


  Every enum should have a validity checking routine, e.g.
  valid(...).  This would permit one to identify illegal
  values without converting everything to ints.  Note that
  valid enums are not necessarily sequential, so that the
  test can be complicated.  This checking cannot be done at
  compile time, so it may be necessary for the user to provide
  the checking routines; a pity.


  I just got caught again closing a comment with \* instead of
  */.  The compiler just ate everything up to the next comment.
  I see no reason why C can't allow nested comments and also
  check for proper balance of comment delimiters.
-------



More information about the Comp.unix mailing list