short circuit evaluation

Gregory Smith greg at utcsri.UUCP
Tue Feb 24 11:52:57 AEST 1987


In article <4700004 at uiucdcsm> mccaugh at uiucdcsm.UUCP writes:
>
> This is not submitted as a criticism to the foregoing erudite discussion of
> side-effects, but is rather an innocent question about C in particular: why
> does C refuse to abide by the associativity/precedence rules for expression-
> evaluation that even BASIC guarantees? I can well understand "optimization"
> as an excuse but can easily imagine cases where normally-evaluated expressions
> can crash a system when "optimized" for eavaluation without the programmer's
> express consent. Isn't it a little arbitrary for C to mnaipulate the parts
> of an expression to its satisfaction (or whim)? In particular, this renders
> the formal verification of C-code impractical.

C does abide by associativity and precedence rules, which are laid
out in the appendix of K&R. A-B+C means (A-B)+C and not A-(B+C).
A*B+C/D means (A*B)+(C/D), and not A*((B+C)/D).

What you are complaining about is sequence of evaluation. If I write
a=b*c+d*e, do I care which multiply is done first? The reason you have
not seen anybody complaining about this in other languages, is that
these other languages do not have side-effects in expressions (at least
not to the same extent). If I write a=foo()+bar(), I may indeed care
whether foo() is called before bar(), in which case I *won't write
that*.

The problem has arisen through the notation we use in writing
expressions. An expression has a tree structure, but it is written in a
linear left to right fashion. Associativity and precedence rules serve
merely to define how an expression tree is extracted from its linear
form. Unfortunately, the linear form imposes an artificial ordering
on an expression. The '+' operator is perfectly commutative, so that A+B
and B+A are the same. Unfortunately, you have to write either A or B first.
Even an operator like / has the same problem: if there were an 'under'
operator \ you could have your choice of A/B or B\A; the justifiable lack
of a \ operator forces you to write A before B even though they are at
equal levels in the expression tree.

The philosophy of the C language is that these limitations of the linear
notation should not be allowed to restrict code generation. In an
expression where two subexpressions X and Y must be added together,
it may be vastly more efficient to evaluate Y before X. The programmer
is not able to determine this, and is nevertheless forced to write one
before the other.

Putting it another way, a tree which represents an expression may
be converted to linear form ( traversed ) in many ways. Some ways
will be more efficient for evaluation, and that is hopefully what
the compiler will generate. Some ways may be more useful for human
reading, and that is what the programmer will write ( E.g. in a divide,
the human always writes the dividend before the divisor, due only to
lack of the aforementioned '\' operator ). As long as the same tree is
involved in both cases, what's wrong with that?

It is worth noting at this point that the parentheses '(' and ')' serve
only in the process of converting a linear expression to a tree. Thus
(a+b)*c is different from a+b*c but a+(b+c) can be treated the same as
(a+b)+c. Simply allow your tree to contain a node which forms the sum of
its three, equally ranked, children, and throw in the fact that
redundant ()'s grow in C like mushrooms. Then a+(b+c) should be treated
as a+b+c or SUM(a,b,c). ( I know, there are overflow considerations).

If the expression tree contains no side-effects, it will make no
difference in which order it is evaluated (just like in BASIC).
However, if one branch of the tree changes the value of X and another
branch uses the value of X, the result will depend on which branch
is done first. Some would say , "the compiler should detect such cases
and then use the order of the original expression". However, it
is *impossible* to detect all of these cases at compile time.
So the programmer is simply warned of the cases where expressions
may be rearranged, and avoids expressions which work differently
depending on how they are arranged.

Note that these cases (where results are undefined) are *well defined*
by the language, so you know what to avoid.

[ I know this is oversimplification and doesn't deal with delaying
of side-effects, or with sequence points, etc.. but I wanted to explain
this stuff to those who just can't fathom what all the fuss is about and
why the damn compilers don't just do what they're told ]

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...



More information about the Comp.lang.c mailing list