Optimization and language level

Thu Dec 20 10:30:11 AEST 1984

> > > FORTRAN is so crippled in...express[ing] what's "really going on" in its
> > > problem domain that a decent...compiler has to do a lot of optimization.
> >  Compilers that optimize code can break programs regardless of language....
> Not quite fair to compilers.  If optimizing breaks programs which conform
> to the language definition, the optimizer is broken.  If optimizing breaks

What does it mean to 'break' a program....  It's taken several decades to
realize that clean semantics in a language is usually more important than easy
or straightforward compilation.  But then, perhaps I shouldn't say 'realize' as
it appears that many partisans of a (dare I mention it?)  well-known PDP-7
operating system still cling tenaciously to its systems programming language's
virtual lack of implementation-independent semantics....  In any case, Fortran
and C are relics of those long-forgotten and dark ages.  Not to say that C
doesn't have some strong points in applications where one would otherwise be
forced to resort to an even lower-level assembler (really, now, is C much
higher level than assembly language?).

So we come to optimization.  If we see the language as simply a way of
expressing individual machine operations, then indeed optimization is
problematical.  In the extreme case, one might be writing a hardware diagnostic
where it is important that "if (1+1=2) then x=1" retain the arithmetic and
comparison operators rather than just set x to 1.

If on the other hand, the language is conceived of as a formal language with a
given abstract semantics, then we'd like to minimize the pragmatic
considerations which enter into its processing (you can't go too far, or any
program which doesn't have any inputs (e.g. calculate pi to 10000 places) could
be 'executed' at 'compile' time).

The tough engineering decisions come in the language design, where the designer
must (usually) trade off precise and clean semantics for efficient processing.
Efficiency might be lost by runtime checking, or it might be lost by imposing
standard results where this would be expensive to port--consider, e.g., fully
specifying floating-point arithmetic.  In principle, it is just as clean to
leave certain kinds of results unspecified, but in practice, many purposely
undefined results are exploited either carelessly or intentionally by
programmers (e.g. Fortran does not require reference passage of parameters, but
most Fortran programmers seem to assume it).

Ada has been criticized for dealing with this phenomenon by distinguishing
'illegal' (must be detected by the processor) from 'erroneous' (might not be)
usages.  In fact, Ada is only acknowledging existing practice while clarifying
its definition.

Optimization must not change the meaning of the program.  This seems pretty
basic.  The dispute comes in the word 'meaning'.  Is the 'meaning' of '1+1' the
same as the meaning of '2'?  Is the 'meaning' of 'x+x' the same as that of
'2*x'?  This all depends on your language.  If your language and compiler have
both been well engineered (and documented!), the language will let you express
your intended meaning conveniently, and the compiler will correctly translate
that meaning into good code.  If not, good luck!

Conclusion:
  Don't blame the optimizer, blame the language!

	Stavros Macrakis
	Harvard