Understanding the Bourne Shell (was Re: Finding the last arg)

Mon Jan 14 06:00:07 AEST 1991

In article <1991Jan11.035416.18772 at NCoast.ORG> allbery at ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
>As quoted from <1033 at mwtech.UUCP> by martin at mwtech.UUCP (Martin Weitzel):
>+---------------
>| But when thinking how to smoothen [the shell syntax by using] fewer rules,
>| we often do not recognize all the consequences that this would have.
>+---------------
>
>There is one other problem.  I daresay it would be possible to make Bourne
>shell syntax a bit more "regular" by using a yacc grammar.  THIS WON'T WORK!
>At least, not without making the shell much less useful --- yacc (or other
>parser generators) grammars are not designed for interaction.

My observations differ a little here. It is true that using a parser
generator like yacc sometimes makes less concious of the actual parsing
algorithm that may have to look for the next token to decide which rule
should be reduced (and hence which action should be executed).

But you can also write yacc-able grammars that can be parsed without look
ahead! (Actions are generally a bit more complex then - in most cases you
have to build the parsing tree explicitly as data structur rather than
simply depend on yyparse's value stack.)

But the conclusion that parsers generator grammars are not designed for
interaction is similar to the `goto-considered-harmful' discussion: You
cannot say that C programs are generally less structured just because
the language contains a `goto'-statement. It much depends on the typical
usage of the `goto' throughout a program, whether the program looks
structured or more like spaghetti-code. Of course, if C had no `goto'
at all even those old-time BASIC-hackers were forced to look at other
ways to do control-flow. In so far I see some truth in Brandon's statement:
Parser generators make it easy to write grammars which do not fit well
into an interactive environment.

>In order to
>do interaction *well*, the shell needs to be able to have at least some idea
>of what is going on *without* having read an entire complex command (read
>"if/while/for/case/etc.").  I've tried writing a yacc grammar that does this
>kind of thing in a graceful manner; I ended up using context-sensitive hacks,
>which I dislike in otherwise simple parsers.

Again, `context-sensitive hacks' are not a bad thing a priori (maybe they
are if they are real `hacks', but I think Brandon meant that he fed
back some information from the syntax analysis to the lexer). There are
two different situations: Either you plan a completly new syntax for
a new language. In this case I would not recommend the coupling between
parser and scanner, because such a syntax becomes more difficult to learn
for a user of this new language (things have different meanings in different
contexts).

On the other hand, if you need to parse a given language that the user
allready knows (e.g. some natural language or a sub-language thereof),
feedback from syntax analysis to lexical analysis will help much, as long
as it duplicates what the user allready expects.

Finding a yacc-able syntax for the Bourne-Shell is a mixed case: A
long-time shell-user would expect all the things in it that a newcomer
might consider to be irregularities. (I don't dare to decide which are
really irregularities as I belong rather to the former group, but at
least I know that most of the irregularities - e.g. implied double
quotes around the word after an `=' in an assignment and between
`case-in' - help to save some key-strokes, though they really are very
non-intuitive for newcomers.)

>This is also why csh is not
>actually like C --- C can depend on the parser collecting statements for it,
>but csh is primarily designed for interactive use and therefore must be able
>to keep track of what's going on incrementally.

Here I can second Brandon's statement and will even work it out a bit more:
One of the major problems come up if the syntax allows an if-statement with
an optional else-part, as this is the case in C (but not in the Bourne
Shell, as it has the closing `fi'). The user expects (of course) that
the if-part should be executed after it is completly written down.
But the parsing algorithm may want to look if there follows an `else'.
This is because the user "knows" what he or she will do next but the
Shell can not read the user's mind. That sort of things must be taken
care of during the design of an interactive language. Simply adopting
the syntax of a non-interactive language for an interactive language is
bound to fail here.

To summarize: IMHO it are not the parser generators which complicate
things, but inappropriate design of an interactive language.
(Esp. to Brandon: Do your experiences stem from trying to derive a
yacc-able grammar for the Bourne-Shell or rather for the C-Shell?)

BTW: I've redirected followups to comp.lang.misc, since the topic tends
to turn away from the focus of comp.unix.shell.
-- 
Martin Weitzel, email: martin at mwtech.UUCP, voice: 49-(0)6151-6 56 83