A new shell. Any takers?

Byron Rakitzis byron at archone.tamu.edu
Tue Jan 8 04:40:14 AEST 1991


Over the last several months I have spent my time hacking up a new
shell. It's still in the early testing stage, and not all the features
have been hacked in yet (here documents still need to go in). I'm
offering this 'introduction' as a piece of bait to the net. I'm using
this shell full time now (as my login shell, that is) but I'm hoping
that it will catch on as a useful script language. I have left a copy
of the compressed distribution in archone:~ftp/pub/rc, for anyone who
cares to examine the source code, and hey, maybe even compile it and
run it on their machine. The code is portable, but it assumes ANSI.

This posting of mine is a little premature, but I'm at the stage where
comments (and bug reports) would be most welcome. The version of rc I
use changes *daily*, so I have not even bothered to assign version
numbers. The filename you will see incorporates the date; this is the
best way I have of archiving old rc versions at the moment. No man page
exists at the moment, although I am working on one. However, I'm hoping
that knowlegable shell users will read my sketch below and say "aha!
this could be useful".

I've compiled rc on sun3, sun4, sgi, dec mips, dec vax, next. It
assumes ANSI, so you'll have to use gcc or some protype-grokking
compiler, like the mips compiler. I have supplied a makefile for
the above architectures.

		AN INTRODUCTION TO RC
		    Byron Rakitzis

rc is the AT&T plan 9 shell. I have taken a copy of the published
manual pages (UNIX research system, 10th ed.) and developed my own
public implementation from the description of rc in these documents.
What follows is a short introduction to the features of rc, in order to
underscore the differences between this shell and the standard sh and
csh.

rc is a shell similar in spirit to sh: on my sun4, rc compiles to a 64K
stripped, statically linked executable; this is even smaller than the
Bourne shell. rc's power lies not in the fact that it has a large
number of features (it does not) but in that it is small, fast and
predictable. What features it does have are very general and powerful.
rc's parsing is performed by a yacc-generated parser, so the precise
syntax of rc is no mystery to its users, as in the case of sh and csh.

VARIABLES:

The area in which rc differs most (apart from syntactically) from the
other shells is in the way variables are implemented. Internally,
variables are linked lists of words. Thus:

a=(this is a list of words)

assignes a 6-element list to $a. The parentheses are necessary for
grouping, but they are stripped during parsing; this definition of
a is no different from:

a=(this (is a) (list) of words)

Quoting in rc is performed using a single quoting character: '
There is no backslash-quoting, and no double quoting. To type
a special character, include it in quotes. To type a ' inside
quotes, type it twice:

; echo 'How''s it going?'
How's it going?

Backslash is special only at the end of the line, for traditional
backslash-continuation. So, type:

; tex foo \bye

not

; tex foo \\bye

A quoted string is treated as a single word in rc. Thus:

; a=(this is a list of words)
; echo $#a
6
; a='this is a list of words??'
; echo $#a
1

Lists (and the variables that represent them) may be concatenated with
the '^' operator. Concatenation behaves according to these rules:
if two lists have the same number of elements, then they are joined
pairwise. Otherwise if one of the lists has one element, the concatenation
is distributive. Any other combination is an error:

; echo (one two three)^.c
one.c two.c three.c
; echo (one two three)^(.a .b .c)
one.a two.b three.c

Sometimes the ^ operation can be gotten for free, for example, by juxtaposing
two variables together, or by juxtaposing a variable with a word:

; opts=(O g c)
; files=(alloca malloc talloc)
; cc -$opts $files.c

results in the execution of the command

cc -O -g -c alloca.c malloc.c talloc.c

An uninitialized variable has the value of the null list, (). Note that
this is different from the *string* ''. The first is represented internally
by a null pointer, the second by a pointer to a null string. So:

; a=()
; echo $#a
0
; a=''
; echo $#a
1

Variables are exported into the environment by default; in fact, there
is no way to create a variable private to rc. This obviates the need
for "setenv" and "export" keywords and makes variable handling simpler
and cleaner.

Variables may be subscripted also: $foo(2) refers to the second element
of foo.  Subscripts are lists of words, so $foo(2 2 2 1 1 1) refers to
a six element list made up of three foo(2)'s and three foo(1)'s.

FUNCTIONS:

rc also offers shell functions, to replace csh aliases and sh shell
functions.  rc functions take the form:

fn foo { definition }

where definition is a sequence of rc commands. $* is set to the
argument list of the function for the duration of the command. As an
example, a function I usually set up is:

; fn l { ls -FC $* }

Shell functions are better than aliases in a number of ways: for one,
the definition may be an arbitrary rc script; parsing is performed up
to the closing brace. The definition need not fit all on one line. The
definition of the shell function is also exported into the environment.
Thus functions (aliases) can be preserved in subshells without the
costly re-interpreting of a .cshrc-like file.

Functions may be deleted by supplying no definition:

; fn l

deletes the above definition.

CONTROL FLOW:

rc's syntax is very clean and simple. Grouping is performed with
bracing, and a brace-delemited command will always work where a single
command suffices.

if (command)
	command
if not
	command

while (command)
	command

switch (word) {
	case pattern
		command
	case pattern
		command
	...
}

! command

command && command

command || command

Let's start with if: if behaves in the usual way, except when it comes
to the treatment of else. The problem is this: rc parses no differently
when it's reading a file or when it's reading a terminal. Therefore,
there is no way after reading

if (grep foo bar) {
	echo zoo
	echo goo
	foo
}

to tell if there is going to be an "else" keyword after the '}'. The
solution used is to have a special "if not" command which looks back at
the last if to see if it succeeded or not. If it did not succeed, then
the body of the if not command is executed. This solution is clumsy,
but in practise very usable. It is an error to use "if not" where no
preceeding if exists.

while works like C's while:

	while (test) command

or
	while (test)
		command

or
	while (test) {
		command
		...
	}

switch behaves as follows: the word in parentheses is looked for in the
patterns specified in each of the case statements. The first match that
succeeds will cause the body of that case statement to be executed;
after that the switch statement terminates. There is no falling-through
case statements as in C:

switch ($foo) {
	case goo
		one
	case f* g*
		two
	case *
		three
}

In the above example, if $foo has the value 'goo', 'one' is executed.
If $foo begins with an 'f' or a 'g', then 'two' is executed, otherwise
'three' is executed.

A note: if $foo has more than one element, then it is a sufficient
condition for matching purposes if one of the elements matches the
pattern. Thus, 'goo' matches the list '(foo goo zoo)'.

! negates the exit status of its argument. Useful in if and while
statements.  Like the C operator. || and && behave like || and && in
sh:

	foo && bar

executes bar if and only if foo is successful.

	foo || bar

executes bar if and only if foo fails.

JOB CONTROL:

As in sh, job control is primitive. Terminating a command with & starts
it asynchronously.  The pid of the job is printed the value of this job
is assigned to $apid.

Some changes over sh: & and ; are terminators in rc syntax, so:

; long command; echo finished &

does not have the desired effect; in order to do this, type:

; {long command; echo finished} &

in the first case, 'long command' is performed before 'echo finished &'
is interpreted (the semicolon terminates the command).

Jobs may be placed in subshells using the @ operator:

; @{cd ..; echo this command is being performed in ..; foo}

performs the command in a subshell. Note that @ is almost never used in
rc, since most of the time a set of braces will implicitly define a
subshell, for example, in the command:
	tar fc - foo | {cd /elsewhere; tar fx -}

REDIRECTIONS AND PIPES, ETC.:

redirections and pipes are as usual (< and >, << and >>) but some
features have been added/changed:

foo |[2] bar

has the effect of piping file descriptor 2 (stderr) from foo to bar.
Similarly:

foo > bar >[2] zar

has the effect of placing standard out in bar and standard error in
zar.

File descriptors can be copied also:

echo 'this is appearing on standard error' >[1=2]

and pipes may pipe arbitrary file descriptors on both sides:

foo |[8=12] bar

has the obvious effect.

Backquote is a unary operator in rc:

echo `date

echoes the output of date. To execute a complex command (longer than
one word), include it in braces:

echo `{cat /etc/motd}

$ifs is used to split the output of ` into words. $ifs defaults to
space-tab-newline.  Thus:

`{echo one two three}

returns a three-element list if $ifs is set to its default value.

One advantage of the fact that backquote is unary is that quoting of
backquotes inside backquotes is no longer necessary:

ls `{foo | grep `{bar zar}}

RC VARIABLES

here are the variables rc uses to guide its execution:

$prompt holds the prompt to print. $prompt(1) and $prompt(2) are the rc
eqivalent of PS1 and PS2 in sh.

$home is the home directory of the user. This is aliased to $HOME, so
assigning a value to one directly assigns the value to the other.

$path is a list of path elements to search. it defaults to (. /bin
/usr/bin) or something similar (/usr/ucb is there on berkeley
machines). This value is aliased to PATH so typing in
	path=(. /bin)
will have the effect of assigning
	PATH=.:/bin
as well.

$cdpath is a list of directories to search in order to change
directories.

$pid is the pid of the shell.

$apid is the pid of the last job started with &.

$ifs defines the field splitting characters for backquote
substitution.

$status is the exit status of the last command executed. This is used
to guide the execution of rc in if, while, && and || as well. If the
last command ended with a signal, $status is set to the lowercase
unix-header-file name for that signal. For example, a segmentation
violation will assign "sigsegv" to $status. A core dump will append
"+core" to $status. In addition, in a pipeline $status is set to a list
of the exit statuses of all the pipeline components:

	; ls | wc
	50	50	365
	; echo $status
	0 0

$history defines a file for rc to append each command before executing
it. This is the ATT-sanctioned way of doing history.  It has its
merits; you can run grep or other unix commands on your history file,
for example. Eventually rc will be bundled with some standalone
"history" programs to facilitate full history.

BUILTINS

rc has the following builtins:

echo: performs a basic echo; echo -n is supported.

cd: changes direcrory to the argument supplied. If the path cd takes is
found by searching cdpath, then this path is printed.

umask: prints and sets the file creation mask in octal.

exec: replaces rc with the specified command.

exit: does the obvious. exit with an argument exits with that exit
status

shift: deletes the specified number of words from $*. Defaults to 1.

builtin: executes the specified builtin. Useful if a builtin has been
redefined as a function. For example:

	fn cd {
		builtin cd $* && prompt=`newprompt
	}

wait: waits for the specified pid, otherwise for all child processes
belonging to rc.

whatis: prints the definition of the supplied arguments. This can be
the definition of a variable, a function, or the pathname of an
executable. Supercedes 'which' in utility.

eval: re-interprets its arguments as a space-separated list of words.
For example:
	a='$b'
	eval $a '=' 1

has the effect of assigning '1' to $b.


.: like sh .; the filename supplied is interpreted in the current
shell.

~: ~ is not a builtin, but it is built in to rc; it is a keyword which
replaces the some of the functionality of /bin/test. It works as
follows:

	~ subject pattern [patterns ... ]

This sets $status to true if and only if the subject is matched by one
of the patterns. For example, the idiom for checking to see if an rc
variable is set is:

	if (! ~ $#foo 0)

which may be read 'if the count of $foo's argument does not match '0',
then..' Note that rc does not have any datatypes other than the list;
$#foo supplies the word count of $foo as a string.

SIGNALS:

rc can be told to catch certain signals and to invoke a shell function
on them. This is accomplished by writing a shell function by the name
of that signal in lower case. A typical use might be:

fn sigint sigterm sigquit {
	echo 'interrupted; cleaning up and exiting' >[1=2]
	rm /tmp/foo.$pid.*
	exit 1
}

Setting a handler to the value {} causes that signal to be ignored.
Deleting the definition of a signal causes the handler to return to its
default value. This scheme has an advantage over sh trap in that syntax
errors are caught at parse time, not execution time.

GLOBBING AND RELATED ISSUES:

rc's globbing mechanism is a little different than usual. For example:

	; a='*'
	; echo $a
	*

For a metacharacter to work (*, ? and [ are the usual shell
metacharacters) then it must appear literally and unquoted. This is a
consequence of the fact that rc input is scanned only once, at the very
beginning. All subsequent interpretation (for example, in backquotes or
variable substitutions) comes from the parse tree. In order to get rc
to explicitly re-scan its arguments, use eval; that's what it's there
for.

A note on metacharacters: [a-z] denotes the usual character class
matching, but because '^' is the concatenation operator in rc,
character classes must be negated with a different character. ~ is
used, so

	echo [~a-z]

matches all files of one character not consisting of a lowercase
letter.

OPTIONS

-l	source $home/.rcrc before normal execution. Having argv[0][0] set
	to - also causes .rcrc to be sourced.

-e	exit on nonzero exit status.

-i	start an interactive shell; print $prompt(1) before every command
	is read, and don't exit on interrupts SIGQUIT and SIGINT

-v	echo input on file descriptor 2

-x	echo commands before they are executed

-d	do not catch SIGQUIT for debugging purposes, so that SIGQUIT will
	cause a core dump.

-c	use the following string as the command(s) to execute. Use the
	remaining arguments to set $*.



More information about the Comp.unix.shell mailing list