hat/coat - .h file/object dependency analysis tools - 1 of 2

Bob Mcqueer bobm at rtech.UUCP
Sun Mar 27 04:34:40 AEST 1988


Be sure to pick up a third article containing a utility library, too.

Read man pages for details.  For the discussion that came up
in comp.sources.wanted, try:

hat -sde <hdr files> -q <.c files>

{amdahl, sun, mtxinu, hoptoad, cpsc6a}!rtech!bobm

----------------
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
#	coat.1
#	hat.1
#	coat.tpl
#	parse.y
#	scan.l
#	Makefile
# This archive created: Thu Mar 24 17:21:23 1988
export PATH; PATH=/bin:/usr/bin:$PATH
echo shar: "extracting 'coat.1'" '(925 characters)'
if test -f 'coat.1'
then
	echo shar: "will not over-write existing file 'coat.1'"
else
cat << \SHAR_EOF > 'coat.1'
.TH COAT LOCAL 3/1/87
.UC
.SH NAME
coat - "c" object analysis tool
.SH SYNOPSIS
.B coat
[-s[desmu]] [-r[desmu]] [-v<num>] [-z] <files>
.SH DESCRIPTION
.I Coat
produces a topologically sorted dependency list / symbol cross
reference for a group of objects or libraries, assuming the convention
that the "real" symbol name has an underscore prepended for the linker.
All it actually does is massage the output from
.I nm(1)
to pass it into the analyzer program of
.I hat,
producing a similar listing.  See the
.I hat
manual page for details.
.SH OPTIONS
The options shown are the same as for
.I hat,
except that the number on the -v option should not be negative.
See the
.I hat
manual page for details.
.sp
The order in the absence of
.I -z
will be that referring files will be sorted ahead of defining files, ie.
the order wanted for
.I ld
lists.
.SH "SEE ALSO"
.I hat(local), nm(1)
.SH AUTHOR
Robert L. McQueer, bobm at rtech.
SHAR_EOF
fi
echo shar: "extracting 'hat.1'" '(14793 characters)'
if test -f 'hat.1'
then
	echo shar: "will not over-write existing file 'hat.1'"
else
cat << \SHAR_EOF > 'hat.1'
.TH HAT LOCAL 3/1/87
.UC
.SH NAME
hat - header analysis tool
.SH SYNOPSIS
.B hat
[-s[desmu]] [-r[desmu]] [-z] [-q] [-i] [-x] [-v<n>] [-f<sym>] [-c[-.]<sym>] [-a] [-p[-]<sym>] [<cppopt>] <files>
.SH DESCRIPTION
.I Hat
is a tool for analyzing the #define, typedef statements and structure /
union / enum definitions and references in header
files and determining their dependencies.  It produces five sections
of information:
.sp
First, a list of the files together with the files they depend directly upon,
in topological sort order.  Each dependency also includes the first
symbol that caused the dependency.  See next paragraph concerning sort
order and cyclical references.
.sp
Second, an expanded dependency list.  For each file, this shows the expanded
list that results from descending the dependency tree.  If there are
cyclical references, this section lists the cycles.  Cycles will have
been broken arbitrarily in determining the topological sort order.
One cycle will be shown for each time a dependency is "ignored" to
allow the topological sort to proceed.
.sp
Third, a symbol cross-reference listing of defines and references.
.sp
Fourth, a listing of multiply defined symbols.
.sp
Fifth, a listing of undefined symbols.
.sp
.I Hat
handles preprocessor conditional compilation constructs by the simple
expedient of invoking
.I /lib/cpp
explicitly.  What it does is go through the files, throwing
out everything except preprocessor syntax, and adding special lines
indicating references and definitions.  The result is piped through
.I /lib/cpp
before being analyzed so that #ifdef's affecting what will
be defined or referenced on will be properly treated by having
.I /lib/cpp
remove whatever of the special lines have been conditionally compiled out.
.sp
Many of the options are aimed at letting
.I hat
control the #ifdef's, if desired, and will probably be unused in
most cases.  For "reasonably" ifdef'ed files (refraining from #ifdef'ing
alternate versions of partial syntax), you should be able to simply let
.I /lib/cpp
do the work, as intended.
.sp
Normally, #include lines are not passed through, although this can be
overridden.  It should be overridden only if the #include's in the
header files affect how conditional compilations work (a questionable
arrangement, in the author's opinion).  At any rate, the expanded text
resulting from the #include is skipped during the analysis.  Note that
you could consider
.I hat
a tool for telling you what nested #include's are neccesary in the
first place, should you be in the camp that supports using them.
.SH OPTIONS
All options with attached strings require that the string be part of
the argument, eg. "-v4", and not "-v 4".  The reason for this
is that
.I hat
passes all unrecognized option arguments on to
.I /lib/cpp,
and it obviously couldn't know whether a given argument should include
a following string or not without building in
.I /lib/cpp's
argument syntax.  Instead, we insist on joining the argument to it's
option consistently, and allowing any special options the local
.I /lib/cpp
has to be used as long as they don't conflict with
.I hat
options.
.I Hat
stays away from upper case options.
.sp
The
.I -s
and
.I -r
options allow printing of only certain sections of the output.  The attached
desmu characters indicate the dependency list, expanded dependency list,
symbol cross reference, multiple definition and undefined sections
respectively.  The
.I -s
options specifies what sections to print, while the
.I -r 
specifies printing of all sections except those given.  If multiple
specifications are given, only the last is effective, and these options
will have the same effect wherever placed in the argument list.
.sp
The
.I -z
option reverses the sense of the topological sort.  Normally, the order
presented is defining file before referring file, which is the order
you would want for #include lines.  Using
.I -z
causes the order to be referring file followed by defining file.  This
option is mainly for use of the analyzer with alternate input - for
instance,
.I coat
uses this option to present libraries in link order.
.sp
The
.I -q
option specifies that symbols in unrecognized syntax within the files are
to be treated as references.
.I Hat
normally only recognizes #define's, typedef's, externs (which
are ignored except for what appears to be the type declaration),
array dimension expressions, and struct / union / enum definitions.
Everything else is normally ignored since the syntax isn't
understood.  Using this option will cause every symbol not a keyword or
part of understood syntax to be treated as a reference, and for instance
may be used to generate references from normal .c code, at the expense
of generating many undefined symbols.
.sp
The
.I -i
option specifies that #include lines are to be passed on to
.I /lib/cpp
as discussed above.  If none of the #include's affect conditional
compilation, the only effect of this option is to make
.I /lib/cpp
do more work and pass more lines of output to the analysis routines.
.sp
The
.I -c
option causes
.I hat
to go ahead and treat specific #if[n]def's.  Normally, stuff on both
sides of an ifdef is parsed, allowing
.I /lib/cpp
to resolve the results.  If you specify -c<symbol>,
.I hat
will act as if that symbol was defined for #if[n]def's.  -c-<symbol>
makes it specifically undefined, taking the other leg of conditional
constructs.  -c.<symbol> causes the normal interpretation, ie. both
sides of the conditional expression will be parsed.  This option is mostly
intended to allow you to resolve cases where the normal parsing causes
syntax errors, eg:
.sp
.in +5
.nf
 #ifdef YUCK
 struct onething {
 #else
 struct another {
 #endif
.fi
.in -5
.sp
The
.I -a
option makes
.I hat
treat ALL #if[n]defs, STRICTLY on the basis of command line flags.  #if
sections will be still unconditionally parsed, as well as
their #else clauses.  In this case, -c.<symbol>
specifications will be ineffective.
.sp
It is generally preferable to allow
.I /lib/cpp
to resolve things.  In a vast majority of cases, syntax errors from constructs
such as the one given above cause no problems, or a missed symbol or two
at most.
.sp
The
.I -f
option (forget) causes a symbol to be ignored for analytical purposes.
Neither definitions of, or references to, this symbol will show up.
.sp
The
.I -p
option causes #define lines for the symbol to be inserted into the
input for
.I /lib/cpp
before each file.  -p-<symbol> causes #undef lines to be inserted.
.sp
The
.I -i, -f, -c, -a, -p
and
.I -q
options may be intermingled with the files to control these features
on a file-by-file basis.  They actually act as toggles, the
second invocation turning the feature off again, the third turning
it on again, and so on.  In the case of
.I -f,
the second invocation allows the symbol to be considered again.
Repeated uses of
.I -p, -c
override the previous disposition of the symbol.
.sp
The
.I -x
option suppresses use of
.I /lib/cpp.
This is appropriate if none of the header files contain any
conditional compilation constructs, allowing one less process
to be spawned.
It can also be used to remedy problems arising from too much
input (most likely too many #define's) for
.I /lib/cpp
to handle.  The parser output will simply be fed directly
into the analyzer.  If this is used, and the files DO contain
conditional sections, the result will be that all sections
will be analyzed, however #ifdef'ed, unless explicitly
supressed with the other options.  This may be useful as
some kind of "worst-case" dependency independent of conditional
compilation.  This option will have the same effect anywhere in
the argument list, and makes any use of
.I -i, -p
options and all unrecognized options irrelevent.
.sp
The
.I -v
option specifies a numeric level for tracing.  Positive numbers
indicate trace levels (1-5) for the parsing of the files.  Negative
numbers indicate levels for the analysis (-1 through -4).
.I -v
is equivalent to
.I -v1.
Nobody but somebody debugging the program will likely be interested
in trace levels with absolute value > 1.  The default is level 0 -
no tracing for either parsing or analysis.
.sp
As mentioned before, all unrecognized options are passed on to
.I /lib/cpp.
The most common one used will probably be -D options to drive
definitions for conditional compilation.
.SH "ANALYZER SYNTAX"
The analyzer part of the program is actually a separate entity that can
be used to process files, references, and definitions from any source, not
just the
.I hat
parser program.  If you run "hat -v1 -x ..." it will print a pipeline
it is executing, which consists of the parser being fed into the analyzer
program.  The names and locations of these programs are configurable locally,
so you will either have to do this or find the local installation to figure
out where the analyzer is, and what it is named.
.sp
The
.I coat
command produces this sort of analysis of references in objects or libraries by
using the output of
.I nm
massaged appropriately (via
.I sed),
and fed into the analyzer.  It is actually just
a short shell script, and will probably provide a good example.
.sp
The analyzer program reads standard input, and simply ignores lines not
beginning with "@".  The syntax is very simple:
.sp
@=<filename> - to specify a new file.
.sp
@!<symbol> - current file defines a symbol.
.sp
@?<symbol> - current file references a symbol.
.sp
@<, and @> may be used to bracket stuff which should be ignored.  Inside
a @<, the only significant lines will be @< (an error), and @> (close section).
.sp
The <filename> or <symbol> may optionally have white space and quotes
around them.  Note that the quotes are only treated as whitespace
characters - there is no mechanism to include whitespace characters in
the symbol or filename.  Use of the quotes is simply a mechanism to
prevent expansion by
.I /lib/cpp,
which is also the reason for use of non-alphanumerics in the
rest of the syntax.
.sp
For instance, if you input a @= and @! for each named node, followed by
a @? for each arc starting at the given node, the analyzer could be used
to generate a topological sort of a general directed graph.  It is the
author's belief that the cycles generated are a fundamental set, also,
although he won't swear to it without further analysis.
.SH DIAGNOSTICS
The only file, line number oriented diagnostics come from the parser, and are
self-explanatory, except for "syntax error".  If the latter happens,
parsing resumes at any point that the parser can make sense out of the file,
and may not affect much of anything.
.sp
Errors in the analyzer input, which should not be creatable from the
parser, cause fatal error messages.
.SH BUGS
The parser is in no way, shape, or form intended to check proper
C syntax.
Since it is only looking for certain constructs, it accepts
anything else as irrelevent stuff and ignores it.  Even within the
constructs it is looking for, it is only interested in certain
expected pieces of the syntax, and will actually accept all sorts
of meaningless trash ( "register static auto long unsigned short
short double" gets taken to be a reasonable
type declaration for instance - as far as this analysis is
concerned, that is no different than saying "int".  Or
typedef +++% int += bar; is taken as a perfectly rational typedef,
since characters not needed to distinguish what the tool is looking
for are treated as simple white space).
.sp
On a more objectionable level, the syntax which it trys to recognize is
a mixture of preprocessor syntax and C language proper.  Interactions
between the two which result in perfectly compilable C, may make
.I hat
see syntax errors, or fool it into a wrong interpretation of a symbol
as a definition or a reference.  The author got the grammar in shape
by testing it on /usr/include, /usr/include/sys, and some large
local header directories until it only got a few syntax errors.
Specifically, it gets none on the local /usr/include, 1 on the
local /usr/include/sys, which could have been remedied by an appropriate
use of -c.  The causes in local files were questionable constructs:
.in +5
.sp
Placement of the datatype portion of a typedef in a header
file to be included in front of the actual names being defined
in the source file.
.sp
Use of a macro to provide a portion of the syntax for a typedef.
.in -5
.sp
Making it really bulletproof would involve essentially doing all the
work of
.I /lib/cpp
while simultaneously realizing that macro expansions are really
references to the macro name, and so on.  Then the analysis could
come out different with different file orders - exactly what the
tool is trying to figure out in the first place.
As it stands, it generally does pretty well,
occasionally getting a syntax error or misinterpreting something.
The stuff that one "normally" places in header files works pretty well.
.sp
For a large number of files (or a number of large files), you may be
forced to use the -x option because you will blow up
.I /lib/cpp.
Suspect this if you get some message about "too
many define's", or some such thing.  The author found that the Pyramid
version quit with an error at around 3000 #define's.
.sp
Line numbers on diagnostics coming from
.I /lib/cpp
may not match up to the original file because of accumulated "@" lines
exceeding the number of newlines which were in the original file at
that point (the program tries to "soak up" the additional lines every
time it comes to some suppressable newlines).  The
.I /lib/cpp
on some machines may not pay any attention to #line
directives for generating its error messages, still giving you
a line number in reference to stdin, and simply passing on line
number information to its output.
In this case the line numbers for
.I /lib/cpp
messages will be entirely bogus.
The author tested this on two systems, Pyramid OSx and Sun 3/60.  The Sun
.I /lib/cpp
generates messages based on the #line directives, and maintains an approximate
correctness, while the Pyramid version simply gives you messages based from
the start of input, no matter what #line directives you inserted.
Redefinitions may be ignored, anyway, since the analysis will
tell you about them in gory detail.
.sp
The analyzer is a dynamic memory pig.  It builds a symbol table containing
all definitions and references for every #define, structure name and
typedef across the entire set of files.  On a system with limited
memory, the program will probably halt with an "out of memory"
message on some number of input files which seems perfectly
reasonable to handle.
.sp
If there is a conceivable way that lines beginning with "@" other than
those inserted by the parser could reach the analyzer, it will cause
problems.
.SH "SEE ALSO"
.I coat(local)
.SH AUTHOR
Robert L. McQueer, bobm at rtech.
SHAR_EOF
fi
echo shar: "extracting 'coat.tpl'" '(321 characters)'
if test -f 'coat.tpl'
then
	echo shar: "will not over-write existing file 'coat.tpl'"
else
cat << \SHAR_EOF > 'coat.tpl'
#!/bin/sh

TMP=/tmp/coat.$$
NMOUT=/tmp/coat.nm.$$
OPT=

for x in $*
do
	case $x in
	-*) OPT="$OPT $x" ;;
	*) echo "@=$x" >>$TMP
		echo $x >&2
		nm -g $x >$NMOUT
		grep " [TBDC] " $NMOUT | sed -e "s/^.* _/@!/" >>$TMP
		grep " U " $NMOUT | sed -e "s/^.* _/@?/" >>$TMP
	esac
done

cat $TMP | ANALYZER -z $OPT
rm $TMP $NMOUT
SHAR_EOF
chmod +x 'coat.tpl'
fi
echo shar: "extracting 'parse.y'" '(6842 characters)'
if test -f 'parse.y'
then
	echo shar: "will not over-write existing file 'parse.y'"
else
cat << \SHAR_EOF > 'parse.y'

/*
**
**	Copyright (c) 1988, Robert L. McQueer
**		All Rights Reserved
**
** Permission granted for use, modification and redistribution of this
** software provided that no use is made for commercial gain without the
** written consent of the author, that all copyright notices remain intact,
** and that all changes are clearly documented.  No warranty of any kind
** concerning any use which may be made of this software is offered or implied.
**
*/

%token TYPEDEF EXTERN STRUCT ENUM
%token DEFINE WORD CPP CPPEND MEND AWORD NAWORD
%token ADJ STCLASS NTYPE KEYWORD
%token LSQ RSQ LBRACE RBRACE SEMICOLON COMMA
%right ADJ NTYPE
%%
file	:
		{
			p_init();
		}
	| file blurb
	;

blurb	: TYPEDEF st tdef tlist SEMICOLON
	| EXTERN ext SEMICOLON
	| pre
	| WORD
		{
			if (Qflag)
				r_out(next_str());
			else
				next_str();
		}
	| COMMA
	| LBRACE
	| RBRACE
	| SEMICOLON
	| KEYWORD
	| STCLASS
	| native
	| s1def
	| e1def
	| arrdim
	| error
		{
			Sn_def = 0;
			Head = Tail = 0;
			p_init();
		}
	;

pre	: DEFINE NAWORD
		{
			d_enter(next_str());
		}
		macro CPPEND
		{
			do_refs();
		}
	| DEFINE AWORD
		{
			d_enter(next_str());
		} margs MEND
		macro CPPEND
		{
			do_refs();
		}
	| CPP pjunk CPPEND
	;

margs	: 
	| mlist
	;

mlist	: WORD
		{
			a_enter(next_str());
		}
	| mlist COMMA WORD
		{
			a_enter(next_str());
		}
	;

s1def	: snword LBRACE
		{
			d_out(next_str());
		}
		struct RBRACE
	| STRUCT LBRACE struct RBRACE
	| snword WORD
		{
			r_out(next_str());
			next_str();
		}
	;

snword	: STRUCT WORD
	;

e1def	: eword LBRACE
		{
			d_out(next_str());
		}
		elist RBRACE
	| ENUM LBRACE elist RBRACE
	| eword WORD
		{
			r_out(next_str());
			next_str();
		}
	;

eword	: ENUM WORD
	;

st	:
	| st STCLASS
	;

tdef	: s2def
	| e2def
	| WORD
		{
			r_out(next_str());
			Sn_def = 0;
		}
	| native
		{
			Sn_def = 0;
		}
	;

elist	: 
		{
			Ecount = 0;
		}
	| elist WORD
		{
			if (Ecount)
				r_out(next_str());
			else
				d_out(next_str());
			++Ecount;
		}
	| elist COMMA
		{
			Ecount = 0;
		}
	;

native	: NTYPE
	| ADJ
	| ADJ native
	;

s2def	: snword LBRACE
		{
			strcpy(Sname,next_str());
			d_out(Sname);
			Sn_def = 1;
		}
		struct RBRACE
	| STRUCT LBRACE struct RBRACE
		{
			Sn_def = 0;
		}
	| snword WORD
		{
			Sn_def = 0;
			r_out(next_str());
			d_out(next_str());
		}
	;

e2def	: eword LBRACE
		{
			strcpy(Sname,next_str());
			d_out(Sname);
			Sn_def = 1;
		}
		elist RBRACE
	| ENUM LBRACE elist RBRACE
		{
			Sn_def = 0;
		}
	| eword WORD
		{
			Sn_def = 0;
			r_out(next_str());
			d_out(next_str());
		}
	;

struct	:
	| struct pre
	| struct s1def items SEMICOLON
	| struct e1def items SEMICOLON
	| struct WORD
		{
			r_out(next_str());
		} items SEMICOLON
	| struct native items SEMICOLON;
	;

items	:
	| items COMMA
	| items WORD
		{
			if (Qflag)
				d_out(next_str());
			else
				next_str();
		}
	| items arrdim
	;

tlist	:
	| tlist COMMA
	| tlist WORD
		{
			char *ptr;

			ptr = next_str();
			if (! Sn_def || strcmp(ptr,Sname) != 0)
				d_out(ptr);
		} tlarr
	;

tlarr	:
	| tlarr arrdim
	;

ext	: WORD
		{
			r_out(next_str());
		}
		ejunk
	| STRUCT WORD 
		{
			r_out(next_str());
		}
		ejunk
	| ENUM WORD
		{
			r_out(next_str());
		}
		ejunk
	| native ejunk
	;

ejunk	: 
	| ejunk WORD
		{
			next_str();
		}
	| ejunk LBRACE ebal RBRACE
	| ejunk STRUCT
	| ejunk COMMA
	| ejunk arrdim
	;

ebal	:
	| ebal LBRACE ebal RBRACE
	| ebal WORD
		{
			next_str();
		}
	| ebal COMMA
	| ebal SEMICOLON
	| ebal STRUCT
	| ebal arrdim
	| ebal native
	;

macro	:
	| macro WORD
		{
			r_enter(next_str());
		}
	| macro COMMA
	| macro SEMICOLON
	| macro LBRACE
	| macro RBRACE
	| macro STRUCT
	| macro LSQ
	| macro KEYWORD
	| macro RSQ
	| macro MEND
	| macro native
	| macro STCLASS
	| macro TYPEDEF
	| macro EXTERN
	| macro ENUM
	;

pjunk	:
	| pjunk WORD
		{
			next_str();
		}
	| pjunk COMMA
	| pjunk SEMICOLON
	| pjunk LBRACE
	| pjunk RBRACE
	| pjunk STRUCT
	| pjunk DEFINE
	| pjunk TYPEDEF
	| pjunk ENUM
	| pjunk EXTERN
	| pjunk LSQ
	| pjunk RSQ
	| pjunk native
	| pjunk STCLASS
	| pjunk KEYWORD
	;

arrdim	: LSQ dstuff RSQ
	;

dstuff	:
	| dstuff COMMA
	| dstuff WORD
		{
			r_out(next_str());
		}
	| dstuff STRUCT
	| dstuff native
	| dstuff KEYWORD
	;
%%

#include <stdio.h>
#include "config.h"


/*
** yyparse can look ahead one token.  Must make SDEPTH
** sufficient for all tokens parsed before calling next_str(), taking
** this into account.  I think 2 would actually work, currently.
*/
#define SDEPTH 3

static char Sq[SDEPTH][BUFSIZ];
static int Tail = 0;
static int Head = 0;

static int Sn_def;
static int Ecount;

static char Sname[BUFSIZ];

static char *Dtab = NULL;
static char *Def;

extern char *Ftab;

extern int Qflag;
extern int Iflag;
extern int Verbosity;

extern int Add_line;	/* see scanner */

char *htab_init();
char *htab_find();

char *str_store();

/* called by yylex() */
q_str(s)
char *s;
{
	strcpy(Sq[Tail],s);
	Tail = (Tail+1)%SDEPTH;
	if (Verbosity > 4)
		diagnostic("PUSH: %s",s);
}

static char *
next_str()
{
	char *ptr;

	ptr = Sq[Head];
	Head = (Head+1)%SDEPTH;
	if (Verbosity > 4)
		diagnostic("NEXT: %s",ptr);
	return (ptr);
}

static
p_init()
{
	if (Dtab == NULL)
	{
		Dtab = htab_init(SMALL_TABLE,NULL,NULL,NULL);
		if (Verbosity > 3)
			diagnostic("tab INIT");
	}
	else
	{
		htab_clear(Dtab);
		if (Verbosity > 3)
			diagnostic("tab CLEAR");
		str_free();
	}

	if (Tail != Head)
		diagnostic("OOPS - parser/lex synch problem");
	Head = Tail;
}

static
d_enter(s)
char *s;
{
	if (Verbosity > 2)
		diagnostic("#def: %s",s);
	if (keycheck(s) != WORD)
		diagnostic("Redefining keywords is not a good idea");
	Def = str_store(s);
}

static
a_enter(s)
char *s;
{
	if (Verbosity > 2)
		diagnostic("#arg: %s",s);
	htab_enter(Dtab,str_store(s),"ARG");
	if (Verbosity > 3)
		diagnostic("tab enter '%s', 'ARG'",s);
}

static
r_enter(s)
char *s;
{
	if (Verbosity > 2)
		diagnostic("#ref: %s",s);
	if (htab_find(Dtab,s) == NULL)
	{
		if (Verbosity > 3)
			diagnostic("tab enter '%s', ''",s);
		htab_enter(Dtab,str_store(s),"");
	}
	else
	{
		if (Verbosity > 3)
			diagnostic("tab contains '%s'",s);
	} 
}

static
do_refs()
{
	int i;
	char *k,*d;

	if (Verbosity > 2)
		diagnostic("#end");
	d_out(Def);
	for (i=htab_list(Dtab,1,&d,&k); i != 0; i=htab_list(Dtab,0,&d,&k))
	{
		if (Verbosity > 3)
			diagnostic("tab list '%s', '%s'",k,d);
		if (*d == '\0')
			r_out(k);
	}
	p_init();
}

static
r_out(s)
char *s;
{
	if (Verbosity > 1)
		diagnostic("REF: %s",s);
	if (Ftab != NULL && htab_find(Ftab,s) != NULL)
	{
		if (Verbosity > 1)
			diagnostic("Forget %s",s);
		return;
	}
	printf("@?\"%s\"\n",s);
	++Add_line;
}


static
d_out(s)
char *s;
{
	if (Verbosity > 1)
		diagnostic("DEF: %s",s);
	if (Ftab != NULL && htab_find(Ftab,s) != NULL)
	{
		if (Verbosity > 1)
			diagnostic("Forget %s",s);
		return;
	}
	printf("@!\"%s\"\n",s);
	++Add_line;
}

yyerror(s)
char *s;
{
	diagnostic(s);
}
SHAR_EOF
fi
echo shar: "extracting 'scan.l'" '(9833 characters)'
if test -f 'scan.l'
then
	echo shar: "will not over-write existing file 'scan.l'"
else
cat << \SHAR_EOF > 'scan.l'
 
 /*
 **
 ** Copyright (c) 1988, Robert L. McQueer
 ** 	All Rights Reserved
 **
 ** Permission granted for use, modification and redistribution of this
 ** software provided that no use is made for commercial gain without the
 ** written consent of the author, that all copyright notices remain intact,
 ** and that all changes are clearly documented.  No warranty of any kind
 ** concerning any use which may be made of this software is offered or implied.
 **
 */

 extern int Diag_line;
 extern char *Diag_file;
 extern int Iflag;
 extern int Xflag;
 extern int Aflag;
 extern int Cflag;
 extern int Verbosity;

 extern char *Ftab;

 extern char Fextra[];

 int Add_line;	/* referenced by yyparse() */

 static int Outflag;
 static int Oldstate;
 static int Close_include;

 static int Squelch;

 /*
 ** ifdef stack.  Cheap to specify a large number of nesting levels,
 ** so we don't bother making this configurable.  Number allowed
 ** is 32 times the length of the array (it's a bit map)
 ** Estack length matches Ifstack, and simply says whether the
 ** else goes with an if we are processing or not.
 */
 static unsigned long Ifstack[50];	/* yep, 1600 nested ifdefs! */
 static unsigned long Estack[50];
 static int Ifidx;

 /*
 ** a local stdio.h is used to pull in yytab.h and define
 ** MYPUTS(), which outputs a string conditionally on the
 ** setting of Outflag.  Also defines SQRET as a conditional
 ** return based on Squelch setting.
 */

 /*
 ** NOTES on interaction with yyparse:
 **
 ** returning WORD, NAWORD or AWORD indicates a string constant
 ** has been queued up using q_str().  It is up to the parser
 ** to dequeue the returned strings without overflowing the queue.
 **
 ** All /lib/cpp syntax is delimited for yyparse with the token
 ** CPPEND once we reach the end of the # line, or possible
 ** continuations for #define constructs.  Separate tokens
 ** DEFINE and CPP distinguish #define lines from other cpp
 ** lines.  States <PREPRO>, <DEF1> and <INDEF> are used for this.
 **
 ** Squelch controls #ifdef treatment.  If set, we simply continue
 ** scanning rather than passing tokens back to yyparse.  Thus
 ** we turn off #ifdef'ed out sections by simply not allowing the
 ** parser to see them.  Squelch is changed in such a manner as
 ** to send back or suppress things only on CPP boundaries. An
 ** entire CPP -> CPPEND statement is suppressed when we suppress,
 ** together with everything until the CPP -> CPPEND sequence
 ** turning it back on again, which is sent back in its entirety.
 **
 ** MYPUTS controls output of /lib/cpp constructs.  The scanner
 ** outputs all the /lib/cpp lines, unless the Xflag is on.  It
 ** also outputs "@<" , "@>" lines around includes if Iflag
 ** is on, and outputs the "@=" line, a "#line" directive, and
 ** any optional stuff specified by the command line at the
 ** beginning of each file.  The parser outputs "@!", "@?" lines
 ** only.
 **
 ** Much stuff is never seen by the parser.  Many characters such
 ** as +, -, (, ), = which are not important to the constructs the
 ** parser is looking for are treated as simple white space.  Numeric,
 ** single-quote and double-quote constants are also ignored, being
 ** treated much as commentary.  <COMMENT> and <QUOTE> states apply
 ** to this.  Note that they have to resume an old state, rather
 ** than an unconditional 0 to handle comment and quote constants
 ** within cpp syntax.
 **
 ** AWORD, NAWORD tokens apply only to the word following #define.
 ** AWORD indicates and argument list, NAWORD none.  ')' is returned
 ** as a token (MEND) only inside #define's, allowing the parser
 ** to pick up argument lists.
 **
 ** obvious item reference, .<something> or -><something> are
 ** also thrown out inside #defines, so we don't see them as
 ** symbol references.
 **
 ** Add_line is a mechanism to attempt to make line numbers
 ** match up for /lib/cpp.  Every time we generate a spare \n
 ** for a @ line, we bump it.  When an "optional" newline comes along,
 ** we decrement it if > 0, or output a newline.
 */

%Start COMMENT QUOTE INDEF DEF1 PREPRO
%%
<COMMENT>\n		{
				if (!Outflag && !Xflag)
				{
					if (Add_line > 0)
						--Add_line;
					else
						fputs(yytext,stdout);
				}
				++Diag_line;
			}
<COMMENT>[^*\n]+	;
<COMMENT>\*\/		BEGIN Oldstate;
<COMMENT>\*		;

<QUOTE>\n		{
				MYPUTS("\n");
				++Diag_line;
				diagnostic("unclosed quote");
				BEGIN Oldstate;
			}
<QUOTE>[^\\"\n]+	MYPUTS(yytext);
<QUOTE>\\\\		MYPUTS(yytext);
<QUOTE>\\\"		MYPUTS(yytext);
<QUOTE>\"		{
				BEGIN Oldstate;
				MYPUTS(yytext);
			}
<QUOTE>\\\n		{
				++Diag_line;
				MYPUTS(yytext);
			}
<PREPRO>\n		{
				++Diag_line;
				if (Close_include)
				{
					MYPUTS("\n@>\n");
					Add_line += 2;
				}
				else
					MYPUTS("\n");
				Close_include = Outflag = Oldstate = 0;
				BEGIN 0;
				SQRET(CPPEND);
			}

<DEF1>[A-Za-z_][A-Za-z0-9_]*\(	{
					MYPUTS(yytext);
					yytext[yyleng-1] = '\0';
					BEGIN INDEF;
					if (!Squelch)
					{
						q_str(yytext);
						SQRET(AWORD);
					}
				}

<DEF1>[A-Za-z_][A-Za-z0-9_]*	{
					MYPUTS(yytext);
					BEGIN INDEF;
					if (!Squelch)
					{
						q_str(yytext);
						SQRET(NAWORD);
					}
				}

<DEF1>\\\n	MYPUTS(yytext);
<DEF1>[ \t]+	MYPUTS(yytext);
<DEF1>.		{
			diagnostic("bizarre #define - can't find symbol");
			MYPUTS(yytext);
			BEGIN 0;
			REJECT;
		}

<INDEF>\\\n		{
				++Diag_line;
				MYPUTS(yytext);
			}
<INDEF>\)		{
				MYPUTS(yytext);
				SQRET(MEND);
			}

<INDEF>\-\>[A-Za-z0-9_]*	MYPUTS(yytext);
<INDEF>\.[A-Za-z0-9_]*		MYPUTS(yytext);

<INDEF>\\\\		MYPUTS(yytext);
<INDEF>\n		{
				++Diag_line;
				MYPUTS("\n");
				Outflag = Oldstate = 0;
				BEGIN 0;
				SQRET(CPPEND);
			}

\/\*	BEGIN COMMENT;
\"	{
		MYPUTS("\"");
		BEGIN QUOTE;
	}

^\#[ \t]*define		{
				Oldstate = INDEF;
				BEGIN DEF1;
				Outflag = ! Xflag;
				MYPUTS(yytext);
				SQRET(DEFINE);
			}
^\#[ \t]*include	{
				if (Iflag)
				{
					Close_include = 1;
					Outflag = 1;
					MYPUTS("\n@<\n");
					Add_line += 2;
					MYPUTS(yytext);
				}
				else
					Close_include = 0;
				Oldstate = PREPRO;
				BEGIN PREPRO;
				SQRET(CPP);
			}

^\#[ \t]*ifdef[ \t]*[A-Za-z0-9_]+	{
						Outflag = ! Xflag;
						MYPUTS(yytext);
						if (Cflag || Aflag)
							do_ifd(yytext,0);
						Oldstate = PREPRO;
						BEGIN PREPRO;
						SQRET(CPP);
					}
^\#[ \t]*ifndef[ \t]*[A-Za-z0-9_]+	{
						Outflag = ! Xflag;
						MYPUTS(yytext);
						if (Cflag || Aflag)
							do_ifd(yytext,1);
						Oldstate = PREPRO;
						BEGIN PREPRO;
						SQRET(CPP);
					}
^\#[ \t]*if		{
				Outflag = ! Xflag;
				MYPUTS(yytext);

				if (Cflag || Aflag)
					do_if();
				Oldstate = PREPRO;
				BEGIN PREPRO;
				SQRET(CPP);
			}
^\#[ \t]*else		{
				Outflag = ! Xflag;
				MYPUTS(yytext);
				if (Cflag || Aflag)
					do_else();
				Oldstate = PREPRO;
				BEGIN PREPRO;
				SQRET(CPP);
			}
^\#[ \t]*endif		{
				Outflag = ! Xflag;
				MYPUTS(yytext);
				if (Cflag || Aflag)
					do_end();
				Oldstate = PREPRO;
				BEGIN PREPRO;
				SQRET(CPP);
			}

^\#	{
		Oldstate = PREPRO;
		Outflag = ! Xflag;
		BEGIN PREPRO;
		MYPUTS("#");
		SQRET(CPP);
	}

\;	{
		MYPUTS(yytext);
		SQRET(SEMICOLON);
	}
\{	{
		MYPUTS(yytext);
		SQRET(LBRACE);
	}
\}	{
		MYPUTS(yytext);
		SQRET(RBRACE);
	}
\[	{
		MYPUTS(yytext);
		SQRET(LSQ);
	}
\]	{
		MYPUTS(yytext);
		SQRET(RSQ);
	}
\,	{
		MYPUTS(yytext);
		SQRET(COMMA);
	}

[0-9]+\.[0-9]*[Ee][+-]*[0-9]	MYPUTS(yytext);
[0-9]+[Ee][+-]*[0-9]		MYPUTS(yytext);

[0-9]+\.[0-9]*		MYPUTS(yytext);
[0-9]+L			MYPUTS(yytext);
[0-9]+			MYPUTS(yytext);
0[Xx][a-fA-F0-9]*L	MYPUTS(yytext);
0[Xx][a-fA-F0-9]*	MYPUTS(yytext);

\'\\\'\'	MYPUTS(yytext);
\'.*\'		MYPUTS(yytext);

[A-Za-z_][A-Za-z0-9_]*	{
				int i;

				MYPUTS(yytext);
				if (!Squelch)
				{
					if ((i = keycheck(yytext)) == WORD)
						q_str(yytext);
					SQRET(i);
				}
			}

\n	{
		if (!Outflag && !Xflag)
		{
			if (Add_line > 0)
				--Add_line;
			else
				fputs(yytext,stdout);
		}
		else
			MYPUTS(yytext);
		++Diag_line;
	}
.	MYPUTS(yytext);
%%

tok_out(i)
int i;
{
	diagnostic("scanned token %d",i);
	return (i);
}

/*
** called on each new file, to reset the scanner
*/
init_lex(name)
char *name;
{
	Add_line = Close_include = Ifidx = Squelch = Outflag = Oldstate = 0;

	printf("@=\"%s\"\n",name);
	if (! Xflag)
		printf("%s#line 1 \"%s\"\n",Fextra,name);
	Diag_line = 1;
	Diag_file = name;
}

char *strtok();
char *htab_find();

/*
** do_ifd is destructive, so it must be called AFTER output of text
*/
do_ifd(s,rev)
char *s;
int rev;
{
	int idx;
	int shift;

	idx = Ifidx/32;
	shift = Ifidx % 32;

	if (Squelch)
		Ifstack[idx] |= 1L << shift;
	else
		Ifstack[idx] &= ~(1L << shift);

	++Ifidx;

	if (Squelch)
	{
		Estack[idx] &= ~(1L << shift);
		return;
	}

	strtok(s," \t#");
	s = strtok(NULL," \t");
	--s;			/* we KNOW the strtok's bumped the string */
	*s = '+';
	if (htab_find(Ftab,s) != NULL)
	{
		if (rev)
			Squelch = 1;
		else
			Squelch = 0;
		Estack[idx] |= 1L << shift;
	}
	else
	{
		*s = '-';
		if (htab_find(Ftab,s) != NULL || Aflag)
		{
			if (rev)
				Squelch = 0;
			else
				Squelch = 1;
			Estack[idx] |= 1L << shift;
		}
		else
			Estack[idx] &= ~(1L << shift);
	}
}

/*
** ignore sense of all #if statements.
*/
do_if()
{
	int idx;
	int shift;

	idx = Ifidx/32;
	shift = Ifidx % 32;

	if (Squelch)
		Ifstack[idx] |= 1L << shift;
	else
		Ifstack[idx] &= ~(1L << shift);

	++Ifidx;

	Estack[idx] &= ~(1L << shift);
}

do_end()
{
	int idx;
	int shift;

	if (Ifidx == 0)
	{
		diagnostic("unmatched #endif");
		Squelch = 0;
		return;
	}

	--Ifidx;
	idx = Ifidx/32;
	shift = Ifidx % 32;

	Squelch = (Ifstack[idx] >> shift) & 1;
}

do_else()
{
	int idx;
	int shift;

	if (Ifidx == 0)
	{
		diagnostic("unmatched #else");
		Squelch = 0;
		return;
	}

	idx = Ifidx - 1;
	shift = idx % 32;
	idx /= 32;

	if ((Estack[idx] >> shift) & 1)
		Squelch = ! Squelch;
}
SHAR_EOF
fi
echo shar: "extracting 'Makefile'" '(2167 characters)'
if test -f 'Makefile'
then
	echo shar: "will not over-write existing file 'Makefile'"
else
cat << \SHAR_EOF > 'Makefile'
#
# libraries.  You will want the bobm.a utility library, wherever
# you decided to put it, and the lex library.
#
LIBS = $(HOME)/lib/bobm.a -ll

#
# -d is needed to generate y.tab.h for the scanner, and for keycheck.c
#
YFLAGS = -d

#
# don't know what all you'll have to do for SYSV, other than
# -Dindex=strchr -Drindex=strrchr
#
# if you want to dink with the keywords recognized, maybe add special
# ones for your c compiler, see keycheck.c
#
CFLAGS = -O

#
# LOCAL CONFIGURATION
#
# These definitions also drive the making of a header file.  HATDIR is the
# directory you want the analyzer and parser placed in, BINDIR is the
# directory you want the command programs to be placed in.  PARSER and
# ANALYZER are the names you want to give those respective executables.
# CPPCMD is the c-preprocessor, and SHELLCMD a shell which will be
# execl'ed to execute the other PARSER | [CPPCMD] | ANALYZER pipe.
# MANDIR is where to put the manual pages.
#
# Some of the definitions will be placed in header file localnames.h, and
# moved to lastnames.h after compiling hat.c
#
MANDIR = .
HATDIR = $(HOME)/bin
BINDIR =  $(HOME)/bin
PARSER = hat_p
ANALYZER = hat_a
CPPCMD = "/lib/cpp"
SHELLCMD = "/bin/sh"

#
# object lists for the three executables - no remarks from the peanut gallery
# concerning the analyzer abbreviation :-).
#
ANALOBJ = amain.o anread.o table.o analyze.o topsort.o listsort.o
PARSOBJ = parse.o scan.o pmain.o keycheck.o
HATOBJ = hat.o

all: hat parser anal coat man

parser: $(PARSOBJ)
	cc -o $(PARSER) $(PARSOBJ) $(LIBS)
	mv $(PARSER) $(HATDIR)

anal:	$(ANALOBJ)
	cc -o $(ANALYZER) $(ANALOBJ) $(LIBS)
	mv $(ANALYZER) $(HATDIR)

hat:	$(HATOBJ)
	cc -o hat $(HATOBJ)
	mv hat $(BINDIR)

coat:
	sed -e "s/ANALYZER/$(ANALYZER)/" coat.tpl >coat
	chmod 755 coat
	mv coat $(BINDIR)

man:
	cp hat.1 $(MANDIR)
	cp coat.1 $(MANDIR)

hat.o:
	echo "#define HATDIR \"$(HATDIR)\"" >localnames.h
	echo "#define PARSER \"$(PARSER)\"" >>localnames.h
	echo "#define ANALYZER \"$(ANALYZER)\"" >>localnames.h
	echo "#define CPPCMD \"$(CPPCMD)\"" >>localnames.h
	echo "#define SHELLCMD \"$(SHELLCMD)\"" >>localnames.h
	cc $(CFLAGS) -c hat.c
	mv localnames.h lastnames.h
SHAR_EOF
fi
exit 0
#	End of shell archive



More information about the Alt.sources mailing list