v23i044: Flex, a fast lex replacement, Part08/10

Rich Salz rsalz at bbn.com
Fri Oct 12 01:29:41 AEST 1990


Submitted-by: Vern Paxson <vern at cs.cornell.edu>
Posting-number: Volume 23, Issue 44
Archive-name: flex2.3/part08

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then feed it
# into a shell via "sh file" or similar.  To overwrite existing files,
# type "sh file -c".
# The tool that generated this appeared in the comp.sources.unix newsgroup;
# send mail to comp-sources-unix at uunet.uu.net if you want that tool.
# Contents:  flexdoc.1.02 misc.c nfa.c
# Wrapped by rsalz at litchi.bbn.com on Wed Oct 10 13:24:04 1990
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
echo If this archive is complete, you will see the following message:
echo '          "shar: End of archive 8 (of 10)."'
if test -f 'flexdoc.1.02' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'flexdoc.1.02'\"
else
  echo shar: Extracting \"'flexdoc.1.02'\" \(15514 characters\)
  sed "s/^X//" >'flexdoc.1.02' <<'END_OF_FILE'
XAnother area where the user can increase a scanner's performance
X(and one that's easier to implement) arises from the fact that
Xthe longer the tokens matched, the faster the scanner will run.
XThis is because with long tokens the processing of most input
Xcharacters takes place in the (short) inner scanning loop, and
Xdoes not often have to go through the additional work of setting up
Xthe scanning environment (e.g.,
X.B yytext)
Xfor the action.  Recall the scanner for C comments:
X.nf
X
X    %x comment
X    %%
X            int line_num = 1;
X
X    "/*"         BEGIN(comment);
X
X    <comment>[^*\\n]*
X    <comment>"*"+[^*/\\n]*
X    <comment>\\n             ++line_num;
X    <comment>"*"+"/"        BEGIN(INITIAL);
X
X.fi
XThis could be sped up by writing it as:
X.nf
X
X    %x comment
X    %%
X            int line_num = 1;
X
X    "/*"         BEGIN(comment);
X
X    <comment>[^*\\n]*
X    <comment>[^*\\n]*\\n      ++line_num;
X    <comment>"*"+[^*/\\n]*
X    <comment>"*"+[^*/\\n]*\\n ++line_num;
X    <comment>"*"+"/"        BEGIN(INITIAL);
X
X.fi
XNow instead of each newline requiring the processing of another
Xaction, recognizing the newlines is "distributed" over the other rules
Xto keep the matched text as long as possible.  Note that
X.I adding
Xrules does
X.I not
Xslow down the scanner!  The speed of the scanner is independent
Xof the number of rules or (modulo the considerations given at the
Xbeginning of this section) how complicated the rules are with
Xregard to operators such as '*' and '|'.
X.LP
XA final example in speeding up a scanner: suppose you want to scan
Xthrough a file containing identifiers and keywords, one per line
Xand with no other extraneous characters, and recognize all the
Xkeywords.  A natural first approach is:
X.nf
X
X    %%
X    asm      |
X    auto     |
X    break    |
X    ... etc ...
X    volatile |
X    while    /* it's a keyword */
X
X    .|\\n     /* it's not a keyword */
X
X.fi
XTo eliminate the back-tracking, introduce a catch-all rule:
X.nf
X
X    %%
X    asm      |
X    auto     |
X    break    |
X    ... etc ...
X    volatile |
X    while    /* it's a keyword */
X
X    [a-z]+   |
X    .|\\n     /* it's not a keyword */
X
X.fi
XNow, if it's guaranteed that there's exactly one word per line,
Xthen we can reduce the total number of matches by a half by
Xmerging in the recognition of newlines with that of the other
Xtokens:
X.nf
X
X    %%
X    asm\\n    |
X    auto\\n   |
X    break\\n  |
X    ... etc ...
X    volatile\\n |
X    while\\n  /* it's a keyword */
X
X    [a-z]+\\n |
X    .|\\n     /* it's not a keyword */
X
X.fi
XOne has to be careful here, as we have now reintroduced backtracking
Xinto the scanner.  In particular, while
X.I we
Xknow that there will never be any characters in the input stream
Xother than letters or newlines,
X.I flex
Xcan't figure this out, and it will plan for possibly needing backtracking
Xwhen it has scanned a token like "auto" and then the next character
Xis something other than a newline or a letter.  Previously it would
Xthen just match the "auto" rule and be done, but now it has no "auto"
Xrule, only a "auto\\n" rule.  To eliminate the possibility of backtracking,
Xwe could either duplicate all rules but without final newlines, or,
Xsince we never expect to encounter such an input and therefore don't
Xhow it's classified, we can introduce one more catch-all rule, this
Xone which doesn't include a newline:
X.nf
X
X    %%
X    asm\\n    |
X    auto\\n   |
X    break\\n  |
X    ... etc ...
X    volatile\\n |
X    while\\n  /* it's a keyword */
X
X    [a-z]+\\n |
X    [a-z]+   |
X    .|\\n     /* it's not a keyword */
X
X.fi
XCompiled with
X.B -Cf,
Xthis is about as fast as one can get a
X.I flex 
Xscanner to go for this particular problem.
X.LP
XA final note:
X.I flex
Xis slow when matching NUL's, particularly when a token contains
Xmultiple NUL's.
XIt's best to write rules which match
X.I short
Xamounts of text if it's anticipated that the text will often include NUL's.
X.SH INCOMPATIBILITIES WITH LEX AND POSIX
X.I flex
Xis a rewrite of the Unix
X.I lex
Xtool (the two implementations do not share any code, though),
Xwith some extensions and incompatibilities, both of which
Xare of concern to those who wish to write scanners acceptable
Xto either implementation.  At present, the POSIX
X.I lex
Xdraft is
Xvery close to the original
X.I lex
Ximplementation, so some of these
Xincompatibilities are also in conflict with the POSIX draft.  But
Xthe intent is that except as noted below,
X.I flex
Xas it presently stands will
Xultimately be POSIX conformant (i.e., that those areas of conflict with
Xthe POSIX draft will be resolved in
X.I flex's
Xfavor).  Please bear in
Xmind that all the comments which follow are with regard to the POSIX
X.I draft
Xstandard of Summer 1989, and not the final document (or subsequent
Xdrafts); they are included so
X.I flex
Xusers can be aware of the standardization issues and those areas where
X.I flex
Xmay in the near future undergo changes incompatible with
Xits current definition.
X.LP
X.I flex
Xis fully compatible with
X.I lex
Xwith the following exceptions:
X.IP -
XThe undocumented
X.I lex
Xscanner internal variable
X.B yylineno
Xis not supported.  It is difficult to support this option efficiently,
Xsince it requires examining every character scanned and reexamining
Xthe characters when the scanner backs up.
XThings get more complicated when the end of buffer or file is reached or a
XNUL is scanned (since the scan must then be restarted with the proper line
Xnumber count), or the user uses the yyless(), unput(), or REJECT actions,
Xor the multiple input buffer functions.
X.IP
XThe fix is to add rules which, upon seeing a newline, increment
Xyylineno.  This is usually an easy process, though it can be a drag if some
Xof the patterns can match multiple newlines along with other characters.
X.IP
Xyylineno is not part of the POSIX draft.
X.IP -
XThe
X.B input()
Xroutine is not redefinable, though it may be called to read characters
Xfollowing whatever has been matched by a rule.  If
X.B input()
Xencounters an end-of-file the normal
X.B yywrap()
Xprocessing is done.  A ``real'' end-of-file is returned by
X.B input()
Xas
X.I EOF.
X.IP
XInput is instead controlled by redefining the
X.B YY_INPUT
Xmacro.
X.IP
XThe
X.I flex
Xrestriction that
X.B input()
Xcannot be redefined is in accordance with the POSIX draft, but
X.B YY_INPUT
Xhas not yet been accepted into the draft (and probably won't; it looks
Xlike the draft will simply not specify any way of controlling the
Xscanner's input other than by making an initial assignment to
X.I yyin).
X.IP -
X.I flex
Xscanners do not use stdio for input.  Because of this, when writing an
Xinteractive scanner one must explicitly call fflush() on the
Xstream associated with the terminal after writing out a prompt.
XWith
X.I lex
Xsuch writes are automatically flushed since
X.I lex
Xscanners use
X.B getchar()
Xfor their input.  Also, when writing interactive scanners with
X.I flex,
Xthe
X.B -I
Xflag must be used.
X.IP -
X.I flex
Xscanners are not as reentrant as
X.I lex
Xscanners.  In particular, if you have an interactive scanner and
Xan interrupt handler which long-jumps out of the scanner, and
Xthe scanner is subsequently called again, you may get the following
Xmessage:
X.nf
X
X    fatal flex scanner internal error--end of buffer missed
X
X.fi
XTo reenter the scanner, first use
X.nf
X
X    yyrestart( yyin );
X
X.fi
X.IP -
X.B output()
Xis not supported.
XOutput from the
X.B ECHO
Xmacro is done to the file-pointer
X.I yyout
X(default
X.I stdout).
X.IP
XThe POSIX draft mentions that an
X.B output()
Xroutine exists but currently gives no details as to what it does.
X.IP -
X.I lex
Xdoes not support exclusive start conditions (%x), though they
Xare in the current POSIX draft.
X.IP -
XWhen definitions are expanded,
X.I flex
Xencloses them in parentheses.
XWith lex, the following:
X.nf
X
X    NAME    [A-Z][A-Z0-9]*
X    %%
X    foo{NAME}?      printf( "Found it\\n" );
X    %%
X
X.fi
Xwill not match the string "foo" because when the macro
Xis expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
Xand the precedence is such that the '?' is associated with
X"[A-Z0-9]*".  With
X.I flex,
Xthe rule will be expanded to
X"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
XNote that because of this, the
X.B ^, $, <s>, /,
Xand
X.B <<EOF>>
Xoperators cannot be used in a
X.I flex
Xdefinition.
X.IP
XThe POSIX draft interpretation is the same as
X.I flex's.
X.IP -
XTo specify a character class which matches anything but a left bracket (']'),
Xin
X.I lex
Xone can use "[^]]" but with
X.I flex
Xone must use "[^\\]]".  The latter works with
X.I lex,
Xtoo.
X.IP -
XThe
X.I lex
X.B %r
X(generate a Ratfor scanner) option is not supported.  It is not part
Xof the POSIX draft.
X.IP -
XIf you are providing your own yywrap() routine, you must include a
X"#undef yywrap" in the definitions section (section 1).  Note that
Xthe "#undef" will have to be enclosed in %{}'s.
X.IP
XThe POSIX draft
Xspecifies that yywrap() is a function and this is very unlikely to change; so
X.I flex users are warned
Xthat
X.B yywrap()
Xis likely to be changed to a function in the near future.
X.IP -
XAfter a call to
X.B unput(),
X.I yytext
Xand
X.I yyleng
Xare undefined until the next token is matched.  This is not the case with
X.I lex
Xor the present POSIX draft.
X.IP -
XThe precedence of the
X.B {}
X(numeric range) operator is different.
X.I lex
Xinterprets "abc{1,3}" as "match one, two, or
Xthree occurrences of 'abc'", whereas
X.I flex
Xinterprets it as "match 'ab'
Xfollowed by one, two, or three occurrences of 'c'".  The latter is
Xin agreement with the current POSIX draft.
X.IP -
XThe precedence of the
X.B ^
Xoperator is different.
X.I lex
Xinterprets "^foo|bar" as "match either 'foo' at the beginning of a line,
Xor 'bar' anywhere", whereas
X.I flex
Xinterprets it as "match either 'foo' or 'bar' if they come at the beginning
Xof a line".  The latter is in agreement with the current POSIX draft.
X.IP -
XTo refer to yytext outside of the scanner source file,
Xthe correct definition with
X.I flex
Xis "extern char *yytext" rather than "extern char yytext[]".
XThis is contrary to the current POSIX draft but a point on which
X.I flex
Xwill not be changing, as the array representation entails a
Xserious performance penalty.  It is hoped that the POSIX draft will
Xbe emended to support the
X.I flex
Xvariety of declaration (as this is a fairly painless change to
Xrequire of
X.I lex
Xusers).
X.IP -
X.I yyin
Xis
X.I initialized
Xby
X.I lex
Xto be
X.I stdin;
X.I flex,
Xon the other hand,
Xinitializes
X.I yyin
Xto NULL
Xand then
X.I assigns
Xit to
X.I stdin
Xthe first time the scanner is called, providing
X.I yyin
Xhas not already been assigned to a non-NULL value.  The difference is
Xsubtle, but the net effect is that with
X.I flex
Xscanners,
X.I yyin
Xdoes not have a valid value until the scanner has been called.
X.IP -
XThe special table-size declarations such as
X.B %a
Xsupported by
X.I lex
Xare not required by
X.I flex
Xscanners;
X.I flex
Xignores them.
X.IP -
XThe name
X.bd
XFLEX_SCANNER
Xis #define'd so scanners may be written for use with either
X.I flex
Xor
X.I lex.
X.LP
XThe following
X.I flex
Xfeatures are not included in
X.I lex
Xor the POSIX draft standard:
X.nf
X
X    yyterminate()
X    <<EOF>>
X    YY_DECL
X    #line directives
X    %{}'s around actions
X    yyrestart()
X    comments beginning with '#' (deprecated)
X    multiple actions on a line
X
X.fi
XThis last feature refers to the fact that with
X.I flex
Xyou can put multiple actions on the same line, separated with
Xsemi-colons, while with
X.I lex,
Xthe following
X.nf
X
X    foo    handle_foo(); ++num_foos_seen;
X
X.fi
Xis (rather surprisingly) truncated to
X.nf
X
X    foo    handle_foo();
X
X.fi
X.I flex
Xdoes not truncate the action.  Actions that are not enclosed in
Xbraces are simply terminated at the end of the line.
X.SH DIAGNOSTICS
X.I reject_used_but_not_detected undefined
Xor
X.I yymore_used_but_not_detected undefined -
XThese errors can occur at compile time.  They indicate that the
Xscanner uses
X.B REJECT
Xor
X.B yymore()
Xbut that
X.I flex
Xfailed to notice the fact, meaning that
X.I flex
Xscanned the first two sections looking for occurrences of these actions
Xand failed to find any, but somehow you snuck some in (via a #include
Xfile, for example).  Make an explicit reference to the action in your
X.I flex
Xinput file.  (Note that previously
X.I flex
Xsupported a
X.B %used/%unused
Xmechanism for dealing with this problem; this feature is still supported
Xbut now deprecated, and will go away soon unless the author hears from
Xpeople who can argue compellingly that they need it.)
X.LP
X.I flex scanner jammed -
Xa scanner compiled with
X.B -s
Xhas encountered an input string which wasn't matched by
Xany of its rules.
X.LP
X.I flex input buffer overflowed -
Xa scanner rule matched a string long enough to overflow the
Xscanner's internal input buffer (16K bytes by default - controlled by
X.B YY_BUF_SIZE
Xin "flex.skel".  Note that to redefine this macro, you must first
X.B #undefine
Xit).
X.LP
X.I scanner requires -8 flag -
XYour scanner specification includes recognizing 8-bit characters and
Xyou did not specify the -8 flag (and your site has not installed flex
Xwith -8 as the default).
X.LP
X.I
Xfatal flex scanner internal error--end of buffer missed -
XThis can occur in an scanner which is reentered after a long-jump
Xhas jumped out (or over) the scanner's activation frame.  Before
Xreentering the scanner, use:
X.nf
X
X    yyrestart( yyin );
X
X.fi
X.LP
X.I too many %t classes! -
XYou managed to put every single character into its own %t class.
X.I flex
Xrequires that at least one of the classes share characters.
X.SH DEFICIENCIES / BUGS
XSee flex(1).
X.SH "SEE ALSO"
X.LP
Xflex(1), lex(1), yacc(1), sed(1), awk(1).
X.LP
XM. E. Lesk and E. Schmidt,
X.I LEX - Lexical Analyzer Generator
X.SH AUTHOR
XVern Paxson, with the help of many ideas and much inspiration from
XVan Jacobson.  Original version by Jef Poskanzer.  The fast table
Xrepresentation is a partial implementation of a design done by Van
XJacobson.  The implementation was done by Kevin Gong and Vern Paxson.
X.LP
XThanks to the many
X.I flex
Xbeta-testers, feedbackers, and contributors, especially Casey
XLeedom, benson at odi.com, Keith Bostic,
XFrederic Brehm, Nick Christopher, Jason Coughlin,
XScott David Daniels, Leo Eskin,
XChris Faylor, Eric Goldman, Eric
XHughes, Jeffrey R. Jones, Kevin B. Kenny, Ronald Lamprecht,
XGreg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Marc Nozell, Esmond Pitt,
XJef Poskanzer, Jim Roskind,
XDave Tallman, Frank Whaley, Ken Yap, and those whose names
Xhave slipped my marginal mail-archiving skills but whose contributions
Xare appreciated all the same.
X.LP
XThanks to Keith Bostic, John Gilmore, Craig Leres, Bob
XMulcahy, Rich Salz, and Richard Stallman for help with various distribution
Xheadaches.
X.LP
XThanks to Esmond Pitt and Earle Horton for 8-bit character support;
Xto Benson Margulies and Fred
XBurke for C++ support; to Ove Ewerlid for the basics of support for
XNUL's; and to Eric Hughes for the basics of support for multiple buffers.
X.LP
XWork is being done on extending
X.I flex
Xto generate scanners in which the
Xstate machine is directly represented in C code rather than tables.
XThese scanners may well be substantially faster than those generated
Xusing -f or -F.  If you are working in this area and are interested
Xin comparing notes and seeing whether redundant work can be avoided,
Xcontact Ove Ewerlid (ewerlid at mizar.DoCS.UU.SE).
X.LP
XThis work was primarily done when I was at the Real Time Systems Group
Xat the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks to all there
Xfor the support I received.
X.LP
XSend comments to:
X.nf
X
X     Vern Paxson
X     Computer Science Department
X     4126 Upson Hall
X     Cornell University
X     Ithaca, NY 14853-7501
X
X     vern at cs.cornell.edu
X     decvax!cornell!vern
X
X.fi
END_OF_FILE
  if test 15514 -ne `wc -c <'flexdoc.1.02'`; then
    echo shar: \"'flexdoc.1.02'\" unpacked with wrong size!
  fi
  # end of 'flexdoc.1.02'
fi
if test -f 'misc.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'misc.c'\"
else
  echo shar: Extracting \"'misc.c'\" \(14912 characters\)
  sed "s/^X//" >'misc.c' <<'END_OF_FILE'
X/* misc - miscellaneous flex routines */
X
X/*-
X * Copyright (c) 1990 The Regents of the University of California.
X * All rights reserved.
X *
X * This code is derived from software contributed to Berkeley by
X * Vern Paxson.
X * 
X * The United States Government has rights in this work pursuant
X * to contract no. DE-AC03-76SF00098 between the United States
X * Department of Energy and the University of California.
X *
X * Redistribution and use in source and binary forms are permitted provided
X * that: (1) source distributions retain this entire copyright notice and
X * comment, and (2) distributions including binaries display the following
X * acknowledgement:  ``This product includes software developed by the
X * University of California, Berkeley and its contributors'' in the
X * documentation or other materials provided with the distribution and in
X * all advertising materials mentioning features or use of this software.
X * Neither the name of the University nor the names of its contributors may
X * be used to endorse or promote products derived from this software without
X * specific prior written permission.
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
X * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
X * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
X */
X
X#ifndef lint
Xstatic char rcsid[] =
X    "@(#) $Header: /usr/fsys/odin/a/vern/flex/RCS/misc.c,v 2.9 90/08/14 00:10:24 vern Exp $ (LBL)";
X#endif
X
X#include <ctype.h>
X#include "flexdef.h"
X
X
X/* ANSI C does not guarantee that isascii() is defined */
X#ifndef isascii
X#define isascii(c) ((c) <= 0177)
X#endif
X
X
X
X/* declare functions that have forward references */
X
Xvoid dataflush PROTO(());
Xint otoi PROTO((Char []));
X
X
X/* action_out - write the actions from the temporary file to lex.yy.c
X *
X * synopsis
X *     action_out();
X *
X *     Copies the action file up to %% (or end-of-file) to lex.yy.c
X */
X
Xvoid action_out()
X
X    {
X    char buf[MAXLINE];
X
X    while ( fgets( buf, MAXLINE, temp_action_file ) != NULL )
X	if ( buf[0] == '%' && buf[1] == '%' )
X	    break;
X	else
X	    fputs( buf, stdout );
X    }
X
X
X/* allocate_array - allocate memory for an integer array of the given size */
X
Xvoid *allocate_array( size, element_size )
Xint size, element_size;
X
X    {
X    register void *mem;
X
X    /* on 16-bit int machines (e.g., 80286) we might be trying to
X     * allocate more than a signed int can hold, and that won't
X     * work.  Cheap test:
X     */
X    if ( element_size * size <= 0 )
X        flexfatal( "request for < 1 byte in allocate_array()" );
X
X    mem = (void *) malloc( (unsigned) (element_size * size) );
X
X    if ( mem == NULL )
X	flexfatal( "memory allocation failed in allocate_array()" );
X
X    return ( mem );
X    }
X
X
X/* all_lower - true if a string is all lower-case
X *
X * synopsis:
X *    Char *str;
X *    int all_lower();
X *    true/false = all_lower( str );
X */
X
Xint all_lower( str )
Xregister Char *str;
X
X    {
X    while ( *str )
X	{
X	if ( ! isascii( *str ) || ! islower( *str ) )
X	    return ( 0 );
X	++str;
X	}
X
X    return ( 1 );
X    }
X
X
X/* all_upper - true if a string is all upper-case
X *
X * synopsis:
X *    Char *str;
X *    int all_upper();
X *    true/false = all_upper( str );
X */
X
Xint all_upper( str )
Xregister Char *str;
X
X    {
X    while ( *str )
X	{
X	if ( ! isascii( *str ) || ! isupper( (char) *str ) )
X	    return ( 0 );
X	++str;
X	}
X
X    return ( 1 );
X    }
X
X
X/* bubble - bubble sort an integer array in increasing order
X *
X * synopsis
X *   int v[n], n;
X *   bubble( v, n );
X *
X * description
X *   sorts the first n elements of array v and replaces them in
X *   increasing order.
X *
X * passed
X *   v - the array to be sorted
X *   n - the number of elements of 'v' to be sorted */
X
Xvoid bubble( v, n )
Xint v[], n;
X
X    {
X    register int i, j, k;
X
X    for ( i = n; i > 1; --i )
X	for ( j = 1; j < i; ++j )
X	    if ( v[j] > v[j + 1] )	/* compare */
X		{
X		k = v[j];	/* exchange */
X		v[j] = v[j + 1];
X		v[j + 1] = k;
X		}
X    }
X
X
X/* clower - replace upper-case letter to lower-case
X *
X * synopsis:
X *    Char clower();
X *    int c;
X *    c = clower( c );
X */
X
XChar clower( c )
Xregister int c;
X
X    {
X    return ( (isascii( c ) && isupper( c )) ? tolower( c ) : c );
X    }
X
X
X/* copy_string - returns a dynamically allocated copy of a string
X *
X * synopsis
X *    char *str, *copy, *copy_string();
X *    copy = copy_string( str );
X */
X
Xchar *copy_string( str )
Xregister char *str;
X
X    {
X    register char *c;
X    char *copy;
X
X    /* find length */
X    for ( c = str; *c; ++c )
X	;
X
X    copy = malloc( (unsigned) ((c - str + 1) * sizeof( char )) );
X
X    if ( copy == NULL )
X	flexfatal( "dynamic memory failure in copy_string()" );
X
X    for ( c = copy; (*c++ = *str++); )
X	;
X
X    return ( copy );
X    }
X
X
X/* copy_unsigned_string -
X *    returns a dynamically allocated copy of a (potentially) unsigned string
X *
X * synopsis
X *    Char *str, *copy, *copy_unsigned_string();
X *    copy = copy_unsigned_string( str );
X */
X
XChar *copy_unsigned_string( str )
Xregister Char *str;
X
X    {
X    register Char *c;
X    Char *copy;
X
X    /* find length */
X    for ( c = str; *c; ++c )
X	;
X
X    copy = (Char *) malloc( (unsigned) ((c - str + 1) * sizeof( Char )) );
X
X    if ( copy == NULL )
X	flexfatal( "dynamic memory failure in copy_unsigned_string()" );
X
X    for ( c = copy; (*c++ = *str++); )
X	;
X
X    return ( copy );
X    }
X
X
X/* cshell - shell sort a character array in increasing order
X *
X * synopsis
X *
X *   Char v[n];
X *   int n, special_case_0;
X *   cshell( v, n, special_case_0 );
X *
X * description
X *   does a shell sort of the first n elements of array v.
X *   If special_case_0 is true, then any element equal to 0
X *   is instead assumed to have infinite weight.
X *
X * passed
X *   v - array to be sorted
X *   n - number of elements of v to be sorted
X */
X
Xvoid cshell( v, n, special_case_0 )
XChar v[];
Xint n, special_case_0;
X
X    {
X    int gap, i, j, jg;
X    Char k;
X
X    for ( gap = n / 2; gap > 0; gap = gap / 2 )
X	for ( i = gap; i < n; ++i )
X	    for ( j = i - gap; j >= 0; j = j - gap )
X		{
X		jg = j + gap;
X
X		if ( special_case_0 )
X		    {
X		    if ( v[jg] == 0 )
X			break;
X
X		    else if ( v[j] != 0 && v[j] <= v[jg] )
X			break;
X		    }
X
X		else if ( v[j] <= v[jg] )
X		    break;
X
X		k = v[j];
X		v[j] = v[jg];
X		v[jg] = k;
X		}
X    }
X
X
X/* dataend - finish up a block of data declarations
X *
X * synopsis
X *    dataend();
X */
X
Xvoid dataend()
X
X    {
X    if ( datapos > 0 )
X	dataflush();
X
X    /* add terminator for initialization */
X    puts( "    } ;\n" );
X
X    dataline = 0;
X    datapos = 0;
X    }
X
X
X
X/* dataflush - flush generated data statements
X *
X * synopsis
X *    dataflush();
X */
X
Xvoid dataflush()
X
X    {
X    putchar( '\n' );
X
X    if ( ++dataline >= NUMDATALINES )
X	{
X	/* put out a blank line so that the table is grouped into
X	 * large blocks that enable the user to find elements easily
X	 */
X	putchar( '\n' );
X	dataline = 0;
X	}
X
X    /* reset the number of characters written on the current line */
X    datapos = 0;
X    }
X
X
X/* flexerror - report an error message and terminate
X *
X * synopsis
X *    char msg[];
X *    flexerror( msg );
X */
X
Xvoid flexerror( msg )
Xchar msg[];
X
X    {
X    fprintf( stderr, "%s: %s\n", program_name, msg );
X
X    flexend( 1 );
X    }
X
X
X/* flexfatal - report a fatal error message and terminate
X *
X * synopsis
X *    char msg[];
X *    flexfatal( msg );
X */
X
Xvoid flexfatal( msg )
Xchar msg[];
X
X    {
X    fprintf( stderr, "%s: fatal internal error, %s\n", program_name, msg );
X    flexend( 1 );
X    }
X
X
X/* flex_gettime - return current time
X *
X * synopsis
X *    char *flex_gettime(), *time_str;
X *    time_str = flex_gettime();
X *
X * note
X *    the routine name has the "flex_" prefix because of name clashes
X *    with Turbo-C
X */
X
X/* include sys/types.h to use time_t and make lint happy */
X
X#ifndef MS_DOS
X#ifndef VMS
X#include <sys/types.h>
X#else
X#include <types.h>
X#endif
X#endif
X
X#ifdef MS_DOS
X#include <time.h>
Xtypedef long time_t;
X#endif
X
Xchar *flex_gettime()
X
X    {
X    time_t t, time();
X    char *result, *ctime(), *copy_string();
X
X    t = time( (long *) 0 );
X
X    result = copy_string( ctime( &t ) );
X
X    /* get rid of trailing newline */
X    result[24] = '\0';
X
X    return ( result );
X    }
X
X
X/* lerrif - report an error message formatted with one integer argument
X *
X * synopsis
X *    char msg[];
X *    int arg;
X *    lerrif( msg, arg );
X */
X
Xvoid lerrif( msg, arg )
Xchar msg[];
Xint arg;
X
X    {
X    char errmsg[MAXLINE];
X    (void) sprintf( errmsg, msg, arg );
X    flexerror( errmsg );
X    }
X
X
X/* lerrsf - report an error message formatted with one string argument
X *
X * synopsis
X *    char msg[], arg[];
X *    lerrsf( msg, arg );
X */
X
Xvoid lerrsf( msg, arg )
Xchar msg[], arg[];
X
X    {
X    char errmsg[MAXLINE];
X
X    (void) sprintf( errmsg, msg, arg );
X    flexerror( errmsg );
X    }
X
X
X/* htoi - convert a hexadecimal digit string to an integer value
X *
X * synopsis:
X *    int val, htoi();
X *    Char str[];
X *    val = htoi( str );
X */
X
Xint htoi( str )
XChar str[];
X
X    {
X    int result;
X
X    (void) sscanf( (char *) str, "%x", &result );
X
X    return ( result );
X    }
X
X
X/* is_hex_digit - returns true if a character is a valid hex digit, false
X *		  otherwise
X *
X * synopsis:
X *    int true_or_false, is_hex_digit();
X *    int ch;
X *    val = is_hex_digit( ch );
X */
X
Xint is_hex_digit( ch )
Xint ch;
X
X    {
X    if ( isdigit( ch ) )
X	return ( 1 );
X
X    switch ( clower( ch ) )
X	{
X	case 'a':
X	case 'b':
X	case 'c':
X	case 'd':
X	case 'e':
X	case 'f':
X	    return ( 1 );
X
X	default:
X	    return ( 0 );
X	}
X    }
X
X
X/* line_directive_out - spit out a "# line" statement */
X
Xvoid line_directive_out( output_file_name )
XFILE *output_file_name;
X
X    {
X    if ( infilename && gen_line_dirs )
X        fprintf( output_file_name, "# line %d \"%s\"\n", linenum, infilename );
X    }
X
X
X/* mk2data - generate a data statement for a two-dimensional array
X *
X * synopsis
X *    int value;
X *    mk2data( value );
X *
X *  generates a data statement initializing the current 2-D array to "value"
X */
Xvoid mk2data( value )
Xint value;
X
X    {
X    if ( datapos >= NUMDATAITEMS )
X	{
X	putchar( ',' );
X	dataflush();
X	}
X
X    if ( datapos == 0 )
X	/* indent */
X	fputs( "    ", stdout );
X
X    else
X	putchar( ',' );
X
X    ++datapos;
X
X    printf( "%5d", value );
X    }
X
X
X/* mkdata - generate a data statement
X *
X * synopsis
X *    int value;
X *    mkdata( value );
X *
X *  generates a data statement initializing the current array element to
X *  "value"
X */
Xvoid mkdata( value )
Xint value;
X
X    {
X    if ( datapos >= NUMDATAITEMS )
X	{
X	putchar( ',' );
X	dataflush();
X	}
X
X    if ( datapos == 0 )
X	/* indent */
X	fputs( "    ", stdout );
X
X    else
X	putchar( ',' );
X
X    ++datapos;
X
X    printf( "%5d", value );
X    }
X
X
X/* myctoi - return the integer represented by a string of digits
X *
X * synopsis
X *    Char array[];
X *    int val, myctoi();
X *    val = myctoi( array );
X *
X */
X
Xint myctoi( array )
XChar array[];
X
X    {
X    int val = 0;
X
X    (void) sscanf( (char *) array, "%d", &val );
X
X    return ( val );
X    }
X
X
X/* myesc - return character corresponding to escape sequence
X *
X * synopsis
X *    Char array[], c, myesc();
X *    c = myesc( array );
X *
X */
X
XChar myesc( array )
XChar array[];
X
X    {
X    Char c, esc_char;
X    register int sptr;
X
X    switch ( array[1] )
X	{
X	case 'a': return ( '\a' );
X	case 'b': return ( '\b' );
X	case 'f': return ( '\f' );
X	case 'n': return ( '\n' );
X	case 'r': return ( '\r' );
X	case 't': return ( '\t' );
X	case 'v': return ( '\v' );
X
X	case '0':
X	case '1':
X	case '2':
X	case '3':
X	case '4':
X	case '5':
X	case '6':
X	case '7':
X	case '8':
X	case '9':
X	    { /* \<octal> */
X	    sptr = 1;
X
X	    while ( isascii( array[sptr] ) && isdigit( array[sptr] ) )
X		/* don't increment inside loop control because if
X		 * isdigit() is a macro it might expand into multiple
X		 * increments ...
X		 */
X		++sptr;
X
X	    c = array[sptr];
X	    array[sptr] = '\0';
X
X	    esc_char = otoi( array + 1 );
X
X	    array[sptr] = c;
X
X	    return ( esc_char );
X	    }
X
X	case 'x':
X	    { /* \x<hex> */
X	    int sptr = 2;
X
X	    while ( isascii( array[sptr] ) && is_hex_digit( array[sptr] ) )
X		/* don't increment inside loop control because if
X		 * isdigit() is a macro it might expand into multiple
X		 * increments ...
X		 */
X		++sptr;
X
X	    c = array[sptr];
X	    array[sptr] = '\0';
X
X	    esc_char = htoi( array + 2 );
X
X	    array[sptr] = c;
X
X	    return ( esc_char );
X	    }
X
X	default:
X	    return ( array[1] );
X	}
X    }
X
X
X/* otoi - convert an octal digit string to an integer value
X *
X * synopsis:
X *    int val, otoi();
X *    Char str[];
X *    val = otoi( str );
X */
X
Xint otoi( str )
XChar str[];
X
X    {
X    int result;
X
X    (void) sscanf( (char *) str, "%o", &result );
X
X    return ( result );
X    }
X
X
X/* readable_form - return the the human-readable form of a character
X *
X * synopsis:
X *    int c;
X *    char *readable_form();
X *    <string> = readable_form( c );
X *
X * The returned string is in static storage.
X */
X
Xchar *readable_form( c )
Xregister int c;
X
X    {
X    static char rform[10];
X
X    if ( (c >= 0 && c < 32) || c >= 127 )
X	{
X	switch ( c )
X	    {
X	    case '\n': return ( "\\n" );
X	    case '\t': return ( "\\t" );
X	    case '\f': return ( "\\f" );
X	    case '\r': return ( "\\r" );
X	    case '\b': return ( "\\b" );
X
X	    default:
X		(void) sprintf( rform, "\\%.3o", c );
X		return ( rform );
X	    }
X	}
X
X    else if ( c == ' ' )
X	return ( "' '" );
X
X    else
X	{
X	rform[0] = c;
X	rform[1] = '\0';
X
X	return ( rform );
X	}
X    }
X
X
X/* reallocate_array - increase the size of a dynamic array */
X
Xvoid *reallocate_array( array, size, element_size )
Xvoid *array;
Xint size, element_size;
X
X    {
X    register void *new_array;
X
X    /* same worry as in allocate_array(): */
X    if ( size * element_size <= 0 )
X        flexfatal( "attempt to increase array size by less than 1 byte" );
X
X    new_array =
X	(void *) realloc( (char *)array, (unsigned) (size * element_size ));
X
X    if ( new_array == NULL )
X	flexfatal( "attempt to increase array size failed" );
X
X    return ( new_array );
X    }
X
X
X/* skelout - write out one section of the skeleton file
X *
X * synopsis
X *    skelout();
X *
X * DESCRIPTION
X *    Copies from skelfile to stdout until a line beginning with "%%" or
X *    EOF is found.
X */
Xvoid skelout()
X
X    {
X    char buf[MAXLINE];
X
X    while ( fgets( buf, MAXLINE, skelfile ) != NULL )
X	if ( buf[0] == '%' && buf[1] == '%' )
X	    break;
X	else
X	    fputs( buf, stdout );
X    }
X
X
X/* transition_struct_out - output a yy_trans_info structure
X *
X * synopsis
X *     int element_v, element_n;
X *     transition_struct_out( element_v, element_n );
X *
X * outputs the yy_trans_info structure with the two elements, element_v and
X * element_n.  Formats the output with spaces and carriage returns.
X */
X
Xvoid transition_struct_out( element_v, element_n )
Xint element_v, element_n;
X
X    {
X    printf( "%7d, %5d,", element_v, element_n );
X
X    datapos += TRANS_STRUCT_PRINT_LENGTH;
X
X    if ( datapos >= 75 )
X	{
X	putchar( '\n' );
X
X	if ( ++dataline % 10 == 0 )
X	    putchar( '\n' );
X
X	datapos = 0;
X	}
X    }
END_OF_FILE
  if test 14912 -ne `wc -c <'misc.c'`; then
    echo shar: \"'misc.c'\" unpacked with wrong size!
  fi
  # end of 'misc.c'
fi
if test -f 'nfa.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'nfa.c'\"
else
  echo shar: Extracting \"'nfa.c'\" \(17603 characters\)
  sed "s/^X//" >'nfa.c' <<'END_OF_FILE'
X/* nfa - NFA construction routines */
X
X/*-
X * Copyright (c) 1990 The Regents of the University of California.
X * All rights reserved.
X *
X * This code is derived from software contributed to Berkeley by
X * Vern Paxson.
X * 
X * The United States Government has rights in this work pursuant
X * to contract no. DE-AC03-76SF00098 between the United States
X * Department of Energy and the University of California.
X *
X * Redistribution and use in source and binary forms are permitted provided
X * that: (1) source distributions retain this entire copyright notice and
X * comment, and (2) distributions including binaries display the following
X * acknowledgement:  ``This product includes software developed by the
X * University of California, Berkeley and its contributors'' in the
X * documentation or other materials provided with the distribution and in
X * all advertising materials mentioning features or use of this software.
X * Neither the name of the University nor the names of its contributors may
X * be used to endorse or promote products derived from this software without
X * specific prior written permission.
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
X * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
X * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
X */
X
X#ifndef lint
Xstatic char rcsid[] =
X    "@(#) $Header: /usr/fsys/odin/a/vern/flex/RCS/nfa.c,v 2.6 90/06/27 23:48:29 vern Exp $ (LBL)";
X#endif
X
X#include "flexdef.h"
X
X
X/* declare functions that have forward references */
X
Xint dupmachine PROTO((int));
Xvoid mkxtion PROTO((int, int));
X
X
X/* add_accept - add an accepting state to a machine
X *
X * synopsis
X *
X *   add_accept( mach, accepting_number );
X *
X * accepting_number becomes mach's accepting number.
X */
X
Xvoid add_accept( mach, accepting_number )
Xint mach, accepting_number;
X
X    {
X    /* hang the accepting number off an epsilon state.  if it is associated
X     * with a state that has a non-epsilon out-transition, then the state
X     * will accept BEFORE it makes that transition, i.e., one character
X     * too soon
X     */
X
X    if ( transchar[finalst[mach]] == SYM_EPSILON )
X	accptnum[finalst[mach]] = accepting_number;
X
X    else
X	{
X	int astate = mkstate( SYM_EPSILON );
X	accptnum[astate] = accepting_number;
X	mach = link_machines( mach, astate );
X	}
X    }
X
X
X/* copysingl - make a given number of copies of a singleton machine
X *
X * synopsis
X *
X *   newsng = copysingl( singl, num );
X *
X *     newsng - a new singleton composed of num copies of singl
X *     singl  - a singleton machine
X *     num    - the number of copies of singl to be present in newsng
X */
X
Xint copysingl( singl, num )
Xint singl, num;
X
X    {
X    int copy, i;
X
X    copy = mkstate( SYM_EPSILON );
X
X    for ( i = 1; i <= num; ++i )
X	copy = link_machines( copy, dupmachine( singl ) );
X
X    return ( copy );
X    }
X
X
X/* dumpnfa - debugging routine to write out an nfa
X *
X * synopsis
X *    int state1;
X *    dumpnfa( state1 );
X */
X
Xvoid dumpnfa( state1 )
Xint state1;
X
X    {
X    int sym, tsp1, tsp2, anum, ns;
X
X    fprintf( stderr, "\n\n********** beginning dump of nfa with start state %d\n",
X	     state1 );
X
X    /* we probably should loop starting at firstst[state1] and going to
X     * lastst[state1], but they're not maintained properly when we "or"
X     * all of the rules together.  So we use our knowledge that the machine
X     * starts at state 1 and ends at lastnfa.
X     */
X
X    /* for ( ns = firstst[state1]; ns <= lastst[state1]; ++ns ) */
X    for ( ns = 1; ns <= lastnfa; ++ns )
X	{
X	fprintf( stderr, "state # %4d\t", ns );
X
X	sym = transchar[ns];
X	tsp1 = trans1[ns];
X	tsp2 = trans2[ns];
X	anum = accptnum[ns];
X
X	fprintf( stderr, "%3d:  %4d, %4d", sym, tsp1, tsp2 );
X
X	if ( anum != NIL )
X	    fprintf( stderr, "  [%d]", anum );
X
X	fprintf( stderr, "\n" );
X	}
X
X    fprintf( stderr, "********** end of dump\n" );
X    }
X
X
X/* dupmachine - make a duplicate of a given machine
X *
X * synopsis
X *
X *   copy = dupmachine( mach );
X *
X *     copy - holds duplicate of mach
X *     mach - machine to be duplicated
X *
X * note that the copy of mach is NOT an exact duplicate; rather, all the
X * transition states values are adjusted so that the copy is self-contained,
X * as the original should have been.
X *
X * also note that the original MUST be contiguous, with its low and high
X * states accessible by the arrays firstst and lastst
X */
X
Xint dupmachine( mach )
Xint mach;
X
X    {
X    int i, init, state_offset;
X    int state = 0;
X    int last = lastst[mach];
X
X    for ( i = firstst[mach]; i <= last; ++i )
X	{
X	state = mkstate( transchar[i] );
X
X	if ( trans1[i] != NO_TRANSITION )
X	    {
X	    mkxtion( finalst[state], trans1[i] + state - i );
X
X	    if ( transchar[i] == SYM_EPSILON && trans2[i] != NO_TRANSITION )
X		mkxtion( finalst[state], trans2[i] + state - i );
X	    }
X
X	accptnum[state] = accptnum[i];
X	}
X
X    if ( state == 0 )
X	flexfatal( "empty machine in dupmachine()" );
X
X    state_offset = state - i + 1;
X
X    init = mach + state_offset;
X    firstst[init] = firstst[mach] + state_offset;
X    finalst[init] = finalst[mach] + state_offset;
X    lastst[init] = lastst[mach] + state_offset;
X
X    return ( init );
X    }
X
X
X/* finish_rule - finish up the processing for a rule
X *
X * synopsis
X *
X *   finish_rule( mach, variable_trail_rule, headcnt, trailcnt );
X *
X * An accepting number is added to the given machine.  If variable_trail_rule
X * is true then the rule has trailing context and both the head and trail
X * are variable size.  Otherwise if headcnt or trailcnt is non-zero then
X * the machine recognizes a pattern with trailing context and headcnt is
X * the number of characters in the matched part of the pattern, or zero
X * if the matched part has variable length.  trailcnt is the number of
X * trailing context characters in the pattern, or zero if the trailing
X * context has variable length.
X */
X
Xvoid finish_rule( mach, variable_trail_rule, headcnt, trailcnt )
Xint mach, variable_trail_rule, headcnt, trailcnt;
X
X    {
X    add_accept( mach, num_rules );
X
X    /* we did this in new_rule(), but it often gets the wrong
X     * number because we do it before we start parsing the current rule
X     */
X    rule_linenum[num_rules] = linenum;
X
X    /* if this is a continued action, then the line-number has
X     * already been updated, giving us the wrong number
X     */
X    if ( continued_action )
X	--rule_linenum[num_rules];
X
X    fprintf( temp_action_file, "case %d:\n", num_rules );
X
X    if ( variable_trail_rule )
X	{
X	rule_type[num_rules] = RULE_VARIABLE;
X
X	if ( performance_report )
X	    fprintf( stderr, "Variable trailing context rule at line %d\n",
X		     rule_linenum[num_rules] );
X
X	variable_trailing_context_rules = true;
X	}
X
X    else
X	{
X	rule_type[num_rules] = RULE_NORMAL;
X
X	if ( headcnt > 0 || trailcnt > 0 )
X	    {
X	    /* do trailing context magic to not match the trailing characters */
X	    char *scanner_cp = "yy_c_buf_p = yy_cp";
X	    char *scanner_bp = "yy_bp";
X
X	    fprintf( temp_action_file,
X	"*yy_cp = yy_hold_char; /* undo effects of setting up yytext */\n" );
X
X	    if ( headcnt > 0 )
X		fprintf( temp_action_file, "%s = %s + %d;\n",
X			 scanner_cp, scanner_bp, headcnt );
X
X	    else
X		fprintf( temp_action_file,
X			 "%s -= %d;\n", scanner_cp, trailcnt );
X	
X	    fprintf( temp_action_file,
X		     "YY_DO_BEFORE_ACTION; /* set up yytext again */\n" );
X	    }
X	}
X
X    line_directive_out( temp_action_file );
X    }
X
X
X/* link_machines - connect two machines together
X *
X * synopsis
X *
X *   new = link_machines( first, last );
X *
X *     new    - a machine constructed by connecting first to last
X *     first  - the machine whose successor is to be last
X *     last   - the machine whose predecessor is to be first
X *
X * note: this routine concatenates the machine first with the machine
X *  last to produce a machine new which will pattern-match first first
X *  and then last, and will fail if either of the sub-patterns fails.
X *  FIRST is set to new by the operation.  last is unmolested.
X */
X
Xint link_machines( first, last )
Xint first, last;
X
X    {
X    if ( first == NIL )
X	return ( last );
X
X    else if ( last == NIL )
X	return ( first );
X
X    else
X	{
X	mkxtion( finalst[first], last );
X	finalst[first] = finalst[last];
X	lastst[first] = max( lastst[first], lastst[last] );
X	firstst[first] = min( firstst[first], firstst[last] );
X
X	return ( first );
X	}
X    }
X
X
X/* mark_beginning_as_normal - mark each "beginning" state in a machine
X *                            as being a "normal" (i.e., not trailing context-
X *                            associated) states
X *
X * synopsis
X *
X *   mark_beginning_as_normal( mach )
X *
X *     mach - machine to mark
X *
X * The "beginning" states are the epsilon closure of the first state
X */
X
Xvoid mark_beginning_as_normal( mach )
Xregister int mach;
X
X    {
X    switch ( state_type[mach] )
X	{
X	case STATE_NORMAL:
X	    /* oh, we've already visited here */
X	    return;
X
X	case STATE_TRAILING_CONTEXT:
X	    state_type[mach] = STATE_NORMAL;
X
X	    if ( transchar[mach] == SYM_EPSILON )
X		{
X		if ( trans1[mach] != NO_TRANSITION )
X		    mark_beginning_as_normal( trans1[mach] );
X
X		if ( trans2[mach] != NO_TRANSITION )
X		    mark_beginning_as_normal( trans2[mach] );
X		}
X	    break;
X
X	default:
X	    flexerror( "bad state type in mark_beginning_as_normal()" );
X	    break;
X	}
X    }
X
X
X/* mkbranch - make a machine that branches to two machines
X *
X * synopsis
X *
X *   branch = mkbranch( first, second );
X *
X *     branch - a machine which matches either first's pattern or second's
X *     first, second - machines whose patterns are to be or'ed (the | operator)
X *
X * note that first and second are NEITHER destroyed by the operation.  Also,
X * the resulting machine CANNOT be used with any other "mk" operation except
X * more mkbranch's.  Compare with mkor()
X */
X
Xint mkbranch( first, second )
Xint first, second;
X
X    {
X    int eps;
X
X    if ( first == NO_TRANSITION )
X	return ( second );
X
X    else if ( second == NO_TRANSITION )
X	return ( first );
X
X    eps = mkstate( SYM_EPSILON );
X
X    mkxtion( eps, first );
X    mkxtion( eps, second );
X
X    return ( eps );
X    }
X
X
X/* mkclos - convert a machine into a closure
X *
X * synopsis
X *   new = mkclos( state );
X *
X *     new - a new state which matches the closure of "state"
X */
X
Xint mkclos( state )
Xint state;
X
X    {
X    return ( mkopt( mkposcl( state ) ) );
X    }
X
X
X/* mkopt - make a machine optional
X *
X * synopsis
X *
X *   new = mkopt( mach );
X *
X *     new  - a machine which optionally matches whatever mach matched
X *     mach - the machine to make optional
X *
X * notes:
X *     1. mach must be the last machine created
X *     2. mach is destroyed by the call
X */
X
Xint mkopt( mach )
Xint mach;
X
X    {
X    int eps;
X
X    if ( ! SUPER_FREE_EPSILON(finalst[mach]) )
X	{
X	eps = mkstate( SYM_EPSILON );
X	mach = link_machines( mach, eps );
X	}
X
X    /* can't skimp on the following if FREE_EPSILON(mach) is true because
X     * some state interior to "mach" might point back to the beginning
X     * for a closure
X     */
X    eps = mkstate( SYM_EPSILON );
X    mach = link_machines( eps, mach );
X
X    mkxtion( mach, finalst[mach] );
X
X    return ( mach );
X    }
X
X
X/* mkor - make a machine that matches either one of two machines
X *
X * synopsis
X *
X *   new = mkor( first, second );
X *
X *     new - a machine which matches either first's pattern or second's
X *     first, second - machines whose patterns are to be or'ed (the | operator)
X *
X * note that first and second are both destroyed by the operation
X * the code is rather convoluted because an attempt is made to minimize
X * the number of epsilon states needed
X */
X
Xint mkor( first, second )
Xint first, second;
X
X    {
X    int eps, orend;
X
X    if ( first == NIL )
X	return ( second );
X
X    else if ( second == NIL )
X	return ( first );
X
X    else
X	{
X	/* see comment in mkopt() about why we can't use the first state
X	 * of "first" or "second" if they satisfy "FREE_EPSILON"
X	 */
X	eps = mkstate( SYM_EPSILON );
X
X	first = link_machines( eps, first );
X
X	mkxtion( first, second );
X
X	if ( SUPER_FREE_EPSILON(finalst[first]) &&
X	     accptnum[finalst[first]] == NIL )
X	    {
X	    orend = finalst[first];
X	    mkxtion( finalst[second], orend );
X	    }
X
X	else if ( SUPER_FREE_EPSILON(finalst[second]) &&
X		  accptnum[finalst[second]] == NIL )
X	    {
X	    orend = finalst[second];
X	    mkxtion( finalst[first], orend );
X	    }
X
X	else
X	    {
X	    eps = mkstate( SYM_EPSILON );
X
X	    first = link_machines( first, eps );
X	    orend = finalst[first];
X
X	    mkxtion( finalst[second], orend );
X	    }
X	}
X
X    finalst[first] = orend;
X    return ( first );
X    }
X
X
X/* mkposcl - convert a machine into a positive closure
X *
X * synopsis
X *   new = mkposcl( state );
X *
X *    new - a machine matching the positive closure of "state"
X */
X
Xint mkposcl( state )
Xint state;
X
X    {
X    int eps;
X
X    if ( SUPER_FREE_EPSILON(finalst[state]) )
X	{
X	mkxtion( finalst[state], state );
X	return ( state );
X	}
X
X    else
X	{
X	eps = mkstate( SYM_EPSILON );
X	mkxtion( eps, state );
X	return ( link_machines( state, eps ) );
X	}
X    }
X
X
X/* mkrep - make a replicated machine
X *
X * synopsis
X *   new = mkrep( mach, lb, ub );
X *
X *    new - a machine that matches whatever "mach" matched from "lb"
X *          number of times to "ub" number of times
X *
X * note
X *   if "ub" is INFINITY then "new" matches "lb" or more occurrences of "mach"
X */
X
Xint mkrep( mach, lb, ub )
Xint mach, lb, ub;
X
X    {
X    int base_mach, tail, copy, i;
X
X    base_mach = copysingl( mach, lb - 1 );
X
X    if ( ub == INFINITY )
X	{
X	copy = dupmachine( mach );
X	mach = link_machines( mach,
X			      link_machines( base_mach, mkclos( copy ) ) );
X	}
X
X    else
X	{
X	tail = mkstate( SYM_EPSILON );
X
X	for ( i = lb; i < ub; ++i )
X	    {
X	    copy = dupmachine( mach );
X	    tail = mkopt( link_machines( copy, tail ) );
X	    }
X
X	mach = link_machines( mach, link_machines( base_mach, tail ) );
X	}
X
X    return ( mach );
X    }
X
X
X/* mkstate - create a state with a transition on a given symbol
X *
X * synopsis
X *
X *   state = mkstate( sym );
X *
X *     state - a new state matching sym
X *     sym   - the symbol the new state is to have an out-transition on
X *
X * note that this routine makes new states in ascending order through the
X * state array (and increments LASTNFA accordingly).  The routine DUPMACHINE
X * relies on machines being made in ascending order and that they are
X * CONTIGUOUS.  Change it and you will have to rewrite DUPMACHINE (kludge
X * that it admittedly is)
X */
X
Xint mkstate( sym )
Xint sym;
X
X    {
X    if ( ++lastnfa >= current_mns )
X	{
X	if ( (current_mns += MNS_INCREMENT) >= MAXIMUM_MNS )
X	    lerrif( "input rules are too complicated (>= %d NFA states)",
X		    current_mns );
X	
X	++num_reallocs;
X
X	firstst = reallocate_integer_array( firstst, current_mns );
X	lastst = reallocate_integer_array( lastst, current_mns );
X	finalst = reallocate_integer_array( finalst, current_mns );
X	transchar = reallocate_integer_array( transchar, current_mns );
X	trans1 = reallocate_integer_array( trans1, current_mns );
X	trans2 = reallocate_integer_array( trans2, current_mns );
X	accptnum = reallocate_integer_array( accptnum, current_mns );
X	assoc_rule = reallocate_integer_array( assoc_rule, current_mns );
X	state_type = reallocate_integer_array( state_type, current_mns );
X	}
X
X    firstst[lastnfa] = lastnfa;
X    finalst[lastnfa] = lastnfa;
X    lastst[lastnfa] = lastnfa;
X    transchar[lastnfa] = sym;
X    trans1[lastnfa] = NO_TRANSITION;
X    trans2[lastnfa] = NO_TRANSITION;
X    accptnum[lastnfa] = NIL;
X    assoc_rule[lastnfa] = num_rules;
X    state_type[lastnfa] = current_state_type;
X
X    /* fix up equivalence classes base on this transition.  Note that any
X     * character which has its own transition gets its own equivalence class.
X     * Thus only characters which are only in character classes have a chance
X     * at being in the same equivalence class.  E.g. "a|b" puts 'a' and 'b'
X     * into two different equivalence classes.  "[ab]" puts them in the same
X     * equivalence class (barring other differences elsewhere in the input).
X     */
X
X    if ( sym < 0 )
X	{
X	/* we don't have to update the equivalence classes since that was
X	 * already done when the ccl was created for the first time
X	 */
X	}
X
X    else if ( sym == SYM_EPSILON )
X	++numeps;
X
X    else
X	{
X	if ( useecs )
X	    /* map NUL's to csize */
X	    mkechar( sym ? sym : csize, nextecm, ecgroup );
X	}
X
X    return ( lastnfa );
X    }
X
X
X/* mkxtion - make a transition from one state to another
X *
X * synopsis
X *
X *   mkxtion( statefrom, stateto );
X *
X *     statefrom - the state from which the transition is to be made
X *     stateto   - the state to which the transition is to be made
X */
X
Xvoid mkxtion( statefrom, stateto )
Xint statefrom, stateto;
X
X    {
X    if ( trans1[statefrom] == NO_TRANSITION )
X	trans1[statefrom] = stateto;
X
X    else if ( (transchar[statefrom] != SYM_EPSILON) ||
X	      (trans2[statefrom] != NO_TRANSITION) )
X	flexfatal( "found too many transitions in mkxtion()" );
X
X    else
X	{ /* second out-transition for an epsilon state */
X	++eps2;
X	trans2[statefrom] = stateto;
X	}
X    }
X
X/* new_rule - initialize for a new rule
X *
X * synopsis
X *
X *   new_rule();
X *
X * the global num_rules is incremented and the any corresponding dynamic
X * arrays (such as rule_type[]) are grown as needed.
X */
X
Xvoid new_rule()
X
X    {
X    if ( ++num_rules >= current_max_rules )
X	{
X	++num_reallocs;
X	current_max_rules += MAX_RULES_INCREMENT;
X	rule_type = reallocate_integer_array( rule_type, current_max_rules );
X	rule_linenum =
X	    reallocate_integer_array( rule_linenum, current_max_rules );
X	}
X
X    if ( num_rules > MAX_RULE )
X	lerrif( "too many rules (> %d)!", MAX_RULE );
X
X    rule_linenum[num_rules] = linenum;
X    }
END_OF_FILE
  if test 17603 -ne `wc -c <'nfa.c'`; then
    echo shar: \"'nfa.c'\" unpacked with wrong size!
  fi
  # end of 'nfa.c'
fi
echo shar: End of archive 8 \(of 10\).
cp /dev/null ark8isdone
MISSING=""
for I in 1 2 3 4 5 6 7 8 9 10 ; do
    if test ! -f ark${I}isdone ; then
	MISSING="${MISSING} ${I}"
    fi
done
if test "${MISSING}" = "" ; then
    echo You have unpacked all 10 archives.
    rm -f ark[1-9]isdone ark[1-9][0-9]isdone
else
    echo You still must unpack the following archives:
    echo "        " ${MISSING}
fi
exit 0
exit 0 # Just in case...
-- 
Please send comp.sources.unix-related mail to rsalz at uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.



More information about the Comp.sources.unix mailing list