yacc sorrows

Paul Stath prs at tcsc3b2.tcsc.com
Sat Feb 10 04:15:57 AEST 1990


evil at arcturus.UUCP (Wade Guthrie) writes:

[... Introduction text deleted ...]

>My problem is this: I am trying to get access to the strings that
>got matched by lex to make the tokens which are passed to yacc.
>Given this, I can do the job (I think).  This is on a sun 3/60 under 
>the 3.4 version of the operating system.  After RTFMing (and
>gratuitous consultation of my local guru), I got to the part that 
>says "the programmer includes in the declaration section [of the 
>yacc grammar] %union { body }  This declares the yacc value stack 
>[...] the value is referenced through a $$ or $n construction, yacc 
>automatically inserts the appropriate union name", or some such.  
>I tried this approach (and another that I will get to soon).

The string that gets matched in LEX is stored in a character pointer called
`yytext'.

>At this point, I would like to give an example of what I think the
>pertinent pieces of code are.  The lex source looks something like:

>	%{
>	#include "y.tab.h"
>	[...]
>	%}
>	[...]
>	%%
>	auto        { return(AUTO); }
>	register    { return(REGISTER); }
>	[...]
>	"->"        { return(ARROW); }
>	";"         { return(SEMICOLON); }
>	.           { return(yytext[0]); }

>And the yacc grammar that looks like . . .

>	%{
>	#include <stdio.h>
>	[...]
>	%}
>	%union VALTYPE {
>	    int type;
>	    char *string;
>	};
>	%token AUTO REGISTER STATIC EXTERN TYPEDEF ENUM
>	[...]
>	%token COMMA SEMICOLON
>	%left   COMMA
>	[...]
>	%left   ARROW '.'
>	%%  
>	translation_unit
>	    : external_declaration
>	    | translation_unit external_declaration
>	    ;
>	function_definition
>	    : decln_spec declarator decln_list compound_statement
>		{ printf("Found function %s\n",$2);}
>	    | decln_spec declarator compound_statement
>		{ printf("Found function %s\n",$2);}
>	    | declarator decln_list compound_statement
>		{ printf("Found function %s\n",$1);}
>	    | declarator compound_statement
>		{ printf("Found function %s\n",$1);}
>	    ;
>	[. . .]

>For those that care, my y.tab.h looks something like:

>	typedef union  VALTYPE {
>	    int type;
>	    char *string;
>	} YYSTYPE;

>	# define AUTO 257
>	# define REGISTER 258
>	[...]
>	# define COMMA 318
>	# define SEMICOLON 319

>Which should be okay, since I compile my grammar with:

>	yacc -vd grammar.y

>Assuming that my interpretation of the manual is correct, this
>(may I call your attention to the function_definition rule of the
>yacc grammar) should give me the proper info.  Instead, the $n 
>values turn out to be NULL pointers.

Your interpretation of the manual is ALMOST correct, but not quite.  YACC allows
the $n construct as a shorthand to access the stack of token values that are
being shift'ed during the parse sequence.  By doing the:

%union {
	/* union declaration */
}

declareation in YACC, you redefine the stack type to be something other
than int.  Since YACC and LEX are -mostly- independent programs, you have
to do a little bit more work when you change the default stack.

In your LEX actions which find strings, you need to allocate space for the
string to be stored, point the stack pointer at that allocated space,
and then copy the string into that space.

Here is an example from the LEX code for that parser:

%{
#include "y.tab.h"

char	*str_ptr;
%}
%%
[....]
${alpha}{alphanum}*	{
				yylval.str=malloc(strlen(yytext)+1);
				strcpy(yylval.str, yytext);
				return (Identifier);
			}
[....]
.					return (yytext[0]);
%%


Here is the relevant YACC code for that parser:
%{

extern	char	yytext[];
%}

%union {
	char	*str;
}

[......]
%token <str> Identifier
[......]
%%
[......]
file_declaration:
		FILE Identifier sysname ';'
		{
			token_in(token_init(Identifier, $2));
			file_rec_insert (&token_anchor);
		}
	;
[......]
%%
[......]

>Anyone got any ideas?  Can normal yacc and lex do this sort of
>thing?  How?  In lieu of this, can you name a good single malt
>whiskey in which to drown my programming sorrows?

LEX and YACC are powerful tools which IMHO are poorly documented.  Just
RTFM'ing will NOT help.  You have to read both between the lines, and
sometimes THROUGH the page to find out what you want.  Making LEX and
YACC do the work is MUCH easier than righting a parser yourself,
but the learning curve involved is VERY high!  I spent almost 3 months up
to my elbows in YACC, LEX and C code to produce a parser for a database
report language.  (Not terribly complex grammer, but hard enough.)  Most of
my knowledge of LEX and YACC came from poking it until it broke.  I STILL
refer back to this code whenever I need to do something tricky, because I
probably had to do it to write the report language parser.

I would like to see this thing posted to the net if it is not something
you are doing on company time.  I would be happy to help it you have any
other YACC or LEX questions.  Just E-Mail.

Just another application hacker drowning in a world of suits and stuffed shirts!
-- 
===============================================================================
Paul R. Stath       The Computer Solution Co., Inc.       Voice: 804-794-3491
------------------------------------------------+------------------------------
INTERNET:	prs at tcsc3b2.tcsc.com		| "There was no diety involved,



More information about the Comp.lang.c mailing list