You, too, can look at strings.

Cameron Laird cl at lgc.com
Thu Feb 21 02:02:04 AEST 1991


I asked for help extracting string constants from source code.
I summarize the responses I received:
1.  my own was to write (approximately)
		echo 's/"[^"]*$/"/
		s/[^"]*"/"/' >/tmp/string_script
		grep '".*"' | tee /tmp/string_list | \
			sed -f /tmp/string_script | ...
		rm /tmp/string_script
    as part of a filter.  The filter does these things:
    a.  puts a grep-listing (not egrep, not fgrep, but grep)
        of all lines with at least two "-s into /tmp/string_list,
        for my later convenience in examining the contexts where
        the strings occur; and
    b.  copies what's left of those lines after throwing away
        everything before the first " and after the last " to
        stdout.
    This was something I knew how to write in a few minutes,
    and works well enough, although it is ignorant nothing about
    the syntax of C beyond looking for a pair of "-s.
2.  various folks suggested combinations of
	{m,}xstr--available on uunet:bsd-sources/pgrm/{m,}xstr/*
	     I thought this had possibilities, but didn't
	     work with it much.
	cxref
	     I didn't find any quick way to make this do
	     something useful to me.
	strings--this was definitely not what I had in
	     mind (I'm thinking about source code, and,
	     as far as I'm concerned, strings is for work-
	     ing with object files), but I've invoked
	     strings hundreds of times for other chores,
	     and I'm happy to give it a bit of publicity.
3.  a few folks wrote to say that perl could do it in
    one line; no one delivered such a line, but I didn't
    ask.  Does perl remind anyone else of APL?  That's not
    entirely a bad thing ...
4.  comp.compilers publishes each month sites for distribution
    of lexical analyzers and such.  I haven't checked this
    list.  I also received the advice that, "At site
    primost.cs.wisc.edu (128.105.2.115) in directory
    /pub/comp.compilers are files called *grammar.Z
    They contain grammars for lex/yacc for c, c++ ftn
    and pascal.  . . ."
5.  a Swedish HPUX user reported that he relies on findstr,
    in the NLS (Natural Language Support) package that is part
    of HPUX.
6.  William A. Hoffman posted the kind of lapidary answer I expected
    from the net:  a couple dozen lines, definitive (in some sense),
    no-nonsense, functional, and a starting-point for yet more re-
    finements (or arguments).
	
	... string.lex
	--------------------------------------------------------
	string       \"([^"\n]|\\["\n])*\"
	%%
	{string}	printf("%s\n", yytext); return(1);
	\n		;
	.		;
	%%
	main()
		{
		int i;
	
		while(i= yylex())
			;
		}
	
	yywrap()
		{
		}
	------------------------------------------------------------
	to run just:
	lex string.lex
	cc lex.yy.c -o string
	string < *.c

    The moderator noted that this deserved to be beefed up "... to
    handle character constants and comments ..."
7.  One reader wrote that he'd send a finite-state machine which
    models C syntax as soon as he found his copy.  I haven't heard
    from him since.  I'll pass it along when it arrives.
My apologies to Henry Spencer for misremembering his name as "Harry".

Thanks, all.
--
Cameron Laird		USA 713-579-4613
cl at lgc.com		USA 713-996-8546 
-- 
Send compilers articles to compilers at iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.



More information about the Comp.unix.programmer mailing list