Finding words in paragraphs (was: Help a novice: Will "sed" do?)

John Lacey lacey at batcomputer.tn.cornell.edu
Tue Jul 18 02:41:30 AEST 1989


As regards the question of finding paragraphs in text which 
contain a particular word, I sent the following reply directly
to the asker of the question.  But then I saw the reply that no Unix
utility could handle this, and I have to disagree.  Awk will handle
this case with no problem.  Certainly the Awk solution is much nicer
than the previous proposal.

----------------

Awk is what you want in this case.  Try something like this:

	awk 'BEGIN { FS = ""; RS = "\n"} /the-word-here/' the-filename-here

Awk is a series of pattern-action pairs.  Whenever text matching the pattern
is recognized, the associated action is taken.  BEGIN is a special action
that matches exactly once, before the input file is read.  END is the
related pattern for after a file has reached EOF.

FS is the field separator, RS is the record separator.  So, we set RS to
a newline to make each paragraph (separated by a blank line) a different
record.  Then, we search for the word in question.  Patterns in Awk are
egrep-type regular expressions, bounded by /'s.  I left off the action,
to save space.  Any missing action is taken to be a print-the-record.
You can do this explicitly with a print command.

Awk is a lovely language.  I write a lot of one liners like this, and
I also use it to write reasonably large applications (including a small
relational database).

If you don't have awk documentation around, there is a book by Aho, 
Kernighan, and Weinberger (A, W, K) called, appropriately, the 
AWK Programming Language, that explains the whole thing.

Good luck, and cheers,


-- 
John Lacey           |  Internet:  lacey at tcgould.tn.cornell.edu
running unattached   |  BITnet:    lacey at crnlthry
                     |  UUCP:      cornell!batcomputer!lacey
"Whereof one cannot speak, thereof one must remain silent."  ---Wittgenstein
-- 
John Lacey           |  Internet:  lacey at tcgould.tn.cornell.edu
running unattached   |  BITnet:    lacey at crnlthry
                     |  UUCP:      cornell!batcomputer!lacey
"Whereof one cannot speak, thereof one must remain silent."  ---Wittgenstein



More information about the Comp.unix.questions mailing list