Stupid awk question

Mitchell Wyle wyle at inf.ethz.ch
Thu Oct 12 03:00:11 AEST 1989


In article <DMAUSTIN.89Oct10145918 at vivid.sun.com> 
dmaustin at vivid.sun.com (Darren Austin) writes:

>I am trying to split up a large output file into several
>smaller files.  The smaller files are named after the value in
>the second field.  I tried using the following simple awk script,
>
>(current == $2) {print > current".summary"}
>(current != $2) {close(current".summary");
> current=$2;print > current".summary";}
>
>but it fails with 
>
>awk: too many output files 10


Even though everyone will soon have new awk and all these old awk problems
will go away, I think this question deserves to be in the "Frequently
asked questions and answers" periodic postings.  Who moderates it?  How
should one post to it?

* * *

To answer the question, I shall quote verbatum an old article.

>>I am trying to use AWK to split one file into many formatted, smaller files.
>>The problem I am having is that I cannot output to more than 10 files...
>  
> Well, it won't help you right now, but the long-term fix is to complain
> to your software supplier and ask them to get rid of the silly limit.
> It's not that hard.

The limits are based on the number of file descriptors that can be open
at one time (usually small).  One way that I often get around this is
by writting something like this which splits up the input on the field
$1 .

sort +0 |
awk '
{
        if (last != $1) {
                if (NR > 0) print "!XYZZY";
                print "cat > " $1 "<<!XYZZY";
                last = $1;
        }
        print;
}
END { if (NR > 0) print "!XYZZY"; }' | /bin/sh

        Tony O'Hagan                    tonyo at qitfit.qitcs.oz

* * *

I use Tony's solution all the time.  I have seen it used by at least
two other people (David Goodenough and Amos Shapiro) in shell scripts
posted to the net.

It is very important to put that trailing End_of_Here_Document string
in the END clause of your awk program!  Depending on the complexity of
your parse, you might need other cleanup code  there as well.

Happy hacking, 

-Mitchell F. Wyle
Institut fuer Informationssysteme         wyle at inf.ethz.ch 
ETH Zentrum / 8092 Zurich, Switzerland    +41 1 256 5237
--
If this appears in _IN_MODERATION_ or ClariNet, please let me know.
I am forbidden to tell you that you can reach me at:
...!uunet!mcvax!ethz!wyle   or    wyle at rascal.ics.utexas.edu



More information about the Comp.unix.questions mailing list