Eliminating Duplicate Mail Headers

Lyndon Nerenberg lyndon at cs.athabascau.ca
Thu May 2 08:44:09 AEST 1991


[ Tried mailing this but oss670.uucp was unknown to us ]

In comp.mail.headers you write:

>I'm not able to fix the mailer myself, but can pass its output
>through standard filters--awk, sed, etc.--before it goes
>out the door.  My first thought was to pass things through 'uniq',
>but this would also delete consecutive identical lines in the body (the
>mailer doesn't distinguish between header and body).  The probability
>of consecutive, identical lines in the body of mail messages seems
>low, but not low enough to chance this.

You almost answered your own question :-)

Use sed to split the headers and body into seperate files. Run the header
file through sort|uniq, then append the body file. Note that you will 
have to deal with header continuation lines somehow. A short piece of
C code should handle folding the headers, and unfolding them when you're
done.

Perhaps the easiest way to deal with this would be to write the entire
filter in C. All you need to do is maintain a linked list of headers
you have seen. During the scanning phase, if you encounter a header that's
already on the linked list, ignore it (and any possible continuation
lines). If it's a new header, start up a second linked list of lines
containing the header contents. If there are continuation lines in the
header, simply append them to the linked list for that header. This
eliminates the need to fold/spindle/mutilate the header continuation
lines.

Once you've fallen out of the headers, just copy the message body
through and you're done!



More information about the Comp.unix.questions mailing list