Eliminating Duplicate Mail Headers

Tom Christiansen tchrist at convex.COM
Thu May 2 09:47:39 AEST 1991


>From the keyboard of lyndon at cs.athabascau.ca (Lyndon Nerenberg):
:[ Tried mailing this but oss670.uucp was unknown to us ]

right, me too.

:In comp.mail.headers you write:
:
:>I'm not able to fix the mailer myself, but can pass its output
:>through standard filters--awk, sed, etc.--before it goes
:>out the door.  My first thought was to pass things through 'uniq',
:>but this would also delete consecutive identical lines in the body (the
:>mailer doesn't distinguish between header and body).  The probability
:>of consecutive, identical lines in the body of mail messages seems
:>low, but not low enough to chance this.
:
:You almost answered your own question :-)
:
:Use sed to split the headers and body into seperate files. Run the header
:file through sort|uniq, then append the body file. Note that you will 
:have to deal with header continuation lines somehow. A short piece of
:C code should handle folding the headers, and unfolding them when you're
:done.

That's a lot of work!!


:Perhaps the easiest way to deal with this would be to write the entire
:filter in C. All you need to do is maintain a linked list of headers
:you have seen. During the scanning phase, if you encounter a header that's
:already on the linked list, ignore it (and any possible continuation
:lines). If it's a new header, start up a second linked list of lines
:containing the header contents. If there are continuation lines in the
:header, simply append them to the linked list for that header. This
:eliminates the need to fold/spindle/mutilate the header continuation
:lines.

:Once you've fallen out of the headers, just copy the message body
:through and you're done!

That's a HELLUVA lotta work!

Here's an awk solution:

    #!/bin/awk -f
    /^$/ { body = 1 }
    {
        if (!body) {
            if (lastline == $0) next
            lastline = $0
        }
        print
    }

And here's a perl solution:

    perl -ne 'print if (/^$/ .. eof)  || $lastline ne $_; $lastline = $_'


If you want solutions for non-consecutive or especially multi-line
headers, ask, but I can lay odds they'll be in perl. :-)

--tom



More information about the Comp.unix.questions mailing list