Perl version of from (Was: Re: from.sed (v1.2))

Johan Vromans jv at mh.nl
Fri Dec 22 16:11:43 AEST 1989


In article <1989Dec20.222732.5633 at trigraph.uucp> john at trigraph.uucp (John Chew) writes:
   Here's a new version of from.sed, my sed script that does the job
   of from(1) better and faster.  It now truncates long subjects,
   correctly handles messages without subjects and From lines with %
   or @foo: routing.

   Yes, I tried writing this in Perl.  I'm not an expert Perl programmer,
   but I couldn't get it to run faster than about 70% slower than sed.

I've been using a perl version of 'from' for a long time, so I trow it
in. Features:
  - shortens the date, so there's more room for subject
  - shortens long subjects
  - uses "From: " headers if possible
  - provide "<none>" subject
  - automatic determination of system mailbox
  - maybe more
  - output sample:

  Nov 29 00:14 "jv           " Re: your mail through the list got here
  Nov 28 21:21 "David Dyck   " your mail through the list got here
  Nov 29 08:28 "Mark H. Colbu" Re: output compatibility

It runs about as fast as the sed version. Typical times for a large
mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl.

------ begin of from -- ascii -- complete ------
#!/usr/bin/perl

# This program requires perl version 3.0, patchlevel 4 or higher

# Usage "from MAILBOX..."

# Don't forget: perl is a Practical Extract and Report Language!

format =
@<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
$date,        $from,         $subj
.

if ( $#ARGV < 0 ) {
  if ( ! ($user = getlogin)) {
    @a = getpwuid($<);
    $user = $a[0];
  }
  if ( -r "/usr/mail/$user" ) {
    @ARGV = ("/usr/mail/$user");
  }
  elsif ( -r "/usr/spool/mail" ) {
    @ARGV = ("/usr/spool/mail/$user");
  }
  else {
    printf STDERR "No mail for $user.\n";
    exit 1;
  }
}
  

# read through input file(s)
while ( $line = <> ) {
  chop ($line);

  # scan until "From_" header found
  next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
  $from = $1;  
  $date = $2;
  if ( $date eq "" || $from eq "" ) {
    print STDERR "Possible garbage: $line\n";
    next;
  }

  # get user name from uucp path
  $from = $1 if $from =~ /.*!(.+)/;

  # now, scan for Subject or empty line
  $subj = "";
  while ( $line = <> ) {
    chop ($line);

    if ( $line =~ /^$/ || $line =~ /^From / ) {
      # force fall-though
      $subj = "<none>" unless $subj;
    }
    else {
      $subj = $1 if $line =~ /^Subject\s*:\s*(.*)/i;
      if ( $line =~ /^From\s*:\s*/ ) {
        $line = $';
        if ( $line =~ /\((.+)\)/i ) { $from = $1; } 
        elsif ( $line =~ /^\s*(.+)\s*<.+>/i ) { $from = $1; } 
        elsif ( $line =~ /^<.+>\s*(.+)/i ) { $from = $1; } 
      }
    }

    # do we have enough info?
    if ( $from && $subj ) {
      write;
      last;
    }
  }
}
------ end of from -- ascii -- complete ------

Have fun,
Johan
--
Johan Vromans				       jv at mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------



More information about the Alt.sources mailing list