Bandwidth Wasters Hall of Fame - The Code

Fri Sep 22 21:32:53 AEST 1989

Here's a slightly tongue in cheek bandwidth decreasing tool.  Use it in
good health.  Please forgive my beta release shar program that adds a 
blank line at the end of each file, and then complains about it during
unsharing; it doesn't seem to hurt anything.

well!xanthian
Kent, the man from xanth, now just another echo from The Well.

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then feed it
# into a shell via "sh file" or similar.  To overwrite existing files,
# type "sh file -c".
# The tool that generated this appeared in the comp.sources.unix newsgroup;
# send mail to comp-sources-unix at uunet.uu.net if you want that tool.
# If this archive is complete, you will see the following message at the end:
#		"End of shell archive."
# Contents:  bwhf.hype bwhf.csh bwhf1.awk bwhf2.awk bwhf.example_output
# Wrapped by kent as a guest on Thu Sep 21 20:45:03 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'bwhf.hype' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf.hype'\"
else
echo shar: Extracting \"'bwhf.hype'\" $1011 characters$
sed "s/^X//" >'bwhf.hype' <<'END_OF_FILE'
X		    BANDWIDTH WASTERS HALL OF FAME
X
X	     You've seen the postings, now read the code!
X
XHave a group of blowhards taken over your favorite newsgroup, with
Xpostings of negligible content and awesome volume?
X
XIs it getting hard to cut through the chaff in your search to find
Xthose grains of meaning?
X
XAre you mad enough to _take measures_?
X
XDo you wish you had a way to get, not just even, but ahead?
X
XWish no more!  Here are the tools you need to publish your _very own_
XBandwidth Wasters Hall of Fame articles, and point the finger of
X_public ridicule_ at the guilty parties.
X
XEnclosed are two awk scripts, and a cshell script to run them.  These
Xare for a BSD 4.3 system (Sun 4.0.3) with a really wimpy
Ximplementation of awk.  You may have to fiddle things a bit to make it
Xgo on your system, but the basics are here.
X
XRead, enjoy, and most of all, use it to _nail the miscreants_!
X
XYours for an improved signal to noise ratio,
X
Xwell!xanthian
XKent, the man from xanth, now just another echo from The Well.
X

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf.hype'\"
if test 1012 -ne `wc -c <'bwhf.hype'`; then
    echo shar: \"'bwhf.hype'\" unpacked with wrong size!
fi
# end of 'bwhf.hype'
fi
if test -f 'bwhf.csh' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf.csh'\"
else
echo shar: Extracting \"'bwhf.csh'\" \(1689 characters\)
sed "s/^X//" >'bwhf.csh' <<'END_OF_FILE'
X#!/bin/csh
X#
X# bwhf.csh by Kent Paul Dolan - Public Domain
X#
X# Bandwidth Waster's Hall of Fame master shell script; runs two awk
X# scripts with a sort step between them.  Set the execute bit on this
X# file with chmod and put it in your path.  It expects the two awk
X# scripts to be in the current directory, and needs access to the
X# "awk" and "sort" and "date" Unix(tm) commands.  I don't know whether
X# this command set would work under "sh", I didn't try it.
X#
X# usage:  bwhf.csh <path-to-newsgroup-articles> <output-file-name>
X#
X# example: bwhf.csh /usr/spool/news/alt/sources BWHF.alt.sources
X#
X# The first awk script accumulates the statistics for each author in
X# an array, then dumps the array to a temp file for sorting.  The
X# [0-9] are to exclude subordinate directories from being processed as
X# articles.
X#
Xawk -f bwhf1.awk ${1}/[0-9]* > /tmp/$$.bwhf.1
X#
X# The sort step sorts on the bytes wasted column, numerically because it
X# has leading blanks, and reversed because we want to list the worst
X# bandwidth wasters first.
X#
Xsort -nr < /tmp/$$.bwhf.1 > /tmp/$$.bwhf.2
X#
X# The second awk script prints a header, including the path to the
X# newsgroup and the date, prints a line for each byte-burner, then
X# prints a footer with a totals line and an apology for not including
X# "beyond AI" capabilities in the output.
X#
Xawk -f bwhf2.awk newsgrouppath=$1 date="`date`" /tmp/$$.bwhf.2 > $2
X#
X# Clean up the temp files - why wait for a reboot?
X#
Xrm /tmp/$$.bwhf.[1-2]
X#
X# You might want to put this back in, to preview the output before you
X# send it off to your favorite newsgroup; it was giving me fits when I
X# ran this script in background, so I commented it out.
X#
X#more $2
X

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf.csh'\"
if test 1690 -ne `wc -c <'bwhf.csh'`; then
    echo shar: \"'bwhf.csh'\" unpacked with wrong size!
fi
chmod +x 'bwhf.csh'
# end of 'bwhf.csh'
fi
if test -f 'bwhf1.awk' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf1.awk'\"
else
echo shar: Extracting \"'bwhf1.awk'\" \(4966 characters\)
sed "s/^X//" >'bwhf1.awk' <<'END_OF_FILE'
X#
X# bwhf1.awk by Kent Paul Dolan - Public Domain
X#
X# Bandwidth Waster's Hall of Fame first awk script; finds article
X# authors on "From:" line, credits them with the article and the bytes
X# it contains, accumulates byte and article counts into arrays indexed
X# by author (_love_ those associative array indices) , counts total
X# bytes, lists bytes, byte share, articles, author's login, and any
X# other author info from the "From:" line.
X#
X# Fails to merge postings from the same author at different sites,
X# because it is not possible to distinguish the case of different
X# people at different sites with the same login, and the same person
X# and login from different sites, by mechanical means.
X#
X# This script is normally run by csh script bwhf.csh, but anyway, here is:
X#
X# usage: awk -f bwhf1.awk <path-to-newsgroup>/[0-9]* > <outfile-to-sort-step>
X#
X# example: awk -f bwhf1.awk /usr/spool/news/alt/sources/[0-9]* temp1
X#
X# where the [0-9]* takes care of the case of a newsgroup with articles
X# which also has one or more subgroups (whose names won't start with
X# [0-9])
X#
X# Setup a couple of variables for file swapping control and multiple
X# "From:" line detection.
X#
XBEGIN		{
X#
X# use this to detect when we have changed files and need to start a
X# new bytecount for a new file and save the old one to the old
X# author's count.
X#
X		  lastfile = FILENAME
X#
X# Use this to avoid problems with multiple "From:" lines in the same
X# article (not really needed, since awk zeros all variables at
X# creation, but the code is a lot easier to comprehend with this in
X# here): 
X#
X		  sawfrom = 0
X		}
X#
X# Although this is the first record processing code physically,
X# logically it is not executed until the top of the second and
X# subsequent articles of the input, therefore the "From:" code below
X# has been executed once before this code.  This pattern/action pair
X# has to be up here to make sure that the bytecount and sawfrom fields
X# are cleared before any other processing on second and subsequent
X# articles.
X#
X# When the article for the current record has changed:
X#
X# Accumulate the byte count for the previous article for its author
X# (saved as "from" in the "From:" pattern/action set); then clear the
X# bytecount.  Reset the lastfile item to the current file name, and
X# clear sawfrom so that we are again looking for a "From:" line.
X#
Xlastfile != FILENAME	{ bytes[from] = bytes[from] + bytecount
X			  bytecount = 0
X			  lastfile = FILENAME
X			  sawfrom = 0
X			}
X#
X# For every record (line) in the file (article), count it's bytes (the
X# + 1 takes care of the '\n', which is ignored by "length($0)") into a
X# total byte count for the file.
X#
X{ bytecount = bytecount + length($0) + 1 }
X# 
X# One line in the article gets special processing: the _first_ "From:"
X# line.  If we haven't set sawfrom to 1 in this article, and this line
X# _starts_ with "From:", then it is the one we want to identify the
X# author of the article.  Pull the login at site out of the second field
X# as element "from" (the author ID), use it as an array index of an
X# associative array "articles" to (possibly create with contents zero
X# and) bump the article count for this author.  Most authors' posting
X# software includes a vanity ID after the login at site information. Use
X# the index and substr commands to pull that off and store it too,
X# indexed by author in associative array "fromtags" .  The authors who
X# use more than one vanity ID from the same site get the usage from
X# the last of their articles.  Set sawfrom to 1 (true) to avoid
X# processing a second "From:" line where an article includes some of
X# the header of another article without a protecting lead character.
X# 
X/^From:/ && sawfrom == 0 { 
X		  from = $2
X		  articles[from]++
X		  ind = index($0,$2) + length($2) + 1
X		  fromtags[from] = substr($0,ind)
X		  sawfrom = 1
X		}
X# 
X# After all the articles have been processed, we need to add the
X# bytecount for the last article to the credit of the last wastrel,
X# because we don't see another line to process through the "lastfile =
X# FILENAME" pattern/action pair above, which does that crediting for
X# all other articles but the last one.
X# 
X# Loop through the associative byte count array by author to get a
X# total byte count for all the articles, to use in determining an
X# author's share of the total bandwidth waste.  Use that information
X# in a second loop which prints per-author summary information to
X# calculate the share percentage field.  For each author, print the
X# bytes wasted, the waste share, the articles exuded, and the author
X# ID and author vanity ID.
X#
X# The resulting file is ready for the sort step.
X#
XEND	{ bytes[from] = bytes[from] + bytecount
X
X	  for (from in articles)
X	  {
X	    bytestotal = bytestotal + bytes[from]
X	  }
X	  for (from in articles)
X	  {
X	    
X            printf("%8s %6.2f%% %4s  %s %s\n", \
X	           bytes[from], \
X		   (bytes[from]*100)/bytestotal, \
X		   articles[from], \
X		   from, \
X		   fromtags[from])
X	  }
X	}

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf1.awk'\"
if test 4967 -ne `wc -c <'bwhf1.awk'`; then
    echo shar: \"'bwhf1.awk'\" unpacked with wrong size!
fi
# end of 'bwhf1.awk'
fi
if test -f 'bwhf2.awk' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf2.awk'\"
else
echo shar: Extracting \"'bwhf2.awk'\" \(3187 characters\)
sed "s/^X//" >'bwhf2.awk' <<'END_OF_FILE'
X#
X# bwhf2.awk by Kent Paul Dolan - Public Domain
X#
X# Bandwidth Waster's Hall of Fame second awk script; prints header,
X# prints by-author lines and (re)accumulates byte and article totals,
X# prints a footer showing the totals of bytes, share, and article
X# counts.
X#
X# A "sort" step to sort the by-author lines in reverse bytes-wasted
X# order should be run after the first script and before this one to
X# rank the bandwidth wasters from most to least heinous, although this
X# script is not dependent on the sort order of the input lines.
X# 
X# This awk script is normally run by csh script bwhf.csh, but here is:
X#
X# usage: awk -f bwhf2.awk newsgrouppath=/usr/spool/news/whatever \
X#        date="some-string" <input-from-sort-of-output-of-bwhf1.awk>
X#
X#example: awk -f bwhf2.awk newsgrouppath=/usr/spool/news/alt/sources \
X#	  date="`date`" temp2
X#
X# (the "\" means each of these is all supposed to be on one line)
X#
X# Start the header:
X#
XBEGIN	{ 
X	  printf("%55s\n", "BANDWIDTH WASTERS HALL OF FAME")
X          printf("%48s\n","for articles in")
X	}
X#
X# Finish the header:
X#
X# This has to be done at the first line, because until awk tries to
X# read the first line, it hasn't seen the command line settings for
X# newsgrouppath and date, so putting this in the BEGIN block failed.
X#
XNR == 1	{ pformat = "%" int(40 + ((length(newsgrouppath)  + 1) / 2) ) "s\n"
X	  printf(pformat,newsgrouppath)
X	  pformat = "%" int(40 + ((length(date)  + 1) / 2) ) "s\n"
X	  printf(pformat,date)
X	  print ""
X	  print "   Bytes  Volume  Offending"
X	  print "  Wasted   Share  Articles     Guilty Party"
X	  print ""
X	}
X#
X# Accumulate the total bytes and total articles, and print each
X# wastrel's contribution line:
X# 
X# I was faking the share total to 100 percent, but then I thought a
X# bit more.  Now it is calculated, giving BWHF posters the chance to
X# edit the sort output down to just the worst ten or so offenders, and
X# pass just those records through this second awk script.  My own
X# expeience is that people just hate being omitted from the list, but
X# your mileage may vary, so I changed the code to accomodate.
X# 
X		{ bytestotal = bytestotal + $1
X#
X# We have to strip off the trailing "%" from $2 to make a number:
X#
X		  share = substr($2,1,length($2)-1)
X                  sharetotal = sharetotal + share
X		  articlestotal = articlestotal + $3
X		  print
X		}
X#
X# Print the footer, consisting of a Totals line, an apology that this
X# awk script doesn't do AI name matches for posters who use multiple
X# sites, and a none too subtle piece of author puffery and general
X# purpose mischief making.
X#
XEND	{ 
X	  print "-------- ------- ----"
X	  printf("%8s %6.2f%% %4s  Totals for %d authors\n", \
X	         bytestotal,sharetotal,articlestotal,NR)
X	  print ""
X	  print "(Roundoff fuzz may make total share not equal 100.00%)"
X	  print ""
X	  print "(Sorry, if you posted from more than one site, you got more"
X	  print "than one entry.  It's unavoidable; think about it!  But even"
X	  print "though your subtotals look smaller, we know who you are!)"
X	  print ""
X	  print "[A shar file of the scripts used to create this article was"
X	  print "posted to alt.sources by the author, Kent Paul Dolan.]"
X
X	}

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf2.awk'\"
if test 3188 -ne `wc -c <'bwhf2.awk'`; then
    echo shar: \"'bwhf2.awk'\" unpacked with wrong size!
fi
# end of 'bwhf2.awk'
fi
if test -f 'bwhf.example_output' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'bwhf.example_output'\"
else
echo shar: Extracting \"'bwhf.example_output'\" \(1159 characters\)
sed "s/^X//" >'bwhf.example_output' <<'END_OF_FILE'
X                         BANDWIDTH WASTERS HALL OF FAME
X                                 for articles in
X                           /usr/spool/news/alt/sources
X                          Thu Sep 21 19:54:16 PDT 1989
X
X   Bytes  Volume  Offending
X  Wasted   Share  Articles     Guilty Party
X
X  730081  44.23%   18  pokey at well.UUCP (Jef Poskanzer)
X  294802  17.86%    6  mark at unix386.Convergent.COM (Mark Nudelman)
X  195626  11.85%    5  lwall at jato.Jpl.Nasa.Gov (Larry Wall)
X  149560   9.06%    1  raivio at procyon.hut.FI (Perttu Raivio)
X
X[30 lines of example output omitted to save bandwidth!]
X
X     496   0.03%    1  larrym at rigel.uucp (24121-E R Inghrim(3786)556)
X     477   0.03%    1  garyc at quasi.tek.com (Gary Combs;685-2072;60-720;;tekecs)
X-------- ------- ----
X 1650582  99.98%   67  Totals for 36 authors
X
X(Roundoff fuzz may make total share not equal 100.00%)
X
X(Sorry, if you posted from more than one site, you got more
Xthan one entry.  It's unavoidable; think about it!  But even
Xthough your subtotals look smaller, we know who you are!)
X
X[A shar file of the scripts used to create this article was
Xposted to alt.sources by the author, Kent Paul Dolan.]

END_OF_FILE
echo shar: NEWLINE appended to \"'bwhf.example_output'\"
if test 1160 -ne `wc -c <'bwhf.example_output'`; then
    echo shar: \"'bwhf.example_output'\" unpacked with wrong size!
fi
# end of 'bwhf.example_output'
fi
echo shar: End of shell archive.
exit 0