Sort(1) on E-format numerics

Greg "Bucket" Woods woods at hao.UUCP
Sat Sep 29 08:27:11 AEST 1984


> 
>   We have a need to numerically sort files which contain columns of numbers in
> E-format, i.e. something of the form [+-]#.####e[+-]##, where "#" means
> a digit and [+-] means an optional sign. Unfortunately, the -n option to
> sort(1) does not recognize exponents and stops numerical conversion of the
> sort field when it sees the "e". This results in incorrect sorting in some
> cases, like it will put 1.0e-07 before 2.0e-09. 

   In reply to my own question, after a bit of trial and error I discovered
a method that seems to work. It does depend on the fact that every line
is identical in format, which is true in all cases we have. Here is an example:

 1.27000E-07 8.91000E+04 6.00495E+09 9.82000E+05 1.66451E+05 4.99966E+09 
 1.43000E-07 5.00000E+04 1.04275E+10 9.76000E+05 2.38238E+06 8.68145E+09 
 8.09000E-07 2.30000E+04 2.35302E+10 8.87000E+05 4.11476E+08 2.02331E+10 
 1.67000E-07 3.20000E+04 1.57815E+08 9.71000E+05 3.63586E+07 1.31336E+10 
 1.97000E-07 2.55000E+04 1.93346E+10 9.68000E+05 1.92010E+08 1.61099E+10 
 2.30000E-07 2.45000E+04 2.00822E+10 9.64000E+05 2.55430E+08 1.68091E+10 
 1.81000E-07 2.80000E+04 1.78057E+10 9.70000E+05 9.50806E+07 1.48126E+10 
 1.58000E-07 3.70000E+04 1.38137E+10 9.73000E+05 1.38215E+07 1.14989E+10 
 4.70000E-07 2.40000E+04 2.14417E+10 9.33000E+05 2.84392E+08 1.80507E+10 
 6.56000E-07 2.35000E+04 2.25669E+10 9.08000E+05 3.37865E+08 1.91669E+10 
 3.37000E-07 2.42000E+04 2.07391E+10 9.49000E+05 2.70261E+08 1.74114E+10 

   We want to sort on the third column. The command "sort +2.9 -n +2"
run on this file, which says "sort on third field and skip 9 characters, sort
this numerically, then subsort on the third field" does what we want.
It took a lot of trial and error to figure this one out! The only problem with
it is that it won't work if some of the exponents are negative (in all of our
cases, the exponents are all the same sign). I tried using "sort +2.8" instead, 
but apparently the stupid numeric sort algorithm knows about minus signs but 
not plus signs (AAARGH!) and so sort +2.8 failed totally. I'm going to see 
about fixing that so a plus sign as a leading character in a numeric field 
will be ignored instead of aborting the field.
  Thanks to all those who responded. Some people gave me kludges using "sed"
and/or "awk". I didn't actually try any of these, but from the looks of it, 
"awk" is aptly named! :-)
   One person even sent me mods to sort.c to make numeric sorts work on 
E-format.
   If anyone is interested in any of those, drop me a line and I'll be glad to
mail you everything I got.

--Greg
-- 
{ucbvax!hplabs | allegra!nbires | decvax!stcvax | harpo!seismo | ihnp4!stcvax}
       		        !hao!woods
   
     "Every silver lining has a touch of grey..."



More information about the Comp.unix mailing list