Value of microeffiency

Doug Gwyn gwyn at brl-smoke.ARPA
Mon Jul 11 23:45:37 AEST 1988


In article <449 at proxftl.UUCP> bill at proxftl.UUCP (T. William Wells) writes:
>	Once upon a time I got very tired of the slowness of the
>	UNIX sort program. So I wrote a clone of it. I invented a
>	whole new algorithm for it which you could call a
>	linked-list merge sort.  It beat every other sort method
>	that I tested hands-down.  Its only drawback is the
>	memory for the linked list.  (Actually, I'd bet that
>	someone invented this sort before me, but at the time,
>	I'd never heard of anything like it.)

You don't actually need linked lists for this method; arrays of
pointers to the records will do fine.  That is known as "list merge"
sorting (see Knuth vol. 3), and it is my favorite method.

I, too, reimplemented the sort utility, although my motivation was
to sort binary records instead of text lines.  As I iteratively
improved it, it evolved into purely merge sorting, using external
"run" files when necessary and sorting the internal buffer via
list merge (originally this was done via heapsort).  Someday I
hope to polish this up and contribute it to the sources newsgroup.

>Most people who defend the notion that there are good and bad
>algorithms try to treat algorithms as if they exist in a vacuum.
>This is Idealist hokum. Algorithms exist to serve purposes, the
>quality of a given algorithm is simply how well it serves a
>given purpose and thus can not be spoken of independently of
>said purpose.

Well said, but this should be tempered with the realization that
code often survives its original intended purpose and gets pressed
into service for unanticipated uses.  Therefore it often pays off
in the long run to select an algorithm that works well enough to
not cause problems when this happens.  Some sorting methods, e.g.
bubble sort, are a real catastrophe when scaled up, so usually it
is wise to pick a method that scales better.

One of my pet peeves is code that assumes that everything will
always fit into main memory, when it would have been only slightly
more trouble to code a method that doesn't rely on that assumption.
Of course, sometimes there is a huge performance difference, but
not always -- especially when you consider that relying on virtual
memory amounts to surrendering application control over external
buffering to the operating system, which doesn't know what the
best buffering strategy for the particular application is.



More information about the Comp.lang.c mailing list