Value of microeffiency

T. William Wells bill at proxftl.UUCP
Tue Jul 19 18:13:47 AEST 1988


In article <8238 at brl-smoke.ARPA>, gwyn at brl-smoke.ARPA (Doug Gwyn ) writes:
) In article <449 at proxftl.UUCP> bill at proxftl.UUCP (T. William Wells) writes:
) >     Once upon a time I got very tired of the slowness of the
) >     UNIX sort program. So I wrote a clone of it. I invented a
) >     whole new algorithm for it which you could call a
) >     linked-list merge sort.  It beat every other sort method
) >     that I tested hands-down.  Its only drawback is the
) >     memory for the linked list.  (Actually, I'd bet that
) >     someone invented this sort before me, but at the time,
) >     I'd never heard of anything like it.)
)
) You don't actually need linked lists for this method; arrays of
) pointers to the records will do fine.  That is known as "list merge"
) sorting (see Knuth vol. 3), and it is my favorite method.

Actually, my sort method did not use a linked list per-se;
instead, it used an array of pointers into that array, the index
into the array was the index into the items being sorted.

And, while this sort is very similar to the list-merge method,
there are some differences.  You'll see what I mean when I
publish the code.

) I, too, reimplemented the sort utility, although my motivation was
) to sort binary records instead of text lines.  As I iteratively
) improved it, it evolved into purely merge sorting, using external
) "run" files when necessary and sorting the internal buffer via
) list merge (originally this was done via heapsort).  Someday I
) hope to polish this up and contribute it to the sources newsgroup.

This does seem to be the canonical method of doing general
sorts.  Having heard of sorts which do not behave as if they were
this kind of sort, I do wonder whether any netters have any
comments on better ways of sorting large amounts of data?

) >Most people who defend the notion that there are good and bad
) >algorithms try to treat algorithms as if they exist in a vacuum.
) >This is Idealist hokum. Algorithms exist to serve purposes, the
) >quality of a given algorithm is simply how well it serves a
) >given purpose and thus can not be spoken of independently of
) >said purpose.
)
) Well said, but this should be tempered with the realization that
) code often survives its original intended purpose and gets pressed
) into service for unanticipated uses.  Therefore it often pays off
) in the long run to select an algorithm that works well enough to
) not cause problems when this happens.  Some sorting methods, e.g.
) bubble sort, are a real catastrophe when scaled up, so usually it
) is wise to pick a method that scales better.

In general, this is true.  However, it does not change the point
of what I said, it just points out that one had better consider
as part of the purpose while selecting an algorithm the future
uses of the algorithm.  Else, that algorithm will not be suited
to its future use.  As an aside, when I chose the bubble sort,
the items being sorted were entries in an input transaction;
since this was strictly limited by how much the user could type
there was no chance that the bubble sort would become
inappopriate.

) One of my pet peeves is code that assumes that everything will
) always fit into main memory, when it would have been only slightly
) more trouble to code a method that doesn't rely on that assumption.
) Of course, sometimes there is a huge performance difference, but
) not always -- especially when you consider that relying on virtual
) memory amounts to surrendering application control over external
) buffering to the operating system, which doesn't know what the
) best buffering strategy for the particular application is.

This is right in line with one of my pet peeves: the presumption
that, because the power of computers is growing by leaps and
bounds, one can be as wasteful as one cares, and that somehow,
the computers will get powerful enough to run your program in
spite of the waste.  Argh!!!!

By the way, this isn't really C; where ought this kind of
discussion go?  I'd say comp.misc, but I am already embroiled in
an idiot discussion there and do not want to move anything more
over there unless I must.



More information about the Comp.lang.c mailing list