strstr sources: a summary of responses
John Baldwin
johnb at srchtec.UUCP
Thu Aug 30 02:18:08 AEST 1990
In article <11246 at alice.UUCP> andrew at alice.UUCP (Andrew Hume) writes:
>
>people interested in using REAL algorithms for strstr should check
>out the fast substring algorithm in the august CACM
Great article. But I digress before I even start. :-)
As a result of the comments being tossed around, I went home last night and
checked the source code for strstr() in the Microsoft C ver 5.10 standard
library. Not that I'm claiming that "the Microsoft way is the Right Way"
(in fact, I think I'd have to argue *against* it! :) :) :)), but I figured
its production code in a heavily-used production compiler, and they would
want their library to perform well in benchmarks.
They used the brute-force algorithm.
Just to double-check, I looked at several other library functions at random.
All seemed to be very good choices as far as algorithms go, so it seems like
they *didn't* just use whatever was familiar, but instead put a good bit
of thought into what they were doing. To be sure, strstr() was written in
assembler, and they used a machine instruction to scan the "search-in" string
for the next character of the "search-for" string.
Probably, for the most-general case, letting the hardware do the looking,
and avoiding the setup costs of the arrays necessary for Boyer-Moore or
Knuth-et-al, ends up being faster.
In fact, I can think of another good illustration of this effect: a colleague
was trying to figure out what sorting and storage methods to use for certain
information in his part of a very large program. We talked about whether
the information should be maintained in sorted order, or sorted just before
use, sorted-on-entry, et cetera ad infinitum. Then it occurred to me to ask
about the scale of the problem. No more than a dozen items would ever be
"active" at any one time. The fastest mechanism turned out to be the most
simple: store things FIFO (unordered) and do a linear search!
--
John T. Baldwin | johnb%srchtec.uucp at mathcs.emory.edu
Search Technology, Inc. |
| "... I had an infinite loop,
My opinions; not my employers'. | but it was only for a little while..."
More information about the Comp.lang.c
mailing list