strings

Henry Spencer henry at utzoo.uucp
Sun May 14 07:12:18 AEST 1989


In article <10245 at socslgw.csl.sony.JUNET> diamond at csl.sony.junet (Norman Diamond) writes:
>>Improvements to C library routines are quite possible.  Like all such,
>>cleverness is sometimes required.  One convention is not intrinsically
>>worse than the other.
>
>How do you improve a C library routine to look in a string descriptor
>to just grab the current length of the string? 

You can't, any more than you can improve the equivalent in some other
languages to get the length of a trailing substring without having to
go back to the beginning and then subtract.  The data structures do
constrain your ability to improve the functions.  That doesn't mean
you can't make improvements.

(If you're going to tell me that other languages can change the underlying
implementation, note that they *have* to use a length-count implementation
if the language semantics require that '\0' be a valid string character,
unless still worse convolutions are used.)

>... On the other hand,
>a correct strlen() function has to scan every byte of (for example)
>my 300K array...

Nonsense, it only has to scan the words in that array that comprise the
actual text of your string... which normally is measured in bytes, not
hundreds of Kbytes.  It doesn't have to do it a byte at a time, by the way,
even on machines with no special string-scan facilities -- you just have
to be clever.

>Or my 509-byte array, maybe 510-byte array, but several thousand times...

If you are applying strlen to the same string thousands of times, your
code is badly written, period.  I recommend re-reading that gem of a paper,
"News Need Not Be Slow", co-written by yours truly, in the Winter 87
Usenix proceedings, for sage words of advice on avoiding inefficiency. :-)
Nobody ever said that strlen was *always* the right way to get string
lengths.

> It seems intrinsically worse to me.

That depends on what you are doing.  In certain ways it is, given that
a length-count implementation has more information immediately available.
In other ways it isn't, because that semi-redundant information has to
be updated whenever the string is modified, and that has a non-zero cost.
-- 
Mars in 1980s:  USSR, 2 tries, |     Henry Spencer at U of Toronto Zoology
2 failures; USA, 0 tries.      | uunet!attcan!utzoo!henry henry at zoo.toronto.edu



More information about the Comp.lang.c mailing list