compare strings, strcmp

Liam R. E. Quin lee at sq.sq.com
Tue Nov 21 12:51:41 AEST 1989


Some general thoughts on STREQ, sorting, arrays and cranberry sauce...

Doug Gwyn mentions
>>>    #define StrEq(a,b) (*(a) == *(b) && strcmp(a, b) == 0)	/*UNSAFE*/

and comments it as UNSAFE...
I first saw this as a Henry Spencer-ism, and hence generally write
#define STREQ(henry, utzoo) (*(henry) == *(utzoo) && !strcmp(henry, utzoo))

(this has the same side-effect problems, but is a little more fun).

For the original question, about sorting a large "in-core" array,
the 4.3 BSD qsort routine is publicly available by uucp and ftp, and was
on the Tahoe release.  I mention this because it has much better behaviour
in some cases, tending not to recurse as deeply as the older ones.

strcmp() is usually quite fast, but if you are calling it a lot of times
and this is a problem, at the loss of some generality and flexibility I
suggest you take the BSD qsort() (or write your own) and WIRE IN the
compare function to be the STREQ macro.
Otherwise, STREQ does not help at all, as you have to give qsort(3) a
function, not a macro.

You might also want to think about using a different sort algorithm
altogether.  One that sorts one part of the large array and then
goes on to look at another may take more swaps/compares, but will
exhibit greater locality of reference -- in other words, will cause
fewer page faults!  This might be *much* more significant!

Of course, if the strings themselves came from malloc individually,
you can't predict where they are (in general) and this won't work, but I
seem to remember seeing that you had an array
	char WhateverItWasCalled[1000][200];
implying a thousand strings each of exactly 200 characters.

If the strings are in practice generally shorter, you may get a big win
by calling malloc for each of them, possibly generating them in sorted
order (for example using a tree or linked list).
That way, you don't use up so much storage, and things will probably
work out faster in the end.  1,000 calls to malloc doesn't seem to take
very long -- especially if you have the newer System V malloc(3X).


Hope this helps.

Lee		(I lied about the cranberry sauce)
-- 
Liam R. Quin, Unixsys (UK) Ltd [note: not an employee of "sq" - a visitor!]
lee at sq.com (Whilst visiting Canada from England)
lee at anduk.co.uk (Upon my return to England at Christmas)



More information about the Comp.lang.c mailing list