strstr sources: a summary of responses

Manoj Srivastava srivasta at umvlsi.ecs.umass.edu
Sun Aug 26 06:37:13 AEST 1990


	I want to thank all the people who replied to my request for a
 srtstr function, I really appreciated the response I got.

	Most of the answers were  some  variation of the linear (brute
 force) sub-string match algorithm O(NM) in the  worst case [where N =
 strlen (text) and M = strlen (substring)]. Here is a gem:

--------cut here--------
#include <string.h>

char *strstr(register char const *s, register char const *t) {
    do {
	register char const *ss = s;
	register char const *tt = t;
	do {
	    if (*tt == '\0') return ((char *)s);
	} while (*ss++ == *tt++);
    } while (*s++ != '\0');
    return (NULL);
}
--------cut here--------

	There  were other examples,  but  since all  of  them  are   a
 variation  of  the same theme, I   shall  not waste net bandwidth  by
 posting them all.

	In the mean while I had hacked up  the  following based on the
 Knuth-Morris-Pratt method [I guess I should have made it clear that I
 was looking for "safeguards" which I (naively)  assume are built into
 library routines as  opposed  to  code by "mere" programmers ;-) From
 the lib excerpts which were posted I guess lib  programmers are human
 too.;-) ;-) ]. This is O(N+M) in the worst case.

--------cut here--------
#include <string.h>

char *
  strstr (char * cs, char * ct)
{
  register int i;
  register int j;
  int M = strlen (ct);
  int N = strlen (cs);
  int * next = (int *) malloc ((M + 1) * sizeof(int));

  /* initialize the next array */
  next[0] = -1;
  
  for (i = 0, j = -1; i < M; i++, j++, next[i] = (ct[i] == ct[j]) ?
       next[j] : j)
      while ((j >= 0) && (ct[i] != ct[j]))
        j = next[j];
  
  /*now for the pattern match  */
  for (i = 0, j = 0; j < M && i < N - M + j + 1; i++, j++)
    while ((j >= 0) && (cs[i] != ct[j]))
      j = next[j];

  free (next);
  
  if (j == M)
    return & cs [i - M];
  else 
    return NULL;
}
--------cut here--------


	And finally, an implementation of the Boyer-Moore algorithm. I
 am  not  sure offhand, but  while  the worst case complexity  remains
 O(N+M), the avg case behaviour is O(N/M) ???


------8-<---Cut here---8-<-----
/* strstr, Copyright (c) 1990 by John Lacey.  -*- C -*-

   This program is the ANSI C strstr() string routine.  It is
   basically a string search routine.

   This implementation uses the Boyer-Moore algorithm.  

   Some rough timings found this implementation about twice as fast as 
   the following straight string search.  

   char *
   strstr( const char *text, const char * pattern )
   {
   int i = 0, j = 0;
   
   while ( text[i] && pattern[j] )
     {
       i++;
       j = ( text[i] == pattern[j] ) ? j + 1 : 0;
     }
   
   return (char *) ( ( pattern[j] ) ? NULL : text + i - j + 1 );
   }

   So, while the algorithm itself is more complicated, that complexity
   is worth it. 
*/

#include <limits.h>
#include <stdio.h>
#include <string.h>

char *
strstr( const char *text, const char * pattern )
{
int N = strlen( text );
int M = strlen( pattern );
int i = M - 1;
int i0, j, k;
int ch, d[SCHAR_MAX + 1];

for ( ch = 0; ch <= SCHAR_MAX; ch++ )
  d[ch] = M;

for ( j = 0; j < M-1; j++ )
  d[pattern[j]] = M - j - 1;

do
    {
      j = M - 1;
      k = i;
      while ( j >= 0 && pattern[j] == text[k] )
	{
	  k--; j--;
	}
      i0 = i;
      i += d[text[i]];
    }
while ( j >= 0 && i < N );
return (char *)(( j < 0 ) ? text + i0 - M + 1 : NULL);
}

#ifdef TEST
main( int argc, char *argv[] )
{
char *status = NULL;

if ( argc != 3 )
  {
    puts( "Try again, with a text string and a pattern string." );
    exit( 1 );
  }
else
  {
    status = strstr( argv[1], argv[2] );
    if ( status != NULL )
      printf( "Found substring: %s\n", status );
    else
      printf( "Nothing doing, boss.  I can't see a thing.\n" );
  }
}
#endif /* TEST */
------8-<---Cut here---8-<-----

Man weeps to think that he will die so soon; woman, that she was born so long
 ago.  -- H. L. Mencken



More information about the Comp.lang.c mailing list