experience with LWPs; bug fixes (l

Wed Jun 28 05:48:16 AEST 1989

I have used Sun's Lightweight Process (LWP) package for a couple of
different research prototypes.  I came across and solved some problems
with the package.  I am reporting the bugs to Sun, but since there won't
be fixes for them until SunOS 4.1 comes out, I thought I'd tell people
about them now.

Summary of problems:
        1.  Problem with the use of the non blocking I/O library
            (libnbio.a) that sometimes caused the shell on which the program
	    was running to exit.
        2.  Problem with the LWP library where after a while an internal
            thread called the stkreaper gets into a high prio infinite loop
            within a locked critical section, starving all the other
            threads, and wedging the CPU.  This was by far the most difficult
	    bug to track down, and also the one that caused the most trouble.
	3.  Problem with the nbio implementation of select, which causes
	    threads to awaken when there is no I/O available and before
	    a timeout has occurred.
        4.  Problem with the nbio implementation of connect, which returns
            with errno == EINVAL instead of ECONNREFUSED when the
            connection is refused.
	5.  Dependency bug in nbio library that causes it not to be linked in
	    sometimes, even though you specify -lnbio when linking.
	6.  Inability to get sequentially numbered thread IDs, for use in
	    indexing a global data structure.

Problem 1.

Since the LWP package is implemented as library code instead of in the
kernel, the standard UNIX I/O calls will cause the UNIX process to block
if any thread uses one.  The Non-Blocking I/O package gets around that by
supplying versions of most of the main I/O library routines that know
about LWPs and do the right thing (i.e., just block one thread).

The problem I had was that when an error caused the UNIX process to
crash, the shell on which the program was running would exit, as if it
had received an EOF.  It turned out that the problem was due to
something to which there was a passing reference in the BUGS section of
the manpage Intro.3l:
        "Killing a process that uses the non-blocking I/O library may leave
        objects (such as its standard input) in a non-blocking state.  This
        could cause confusion to the shell."
When the program died, the shell was left with non-blocking input, and
so it thought it was getting EOFs.  To deal with this problem, you can
catch signals and restore the file descriptors to blocking state, as
follows:

CleanupHandler(sig)
int sig;
{
	int fd;
	int FDState;

	sigsetmask(~0);
	printf("CleanupHandler received signal %d\n", sig);
	for (fd = 0; fd < NOFILE; fd++) {
		FDState = fcntl(fd, F_GETFL, 0);
		if (FDState != -1)
			fcntl(fd, F_SETFL, FDState & ~(FASYNC|FNDELAY));
	}
	pod_exit(0);
}

Even after I did this, the problem still happened for some reason when
the program wrote to stderr.  I don't know why.  stdout doesn't give me
the problem.

Problem 2.

When a user-level thread dies, an internal thread called the stkreaper
runs, and reclaims the stack space used by that thread.  The problem I
had was that the stkreaper did not properly clean up its data
structures after a thread dies, causing it to get into a high prio
infinite loop within a locked critical section.  When this happens,
none of the other threads make any more progress, and the CPU gets
wedged.  The problem will only happen in the case where some threads
finish executing before others are started (hence my comment above
about how this would only happen in a serious LWP application).

The fix is to use a modified version of one of the LWP source files
(stack.c).  I can't give out the fixed file (since it is copyrighted),
but if you have a source license I could give you a context diff of the
fixed file.  I could also give you .o files for Sun 3's and Sun 4's, so
you can link with them (if you trust me!).

Problem 3.

A common application level paradigm using the non-blocking I/O package
is as follows:
	send a message
	call select() to wait for the response (calling recv() at this
		      point wouldn't work -- it would fail with errno ==
		      EWOULDBLOCK)
	call recv() to get the response

When the I/O is complete, the UNIX process gets a SIGIO, and the LWP
package must decide which thread(s) to awaken.  But as it is currently
implemented, every thread that is waiting on SIGIO gets reawakened.
The result is that the semantics of select are not correctly
implemented -- a thread could return from select with no I/O available,
and before the timeout has expired.  This breaks existing applications
that you are trying to convert to use LWPs (as I was doing in one
case).

This should be fixed in the LWP package, but in the mean time, you can
get around the problem by changing the above code to something like
	send a message
	do {
		call select()
		call recv()
	} while the recv call failed with errno == EWOULDBLOCK

This doesn't quite do it, though, because each time you do the select
you will be using the full original timeout.  I ended up keeping track
of how much time had passed, using something like:

	send a message;
	SelectAgain:
	gettimeofday(&starttime, &timezone);
	select(..., &timeout);
	if (recv(s, answer, anslen, 0) <= 0) {
		if (errno == EWOULDBLOCK) {
			gettimeofday(&endtime, &timezone);
			timeout.tv_sec -= (endtime.tv_sec - starttime.tv_sec);
			timeout.tv_usec -= (endtime.tv_usec - starttime.tv_usec);
			if (timeout.tv_usec < 0) {
				timeout.tv_sec--;
				timeout.tv_usec += 1000000;
			}
			else if (timeout.tv_usec > 1000000) {
				timeout.tv_sec++;
				timeout.tv_usec -= 1000000;
			}
			goto SelectAgain;
		}
		/* Otherwise it's a recv error */
	}

This is definitely a suboptimal solution, because there is alot of
overhead in having every thread that's waiting for I/O (usually all of
them) wake up and do this check, including the system calls to get the
time and do a select again.  This should be fixed in the guts of the
LWP library.

Problem 4.

When a connection is refused (because no server is listening at a port
that a program tries to connect to), the connect call should return
with errno == ECONNREFUSED.  For some reason, it comes back with EINVAL
instead.  I don't know why this is.  For the time being I just hacked
the connect code to translate EINVAL into ECONNREFUSED, but that's
obviously only a stop-gap hack.

Problem 5.
To use the nbio library, you are supposed to be able to just do
	cc -o prog prog.c -lnbio -llwp
However, I found that this didn't always work -- sometimes the entire
UNIX process would block as soon as one of the threads did an I/O call.
This problem has to do with the dependency chain between the routines
in libnbio.a and liblwp.a.  You can get around the problem either by doing
	ar x /lib/libnbio.a nb.o
	cc -o prog prog.c nb.o -lnbio -llwp
	rm nb.o
or by doing
	cc -o prog prog.c -lnbio -llwp -lnbio -llwp

Problem 6.

Sometimes it is useful to build a global data structure shared by all
threads, indexed by thread ID.  The problem is that the thread IDs
returned in the lwp_create call (or the lwp_self call) are not
sequential -- thread and monitor ids come from a single number space,
and thus any monitors that are created "below the scenes" by the LWP
package cause the id numbers for the threads to skip values -- e.g., I
got back ids 2, 5, 7, 8, 10, when creating a set of 5 threads...  One
thing you could do to get around this is to allocate more space in the
global data structure taht you want to index by thread ID.  A more
space efficient way is to build a routine that keeps a mapping of Sun
thread IDs to sequential thread IDs, using an array, as follows:

static int ThreadIDs[5*MAX_LWPS];   /*
                                     * The numbering of unique ids generated by
                                     * Sun LWPs seems to skip no more than 2 or
                                     * 3 values between each LWP; use stride
                                     * size of 5 just to be safe.  This is a
                                     * hack, but it's safe (since even if more
                                     * than 5 values are used sometimes, on
                                     * average fewer than 5 are, so we wont
                                     * overflow this array) and efficient.
				     */
static int GlobalIDCount = 0;

int MyThreadID()
{
	thread_t tid;

	if (lwp_self(&tid) < 0) {
		sprintf(stderr, "MyThreadID: lwp_self failed\n");
		pod_exit(1);
	}
	if (ThreadIDs[tid.thread_key] == 0) { 	/*
						 * Not yet initialized --
						 * assumes globals are
						 * initialized to all 0's
						 */
		ThreadIDs[tid.thread_key] = GlobalIDCount;
	}
	return(ThreadIDs[tid.thread_key] - 1);	/*
						 * -1 so the ID's are numbered
						 * 0..whatever-1, rather than
						 * 1..whatever
						 */
}

Again, it would be better if Sun provided a way to get sequential IDs
directly, since this kind of expensive.

 - Mike Schwartz
   Dept. of Computer Science
   U. Colorado - Boulder