Need help on SysV semaphores, or vice versa (long)

Tue May 13 05:38:33 AEST 1986

	SUMMARY.  Although AT&T SysV semaphore _operations_ are atomic,
	semaphore creation, initialization, and deletion are done by
	separate system calls.  This complicates semaphore management
	when multiple copies of a program use a common resource, such as
	a database.  In particular, I believe having the program create
	a semaphore when it's first needed cannot be made to work, since
	initialization does not occur (System V Interface Definition,
	Issue 2, Volume 1, p. 283) until the next system call.  Atomic
	create-and-initialize-to-zero would help a lot.  Deletion offers
	a similar problem, but can be done with herculean and somewhat
	grotty use of extra semaphores, described below.

	Apologies, of course, if this issue has been dealt with recently
	-- and I'd appreciate being mailed a copy of how it turned out.

--------

	I have a couple of programs that display stuff for the user, and
	incidentally write to databases when something interesting happens.
	The particular job isn't important.  One program displays 'sysinfo'
	event frequencies, and logs new high water marks.  Another example
	-- not mine -- is `rogue`, which keeps a database of user scores.
	Seems like this ought to be a common situation.

	The problem is:  how does one mutually exclude multiple copies of
	a program from a single database, using AT&T System V semaphores?
	Shouldn't this be a solved problem?  I feel kind of silly asking
	the wizards, but here goes ...

	So 14 copies of `Foo` are about to run.  Since semaphores are a
	limited system resource, I assume that one should be created by the
	first copy of Foo that comes up, and deleted by the last copy of
	Foo that exits.  (This assumption turns out to make the problem
	intractable -- see below.  Comments?  Alternatives?)

	Creating a semaphore is handled by semget(2) -- semget(ke_os) in
	the System V Interface Definition, Issue 2, Volume I, Select Code
	320-011.  I see only the following three possibilities.

	1]	sem_id = semget(key, n, permissions | IPC_CREAT)

		Get id of the semaphore for "key".  Transparently create
		one if it didn't exist yet.  But the SVID says (I, p. 283):
		"The data structure associated with each semaphore in the
		set is not initialized.  The function 'semctl' with the
		command SETVAL or SETALL can be used to initialize each
		semaphore".  Which copy of Foo does the initialization?
		There would have to be an atomic test for "I'm about to be
		the first one to use this".  [Right, I just need another
		semaphore to manage initialization of this one ... :-)]

	2]	sem_id = semget(key, n, permissions | IPC_CREAT | IPC_EXCL)

		Get id of a new semaphore for "key".  Fail if one already
		exists.  The copy of Foo in which this call succeeds must
		initialize the semaphore set.  But suppose this copy of
		Foo runs out of time slice between creating the semaphore
		and initializing it.  Aha, wrong, the copy of Foo about to
		first access the semaphore set must initialize it.  Back
		to square 1.

	3]	sem_id = semget(key, n, permissions)

		Get id of an existing semaphore.  Fail if it doesn't exist.
		This is the same situation as 2), with the test reversed,
		and the test is irrelevant anyway.

	Have I missed something?  I know about open O_APPEND, for vanilla
	log files.  This is different:  I need to change information that's
	already in the file.  I don't know how (if) BSD handles semaphores.
	I haven't tried playing with SysV messages or shared memory yet.

	Are there other alternatives?  Create the semaphore during /etc/rc?
	From under my System Administrator's Hat I declare this a crock.
	Create and delete the semaphore before and after the point of use,
	instead of before and after executing?  Might as well wait if the
	semaphore exists, and proceed if it doesn't -- i.e., use existence
	of the semaphore instead of its value.  Actually, might as well
	use a link to something -- less chance of colliding key_t's -- and
	we're back to /etc/locks.

--------

	I think this could be made to work if creation and initialization,
	say to all zeros, were a single atomic operation.  ("Automatic"?
	See SVID Issue 2, Volume I, semop(ke_os).  Also, in that section,
	somebody changed "IPC_NOWAIT" (Issue 1) to "IPC_CREAT" (Issue 2).
	I can't believe they meant it.  Without further apology, I'll use
	IPC_NOWAIT here -- when I mean don't wait.)

	For those who don't have an SVID or SysV manual handy, here's a
	brief review.  Semop(ke_os) atomically does an array of individual
	semaphore operations.  Either the whole array succeeds, and all the
	semaphore values change as specified, or the whole array waits (or
	fails).  Each operation is specified by a semaphore number (zero
	based), a positive/negative/zero integer "operation", and a set of
	flags.  Positive means increment, negative means decrement (but
	wait or fail if result would be negative), and zero means wait or
	fail if nonzero.  The flag IPC_NOWAIT means fail instead of wait.
	The flag SEM_UNDO means back out of a change if the process exits
	-- e.g., even if it gets SIGKILL.

	The following code gets a semaphore id (creating the semaphore set
	if it doesn't exist yet), and initializes the semaphore set if it
	hasn't been initialized yet.  Warning:  this is speculative code,
	based on the false assumption that semaphore values begin as zero.


	{
		struct	sembuf	init[]
		= {
			{ 1,   0,   IPC_NOWAIT },	/* initialized?	*/
			{ 1,   1,   0 },		/* initialized!	*/
			{ 0,   1,   0 }			/* sem[0] = 1	*/
		  };


		/*------------------------------------------------------*/
		/* Create (or use existing) semaphore set, 2 members.	*/
		/* The 1st one is for general use.  The 2nd indicates	*/
		/* whether the set is initialized.  (Zero means "no".)	*/
		/* I assume that both members of the semaphore set are	*/
		/* initialized to zero when created, although that is	*/
		/* explicitly denied at this time by the SVID, Issue 2.	*/
		/*------------------------------------------------------*/

		sem_id = semget(key, 2, permissions | IPC_CREAT);
		if ( sem_id == -1 )
		{
			perror("create");
			exit(errno);
		}


		/*------------------------------------------------------*/
		/* Initialize.  Assuming no outright errors, either:	*/
		/*   1) atomically find sem[1]==0, set sem[1]=1,	*/
		/*      set sem[0]=1 (the goal), and return 0; or	*/
		/*   2) atomically find sem[1]!=0, set errno=EAGAIN	*/
		/*      (i.e., already initialized), and return -1.	*/
		/* The first to execute does 1); everyone else does 2).	*/
		/*------------------------------------------------------*/

		if (    (semop(sem_id, init, 3) == -1)
		     && (errno != EAGAIN)
		   )
		{
			perror("init");
			exit(errno);
		}
	}


	After that one could acquire the common reasource by decrementing
	semaphore 0, and release it by incrementing semaphore 0, as usual.

	Deleting the semaphore when all copies of Foo are done with it is
	another issue, and thornier.  My first try used a 3-semaphore set
	instead of the 2-semaphore set above, with the new one containing
	a reference count.  After creating the semaphore set Foo would
	atomically initialize-or-not-and-increment-the-reference-count; and
	before exiting Foo would decrement the reference count and delete
	the semaphore if the count had gone to 0.  Surprise, same problem.
	Foo decides to delete the semaphore, time passes, Foo finally does
	delete the semaphore.  Meanwhile, another copy of Foo has created
	and initialized it (both no-ops), and (gasp) acquired the resource.
	Then the semaphore disappears.  Disorder follows.

	There appears to be a solution, though.  (Thanks to James Sutton
	-- same address -- for helping me work it up.)  One more semaphore,
	meaning "I'm about to be deleted", does it.  That's a 4-semaphore
	set, 3 for overhead, 1 for doing the job.  Here's how it works.
	Foo takes a SysV semaphore set through the following life cycle:
	1) create, 2) accept, 3) initialize, 4) use, 5) cleanup, 6) delete.


	1)	"Create" is as above -- i.e., transparently create if it
		doesn't exist -- but get 4 elements in the semaphore set.

		sem[0]		0 -- db busy	1 -- db available

		sem[1]		0 -- new	1 -- initialized

		sem[2]		(# of processes using this semaphore set)

		sem[3]		0 -- safe	1 -- about to be deleted

		Again I assume that if the semaphore set was created, its
		elements were given initial value zero.  "Create" may be
		retried.  The next step does so until Foo gets a semaphore
		set that isn't about to be deleted.


	2)	"Accept" atomically tests the about-to-be-deleted bit and,
		if it was 0, increments the reference count.  ("Cleanup",
		later, does roughly the opposite:  decrement the reference
		count and, if it just went to zero, turn on the about-to-
		-be-deleted bit.  The idea is that a successful accept
		prevents the semaphore set from being deleted and, vice
		versa, the decision to delete one prevents it from being
		accepted.)  Here's the code, still speculative of course.


		struct	sembuf	accept[]
		= {
			{ 3,   0,   IPC_NOWAIT },	/* deleting it?	*/
			{ 2,   1,   SEM_UNDO }		/* refcount ++	*/
		  };


		/*------------------------------------------------------*/
		/* Accept.  Assuming no outright errors, either:	*/
		/*   1) atomically find sem[3]==0, increment sem[2],	*/
		/*      and return 0; or				*/
		/*   2) atomically find sem[3]!=0, set errno=EAGAIN	*/
		/*      (i.e., about to be deleted), and return -1.	*/
		/* Retry the "create" step until situation 1) happens.	*/
		/*------------------------------------------------------*/

		do
		{
			/*----------------------------------------------*/
			/* "Create" step here ...			*/
			/*----------------------------------------------*/
		}
		until (    ((accept_it = semop(sem_id, accept, 2)) != -1)
			|| (errno != EAGAIN)
		      );

		if ( (accept_it == -1) && (errno != EAGAIN) )
		{
			perror("accept");
			exit(errno);
		}


	3)	"Initialize" is exactly as before.  It doesn't do anything
		with semaphores 2 and 3.


	4)	"Use" is just acquiring and releasing the resource as Foo
		needs to, by (respectively) decrementing and incrementing
		semaphore 0.  This may happen many times.  I assume Foo
		releases the resource before proceeding to the "cleanup"
		step.  (SEM_UNDO should be used to release it in case of
		Foo's untimely death.  This is why SEM_UNDO was used for
		the reference count above, so a later copy of Foo would
		eventually delete the semaphore set.)


	5)	"Cleanup" is two steps, both atomic, for technical reasons.
		The first step is to decrement the reference count.  (Other
		copies of Foo may get at the semaphore set while it has a
		zero reference count, but only to "create" and "accept" --
		which increments the reference count again.)  The next step
		atomically tests the reference count for zero and, if so,
		turns on the about-to-be-deleted bit.  The technical reason
		for breaking this into two steps is that semop(ke_os) has
		no conditional operations.  It can only attempt something
		unconditional, and either succeed or fail; and if it fails
		the semaphore values are unchanged.  But the desired result
		of "cleanup" is:  if the reference count is greater than 1,
		decrement it; otherwise set the about-to-be-deleted bit --
		i.e., a change either way, hence two semop's.  Just lucky
		the state between fits in with everything else going on.
		

		struct	sembuf	clean1[]
		= {
			{ 2,  -1,   SEM_UNDO }		/* refcount --	*/
		  };

		struct	sembuf	clean2[]
		= {
			{ 2,   0,   IPC_NOWAIT },	/* refcount ? 0	*/
			{ 3,   1,   SEM_UNDO }		/* deleting it!	*/
		  };


		/*------------------------------------------------------*/
		/* Clean1.  Decrement the reference count.  Must work.	*/
		/*------------------------------------------------------*/

		if ( semop(sem_id, clean1, 1) == -1 )
		{
			perror("clean1");
			exit(errno);
		}


		/*------------------------------------------------------*/
		/* Clean2.  Assuming no outright errors, either:	*/
		/*   1) atomically find sem[2]!=0, set errno=EAGAIN,	*/
		/*      and return -1; or				*/
		/*   2) atomically find sem[2]==0, set sem[3]=1		*/
		/*      (i.e., about to be deleted), and return 0.	*/
		/* Delete the semaphore set if situation 2) happened.	*/
		/*------------------------------------------------------*/

		delete_it = semop(sem_id, clean2, 2);
		if ( (delete_it == -1) && (errno != EAGAIN) )
		{
			perror("clean2");
			exit(errno);
		}


	6)	"Delete" just deletes the semaphore set if the about-to-
		-be-deleted bit is set, and leaves it alone otherwise.


		if ( delete_it == 0 )
		{
			if ( semctl(sem_id, 0, IPC_RMID, 0) == -1 )
			{
				perror("delete");
				exit(errno);
			}
		}


	An atomic test-for-zero-and-maybe-delete semaphore operation would
	make the above a lot easier, but it's not clear to me where such an
	idea would fit in among the existing semaphore system calls.

	This looks like a pyrrhic victory:  3 semaphores for overhead, and
	an exotic 6-step ritual in place of create/use/delete.  Of course,
	it's not even that.  The SVID still says members of a semaphore set
	aren't initialized, and the problem described above remains subject
	to glitches of probability epsilon.  [Are you listening, Ma? :-)]

	Anyone who's read this far:  thanks for your attention!  I really
	would like to know how to do the job described.  Semaphores seemed
	like a good idea at the time.  Exclusive open (of an existing file)
	actually fits the situation better -- the common resource is a file,
	which could be off-site these days -- but that idea isn't even in
	the SVID "future directions".  Meanwhile, guess I'll create a file
	in /etc/locks.


-- Bill Meyers ...{decvax,akgua,ihnp4}!mcnc!rti-sel!scirtp!bill
   SCI Systems, Inc., Research Triangle Park, N.C. (919/549-8334)