How can a user program hang the system?

Scott E. Townsend fsset at neptune.lerc.nasa.gov
Fri Jun 8 00:28:03 AEST 1990


I've got a question regarding intermittent system hangs.  A 'system hang'
means that all apparent activity on the Personal Iris monitor freezes,
including the mouse cursor.

Here's my setup:

	1. There are two ordinary user programs communicating with a
	   remote system via shared memory in the remote system's
	   VME rack.  One program loads & monitors the remote system's
	   execution, the other program provides a 'real-time' display
	   of remote data via a 3D surface plot.

	2. The Personal Iris's VME adaptor is connected to the remote
	   system's VME rack via a VME repeater, the model 2000 repeater
	   from HVE Engineering Inc.

	3. The remote system's shared memory is mapped via the mmap()
	   call:

	   cube_data = mmap(0, MEMORY_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
			    mem_fd, (VME_A24SBASE | 0XA0000000) + CUBE_ADDR);

	   where MEMORY_SIZE is 1 Meg and CUBE_ADDR is set at 4 Meg.

	   (This is for the surface plotter, a similar call is made for
	    the loader/monitor, mapping 4 Meg starting at 0)

Here's my symptoms:

	1. The system usually runs fine, but some days (phase of the moon?)
	   it will hang after running for 30 seconds or so.  (Enough time
	   to record 5-10 plots)  This is semi-repeatable.

	2. It seems to hang only if the surface plotter is running.  The
	   surface plotter is just a simple tmesh algorithm, nothing
	   fancy, with a mesh size of 34 x 34 typically.  Note that when
	   the system is using the surface plotter there is significantly
	   more communication across the VME repeater.  Also, I can't run
	   just the plotter, I need the loader/monitor to get things
	   to run.

	3. There is no console output if I have a console running on
	   the Iris display.

	4. When the system hangs, the remote system still runs until its
	   communications time-out waiting for the frozen Iris.

So now the questions:

	1. Has anyone had any experience with hanging repeaters on the
	   Iris's VME adaptor?

	2. Under what conditions could a normal user program freeze the
	   system?  I would think a programming error would simply cause
	   a segmentation fault or somthing similar.

	3  Any suggestions on how I could debug this thing?  I have access to
	   a VME bus monitor, but its triggering facilities are a bit
	   primitive.

Thanks for any and all help you can provide.

--
------------------------------------------------------------------------
Scott Townsend               |   Phone: 216-433-8101
NASA Lewis Research Center   |   Mail Stop: 5-11
Cleveland, Ohio  44135       |   Email: fsset at neptune.lerc.nasa.gov
------------------------------------------------------------------------



More information about the Comp.sys.sgi mailing list