Paging-space problems

Thu Nov 8 18:59:39 AEST 1990

Maybe it's the pageable kernel, who knows, but for sure AIX 3 is rather
funny wrt programs that do dynamic allocation (don't they all?-).  We
have an interactive program (a solid modeler) which allocates memory
dynamically, depending on what the user is doing, with straight calls to
malloc().  On most Unix platforms (...all of them, except AIX 3...!),
if the program (=its user) is too ambitious for the amount of paging
space, malloc() will eventually return NULL - the program religiously
tests for this (to free all that's freeable, retry the malloc(), and
if it still fails gracefully inform the user).  On some machines the
failing malloc()'s give console messages suggesting expansion of paging
space, which is bothersome enough (we DO NOT want to expand paging space
to infinity, there's NO limit on the complexity of solid models that a
user will *attempt* to build, we JUST want to inform the user that a
given model takes more memory than he/she's got, is that such an unusual
approach on our part, grumble grumble!-).

On AIX 3, things are worse.  It appears that malloc() does succeed, BUT
then "system paging space" gets low, and funny things happen.  If our
application does not catch SIGDANGER, it gets killed; if it DOES catch
SIGDANGER, the *X Window System* (under which the app's running) gets
killed instead!  The app does not appear to be able to really "free"
memory to the system, i.e. normal free() probably does not sbrk()
(this happens on MANY platforms... the malloc()/free() pair seems to
attempt to minimize system-call overhead).  We could try funneling
our malloc()'s through a safemalloc() which will check psdanger() and
refuse to allocate if this would take paging-space too low, but this
does not appear to solve the problem: I believe Xlib, Xt, Ingres, and
whatever else we link with our app, use raw malloc()'s.  Ok, so I COULD
completely rewrite the malloc() package and fix things for OUR process,
but this STILL would not solve it - it's quite likely that the malloc()
from OUR process happens when everything's fine, but right after that
any process running some application which source we don't have might
well allocate more memory and cause the danger condition!

It upsets me that an application programmer is supposed to fix these
low-level, system-oriented things, and the only alternative appears to
forego dynamic memory allocation completely!  Just having a program
(or the X Window System indispensible for interfacing to it) die on
the user when he/she attempts construction of a complex solid model
does NOT appear to be a viable approach for a commercial application!!!

A system-level solution would be best, but I can't find a good one.
I think we now have ALL the documentation IBM supplies, but I don't
see a way there to reserve some amount of resource (paging space) for
the kernel, or for root-owned processes, or whatever.  The limits file
allows things like fixing maximum amount of data area PER PROCESS, but
what good will this do me???  I can't predict how many processes WILL
be running when a dangerous situation approaches!  Why oh why can't
brk()/sbrk() just REFUSE to expand space to a dangerous situation?

Suggestions will be appreciated, particularly on how to AVOID danger
situations, but also on how to GRACEFULLY HANDLE them.  Our plight
does NOT appear to me to be a very strange one, so I would really hope
somebody else's "been there before"!	Thanks in advance.

-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 45, Bologna, Italia
Email: (work:) staff at cadlab.sublink.org, (home:) alex at am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).