malloc (was: making a request to IBM)

Fri Apr 19 05:55:49 AEST 1991

mbrown at testsys.austin.ibm.com (Mark Brown) writes:
  [lost the previous attribution for problem statement]
> | The problem:  as you all remember,  malloc()  returns  NULL  only
> | when the process exceeds its datasize limit.  If malloc returns a
> | non-null pointer, the memory  may  turn  out  to  be  exceedingly
> | virtual...
...
> | Personally, I think it's a bug.  If  there  is  no  memory  left,
> | malloc  should  return  a  NULL.  IBM says it's a feature,  catch
> | SIGDANGER if you don't like it.

The way I read this, the complaint is from the normal-programmer point of
view:  There's a defined way to indicate that there's no more memory
available--return NULL from malloc().  SIGDANGER is an IBM invention.

> Yeah, I've heard complaints (and roses) on this one.
> The Rationale: Rather than panic the machine, we'd like for it to keep
> running as long as possible. Hence, we try to keep running at all costs,
> including doing things like this. So, when we do get close to the limit,
> we send a warning, than as we go over we start killing the biggest memory
> users. (Warning - this processes involved have been overly simplified).

As various folks have pointed out, various UNIX systems have had more-or-
less graceless responses to running out of (memory+swap).  One might ask
therefore that a new behavior be better, instead of just different.

The "mistake" (if I may call it that) in what Mark is saying, is that the
overcommitment of memory/pagespace is a kernel problem.  The kernel
created the problem by overallocating, so the kernel (being that piece of
code responsible for allocating/managing the hardware!) should solve it
rather than handing it back to the applications.  Look at the problem from
the application point of view.

> The Idea was to make the machine 'more reliable'...

I'll object to the idea that killing some arbitrary process makes the
machine "more reliable".  If you want "more reliable", don't overcommit!

>...Our research led us
> to believe that many processes allocated more memory than actually used in
> page space (I think) and we used this knowledge...

There's something wrong with this.  What type of programs were studied in
this "research"?  I know that typical style in C is:
	p = (struct whatzit *)malloc(sizeof(struct whatzit));
	...
	p->thing1 = stuff1;
	p->thing2 = stuff2;
where "..." is rarely more than a check for NULL.

The trouble with SIGDANGER is that it occurs at a time which makes no sense
to the programmer.  Just because you happened to touch some particular
piece of memory (and it's unlikely you really know where your page
boundaries are) for the first time...or worse yet, some *other* process
touched memory for the first time!...you get SIGDANGERed up 'side the head?
What do you do?  How did you get there?  It's fiendishly difficult to tie
it back to a real event in terms of what the program knows.  Add to that
two other considerations:
	- SIGDANGER is not portable.  While IBM may not mind having people
	  write IBM-specific code, many programmers find that requirement
	  objectionable (especially since it's hard to use; it's an anti-
	  feature).
	- There's a defined way to report insufficient memory to a program
	  (NULL from malloc()), and it happens in a way/place a programmer
	  can use.
...and you can see why a programmer would get upset.

> So, do we go back to blowing up processes that allocate too much memory,
> even though that memory may actually be there by the time the process
> actually uses it?...

In the case of C programs and malloc(), yes.  If you can't allocate usable
memory (meaning "usable" at the point of return from malloc()), you should
return NULL.  That doesn't "blow up" the process; it gives it a fair chance
to decide what to do.
-- 
Dick Dunn     rcd at ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   ...While you were reading this, Motif grew by another kilobyte.