Bus error DURING call to malloc()

Wed Jun 13 04:26:11 AEST 1990

In article <62083 at sgi.sgi.com> yohn at tumult.asd.sgi.com (Mike Thompson) writes:
>In article <14525 at thorin.cs.unc.edu>, taylorr at glycine.cs.unc.edu (Russell Taylor) writes:
>> 
>> 	We are running OS 3.2.2 on an IRIS 4D/240GTX.  I ran a program and
>> got the proverbial 'Bus error (core dumped)' message.  The catch is that
>> when I run dbx and look for the error, it tells me that the error occured
>> IN malloc():
>> ...
>> 	There are several calls to malloc() in the code.  There have been
>> successful calls before this call is made.  All calls are passed constant
>> references, and this code compiles and runs correctly on a variety of other
>> machines (VAX, sun 4, DecStation).
>> ...
>> 	Is there a known bug (and hopefully fix) for this?
>
>I cannot guarantee that there are no bugs in malloc (I assume you are
>getting malloc from libc), but I don't know of any (besides performance
>problems when allocating many memory areas).  But I have seen many,
>many user programs that bomb in malloc because the user code overran
>the memory allocated by a call to malloc.  malloc(strlen(s)) and
>copying s is a classic way to get into trouble (user forgets that
>strlen does not account for the trailing null character) -- there are
>many other possibilities.
>
>Since malloc(3X) -- the malloc in /usr/lib/libmalloc.a -- aligns
>requests to eight-byte boundaries and malloc(3C) aligns only to
>four-bytes, switching to libmalloc may help if only that it masks gives
>the caller a little more unrequested rounding space.
>

i've examined a number of malloc() problems throughout the last 7 years
or so, and have always traced the problem back to the application...

there are a couple of good reasons that malloc() usage problems are masked
on a machine and libmalloc basis.

first of all, i know that a number of VMS programs have malloc problems
once they are ported to unix.  the VMS malloc rounds the request up to
the nearest multiple of 512 (page size).  then it skips the next virtual
page.  this turns out to be a great debug tool since you get core dumps
when you hit the next page instead of quietly corrupting some other data
structure.  unfortunately, the granularity is only at the page level,
so small problems are masked and only surface in other environments.
VAX unix may act similar, but i don't know for sure.

the traditional libc malloc approach uses a linked list scheme where the
next pointers are embedded in the memory arena.  if you overwrite a chunk
of malloced memory, you corrupt the linked list and the next call to
malloc() will traverse into the boonies.  the libmalloc approach keeps
the pointers into the memory arena in a separate area and therefore, if
you overwrite a chunk of malloced memory, you may corrupt some other data
structure that doesn't really matter anyway... (at least not at the time).
since the next pointers are saved from corruption, malloc() won't dump
core.  but you still have a problem lurking out there somewhere.

i think i'd stick to the old malloc() and narrow the problem down more.
if you mask this symptom, you will make it even more difficult to isolate
a problem further down the road.

-- mds	[aka Mark D Stadler  mds at sgi.com  ...!uunet!sgi!mds  (415)335-1327]