GNU Emacs, memory usage, releasing

Fri Dec 29 12:04:26 AEST 1989

There has been some discussion recently about the memory usage of GNU Emacs.
I had stated that, using my sbrk-lowering malloc package, I could get
micrognu etc... to release memory to the OS, but not GNU emacs. I speculated
that this was because when allocating a buffer something is allocated after
it that is not released when the buffer is deallocated. Well, I was wrong,
but there is also an interesting story.

I looked at the code in buffer.c and insdel.c, which are the low level
editing engine parts of GNU emacs.

What happens is that when a buffer is created, a buffer header is allocated,
and then memory for the buffer contents is reserved (just a little
initially, 20 bytes). When it is deleted, the contents and some other data
structures are released, but *not* the buffer header and some dependent data
structures. These are only released at the next garbage collection. Indeed,
if after killing a buffer I collect, the virtual address space size of my
GNU emacs process shrinks, as it should (using the proper malloc library.
Don't ask for it, by the way -- it has been posted to alt.sources, and I
will shortly contribute it to comp.sources, and/or to the FSF).

This is not the end of the story. It also appears that the buffer contents
are kept using the "gap" method, in which the entire file is represented as
a single huge string, and at the (insertion) point there is a "gap".  When
the gap is filled by too many insertions, the gap is recreated by shifting
the tail of the buffer upwards.

Unfortunately as it is implemented in GNU emacs the "gap" method is a
disgrace for performance because:

1) every time you move the insertion point by N characters, N characters are
*copied* from one extreme of the gap to the other to shift it.

2) thank goodness the gap is moved only when the insertion point changes,
not when the point changes as well. Too bad that it takes three seconds on
a SUN 3/50 to move the gap by 750K... something like 250KBytes per second
bandwidth, which is more like a poorly optimized disc or an Ethernet.

3) The copy is effected by the classic C loop, in segments of 32000 bytes:

      while (--i >= 0)
	*--to = *--from;

while instead of course if i is greater than a few dozen bytes memcpy(3) or
bcopy(3) should be used, hoping that these do cache line sized and aligned
transfers to avoid catastrophic performance on byte writes.

4) You may argue that since the gap is created at about 2000 bytes, and
usually many insertions are done at the same point, the cost of moving
the gap is spread across many insertions. Too bad that the cost is still
so terribly high.

5) the idea of using the gap method is not too convincing in itself, mostly
because of its poor virtual memory profile. Most of the going around the
huge line that is the buffer contents will wreack havock with most paging
algorithms, because it is strictly sequential, and this is a known bad
pattern of reference for most LRU based paging algorithms. When the gap is
moved hell ensues. Programs with sequential access characteristics to memory
have been reported to markedly improve their performance when it was
possible to inform the pager of their peculiarity

	By the way, this is why, all other advantages notwithstanding,
	memory mapped files often perform less well than traditional io; the
	latter usually is heavvily tuned to expect sequential access
	patterns, both as to read ahead, and to forget behind. Larger
	page sizes help a bit with the former, but with some loss elsewhere.

6) Having a linked list of lines is a fairly cheap technology, and inasmush
it can make the working set smaller (no need to move the gap) it will
often actually *save* memory, even if it has an overhead for the links (often
smaller however for small files than the cost of keeping a gap).

7) Most interestingly when the gap closes, because we have inserted that
much worth of data, the *entire* buffer contents is realloc()ated. If the
buffer is not the top one in virtual memory, or it is but you have a a
realloc() that will not attempt just extending a block, but simply allocates
a new block and copies the old into the new, you are in for extra overheads.
They are no longer proportional to the amount of work you do, but to the
total size of the file as well, which may be bad news.

8) most interestingly, realloc()ating the buffer will leave large holes in
your virtual address space, that will expand forever, taking up if anything
swap space. The holes are also likely to get fragmented as your session
progresses (remember, GNU emacs has such high overheads that you are
supposed to start it up once as a server and then use emacsclient as the
"real" editor), and therefore the virtual memory profile of your session
will worsen steadily.

9) GNU emacs also has some other interesting overheads. The screen display
procedure is no speed daemon either...

What are the solutions?

Well, essentially GNU Emacs is not built for paged virtual memory machines;
it it designed for core based segment swapping systems. It is not by chance
that the gap method was very popular on base+limit relocation machines, and
(I think) comes from the editor on the MIT CTSS.

There are palliatives of course. The default size of the gap could be
increased; this would make realloc()ations less frequent, and would not
greatly increase the working set size, as the pages making up the gap are
never referenced. A gap of say 4 pages per buffer may do, and actually
save more address space than more frequent realloc()ations. Use also a
nice realloc() that will try to avoid copying blocks to be extended as
much as possible. Improving the locality of the lisp area with a compacting
collector may help, as a faster, simpler redisplay. Profling GNU emacs
can be great fun I suppose.

The better solution, made relatively easy by the reasonably modular and
layered structure of GNU emacs, would be to accept The Emacs Cookbook
recommendation (adopted by Jove and the MicroEmacs/Gnu family of editors) and
implement a linked list system. I would suggest just reading (or map, on the
operating systems that allow it) the file to be edited as a large lump of
virtual address space, then building a linked list of pointers to lines
therein, and then to allocate modified lines from ageneral purpose arena.
Informing the OS of the peculiar access patterns would also help, if
possible.

To keep the line descriptor size small (and thus its overhead) it would be
perfectly feasible I think to have for example a byte sized line length
field, and then use hash linking for storing the lengths of the (expectedly
few) lines longer than 254 characters. Many other tricks could be used.

I am very busy currently doing other work for the greater glory of the FSF,
so I hope somebody else will enjoy the honour to do something about this.
If Toshiba, Matsushita, Samsung and the other DRAM suppliers don't get
him/her first :->.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk at nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg at cs.aber.ac.uk