Directory compression

Chris Torek chris at mimsy.umd.edu
Sun Aug 19 18:18:07 AEST 1990


In article <7913 at lynx.UUCP> m5 at lynx.uucp (Mike McNally) writes:
>Does 4.3BSD (or anything else) dynamically compress directories when
>entries are removed?  If so, how does the system deal with
>consistency?

I cannot speak for `anything else', but the answer for 4BSD is `no'.
Recent versions of 4BSD will reduce the sizes of directories, but not
when files are removed: only when files are created.  (There is another
answer to the above question, which appears below.)

The code goes something like this (the order is a bit different):

	if (this operation is not CREATE or RENAME)
		we can use the name cache;
	else {
		we have to scan the entire directory anyway;
		we need a slot with enough space for this name;
	}
	if (we can use the cache && we should use the cache) {
		if (cache lookup yeilds valid result)
			return result;
		remove invalid entry, if any, from cache;
	}
	enduseful = 0;
	for (offset = 0; offset < dirsize; offset += cur_record_len) {
		get current record stuff;
		if (entry is valid) {	/* found a nonempty slot */
			if (this entry matches) {
				if (we can use the cache)
					add it to the cache;
				return it;
			}
			enduseful = offset + cur_record_len;
		} else {		/* found an empty slot */
			if (this slot is big enough && we need a slot)
				remember this slot;
		}
	}
	/* entry not found */
	if (we can use the cache) {
		/* 4.3BSD-Reno only: */
		add a `known not present' entry to the cache;
	}
	save slot information, if any;
	return (NOTFOUND);

Later, if we decide to write to the directory:

	if (enduseful < dirsize) {
		/* directory can be shrunk */
		itrunc(inode for directory, enduseful);
	}

The logic is this:  if we are creating a new entry (CREATE or RENAME
operations), a cache lookup will only tell us that we cannot proceed,
not that we can.  This is rarely helpful (the typical case is that we
can proceed, and looking in the cache will just mean spending some time
only to get a `no information' result).  Otherwise (for a LOOKUP or a
DELETE) we can use the cache; however, profiles gathered on ucbvax
showed that cacheing DELETE operations was not a win, so it is not
done.

In any case, if the cache lookup fails or cannot be done, the whole
directory must be scanned.  (This is actually done in two parts:
rather than from 0 to N, the scan goes from M to N, then 0 to M, where
M is the place the last search ended.  This means that, e.g., if you
read a directory, then stat each file in the directory in directory
order, the loop always hits the next entry `perfectly'.)  While we are
scanning it, however, we might as well find the place (if any) that a
new name could be stashed, and---since it only takes one more
variable---the last place there are any non-empty slots in the
directory.  If we later decide to scribble on the directory (which
means we will have to write back the inode), we may as well trim the
directory at the same time.  Since this was all `detritus', any user
program that is about to read this part of the directory is not losing
any data if it gets EOF instead.

This still leaves the other possible question:  `Since records within
4BSD-style directories are variable length, it would be possible to
take an existing directory and make it smaller by packing the names in
a better way.  That is, we have a bin-packing problem, and when the
contents of the bins change a different packing might be more optimal.
Does any 4BSD system do such a repacking?'  The answer to this question
is `sort of'.

In the pseudo-code above, there really should be three states for the
`found a big enough slot' variable: `no', `yes', and `yes, but only
after moving stuff around'.  (These are actually three enumerations,
NONE, FOUND, and COMPACT respectively.)  The `compaction' state,
however, applies only to a single directory block (512 bytes on the
VAX, 1024 bytes on the Tahoe).  Some records in one block might get
moved (uniformly towards the front, in this case) in order to combine
free space from two different records.  That is:

	entry for foo+free space|entry for bar+free space
	<----200----> <---30--->|<----200----> <---82---> (bytes)
	<-------record 0------->|<-------record 1------->
	    (total size 230)	      (total size 282)

(assuming VERY long file names :-) ).  If we wanted to add a 95-byte
entry, it would fit in this 512-byte block, but only if we moved record
1 somewhat leftwards.  In this case we might get a slot status of
`COMPACT'.

When this happens, the directory rewriting code moves entries leftwards
as necessary, then fills in the (newly expanded) hole.  This means that
a program that (stupidly) reads directories one byte at a time may get
an inconsistent picture of the directory block.  Any program that reads
a whole directory block at a time, however, will get a consistent
picture; and any program that uses getdirentries()---this is required
for NFS, among other things, and is used by the current readdir()---will
likewise get a consistent picture.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at cs.umd.edu	Path:	uunet!mimsy!chris
	(New campus phone system, active sometime soon: +1 301 405 2750)



More information about the Comp.unix.wizards mailing list