Computational complexity of rm & ls

Arnold D. Robbins {EUCC} arnold at mathcs.emory.edu
Sat Mar 11 04:17:10 AEST 1989


In article <9000012 at m.cs.uiuc.edu> wsmith at m.cs.uiuc.edu writes:
>The 4.3 Unix 
>appeared to be quadratic in system time for this operation, while the 4.2 Unix
>was cubic or worse in the amount of system time required!  
>
>Surely rm can do better than cubic, can't it?   "rm *" seems to be a common
>enough idiom that rm could be optimized to handle that case better.   I would
>like rm to be linear in system time.  Is that possible?

This is not really an 'rm' issue, it's a file system issue. The shell (pick
a shell, any shell, sh, csh, ksh) sorts the expansion of *; the order the
filenames are in the directory is irrelevant to rm. Rm then proceeds to
unlink each file in turn, probably randomly deleting entries from the
middle of the directory.  4.3 made a number of changes to directory
handling over 4.2, one of which was to perform compaction of the blocks
in a directory when possible, as entries are deleted.

I'll bet that if you were to 

	rm `ls -f`		and
	rm `ls -f | tac`	

you'd see very interesting behavior (i.e. remove files starting with
the first and last files in the directory, respectively), since each
unlink has to search the directory from scratch to find the entry.
(Actually, on 4.3 the last one would be pathologically bad, while the
first would be really quick, due to the directory caching in namei.)

I guess the point here is to know where to point the finger; since it
was all system time, it's not something that can really be fixed in rm.
Your 4.2 vendor should upgrade their filesystem code to 4.3.
-- 
Unix is a Registered   | Arnold Robbins -- Emory University Computing Center
Bell of AT&T Trademark | DOMAIN: arnold at unix.cc.emory.edu		
Laboratories.          | UUCP: gatech!emory!arnold	PHONE:	+1 404 727-7636
        -- Donn Seeley | BITNET: arnold at emoryu1		FAX:	+1 404 727-2599



More information about the Comp.unix.questions mailing list