Computational complexity of rm & ls

Sun Mar 12 03:25:00 AEST 1989

>To clean up my mess, I removed the files in two steps.  First deleting
>half with `rm *a` rmand then deleting the rest with `rm *` The 4.3 Unix 
>appeared to be quadratic in system time for this operation, while the 4.2 Unix
>was cubic or worse in the amount of system time required!  
>(I'm only measuring the curve at 2 points, so it is hard to get a good 
>estimate of the complexity.)
>
>Surely rm can do better than cubic, can't it?   "rm *" seems to be a common
>enough idiom that rm could be optimized to handle that case better.   I would
>like rm to be linear in system time.  Is that possible?

Please ignore these two paragraphs.   My method of testing rm was invalid.

The reason is that on some systems, the way directories work causes
the amount of time to remove a several files to depend on the order that
they were created in the directory.   It is more work to delete the
most recently created file than it is to delete the first file that was
created.   (There is probably some imprecision in the previous
sentence because of the possibility of alternating the creation and deletion
of files.  I meant only to cover the case of a single file creation phase
followed by a single file deletion phase.)

(It has also been pointed out that I am not really measuring the complexity 
of rm or ls at all, but am instead measuring the kernel name-to-inode 
translation behavior.)  

What brought this up in the first place was an attempt to decide whether 
it was better to have 10000 or more files in one directory, to have a 
fraction of the files in several directories, or at the extreme case, 
to have 100 of the files in each of 100 directories if I wanted to be 
able open() individual files quickly.

Bill Smith
wsmith at cs.uiuc.edu
uiucdcs!wsmith