Case sensitive file names

std-unix at ut-sally.UUCP std-unix at ut-sally.UUCP
Sat Oct 18 02:35:48 AEST 1986


From: cbosgd!cbosgd.ATT.COM!mark at ucbvax.berkeley.edu (Mark Horton)
Date: Fri, 17 Oct 86 11:20:32 edt
Organization: AT&T Medical Information Systems, Columbus

Don Provan raises some interesting questions about foreign languages.
In general, I think we know how to do a case insensitive comparison
appropriately, by extending a function (I think it's called strcoll,
but I don't have my X3J11 draft handy) defined in ANSI C; the function
is like strcpy, but the destination buffer gets a translation of the
string that will collate properly when a lexicographic comparison like
strcmp is used.  If we extend this function to also translate to one
case (as appropriate) and allow each country to define its own function,
it's technically possible to ignore case.  Whether it's fast enough for
the UNIX filesystem is unclear, although this problem is not restricted
to UNIX.

I think it would be interesting to hear what other, case-insensitive
operating systems do about these issues.  What do MS DOS, or VM/CMS,
or VMS, or whatever, do with their case insensitive file names in
Europe, or Japan, or whereever?

If the answer is that file names are restricted to use the same character
set as in the USA, and that extra letters are disallowed, then we need to
know how well this is accepted by the users on other systems.  Maybe it's
good enough.  Do users in other countries often create files whose names
contain extra letters?  If they try, does the shell get in the way if their
letter happens to be "|", for example?

If the answer is that other operating systems have forced other countries
to put up with Americanisms, and that POSIX is an opportunity to break new
ground by handling other languages properly, then by all means let's do it
right.  This might require 8 bit characters in file names, for example.

Incidently, I've seen it claimed here that UNIX allows arbitrary byte
streams in file names.  Perhaps this is the intent, but in reality the
UNIX filesystem is far from a transparent path.  There are lots of
restrictions, some of which are:

	The slash character is special.
	The null character is special.
	Sequences of more than 14 chars not containing a slash are
		either illegal or only significant to 14 chars or
		significant to 256 chars, depending on the version of UNIX.
	Characters with the 8th bit turned on are not allowed.
	Since many commands take names beginning with "-" as flags,
		file names beginning with "-" don't always work.
	Since the shell treats many of the punctuation characters
		specially, file names containing space, #, $, &, *, (, ),
		[, ], ;, ', ", \, |, <, >. and ? do not always work
		properly.  Even if you quote them, the shell strips
		off the quotes, so that if multiple layers of shell
		are involved (for example, uux) it still fails.

Because some of these problems only affect certain uses of the filesystem
(whether or not you go through the shell, whether or not you're going
through a command that takes arguments) it's not unusual for casual users
to create a file and then have trouble using, renaming, or even removing it.
I recall that removing a file whose 8th bit has been set is a frequent topic
on net.unix.
	
If the filesystem were really transparent, the designers of /proc would
not have had to encode process ID's in ASCII digits, they could have
directly used the binary representation.

It's for these reasons that I feel that a conservative UNIX user should
restrict themselves to certain "reasonable" filename conventions; basically
using only lower case letters, digits, and a few save punctuation characters
such as . and - in their filenames.  Just because it's possible to put a
space in a file name doesn't make it a good idea.

	Mark

Volume-Number: Volume 7, Number 67



More information about the Mod.std.unix mailing list