Case sensitive file names

Moderator, John Quarterman std-unix at ut-sally.UUCP
Sun Oct 26 12:51:48 AEST 1986


From: guy at sun.com (Guy Harris)
Date: Mon, 20 Oct 86 10:49:33 PDT

Responses to a couple of messages:

>From Mark Horton:

> Any solution to this problem must be in the kernel, or possibly
> in libc underneath such subroutines as open, unlink, and chmod, (if you
> have shared libraries or full source to recompile) or it won't work all
> the time.

Any solution to this problem must be applied to operating systems other than
UNIX.  As John Bruner pointed out, mandating case-insensitivity will only
have the effect of removing UNIX from the list of standard-conforming
systems.  Changing the semantics of file names at this late date is unlikely
to meet with approval from many UNIX vendors and users.  For one thing, what
are you going to do about directories that contain files named, say,
"makefile" and "Makefile" (yes, they exist)?  You may feel that having
directories like this is a mistake, but declaring them to be a mistake isn't
going to make them go away.

There seem to be two issues here:

1) Should POSIX mandate case-sensitivity?

2) Should UNIX be changed to be case-insensitive if POSIX doesn't mandate
case-sensitivity?

These are rather separate issues.  A case can be made that POSIX should not
mandate case-sensitivity.  Applications must then not depend on
case-sensitivity.  This will affect programs that create files with names
other than those provided by the user.  It could also affect programs that
*read* directories, since they'd have to know that "foobar" and "FoOBaR"
refer to the same file.

I see great difficulty in changing UNIX to be case-insensitive, however.  It
certainly wouldn't pose any great *implementation* difficulties, but I would
not like to bet that no user or program would be greatly affected.

>From Mark R. Crispin:

>     It seems that the two sides in this issue boil down to this:
> . "gee, since we're defining a standard portable operating system
>   that isn't necessarily the present de facto Unix, let's fix
>   this case sensitivity cretinism"
> . "case sensitivity is what makes Unix better than any other
>   operating system, and only a cretin can't understand why this
>   is wonderful"

Not really.  A POSIX standard that does not *mandate* case-sensitivity need
not *forbid* it.  And I have seen *no* arguments that "case sensitivity is
what makes UNIX better than any other operating system."

>      Let's start by discarding the arguments which are bogus.
> The most glaring of these has got to be the international
> compatibility argument.  The only advocates of this argument seem
> to be pro case sensitivity Americans who have seized upon this as
> an argument to shore up their position without really thinking
> over the issue carefully.

Well, it may seem that way, but it isn't.  I admit to being a United States
citizen, but I am not unreservedly pro-case-sensitivity.  I see the merits
to both sides of the argument, but I see more problems with
case-insensitivity than with case-sensitivity.

>      Unix does not allow arbitrary strings in filenames.  Any
> number of "funny" characters must be within a quoted string.  I
> can't say
> 	rm foo.bar;1
> I have to say
> 	rm "foo.bar;1"
> Guess what.  A number of foreign keyboards use those "funny"
> characters to be non-English glyphs.

As the moderator pointed out, the shell, not the operating system,
interprets these funny characters.  Applications need not get file names
passed as arguments from the shell.  The office automation system we
developed at CCI had its own shell, which did no parsing of path names
whatsoever; the only characters it forbade were the slash and the null
character (because they are not allowed in UNIX filenames) and those
characters its forms package didn't allow you to type in (because we never
got around to changing it to do so).  I frequently used file names
containing blanks within this application, even though it made it
inconvenient to manipulate those files using commands typed at the UNIX
shell.

>      I have yet to hear of any organization in Japan using kanzi
> or hirogana or katakana in filenames.

I have a document in front of me from ASCII Corporation in Japan, describing
changes made to 4.2BSD to support Kanji and Kana.  It says:

	It is possible to create a file whose name contains Kana and/or
	Kanji characterss, since the file system and Kanji version of
	the shell support it.  However, we don't recommend such filenames,
	becasue it is impossible to handle such files from ASCII terminals.

The argument used against it would not apply if, for example, no terminals
attached to the machine were ASCII terminals and the site didn't expect to
export these files to machines with only ASCII terminals attached.  The
developers of it may be coming from a more "traditional" UNIX environment,
where you have many ASCII terminals attached to the machine and where you
frequently exchange files with other sites not running the same hardware and
software that you are running.  In an office environment, it may be possible
to provide everyone with a Kanji/Kana terminal, and it may not be as
important to worry about exchanging file with some random development
machine in the United States.

>   There are good reasons for
> this!  One is that there isn't a single way of representing
> written Japanese.  In older terminals, the high order bit when
> set indicated katakana (much as DEC VT220's use the high order
> bit for their "international characters").  Modern Japanese
> terminals use the JIS (Japanese Industrial Standard) system of
> ESCAPE followed by two bytes to define a 14 bit character.

The system they describe uses "Shift JIS" code for Kanji, and supports both
terminals that use this code and the regular JIS code for Kanji; it does
code conversion between the codes for JIS-Kanji terminals.

>      Some German keyboards use various 7-bit glyphs (I believe
> "@" is umlaut-a) for their umlauts and ess-tset.  Or, there's the
> VT220 system.  I just tried creating a file called Goethestrasse
> (using umlaut-o for "oe" and ess-tset for "ss") on my local Unix
> system using my VT220 clone.  It made "GVthestra_e", the 7-bit
> form.

The latter sounds like ISO Latin Alphabet No. 1; "umlaut-O" has the hex code
D6 and capital V has the code 56; 56 hex + 80 hex is D6 hex.  (I believe DEC
recommended the VT220 code set to ISO for standardization.)

>   Dare I mention that in German, only nouns (and the first
> word in a sentence) are capitalized?

The same is true of English; so what?

>      The point is that Unix does *not* support international
> character sets in filenames.  It supports 7-bit USASCII.  So
> let's leave that issue to rest.

As the moderator pointed out, this is not the case.  The kernel supports all
characters except slash and the null character, except for the 4.[23]BSD
kernel which (too helpfully) refuses to create files with characters in
their name that have the eighth bit set.  Certain UNIX utilities do not
handle 8-bit characters; this is not, however, an intrinsic characteristic
of the UNIX system.  I would ask European and Asian customers what they
wanted the UNIX system to do about character sets other than 7-bit USASCII
before I casually dismissed the possibility of supporting them.

>      I haven't yet heard of any serious use of full 8-bit bytes
> for filenames on any other operating system, which, if you are
> serious about supporting international character sets, you must
> do.  There's this small problem of getting 8-bit (as opposed to
> 7-bit) ASCII through various pieces of hardware and networks
> which think that the high order bit is parity...

Not all such pieces of hardware have this limitation.  The paper from ASCII
Corporation simply says "Kana and Kanji terminals must be set up to use 8
bit no parity mode."  If other terminals use a 7-bit encoding of an 8-bit
data stream, the terminal driver can do code translation transparently to
the rest of the system.

The fact that most OSes haven't solved these problems, and don't provide for
full 8-bit characters in file names, doesn't mean there is no demand for
full 8-bit characters in file names.  The users in non-English-speaking
countries may just have learned to get around this problem, and either use
English-language file names or approximate their native spelling in file
names.

Volume-Number: Volume 7, Number 76



More information about the Mod.std.unix mailing list