Case sensitive file names

Sat Oct 4 07:14:14 AEST 1986

>From sun!gorodish!guy at utastro.UUCP Fri Oct  3 15:34:59 1986
Date: Fri, 3 Oct 86 12:26:22 PDT
From: sun!gorodish!guy at utastro.UUCP (Guy Harris)
Message-Id: <8610031926.AA09026 at gorodish.sun.com>
To: ut-sally!std-unix at utastro.uucp
Subject: Re: Case sensitive file names

> From: mark at cbosgd.att.com (Mark Horton)
> Subject: Case sensitive file names

> I think this is a mistake.  UNIX is the only major operating system
> that treats things like file names, logins, host names, and commands
> as case sensitive.

It's been a while since I used Multics; I think it was case-sensitive.  Of
course, I don't know whether it counts as "major" here or not; I don't know
how many sites are around.  Are you sure there are no others?

> It's also reasonable to leave the case alone, but ignore case in
> comparisons.

This would probably be the best scheme (I think the Xerox Alto's operating
system did this).  Some people may want to use mixed case in file names for
aesthetic reasons, for example.

> There is also probably a good argument for keeping it case sensitive
> (after all, there are probably 5 or 6 people out there who really need
> both makefile and Makefile...

This means UNIX probably can't change, at least not without a fair bit of
pain.  I know of at least one directory on a UNIX system that has both
"makefile" and "Makefile" in it; this would cause some upset on a
case-mapping UNIX system.

However, there is another problem with case mapping.  It's dependent on the
language the text is in!  Doing case mapping is all very well and good for
English-speaking users; the algorithm for mapping characters between cases
in English is straightforward.  However, in German "ss" is a single special
character in lower-case but "SS" in upper case.  Even if you don't have
anomalies like this, the current schemes proposed by AT&T for "international
UNIX" use various ISO codes; this means that the character whose hex value
is E6 is the "ae" diaresis in the ISO Latin Alphabet #1, and thus matches
the character whose hex value is C6 (which is the "AE" diaresis); however,
in the JIS C6226 Kanji set, it is probably the first byte of a two-byte
sequence representing a Kanji sysmbol, and I don't think it gets case mapped
at all.

This means that the operating system would have to know what character set a
particular character was in, so that it could map its case correctly; this
would be best done with sequences embedded in the file name indicating
shifts in the character set to which bytes belong.  (These same sequences
should be used in text files, character strings in programs, etc..  Other
suggestions include a per-file character set designator, that would
presumably apply to any files containing character strings, including
directories; however, this means that *all* strings in that file must be in
the same character set, which is not always a reasonable restriction.)  It
would then have to know how to do case mapping for all character sets
supported by the system, and would have to be modified or have new
information supplied to it if a new character set was to be supported.

Volume-Number: Volume 7, Number 16