Converting DOS text files

John G. DeArmond jgd at rsiatl.UUCP
Wed Oct 17 01:56:10 AEST 1990


erc at pai.UUCP (Eric Johnson) writes:

>This is for those of you who have SCO's OpenDesktop with a DOS
>under UNIX, or any other DOS under UNIX that has this problem.

>The problem is this: when you use a DOS-based copy command to copy a text
>file onto your system (from a PC floppy, say), that DOS text file
>is full of CR/LFs (instead of the UNIX line feed) and has a trailing
>Ctrl-Z. On SCO, there is a program to take care of this, called
>dtox. Unfortunately, dtox is a filter. That is, you call it
>with something like:

[program with BIG copyright deleted.]

Please don't take this wrong but your approach, while probably necessary
in a DOS tool-less environment, is terrible for Unix.  Here's how you do
it without any programming.  Get to know Mr. Shell.  He is your friend. 
Here's how:  

for i in `ls *.txt`
do
	# takes care of read-only temp file name collisions
	rm -f /tmp/$i	>/dev/null 2>&1
	tr -d '\032''\015' <$i >/tmp/$i
	if [ -z $? -a -f /tmp/$i]
	then
		mv -f /tmp/$i $i
	else
		rm -f /tmp/$i >/dev/null 2>&1	# just in case
		echo "tr returned an error on file $i"
		exit
	fi
done

If you want to put this in a shell script, simply substitute this for the 
first line:

for i in `ls $*`

What this script does is first execute the command in back-ticks ("ls *.txt")
and then steps through the list of files via the shell variable "i".
Each file is run through tr (translate) invoked in its "dump" mode (-d).
Tr is told to dump ^M (octal 015) and ^Z (octal 032).  The return code
from tr is stored in the shell intrinsic "$?".  If tr is successful,
this value will be 0.  The "if" statement checks to see if tr ran ok AND
if the temporary file was created ok and if so moves the temporary file
back on top of the original.  There are even simpler ways to do this,
but this is what popped out of my head when reading your post.  There are
several unaddressed error conditions in this script, such as when a temp
file name collision occurs and the temp file is not owned by you, but 
these problems are left as an exercise to the reader :-)

You could, of course, use dtox in place of tr but this solution is unix 
vendor-independent.  You could also use sed, awk, Perl (if installed) and
who knows what else.  In other words, get with the Unix tools show, man :-)

Minor programming note.  I don't usually critique coding practices on the
net but in this case I gotta.  Your approach is terribly inefficient,
requiring twice as much system resource as necessary.  Namely, you first
process the input file a character at a time (which is OK for a quick
hack) and then you copy the temp file back onto the input file a
character at a time (NO NO).  The easist way to move the temp file back
onto the original is to use a system() call with mv.  Example: 

sprintf(tmpstr,"mv %s %s"", tmpname, filename);
system(tmpstr);

For a bit of error checking, you could fork() and exec() mv and look at the
return code from wait().  Or, assuming the files are both on the same file
system, you could simply rm() the old file, link() the old name to the 
temp file and rm() the temp file.  That is the most efficient way of doing it.  

While one could (successfully) argue that a system() or fork() system
call would be more expensive than processing small files a byte at a time,
for typical files, this would not be the case.  And for machines that
process I/O system calls slowly (NCR towers come to mind), even small
files would seriously degrade performance, especially if you are doing
a lot of them.

John

John De Armond, WD4OQC  | "The truly ignorant in our society are those people 
Radiation Systems, Inc. | who would throw away the parts of the Constitution 
Atlanta, Ga             | they find inconvienent."  -me   Defend the 2nd
{emory,uunet}!rsiatl!jgd| with the same fervor as you do the 1st.



More information about the Comp.unix.sysv386 mailing list