a backup program, early alpha stage

Steve Nuchia steve at nuchat.UUCP
Sat Dec 9 13:52:12 AEST 1989


The following is an experimental set of backup software I've
been playing with for several months.  It was inspired by
the formidable problem of backing up a mid-sized unix system
on floppies, and goes far toward making that prospect tolerable.

It is running on two 386/ix machines, an A/UX box, and a 3b1.
There are differences between the versions, this is my 386
version.  It is not yet suitable for use by someone who won't
or can't read and understand the code -- my purpose in posting
it at this stage is to publish the basic algorithm and to
get some input on which way to go with it.  If anyone hacks
on it I'd like to get a copy of your results.

-- 
Steve Nuchia	      South Coast Computing Services      (713) 964-2462
"Man is still the best computer that we can put aboard a spacecraft --
 and the only one that can be mass produced with unskilled labor."
					- Wernher von Braun

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  README all_kill bkscan fileinfo.c fit.awk killcrunch.c
#   mcheck minit mrecycle script
# Wrapped by steve at nuchat on Fri Dec  8 20:50:34 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f README -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"README\"
else
echo shar: Extracting \"README\" \(4007 characters\)
sed "s/^X//" >README <<'END_OF_README'
XThis is a backup package based on an algorithm distinct from
Xthe usual full/incremental method.  It maintains a database
Xof initialized media and the files backed up on each volume.
XIt cycles through the set of volumes over time, filling each
Xvolume with the files "most in need" of backing up, with the
Xresult that the files backed up to the next volume to be written
Xare always saved somewhere else before the volume is scratched.
X
XIt uses afio as its copy-out program, and maintains a simple
Xmachine readable lable on each volume.  It needs some smarts
Xfor bulk restores, as it stands you read in all the volumes
X(without the -u cpio option) and try to decide what to delete.
X
XThe system in use on three machines I am responsible for, and
Xworks reasonably well.  There is some customization required,
Xnotably in minit, mcheck, and script.  Script is the main
Xperiodic backup entry point.
X
XThe kill files mechanism has worked but is currently in
Xdissaray.  The idea is to take .nobackup files scattered
Xaround the filesystem containing patterns specifying what
Xfiles from there on down don't need backing up.  The
X.nobackup files need to be observerd by a re-written
Xdisk scan process -- as it is the kill_all file, a sed
Xscript, is being maintained manually and should be checked.
X
XThe main things that need doing are to keep the databases
Xunder some kind of dbms or at least indexes file structure
Xand to make the disk scan a C (or perl?) program.  And
Xdevise an algorithm for restore scheduling.
X
X
X[the rest of this file is my design notes scratch pad.]
X
XGoals:
X
X	efficiency when filesystem is reorganized
X	efficiency in face of many links
X	work well with both small and large media available
X	positive id of volumes
X	support both backup and archiving
X	provide flexible user preference facility
X	media compatible with standard utility(s)
X	database human-readable and recoverable from media set
X	able to handle files larger than medium
X	use "cycle" algorithm
X
XEntities:
X
X	Files:  keyed by: mount-point, dev/inode, ctime, & mtime
X		contains: size, usr/grp, mode, volume assignment(s)
X
X	Directories:
X		keep track of pathnames and deletions
X
X	Volumes:
X		keep id, type, class, name + sequence
X			type = {tape,floppy...}
X			class = {offsite bkup, regular bkup, archive...}
X			time of creation
X
X	Hint files:
X		user can specify patterns, subdirs, etc
X		to not backup per directory.
X
X	Delta files: database update files.
X
X	volume header: copy of volume database record.
X
XCommands:
X
X	bkscan -- scan a directory recursively (normally /) and
X		write a delta file reflecting current reality.
X		takes .bkup-hint files into account. apply the
X		delta to the master by default if all goes well.
X
X	bkinit -- initialize or reinitialize a medium.  If reinitializing
X		it will delete references in the file table, use this
X		when a medium is lost or damaged.
X
X	bkup -- examines the files list and copies files not yet backed up
X		to media of specified class.  For media using the cycle
X		algorithm, consider the oldest N volumes lost when determining
X		what to back up.  Verify each medium as it is brought online
X		and perform all I/O for the underlying program (cpio).  Writes
X		a delta file and if all goes well applies it to the database.
X
X	bkupdate -- apply a delta to a master database file
X
X	bkreport -- format a report at various levels of detail from
X		a master or delta file.
X
X	bkgrok -- scan a bk backup medium and produce a delta for it.
X
X	bkselect -- interactive restore (or archive) file selector.
X		writes a delta file suitable for feeding to bkup
X		or bkrestore.  Runs in batch mode to drive bkup...
X
X	bkrestore -- from a delta file describing the files to be
X		restored and optional renaming parameters schedule
X		and drive restoration from specified media class.
X
XNotes: must update master before running bkup when scratching volumes.
X       what happens when the mount arrangement changes?
X
Xscan stage should be integrated with .nobackup facility
Xto save time and prevent timing windows unpleasantries.
END_OF_README
if test 4007 -ne `wc -c <README`; then
    echo shar: \"README\" unpacked with wrong size!
fi
chmod +x README
# end of overwriting check
fi
if test -f all_kill -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"all_kill\"
else
echo shar: Extracting \"all_kill\" \(199 characters\)
sed "s/^X//" >all_kill <<'END_OF_all_kill'
X/\/core$/d
X/^\/tmp\/.*$/d
X/^\/usr\/tmp\/.*$/d
X/^\/files\/news\/.*\//d
X/^\/files\/extra\/.*\//d
X/^\/usr\/backup\/files$/d
X/^\/usr\/backup\/saved$/d
X/^\/usr\/backup\/tlist$/d
X/^\/usr\/backup\/wlist$/d
END_OF_all_kill
if test 199 -ne `wc -c <all_kill`; then
    echo shar: \"all_kill\" unpacked with wrong size!
fi
chmod +x all_kill
# end of overwriting check
fi
if test -f bkscan -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"bkscan\"
else
echo shar: Extracting \"bkscan\" \(235 characters\)
sed "s/^X//" >bkscan <<'END_OF_bkscan'
XPATH=/u/steve/bkup:$PATH; export PATH
Xif [ $# = 0 ]
Xthen
X	set `pwd`
Xfi
Xfor x
Xdo
X	d=`cd $x; pwd`
X	find $d -name '.nobackup' -print | bk_buildkill > bk_sed$$
X	find $d -print | sed -f bk_sed$$
Xdone | bk_fileprops > bk_delta$$
Xrm bk_sed$$
END_OF_bkscan
if test 235 -ne `wc -c <bkscan`; then
    echo shar: \"bkscan\" unpacked with wrong size!
fi
chmod +x bkscan
# end of overwriting check
fi
if test -f fileinfo.c -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"fileinfo.c\"
else
echo shar: Extracting \"fileinfo.c\" \(883 characters\)
sed "s/^X//" >fileinfo.c <<'END_OF_fileinfo.c'
X/*
X *	fileinfo -- print simplified info record for each filename
X *		on stdin or argv
X */
X
X#include <sys/types.h>
X#include <sys/stat.h>
X#include <stdio.h>
X
Xchar	fnbuf[512];
X
Xmain ( argc, argv )
X	int	argc;
X	char	*argv[];
X{
X	int	i;
X
X    if ( argc > 1 ) for ( i = 1; i < argc; i++ )
X	fmtrec(argv[i]);
X    else while ( fgets ( fnbuf, sizeof(fnbuf), stdin ) )
X    {
X	fnbuf[strlen(fnbuf)-1] = 0;
X	fmtrec(fnbuf);
X    }
X}
X
X
Xstruct	stat	st;
X
Xfmtrec(f)
X	char	*f;
X{
X	long	age;
X
X    if ( stat ( f, &st ) )
X	perror ( f );
X    else
X    {
X	age = st.st_mtime;
X	if ( st.st_ctime > age ) age = st.st_ctime;
X	switch ( st.st_mode & S_IFMT )
X	{
X	case S_IFIFO: /* fifo */
X	case S_IFCHR: /* char */
X	case S_IFBLK: /* blok */
X	case S_IFDIR: /* diry */
X	    printf ( "0 0 %ld %s\n", age, f );
X	    break;
X	case S_IFREG: /* file */
X	    printf ( "0 %ld %ld %s\n", st.st_size, age, f );
X	    break;
X	}
X    }
X}
END_OF_fileinfo.c
if test 883 -ne `wc -c <fileinfo.c`; then
    echo shar: \"fileinfo.c\" unpacked with wrong size!
fi
chmod +x fileinfo.c
# end of overwriting check
fi
if test -f fit.awk -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"fit.awk\"
else
echo shar: Extracting \"fit.awk\" \(635 characters\)
sed "s/^X//" >fit.awk <<'END_OF_fit.awk'
XBEGIN { state = 1; nfile=0; bot=1 }
Xstate == 3 {
X	print
X	exit
X}
Xstate == 2 {
X	size = int($3 / 512) + 3
X	if ( size > maxcap )
X	{
X		print $5 ": too big for any volume" > "assign.errors"
X		next
X	}
X	for ( i = bot; i <= nfile; i++ )
X	{
X	    if ( size <= cap[i] )
X	    {
X		cap[i] -= size;
X		break;
X	    }
X	    if ( cap[i] < 3 && i == bot && ++bot > nfile ) state = 3;
X	}
X	fname = "flist." 
X	if ( i > nfile ) print
X	else print volno[i], $3, $4, $5 > vol[i]
X}
Xstate == 1 && NF == 3 {
X	nfile++;
X	cap[nfile] = $2
X	volno[nfile] = $1
X	vol[nfile] = "flist." $1
X	if ( $2 > maxcap ) maxcap = $2
X	next
X}
X$0 == "end-of-media-list" {
X	state = 2
X	next
X}
END_OF_fit.awk
if test 635 -ne `wc -c <fit.awk`; then
    echo shar: \"fit.awk\" unpacked with wrong size!
fi
chmod +x fit.awk
# end of overwriting check
fi
if test -f killcrunch.c -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"killcrunch.c\"
else
echo shar: Extracting \"killcrunch.c\" \(1207 characters\)
sed "s/^X//" >killcrunch.c <<'END_OF_killcrunch.c'
X/*
X *	read .nobackup files names from stdin or argv and
X *	emit sed commands to be applied to find -print output
X *	implementing the requests found in those files.
X */
X
X#include <stdio.h>
X#include <string.h>
X
Xchar	fnbuf[512];
X
Xmain ( argc, argv )
X	int	argc;
X	char	*argv[];
X{
X	int	i;
X
X    if ( argc > 1 ) for ( i = 1; i < argc; i++ )
X	fmtkill(argv[i]);
X    else while ( fgets ( fnbuf, sizeof(fnbuf), stdin ) )
X    {
X	fnbuf[strlen(fnbuf)-1] = 0;
X	fmtkill(fnbuf);
X    }
X}
X
Xchar	rebuf[512];
X
Xfmtkill(fn)
X	char	*fn;
X{
X	FILE	*kf;
X	char	*p;
X    
X    if ( ! (kf = fopen ( fn, "r" )) ) return perror(fn);
X    if ( p = strrchr(fn,'/') ) *p = 0;
X    else fn = ".";
X    while ( fgets ( rebuf, sizeof(rebuf), kf ) )
X    {
X	rebuf[strlen(rebuf)-1] = 0;
X	fputs ( "/^", stdout );
X	re_fmt(fn);
X	fputs ( "\\/", stdout );
X	if ( rebuf[0] == '/' ) /* regexp */
X	    fputs ( rebuf+1, stdout );
X	else
X	    re_fmt(rebuf);
X	fputs ( "$/d\n", stdout );
X    }
X    fclose(kf);
X}
X
Xre_fmt(s)
X	char	*s;
X{
X    while ( *s )
X    {
X	switch ( *s )
X	{
X	case '\\':
X	case '.':
X	case '/':
X		putc('\\',stdout);
X		break;
X	case '*':
X		fputs("[^/]",stdout);
X		break;
X	case '?': 
X		fputs("[^/]",stdout);
X		s++;
X		continue;
X	}
X	putc(*s++,stdout);
X    }
X}
END_OF_killcrunch.c
if test 1207 -ne `wc -c <killcrunch.c`; then
    echo shar: \"killcrunch.c\" unpacked with wrong size!
fi
chmod +x killcrunch.c
# end of overwriting check
fi
if test -f mcheck -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"mcheck\"
else
echo shar: Extracting \"mcheck\" \(254 characters\)
sed "s/^X//" >mcheck <<'END_OF_mcheck'
Xvol=$1
Xwhile true
Xdo
X	echo load volume  $vol  and press return\\c
X	line > /dev/null
X	dd if=/dev/rmt0 bs=512 count=1 of=tmp_header 2>/dev/null
X	vid=`head -1 tmp_header`
X	if [ $vid != $vol ]
X	then
X		echo "hey, that's volume " $vid
X	else
X		exit 0;
X	fi
Xdone
END_OF_mcheck
if test 254 -ne `wc -c <mcheck`; then
    echo shar: \"mcheck\" unpacked with wrong size!
fi
chmod +x mcheck
# end of overwriting check
fi
if test -f minit -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"minit\"
else
echo shar: Extracting \"minit\" \(320 characters\)
sed "s/^X//" >minit <<'END_OF_minit'
X#
X# initialize a backup volume
X#
Xwhile true
Xdo
X    vnum=`awk '$1 > top { top = $1 } END { print top + 1 }' media.db`
X    echo "insert volume " $vnum " and press return"
X    line > /dev/null
X    mt ret
X    echo $vnum > tmp_label
X    dd if=tmp_label count=1 conv=sync of=/dev/rmt0
X    echo $vnum 110000 0 >> media.db
Xdone
END_OF_minit
if test 320 -ne `wc -c <minit`; then
    echo shar: \"minit\" unpacked with wrong size!
fi
chmod +x minit
# end of overwriting check
fi
if test -f mrecycle -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"mrecycle\"
else
echo shar: Extracting \"mrecycle\" \(78 characters\)
sed "s/^X//" >mrecycle <<'END_OF_mrecycle'
Xfor x
Xdo
X	echo "recycling volume " $x
X	ed - saved <<foo
Xg/^$x /d
Xw
Xq
Xfoo
Xdone
END_OF_mrecycle
if test 78 -ne `wc -c <mrecycle`; then
    echo shar: \"mrecycle\" unpacked with wrong size!
fi
chmod +x mrecycle
# end of overwriting check
fi
if test -f script -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"script\"
else
echo shar: Extracting \"script\" \(3473 characters\)
sed "s/^X//" >script <<'END_OF_script'
X# backup script
XPATH=/usr/backup:/usr/lbin:$PATH; export PATH
Xcd /usr/backup
X
Xcompress < files > files.Z
Xcompress < saved > saved.Z
X
X# N is number of media to use per run, max 9
XN=1
X
X# DEV is the name of the backup device
XDEV=/dev/rmt0
X
X# select N oldest media into mlist
Xsort -n +2 +0 media.db | head -$N > mlist
X
X# remember the age A of youngest
XA=`tail -1 mlist | cut -d' ' -f3`
X
X# now sort them for user convenience
Xsort -n -o mlist mlist
X
X# for unattended (tape, N=1) backup get the volume mounted,
X# otherwise let the operator gather the volumes
Xif [ $N = 1 ]
Xthen
X	vol=`cut -d' ' -f1 mlist`
X	echo Will need volume $vol
X	mrecycle $vol
X	mcheck $vol
Xelse
X	echo Will need volumes `cut -d' ' -f1 mlist`
X	mrecycle `cut -d' ' -f1 mlist`
Xfi
X
X# now scan the disk
Xecho Scanning the disk.
Xfind / -print | sed -f all_kill | fileinfo > files
X
Xecho Assigning files to media.
X
X# format of files and saved is {vol, size, tstamp, name}.  vol is
X# zero in files.  format of media.db is {vol, size, btime}.  We
X
X# files and saved are of the form (vol, size, tstamp, name),
X# with vol degenerate (0) in files.  media.db is of the
X# form (vol, size, btime). We join them into
X# tlist: {vol, btime, size, tstamp, name}.
X
X# join { files U saved } X { media.db U (0,0,0) } -> tlist
Xecho 0 0 0 | sort - media.db > mdb
Xsort +0 -1 files saved | join -j 1 -o 2.1 2.3 1.2 1.3 1.4 - mdb > tlist
X
X# format of wlist (and tlist) is (vol, btime, size, tstamp, name)
X
X# for each group differing only in volume,btime we want to keep
X# the youngest btime found, but only if the file was found in
X# the scan -- in which case there will be a vol=0 record also.
X
X# so we sort tlist such that records differing only in vol,btime are
X# grouped together in ascending btime order.  the result is piped
X# into an awk script that prints the last (most recently saved)
X# line for each group having a 0 (freshly scanned) element.
X# The sentinel is provided to help flush the last group.
X
X(sort +4 +2 -4 +1n tlist ; echo sentinel ) |
Xawk '$3 != s || $4 != t || $5 != n { if (z) print v,b,s,t,n; z=0 }
X	{ if(!$1) z=1; v = $1; b = $2; s = $3; t = $4; n = $5}' > wlist
X
X# Now wlist has one record for each record in files.  We sort
X# them into ascending btime order so that the files most in need
X# of saving will be assigned first.
X
X# now sort by btime, oldest first --
X# the zeros will sort out first.
Xsort +1n -2 +4 -o wlist wlist
X
X# now assign from wlist to N flists
X# flist.x, flist.y ... where x,y... are the volumes selected.
X# flist.0 gets any we can't assign.  In the process the btime gets stripped
X# from flist.i but not from flist.0.
X
Xrm -f flist.*
Xcp /dev/null assign.errors
Xecho end-of-media-list | awk -f fit.awk mlist - wlist > flist.0
X
X# report any errors from assignment
Xcat assign.errors
X
X# remember age T of oldest file not backed up
Xif [ -s flist.0 ]
Xthen
X	T=`head -1 flist.0 | cut -d' ' -f4`
Xelse
X	T=`rawdate`
Xfi
X
X# report overlap percentage = 100(A-T)/A
Xnow=`rawdate`
Xecho "Overlap =" `expr 100 \* \( $T - $now \) / \( $A - $now \)`"%"
X
X# for each flist.i (i != 0) copy to medium and update data bases
Xrm flist.0
Xls flist.* | cut -d. -f2 | sort -n |
Xwhile read vol
Xdo
X	if [ $N != 1 ]
X	then
X		mcheck $vol < /dev/tty
X	fi
X	dd if=$DEV count=1 of=tmp_label 2>/dev/null
X	sed -e 's/[^ ]* [^ ]* [^ ]* //' flist.$vol | sort |
X		(cat tmp_label; afio -of -b 8k -c 16 -) > $DEV
X	cat flist.$vol >> saved
X	rm flist.$vol
X	ed - media.db <<foobar
X/^$vol /s/[0-9]*\$/$now/
Xw
Xq
Xfoobar
X
Xdone
X
Xecho Backup complete.
END_OF_script
if test 3473 -ne `wc -c <script`; then
    echo shar: \"script\" unpacked with wrong size!
fi
chmod +x script
# end of overwriting check
fi
echo shar: End of shell archive.
exit 0
-- 
Steve Nuchia	      South Coast Computing Services      (713) 964-2462
"Man is still the best computer that we can put aboard a spacecraft --
 and the only one that can be mass produced with unskilled labor."
					- Wernher von Braun



More information about the Comp.unix.wizards mailing list