Checkpoint/Restart

Barry Shein bzs at world.std.com
Wed Aug 22 15:19:41 AEST 1990


TOPS-20 made this sort of thing trivial via the SAVE command. It just
rolled all of your current foreground processes' virtual memory into a
file. Kinda like a core dump, but re-executable. Actually, the
foreground processes' virtual memory was always just kind of there,
sort of like being able to TSTP a process and then adb (ahem, DDT) it.
Not horribly different than adb (et al) defaulting to "core", tho I
think you could continue stepping a stopped job (CMS also had that
virtual memory quality, certainly before TOPS-20, but I don't remember
any easy way to save it to a file and restart it.)

TOPS-20 would issue an interrupt (signal) when the program was
restarted which could be trapped to re-init anything you wanted,
again, not that different from SIGCONT, but across a checkpoint.

*BUT*, it was surely fraught with all the problems mentioned for Unix,
nothing magic, the process had to be able to reinit itself when it got
a restart interrupt, and hope that nothing in the external state had
changed much.

So experience bears out what people are trying to say.

Some of the problems with checkpoint/restart are probably also
potential problems with SIGTSTP'd jobs (try seeing how long you can ^Z
a local uucico process and still continue where you left off.)

Another concern is that it seems to me that once TOPS-20 had a SAVE
facility it tended to get in the way of other design decisions. An
answer to a question "why doesn't TOPS-20 do this" was sometimes
answered with "if they did that then SAVE couldn't work right." I seem
to remember this coming up in some peculiarities with the RESCAN
buffer design (sort of like Unix's argv/argc, or maybe it was just
that it never worked quite right on restarted jobs.)

That's the real design problem, it has the potential of becoming an
enormous, draconian tail wagging a quite harried dog if the OS should
promise to do this. I vote for the library routine and applications
being responsible.

(History buffs, earn points for valuable prizes! Didn't OS/MVT do this
kind of cold/warm reboot, where warm reboots, when possible, just
continued everything other than perhaps the job active when the system
crashed?)
-- 
        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs at world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD



More information about the Comp.unix.wizards mailing list