Disk Mirroring (was Re: Altos 5000)

Dermot Tynan dtynan at altos86.Altos.COM
Fri Aug 31 12:12:51 AEST 1990


In article <1990Aug27.183821.13518 at ico.isc.com>, rcd at ico.isc.com (Dick Dunn) writes:
> > Even "reliable" disks eventually die.
> 
> True.  So do reliable controllers.

I don't know what your hardware background is, but let me assure you that the
following statement is Law:

	MTBF(controllers) >> MTBF(disks)	..........................(i)

No-one can claim to produce a completely fault-free system.  Most of the
rhetoric is exactly that.  "Fault Tolerant", "Fault Resilient", etc.  No
matter what you do, as long as there is a probability (no matter how small),
of something failing, your system is not fault-free.  The whole idea behind
disk mirroring, is not to replace disk backups (which can also be faulty),
but to reduce the fault probability by a considerable margin.  In general
terms, if you want to make a system more resilient to failure, the first
place to look is in any non-solid-state system.  Ie, anything with moving
parts.  In the average system, this means the disk drives.  While mirroring
won't eradicate the probability of failure, it will reduce it considerably.
At least from the users point of view.

> What I want to get at--and it's something I didn't say at all in my previous
> posting--is that if you're looking for a certain level of reliability, it's
> a lot harder than just tossing on extra disks and mirroring.

See above.  Nobody is trying to produce a fault-free system.  We are just
trying to reduce the likelihood of having to restore a filesystem.  Believe
me.  Disk mirroring will slow down disk writes (which aren't the bulk of
disk operations, anyway), but it will double your disk reliability.

>   - Is there another way to get comparable recovery capability?
> To the second question, I'll suggest "journaling" as providing a lot of
> what you need, possibly at much less cost.  I'm more interested in the
> first question.

Certainly "journaling" is another approach.  However, it puts the onus on
the person writing the application, rather than hiding it in the OS, and
furthermore, it is as valid to label "journaling" as a marketing bullet
item, as it is disk mirroring.  It is a question of what the user community
wants.  Altos, like most companies is a slave to its user community.  Most
product development is based on what our customers want.  They want mirroring.
We implemented it.  It has nothing to do with bullet items.  It has to do
with what the market wants.

> I had pointed out that it takes extra I/O bandwidth to handle mirroring;
> someone responded that if you have the right sort of controller, it will
> write both disks at once for you.  OK, fine, now you've made the controller
> a single-point-of-failure.

	MTBF(controller) >> MTBF(disks)		Get it?

> I've seen as many motherboard and controller
> failures as disk failures.  I don't pretend my experience is typical, but
> suppose that it might be.  The disks are not the only failure points in the
> system.

I suggest that you have some serious design flaws here.  See Law (i).
Furthermore, even if the controller *does* die, you can snap on a new
controller, and continue, a lot faster than you can replace a disk, and
restore from backups.  Assuming, of course, that your backups were done
*right* before the disk died, or that you log all transactions to tape.

> If you're essentially running on one disk and just writing the
> other as a backup mirror, you're not getting the ongoing check that you
> really need for reliability.

Again, the reliability gained from even the simplest of mirroring schemes
far exceeds not doing *any* mirroring.  If, indeed, reliability is a concern.
If this isn't enough, there are other things you can do.  This sort of
falls into the standard Cache argument, which goes like this...
"With a 256K cache, you can get a 95% hit rate.  So why bother only using
 a 64K cache?".
The correct answer, of course, is that the 64K cache may only give you
an 80% hit rate (arbitrary figure), but its still a lot better than 0%.
And its one quarter the cost!

> In this case, I'm not arguing that
> mirroring is worthless, but I do argue that it's inordinately expensive
> and only addresses one small part of the overall reliability problem.  A
> single system with mirrored disks on one controller has only one element of
> redundancy.

A third time:
	MTBF(controller) >> MTBF(disks)

What exactly do you mean when you say "expensive".  Since Altos doesn't charge
anything for disk mirroring, and for the most part, is developed in conjunction
with disk striping (which is worth its weight in gold), doesn't require any
noticeable NRE.  As for its performance expense, this is *only* borne by those
who enable it (SCO and C2 could learn something here :), therefore, there is
*no* expense to those people (the majority, probably) who don't use it.  For
those who do, you've failed to convince me that the performance expense is not
worth the gain.
						- Der
-- 
	Dermot Tynan,  Altos Computer Systems,  San Jose, CA   95134
	dtynan at altos86.Altos.COM		(408) 432-6200 x4237

	"Five to one, baby, one in five.  No-one here gets out alive."



More information about the Comp.unix.i386 mailing list