Question: On a 1 disk per vdev does a "nonrecoverable read error" mean we only lose a file, as opposed to the entire vdev/pool?
Context:
I'm running out of space, so I'm redoing my servers. The new configuration is a 11 disk (14TB) in a RaidZ3 plus 2 off-line spares and a second server as a backup via replication. I'm looking to minimize the number of disks (16TB) on the second server. Since the primary is already a Z3, for the replication destination I'm thinking of using single-disk vdevs stitched together in a pool.
If I understood this correctly, the chance of the primary pool failing (227,000 hours MTBF), then the rebuild of the 11-disk Z3 failing (0.016%) and then the replication being lost by a drive failure (500,000 hours MTBF) is pretty much nil.
Details:
- MTBF of 1 disk is 2,500,000 hours
- MTBF of a 5-disk pool is 500,000 hours <== mtbf(1||2) = (mtbf1 X mtbf2) / (mtbf1 + mtbf2)
- MTBF of a 11-disk pool is 227,000 hours
- Nonrecoverable read errors rate is 1 per 10^16 bits read (1.12% for a 14GB disk and 1.28% for a 16GB disk)
- Z3 Rebuild Fail is 0.016% <== 1 - (1-df)^(nd-1) - (nd-1)*df*(1-df)^(nd-2) - (nd-1)*(nd-2)*(df)^(2)*(1-df)^(nd-3)/2 where df = probability of a drive to fail during rebuild (1.12%), nd = number of drives (11)
Thoughts?
Thanks!
Context:
I'm running out of space, so I'm redoing my servers. The new configuration is a 11 disk (14TB) in a RaidZ3 plus 2 off-line spares and a second server as a backup via replication. I'm looking to minimize the number of disks (16TB) on the second server. Since the primary is already a Z3, for the replication destination I'm thinking of using single-disk vdevs stitched together in a pool.
If I understood this correctly, the chance of the primary pool failing (227,000 hours MTBF), then the rebuild of the 11-disk Z3 failing (0.016%) and then the replication being lost by a drive failure (500,000 hours MTBF) is pretty much nil.
Details:
- MTBF of 1 disk is 2,500,000 hours
- MTBF of a 5-disk pool is 500,000 hours <== mtbf(1||2) = (mtbf1 X mtbf2) / (mtbf1 + mtbf2)
- MTBF of a 11-disk pool is 227,000 hours
- Nonrecoverable read errors rate is 1 per 10^16 bits read (1.12% for a 14GB disk and 1.28% for a 16GB disk)
- Z3 Rebuild Fail is 0.016% <== 1 - (1-df)^(nd-1) - (nd-1)*df*(1-df)^(nd-2) - (nd-1)*(nd-2)*(df)^(2)*(1-df)^(nd-3)/2 where df = probability of a drive to fail during rebuild (1.12%), nd = number of drives (11)
Thoughts?
Thanks!