Most people while phase out as soon as they see the formulas.
Maybe you should have a table up front before the formulas scare people off?
And other people will be put of by a table of numbers too, but there has to be a limit of the hand holding I guess.
Answers I would be looking for:
- P(pool loss)? single, double, triple disk failure vs pool geometry mirror, raidz1 (i know, but people will want to know, then ignore it), raidz2, raidz3.
- mirror double/triple, raidz2 2/4/6/8D+2P, raidz3 similar I guess, haven't considered z3 much I guess. Too many would be too dense
- Not sure there is enough out there about draid to add something useful?
- How does multiple vdevs affect things. I know it depends, and is complicated, but everyone wants single number.
- Combining P(disks failure) and P(ure) somehow, again to come up with a single number to rank options.
- How does reduced duration of mirror resilver vs raidz* affect P(pool loss) or P(data loss)
- How does sending a disk for RMA and waiting for replacement, which increases exposure time, ie a week or 2 or 3 or 4, affect P(data/pool) loss? Might convince people to keep cold spare?
Looks difficult, maybe impossible, not even sure if they are good questions.
Reading thru though, it reinforces my plan of 1 HDD pool, ~10TB 3x mirror, replacing one disk a year with best available price/capacity modulo features(esp URE), keep new disk as cold spare.
It seems I might have to consider 4x mirror sooner than I was expecting, given 1−(1−10^−15)^(80%*10*10^12*8) ~= 6.19%
Even reducing to 40% capacity, only brings down to 3.13%
Or should I be thinking P(lose 2 disks from 3 disk mirror vdev)=P(lose)=1.19% and P(ure)=6.19%, so
P(data loss) = 1.19% * 6.19% = 0.000738 = 0.0738% or should should I be doing that differently, stats not being my forte.
P(pool loss) should be much lower since meta data is duplicated at least, and should be recoverable vs 1 URE, and I could set ncopies=2 for critical datasets.
Interestingly, P(ure with 10^-15) seems to be roughly linear vs data, double disk size => 12%, halve data load to 40% => 3%, not what I was expecting in a exponent, but I guess (1-10^-15) is very close to one.
Using 10^-14 is however entirely different, base=47%, then 72% and 27% so P(data loss, 10^-14, 3xM 10TB@80%) = 47%* 1.19% ~= 10%