Yes/No question regarding ZFS.

Status
Not open for further replies.

eretron

Dabbler
Joined
Dec 11, 2013
Messages
27
Hi guys,

I have a quarrel to settle. Let me first say I'm sorry for wasting anybody's time, as I am a layman in the field, but I am trying to grasp the logic of this system, so I have the following, multifaceted question:

When it comes to the resilvering process, would I be right in saying that, with each storage increment of added pool capacity, the job gets easier, if more time-consuming? Let me further illustrate:

Say we had two scenarios:
- a total storage capacity of 1000TB in 500 2TB units where 1 of them were to fail
- a total storage capacity of 10TB in 5 2TB units, where 1 of them were to fail

Can it be said that the job of the resilvering hard drive in these respectful scenarios is similar to, for example, that of making a simple mathematical operation like adding 2 + 2 a bunch of times (in the first case), compared to making a significantly harder operation of the same type like adding 137+149 proportionally fewer times?

The theory is obviously based on the fact that in the latter case the percentage of data missing is more significant.

Thanks!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
For rebuilding you must have enough data from all of the other disks to use the appropriate equations to recover your data.

In your first example, it might be A+B+C+D+E+F...(and so on for the remaining 499 units) and all of those combine in 1 very large equation.

In your second example it might be just A+B+C+D=E and you solve for E. Much simple equation.

But, that's not entirely true either. Those equations would be assuming you used a single vdev. Generally you shouldn't be putting more than 10 disks in the same vdev. So the first example, if you had 1 disk fail that one vdev would be busy at work with the 9 remaining disks in that vdev. In the second example you'd have just the other 4 disks.

Now, to make things worse, the type of vdev matters too. RAIDZ1 is just XOR. XOR is quite a bit faster than RAIDZ2's parity calcs(Reed-Solomon error correction) that has 2 sets of parity that is more complex. RAIDZ3 is slower still because it calculates 3 sets of parity. How much slower? Well, that depends on how fast your CPU can XOR versus RAIDZ2 and RAIDZ3. Some people will never notice because they have a fairly powerful CPU. Others will be crippled with RAIDZ2.

Clear everything up?
 

eretron

Dabbler
Joined
Dec 11, 2013
Messages
27
Thank you for your time cyberjock, it indeed cleared a lot about the subject. One last thing - within your explanation, would you say that action A in the first example is as complex as action A in the second one, or would there be a difference in complexity?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Far different. If you had a 500 disk RAIDZ1 you'd be lucky if it could stay online for more than a few hours. Disk failures would happen so quickly you could never rebuild the pool before another disk failed. If you had a 5 disk RAIDZ1 the risk is lower that a failure of 1 disk would cause a failure of a second disk before you lost your pool.

But, keep in mind that in today's day and age RAIDZ1 is not considered safe because you don't need a 2 disk failure to lose data. All you really need is 1 disk to fail and another to have a URE(Unrecoverable read error... aka lost data). That can ruin hardware and software RAIDs. So RAIDZ2 is considered the "safest best" because of the failure rates for hard drives. Read the link in my sig if you want to see the math behind it. In short, single disk redundancy stopped being safe years ago.
 
Status
Not open for further replies.
Top