SOLVED 6 disk RaidZ2 vs Striped Mirrors (Raid10)

Status
Not open for further replies.

Terrigat

Dabbler
Joined
Apr 5, 2016
Messages
13
I just finished a smartctl long / badblocks / smartctl long on my new 6 x 3tb drives. I was about to power it on to copy my home NAS md 4x2tb RAID5 w/hot spare that almost filled up to a new RaidZ2. Then that last minute what-if started getting worse.

From my understanding RaidZ2 would stress the disks more over time and during rebuilds than an equivalent Striped Mirrors [Raid10] for a measly gain of 1/2TB usable space [20% recommended overhead].

A doubt the extra performance from the Striped Mirrors would be important. But it also allows easier future expansion. It lacks the parity recalculation overhead of RaidZ. I won't need to police the recommended space overhead with quotas. Rebuilds are much easier to recover from.

Anyone have experience with if the fault tolerance of RaidZ2 is actually more effective during rebuilds at 6 x drives than Striped Mirrors?

TL;DR: Safer rebuilds on a Raid10 [time & R/W Stress] or RaidZ2 [extra disk failure before loss]?

PS. Important data offline backup. Don't WISH to recover extraneous data on failure.
 

Sakuru

Guru
Joined
Nov 20, 2015
Messages
527
Personally I'd prefer the extra safety of Z2, especially if you don't need the performance of striped mirrors.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
If you have a backup, then you would be fine going striped mirrors (it would be really nice if you had a spare sitting on the shelf).
Don't forget, it's also easier to expand a striped mirror. You only need 2 disks.

I'd go with the RAIDZ2 though. The likelihood of 3 drives crapping out is lower than 2 in the same mirror. And either way, as long as you have backups, you are ok.
 

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
Code:
P(X) = (n choose X) * (p)^X * (1-p)^(n-X)

p = probability of any single drive failing
It is assumed to be 0.03 based on typical drive failure rates in data centers - no spec sheet MTTF gibberish (http://lwn.net/Articles/237924/)

n = array size (i.e. number of disks)

X = number of drives that fail simultaneously

2-way MIRROR:
=======================
P(X) = (2 choose X) * (0.03)^X * 0.97^(2-X)

P(1) = (2 choose 1) * (0.03)^1 * 0.97^(2-1) = 0.0582
P(2) = (2 choose 2) * (0.03)^2 * 0.97^(2-2) = 0.0009

3-way MIRROR:
=======================
P(X) = (3 choose X) * (0.03)^X * 0.97^(3-X)

P(1) = (3 choose 1) * (0.03)^1 * 0.97^(3-1) = 0.084681
P(2) = (3 choose 2) * (0.03)^2 * 0.97^(3-2) = 0.002619
P(3) = (3 choose 3) * (0.03)^3 * 0.97^(3-3) = 0.000027

RAIDZ with 6 drives:
=======================
P(X) = (6 choose X) * (0.03)^X * 0.97^(6-X)

P(1) = (6 choose 1) * (0.03)^1 * 0.97^(6-1) = 0.154572124626
P(2) = (6 choose 2) * (0.03)^2 * 0.97^(6-2) = 0.011951452935
P(3) = (6 choose 3) * (0.03)^3 * 0.97^(6-3) = 0.00049284342
P(4) = (6 choose 4) * (0.03)^4 * 0.97^(6-4) = 0.000011431935
P(5) = (6 choose 5) * (0.03)^5 * 0.97^(6-5) = 1.41426 * 10^-7
P(6) = (6 choose 6) * (0.03)^6 * 0.97^(6-6) = 7.29 * 10^-10

===============================================

2-way mirror vdev failure on X = 2:
P(3x2 mirrors) = 3 * P(2) = 0.0027

3-way mirror vdev failure on X = 3:
P(2x3 mirrors) = 2 * P(3) = 0.000054

RAIDZ1 5+1: vdev failure on X >= 2:
P(RAIDZ1 5+1) = P(2) + P(3) + P(4) + P(5) + P(6) = 0.01245587044

RAIDZ2 4+2: vdev failure on X >= 3:
P(RAIDZ2 4+2) = P(3) + P(4) + P(5) + P(6) = 0.00050441751

RAIDZ3 3+3: vdev failure on X >= 4:
P(RAIDZ3 3+3) = P(4) + P(5) + P(6) = 0.00001157409
 
Last edited:

Terrigat

Dabbler
Joined
Apr 5, 2016
Messages
13
Thanks for the data. Already started the RaidZ2.

So I finished the move and scrubbed once for good measure. Afterwards a SmartCTL test turned up this:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 141 141 054 Pre-fail Offline - 66
3 Spin_Up_Time 0x0007 100 100 024 Pre-fail Always - 436
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 9

5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 4

7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline - 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 208
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 9
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 551
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 551
194 Temperature_Celsius 0x0002 240 240 000 Old_age Always - 25 (Min/Max 16/40)

196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 4

197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

Can I / Should I RMA this drive before the first 30 days? The Thresh[hold] value says 5 so is it close to a Fail condition already?
 

Sakuru

Guru
Joined
Nov 20, 2015
Messages
527
It's certainly not good to see reallocated sectors so soon. Watch it closely. If they keep increasing, RMA it.
 

Terrigat

Dabbler
Joined
Apr 5, 2016
Messages
13
This drive went through a smart long|badblocks|smart long|rsync|scrub|smart long. The first two tests came thru clean which SHOULD mean these sectors worked during the default 4xBadBlocks passes. I assume these only occur during writes? RMA window closes the 22nd of this month...
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
This drive went through a smart long|badblocks|smart long|rsync|scrub|smart long. The first two tests came thru clean which SHOULD mean these sectors worked during the default 4xBadBlocks passes. I assume these only occur during writes? RMA window closes the 22nd of this month...

I would do an RMA now then.
 
Status
Not open for further replies.
Top