Hardware advice & ZFS questions (URE/BER)

Status
Not open for further replies.

cjdavies

Cadet
Joined
Nov 1, 2012
Messages
7
I currently run 3x 2TB Samsung F3EG (1 URE in 10^15) in a software (mdadm) RAID5 in my main computer. I'm running out of free space & want to build a separate NAS running FreeNAS, using these 3x 2TB disks & 3x new 2TB disks - which will be 1 URE in 10^14, because when Seagate bought Samsung's disk division & ceased production of the 1 in 10^15 F4EG disks there are no longer any 10^15 disks that aren't 'enterprise' models with the accompanying price tag.

I want to use ZFS, either RAID-Z2 or RAID-Z3, using all 6x 2TB disks, however I'm having difficulty working out the probability of multiple UREs when rebuilding. In a worst-case scenario the array will be full, so when a single disk fails it will need to read all of the remaining 5x disks to rebuild, which means 10TB of data. It's fairly easy to work out that the probability of a single URE when reading 10TB & assuming all disks are 1 URE in 10^14 is given by;

100/(10^14)*8.8E13 = 88%

Which isn't great at all, so RAID-Z with a single layer of parity is out of the question. However I don't know how to calculate the probability of getting a *second* & even a *third* URE in that same 10TB of read data. Can anybody with better maths knowledge than me help?

Also wrt UREs, can anybody confirm what happens when ZFS encounters a URE that it can't recover from (eg there is a failed disk on a RAID-Z2 & then 2x URE's when rebuilding). I've read in some places that ZFS will alter the filename of any affected files so that they are identified as 'unrecoverable' & then continue rebuilding. But I've also read that when a disk produces a URE the controller/ZFS marks the entire disk as failed & the rebuild would stop/fail.

As for hardware, I'm planning on using the Fractal Array R2. I've read that a dedicated controller doesn't really provide any performance benefit when using ZFS than software, so I'm planning on getting a Mini-ITX motherboard with 6x SATA connectors. The only choice that is still stocked seems to be the Zotac H67ITX-C-E, which I would pair with an i3 & 8GiB or 16GiB RAM. I'm guessing that H67 is fairly well supported & there shouldn't be any issues with the onboard LAN?

For the disks, I already have the 3x 2TB Samsung F3EG & will probably go for Western Digital for the new 3x. I don't trust Seagate's quality as much as WD & now that Samsung have been taken over by Seagate they're one & the same. My first choice would have been the 2TB WD Green, but I've read that the 'Advanced Format' feature that doesn't use standard 512B sectors makes using them with FreeNAS more difficult. The WD Red is the next choice, especially as they claim to be '24/7' disks, but I can't tell if they also have this 'Advanced Format' rubbish. Any input?

For RAM, will I benefit from having 16GiB instead of 8GiB? The NAS will normally only be serving a single computer, if it makes a difference. The price of 16GiB is exactly double that of 8GiB, so I don't mind getting 16GiB if I will notice a difference.

tl:dr

1.) How to calculate probability of multiple UREs in a certain amount of data read?

2.) Is this a good selection of hardware?

Fractal Array R2
Zotac H67ITX-C-E
i3 2120
8GiB or 16GiB RAM
3x 2TB Samsung F3EG
3x 2TB WD Red (WD20EFRX?)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Someone in the IRC channel was discussing UREs. There's lots of ambiguity as to what the hardware will do because each vendor can do whatever they want. If you don't have drives with TLER then a drive could, in theory, continue retrying in-perpetuity. In that case, FreeNAS may basically lock up, in-perpetuity.

A SATA or RAID controller may consider the hard drive "disconnected" after a certain period of time too, so that would need to be factored in.

If ZFS becomes aware that a file is corrupted, it is often found in a 'zfs status' printout. I haven't had a file be corrupted before, so I can't vouch for a file name being changed. Honestly, that somwhat flies in the face of the theory of FreeBSD. FreeBSD uses the principle that the administrator knows what he/she is doing, so doesn't do much automatically without the admin specifying it. This can be taken to the extreme as FreeBSD will let you do things that are completely idiotic, if you don't have a good foundation of ZFS.

You should follow my guide wrt RAM. More the merrier. You can't have too much RAM.

Not sure where you've heard that advanced format is bad when FreeNAS has a checkbox SPECIFICALLY to enable 4k sector alignment when creating the zpool. My FreeNAS server is 22 Green drives, all advanced format.
While your approach to protecting your data is sound, the difference between the engineered effect and the effect you desire are not the same, nor can they be inferred or guesstimated without what is likely proprietery vendor information for the hardware.

I'm confused on your attention to detail wrt the hard drives, but are completely ignoring other potential issues, such as not using ECC RAM. You could, theoretically, have failing RAM that could cause devastating damage to your data. It could result in data corruption that cannot be corrected with ZFS.

As for Red vs non-Red, there's plenty of threads you can search for and read if you want some gouge on which is more recommended. I think the general concensus is that Green drives work, Red drives have potential to be better for FreeNAS, but it will take time to see how well Red drives perform when they start wearing out.

But yeah, throw out all the number crunching because it really won't help you much when you have approximately zero chance of predicting how a URE will respond without actually owning the exact hardware you plan to use and testing it. Don't forget that a simple firmware revision to a hard drive or SATA controller could change the way in which the device responds.
 

cjdavies

Cadet
Joined
Nov 1, 2012
Messages
7
So the takeaway message is that without knowing how the disks & the controller will respond after a URE, the data isn't safe at all? Looks like I need to do some more research then.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It's not that your data isn't safe, it's that you can't do some math calculation and say one is better than the other. If a drive is going to fail a particular way, ZFS is far better than some of the other options. Are you going to tell me that your desktop or laptop data isn't "safe" because you haven't applied the exact same analytical processes to those machines?

It's all about risk analysis. How much risk are you willing to take? You took the question to the extreme and started trying to analyze things you can only guess on. If you are going to go to the extreme, you should buy a pre-made system. Those systems will come with parts that have guaranteed compatibility(if you trust the manufacturer isn't full of sh*t). You'll pay a hefty premium, and performance may be subpar for the hardware you have, but its probably the safest. That's why server manufacturers are not threatened by the obscure geeks that build their own server. Companies WILL still buy their stuff at the end of the day.

But yeah, go back and reanalyze whatever you want. I'm sure emails from manufacturers won't necessarily be the correct answer, but it has the virtue of being the answer from someone that probably speaks English as a second language and you can't provide any evidence to the contrary. Keep in mind that you're analyzing hard drive errors but completely ignoring potential RAM errors. You really have to decide what priorities are important and factor that accordingly.
 

cjdavies

Cadet
Joined
Nov 1, 2012
Messages
7
Going for ECC RAM instead of regular RAM shouldn't be an issue unless it requires a 'sever' motherboard/chipset (eg not a H67)? It's double the cost so I would probably just go for 8GiB. If it does require a 'server' chipset, there is always a SuperMicro Mini-ITX board that I could use, which I expect will support ECC.

My paranoid calculations were wrong anyway - I was calculating the risk of (for example) a single disk failing & then 3x UREs *in total* whilst rebuilding. Whereas what I actually wanted to know was the probability of a single disk failing & then 3x UREs *in the same chunk* (eg a URE, then another URE when reading the particular parity to rebuild it, etc.) which is probably substantially less likely.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It does require a server CPU.

If you are looking for 3 UREs in the same ZFS stripe, I'd say it's so remote I wouldn't even consider calculating it. It's more likely the world will end tomorrow than you'd ever see 3 UREs in the same ZFS stripe at the same time. Even 2 is incredibly remote.
 

cjdavies

Cadet
Joined
Nov 1, 2012
Messages
7
Looks like I'm going to be looking at the 'Intel Server Board S1200KP' as it is the only Mini-ITX board I can find that supports ECC. It only has 4x SATA connectors onboard so I'd have to buy a PCIe controller as well. But I would at least have more peace-of-mind that cosmic rays aren't flipping bits whilst I'm sleeping. I suppose moving up to a larger form factor would give me more choice in motherboards, maybe some that have both ECC support & 6x SATA connectors. Google tells me that AMD is much more lenient on ECC support...
 
Status
Not open for further replies.
Top