Best way to get started?

Status
Not open for further replies.

rt11

Dabbler
Joined
Oct 6, 2015
Messages
10
Ok, crappy title, but here's the thing:

I've been mucking about with a RocketRAID 2720SGL controller and 4 x 3TB WD Green disks in a RAID5 configuration. As you might expect, the Green disks don't do too well in RAID5 and although I got the setup to work (eventually, gah!) I don't actually trust it now. In part because the write performance is suspiciously poor, in part because I did a rebuild which took over a week to finish, and also I've read too many warnings about Idle3 and TLER.

So I'd like to give FreeNAS a try now, and I'm wondering what's the best way to proceed. Here's what I have:

- a spare motherboard (Asrock something, I forget)
- some i5 CPU
- 16 GB of RAM
- 3 x 80 GB Intel SSDs (hoping to leverage them for caching, maybe?)
- 4 x 3 TB WD30EZRX (Green) drives
- 2720SGL RAID controller which I'm hoping to use as a straight SATA controller (it has an additional 8 ports, the motherboard only has 4)

Now I could just jump straight in, but why not try to get a few pointers before wasting my time, you know? Cause you see I have questions!! Behold:

If I use the RAID controller without setting up any arrays, any attached disks will appear (at least in Win7) as regular SATA devices. However I'm worried that the controller might still decide to drop one of the drives if an error recovery takes longer than ten seconds (as it would in RAID mode.) Would anyone know what the expected behaviour for the 2720SGL is in this case?

Would it make sense to set up a hardware RAID0 for the SSDs? And if so, say I install FreeNAS to that array, will it be able to use the unused space for caching purposes or does FreeNAS need dedicated drives for that? Or should I use a whole 80 GB drive just for FreeNAS?

Without the 2720SGL I only have the four SATA ports on the motherboard but another option would be to use those for the HDDs, ditch the 2720SGL altogether, forego caching on the SSDs and boot FreeNAS from a USB drive. Is this advisable? How would performance be in such a setup, generally speaking, and noting that I'm on gigabit ethernet so I have no use for throughput above 130 MB/s.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
With only 16 GB of RAM, cache devices won't do you any good and will likely hurt performance. Your "another option" (ditch the RAID controller and use a USB stick as a boot device) is probably best. Even so, your hardware doesn't meet the recommendations for FreeNAS, in particular that it doesn't support ECC RAM.

If drives are dropping out, though, they may have problems. FreeNAS won't fix problems with your drives.
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
I started the same way as you. Windows 7 box as my "server" then switched some old hardware to FreeNAS, now I am buying real server grade equipment for my new box.

Most people boot FreeNAS from a USB drive and many of them mirror the installation on 2 of them.

Make sure you are using ECC RAM. It is documented everywhere why you should do this. If you mobo does not support it, get a new one.

I've seen people on here using the WD Green drives in their server. Just do the Idle3 fix so it doesn't park the head and you are set.
 

rt11

Dabbler
Joined
Oct 6, 2015
Messages
10
Gotcha. So I guess ECC is a big enough deal to get people all worked up. Whether it's really documented, though, Idunno about that. I couldn't find any attempts to actually quantify the danger of using non-ECC RAM. You know, to weigh it against the risks of not isolating your NAS server in an underground bunker with an active fire suppression system and armed guards watching over it 24/7. But for the sake of not having that argument, let's just say it's a genuine concern.

What sort of motherboard should I be looking for if I want ECC RAM support? I have a high-end i7 CPU I'd like to reuse, and though I can't recall the model I do know it wants to live in an LGA1156 socket. Sadly retailers don't like to list things like ECC support in specs which sucks for me. So, anyone know of any LGA1156 motherboards with ECC support? (Cheap ones of course. Cause I'm like that.)
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
I just check Newegg and WOW, they don't even sell 1156 motherboards anymore. A quick Amazon search turned up a Supermicro Intel X58. It supports up to 32GB of RAM which is more than enough for most people. I'd check to see how much DDR3 RAM is + that mobo. it might be worth it to see your old 1156 i7 and get a 2011 mobo with DDR4.

I went the cheap route before and it didn't bite me in the ares, but the FreeNAS was my second backup, so if I lost it, no big deal. I am moving to a FreeNAS system for my primary storage and I decided to buy the "right" equipment meaning server grade.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
i7 CPUs don't support ECC, but current i3s do. For motherboards, you'll be looking at a server board.

The cheapest way to get started with decent, server-grade hardware is with a Lenovo TS140 or a Dell T20--last I heard, the T20 was available under $200US. Both have server-grade boards which support ECC.
 

rt11

Dabbler
Joined
Oct 6, 2015
Messages
10
Aw bugger. That's a really nice CPU. :( I won't have any use for it then.

But I found this motherboard right here. Which isn't exactly server-grade, but it is dirt cheap and does support ECC memory. And it should work with a 6th-gen i3 CPU like this, right?

Altogether with 16 GB of DDR4 ECC RAM that works out to like $380 for me (in Europe where everything inexplicably costs 50% more than it should, mind you.) Which I guess is sort of acceptable. Either a TS140 or a T20 would cost at least twice that (upgraded to 16 GB) and I have no need for a chassis. So am I going completely the wrong way there? I know it's not "server grade" but it seems like a lot of value. Any idea how FreeNAS would feel running on a gaming board like that?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Will it work? Probably. Is it ideal? Not really. The problem with consumer-grade boards, and especially gaming boards, is that they include a lot of stuff that isn't needed for a server. Some of it can interfere with what is needed for the server. In the case of this board, the LAN looks like a complete disaster.
 

TheDubiousDubber

Contributor
Joined
Sep 11, 2014
Messages
193
If you are looking to spend the least amount I would suggest following danb35's suggestion. Look into a complete build or at least a barebones system. Trying to build your own from unproven parts (particularly a gaming board) can lead to headaches down the road. If you're set on building one instead of purchasing one then look into some older supermicro parts. The x8 boards can be had for under $50 and a xeon proc to go with it for even less. Then it's just a matter of finding ram which is likely the most expensive part in that case. PSU can be reused, though depending on the specs it may not be ideal, but could work for the time being. Cases also shouldn't really matter, just need to be able to hold the amount of drives you plan to use.
 

rt11

Dabbler
Joined
Oct 6, 2015
Messages
10
That would be second-hand, I take it? Because the cheapest X8 boards I can find are more like $500 than $50. Xeon processors aren't cheap, either. Keep in mind that various American outlets don't help me much, and Supermicro apparently aren't very big in Europe. (Although there's a wealth of second-hand options from US sellers on ebay etc., with postage it all works out super expensive in the end.) But yeah, I guess I could go second-hand, spend $150 or so on a refurbished Supermicro board maybe. Then I still need some discontinued Xeon CPU... it seems like a bit of a headache. :(

I keep coming back to the thought that a consumer-grade board ought to work "just fine." Obviously you have concerns, and I've seen similar concerns expressed in many other places, but they're always very vague, aren't they? Of course that gaming board has some scary looking LAN hardware, I agree, but even with that I imagine any issues I'd run into would be easier for me to deal with as I'm much more familiar with consumer-grade hardware. Maybe? As long as FreeNAS supports the hardware, I mean.

Anyway, I found this other MSI board which seems to be more or less a version of the gaming board without all those gaming features. Simpler is better, I take it, and it's even slightly cheaper. So that's my current favourite. I'd note that all I really need from the setup is this:

- R/W speeds high enough to saturate the LAN (i.e. 130+ MB/s)
- RAID5-like redundancy with reasonable rebuild times (so I can survive one disk failing completely every 24 hours or so)

I'd be very sad to lose the data I intend to store on the server but all the really important stuff is backed up on a remote server. So I think I'm leaning towards the consumer-grade stuff. Let me know if I'm kidding myself, though. In which case I'll get back to looking for these cheap barebones servers and Supermicro boards you all keep going on about. ;)
 

TheDubiousDubber

Contributor
Joined
Sep 11, 2014
Messages
193
Supermicro is a common choice so you're more likely to get help regarding certain issues than you are with a consumer board. So it seems to be a bit of a tradeoff, small headache now by going for certain hardware to avoid future potential headaches or risk it and go avoid the headache by going with what is readily available. There is no doubt that server grade hardware is going to make for a better server, but that's not to say you can't get by with other choices. I can only suggest you do lots of research on what is available at what cost before you make your decision.

The board I'm currently using is the X8SIL-F, there is a seller on eBay selling for under $50, so then you just have to factor shipping cost which I can't imagine would be twice the cost of the board. Then a Xeon x3440 which can be had for around $100 from a UK seller. Then you just need to find RAM which is far less specific and more readily available worldwide (at least I would imagine).

Just my thoughts. Good luck to you in whatever you decide.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
On the ECC thing--with any OS and any filesystem, if your data gets corrupted while it's in RAM, before it's written to disk, you will be SOL. The data on disk will be corrupt, and you won't have any way to recover it. To this universal risk, ZFS adds scrubs. When ZFS does a scrub, it reads all the data on your pool and compares the data to its checksum. If the data and the checksum don't match, the system tries to fix the data from your redundancy (mirror, parity, etc.). But if the reason that the data and the checksum don't match is because one of them has been read into a spot of bad RAM, the system has no hope of correcting this, and a significant chance of corrupting the data on disk that wasn't corrupt to begin with. This is why the lack of ECC RAM is a somewhat bigger deal with ZFS than it is with other filesystems. How much bigger? I don't think I've ever seen it quantified.

The folks here, by and large, care about their data, and assume that if you're going to be using FreeNAS, you must care about your data too. That's why we tend to be pretty conservative about things like ECC RAM--it is (in most cases) inexpensive, and it's very good protection for data in transit to your storage (that's also why we recommend Intel CPUs over AMD, Intel NICs over Realtek, etc.). If ECC isn't a live option for you, then it isn't, but just be aware that there is some, non-zero additional risk to your data from not having it.

We recommend server boards because they work, they have the features that are needed (and those features work well), they don't have many features that aren't needed, and they're stable. They also often have handy additional features like IPMI. In the US, where most (but by no means all) of us are, suitable hardware is available pretty inexpensively if you look (the TS140 and T20 previously mentioned being probably the best deals). If server-grade hardware isn't an option for whatever reason, yes, consumer-grade hardware will likely work as well. The problem you'll encounter is that you probably won't find much support for that hardware, because not too many of us us it or recommend it.

If you're going to go with consumer-grade hardware, in general, the less stuff you get built in to the board, the better. You really want an Intel NIC, and you don't want any sort of onboard RAID.
 

rt11

Dabbler
Joined
Oct 6, 2015
Messages
10
I understand the nature of the risk with non-ECC RAM (and ZFS in particular), but still, all risk should be quantified. If not, anything is a potential disaster and you have no basis for making rational decisions. Like, are you protected against lightning strike? I'm not, I think. But there are steps you could take to insulate your server from lightning, spending an arbitrary amount of money to feel arbitrarily secure that you won't lose your data to bad weather. But then there are all those other concerns too, like RAM corruption, or fire, or hacking, or physical theft of the hardware, or alien voodoo, or whatever. The list is as long as you want it to be, so you're absolutely forced to disregard some of the concerns you could have.

With regards to ECC, the FreeNAS community seem to have determined that the danger is very real, but I haven't found even anecdotal evidence to suggest that anyone has ever lost anything to faulty RAM chips. Maybe I haven't looked hard enough? But then again current Intel chipsets and i3s support it, which makes all the difference. Means the market is actually flooded with cheap, ECC-capable consumer hardware as it turns out, that MSI B150M-PRO DH with an i3-6100T being one example. So if I stay in the consumer range I may as well get an ECC motherboard and RAM for 10-20% extra cost.

Of course I still don't understand why, given the obvious importance of memory integrity, ZFS doesn't take measures to guard against RAM corruption. It seems that any time a checksum fails it would be easy to run a memory diagnostic on the buffer that holds the corrupt data before accepting that the drive caused the corruption. Should be a rare event that only takes a few microseconds, but it should give some (maybe most) of the protection ECC systems get to non-ECC systems. Also any idle time could be spent doing aggressive memory diagnostics, which would help guard against manufacturing imperfections (supposedly the primary cause of RAM glitches.) Eh, anyway...

So what's the deal with Intel NICs, then?

And second question, what's the deal with onboard RAID? Seems that many (all?) of the Supermicro boards have it. Obviously you wouldn't enable it, but is it a problem that it's there in the first place?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
In fact there's some members who have lost their pools because of bad RAM. The search feature on the forum is crappy so it's hard to find what we want but I know there's threads where members have this problem and I remember a post from jgreco with links to several of those threads but I can't find it.

The UPS (which you should have) should protect the server against lightning strikes. Hacking is pretty unlikely has you shouldn't expose your server on the WAN. For the others threats that's why we do backups :)

Of course I still don't understand why, given the obvious importance of memory integrity, ZFS doesn't take measures to guard against RAM corruption. It seems that any time a checksum fails it would be easy to run a memory diagnostic on the buffer that holds the corrupt data before accepting that the drive caused the corruption.

There's some problems with that like for example the OS which use the RAM can run on the corrupted part of the RAM and crash, same for the program to check the RAM. Also ZFS wasn't designed for cheap home servers but rather for enterprise grade servers so cost isn't really a problem.

So what's the deal with Intel NICs, then?

Intel NICs are the stablest (yeah, that's a word... :P), it's as simple as that.
 

rt11

Dabbler
Joined
Oct 6, 2015
Messages
10
In fact there's some members who have lost their pools because of bad RAM.

That's more than nothing, I guess. :) Still not really a quantified risk but I'll accept the need for ECC RAM anyway.

The UPS (which you should have) should protect the server against lightning strikes.

Against electrical surges via the power lines, at least. If lightning strikes right down through the server itself you're probably screwed either way. ;) It could happen.

Hacking is pretty unlikely has you shouldn't expose your server on the WAN.

The FreeNAS server might not be exposed but if it's on a LAN with other devices like Android phones and Windows PCs (even "Smart" TVs and similar junk) that connect to remote servers all the time for no good reason, well... you know. Even well-maintained Linux clients have potential vulnerabilities. So I think the point stands that you'd want to measure the risks before deciding which ones are most cost-efficient to address.

There's some problems with that like for example the OS which use the RAM can run on the corrupted part of the RAM and crash, same for the program to check the RAM. Also ZFS wasn't designed for cheap home servers but rather for enterprise grade servers so cost isn't really a problem.

Well no, sanity checks like that wouldn't altogether remove the need for ECC RAM in super-critical systems, but they might be "almost as good" for a large number of cases. That is, they would introduce a whole new range of hardware that's if not ideally suited to run FreeNAS then at least much less unsuited. And with such a small addition, a few hundred lines of code in principle (of course carefully implemented by someone who's very intimate with the code and who isn't me,) and such a huge number of existing consumer-grade systems that'll become more secure with a minor update... I mean, I know a bunch of developers who prefer to shrug these things off and say, "meh, it's not my fault if the hardware is faulty, I put my requirements in the specs so don't blame me if the user tries to cut corners, and besides if I can't trust a given component absolutely why even bother trying to work with it... " etc. Even so, if basic sanity checks are likely to prevent catastrophic failures in common setups... it just seems like a no-brainer -- unless the ECC issue is actually overblown.

As for code running in corrupted RAM, it's far more likely to crash (i.e. halt) the server than trash the file system. And extra sanity checks in key places can make the code even more sensitive and thus even more likely to crash before destroying anything. In fact, I haven't looked at the code but I imagine it's littered with redundant sanity checks already. It should be, anyway. ;)

(I'm only ranting cause I'm bored. Excuse me.)
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Against electrical surges via the power lines, at least. If lightning strikes right down through the server itself you're probably screwed either way. ;) It could happen.
Unless you have your NAS sitting outside in a field or a tree, the electrical surge is going to have to flow into your facility via some wiring. And it's worth pointing out that there is more than just the power path that an electrical surge can take. I've had a cable modem and the connected router both fried do to a close strike. Which reminds me, I need to go buy that optical ethernet isolator.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
So I think the point stands that you'd want to measure the risks before deciding which ones are most cost-efficient to address.

Exactly, that's why you don't put the NAS in a Faraday cage and full galvanic isolation of power and data cables to protect it against lightning but you still use ECC RAM to protect it against RAM corruption. The second one is fare more likely to happen and is far more easier and cheaper to take care of ;)

As for code running in corrupted RAM, it's far more likely to crash (i.e. halt) the server than trash the file system.

Hmmm, it's true for NTFS on Windows but with ZFS I'd not bet on that because ZFS checks every byte of data and as every byte goes through the RAM...
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
lack of ECC RAM is a somewhat bigger deal with ZFS than it is with other filesystems. How much bigger? I don't think I've ever seen it quantified.
Today I noticed that this case study is linked from section 1.2.1 of the 9.3.1 manual. Section 5 addresses in-memory data integrity and how ZFS responds to in-memory bit-flips. Section 6 attempts to quantify the risks of certain types of failure assuming a certain number of flipped bits. I've only skimmed it, but a few things caught my eye.
From section 5:
ZFS was not specifically designed to tolerate memory corruptions
ZFS fails to catch data block corruptions due to memory errors in both read and write experiments.
The window of vulnerability of blocks in the page cache is unbounded.
In summary, ZFS fails to detect and recover from many corruptions. Checksums in the page cache are not used to protect the integrity of blocks. Therefore, bad data blocks are returned to the user or written to disk. Moreover, corrupted metadata blocks are accessed by ZFS and lead to operation failure and system crashes.
From section 6:
In summary, when a single bit flip occurs, the chances of failure scenarios happening can not be ignored. Therefore, efforts should be made to preserve data integrity in memory and prevent these failures from happening.

Regarding this comment
I still don't understand why, given the obvious importance of memory integrity, ZFS doesn't take measures to guard against RAM corruption
the authors merely note that
ZFS could use the checksums inside block pointers in the page cache, update them on block updates, and verify the checksums on reads. However, this does incur an overhead in computation as well as some complexity in implementation; these are always the tradeoffs one has to make for reliability.

In other news, section 4 addresses on-disk data integrity, noting that
ZFS detects all corruptions due to the use of checksums. In our fault injection experiments on all metadata and data, we found that bad data was never returned to the user because ZFS was able to detect all corruptions due to the use of checksums in block pointers.

In other words, like any software, ZFS has strengths and weaknesses, and you can significantly mitigate (perhaps even eliminate) its greatest weakness by using ECC RAM.
 
Last edited:

rt11

Dabbler
Joined
Oct 6, 2015
Messages
10
Today I noticed that this case study is linked from section 1.2.1 of the 9.3.1 manual.

That's pretty interesting. Certainly a strong case for using ECC RAM with ZFS, but then I've already decided on that, so...

What strikes me, though, is that they reference this study here, but they don't carry over the distinction between correctable and uncorrectable errors, the former being what you're protected against using ECC memory. As the referenced study puts it:

About a third of machines and over 8% of DIMMs in our fleet saw at least one correctable error per year. Our per-DIMM rates of correctable errors translate to an average of 25,000–75,000 FIT (failures in time per billion hours of operation) per Mbit and a median FIT range of 778 – 25,000 per Mbit (median for DIMMs with errors), while previous studies report 200-5,000 FIT per Mbit. The number of correctable errors per DIMM is highly variable, with some DIMMs experiencing a huge number of errors, compared to others. The annual incidence of uncorrectable errors was 1.3% per machine and 0.22% per DIMM.

That's pretty scary, suggesting that with four DIMMs of ECC RAM, there's still a .9% chance per year of data corruption that ZFS will be more or less oblivious to. Still beats my wonky budget RAID card with WD Green drives, but even so them are some scary statistics. :eek:
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
with four DIMMs of ECC RAM, there's still a .9% chance per year of data corruption that ZFS will be more or less oblivious to
I don't think so. When ECC RAM encounters an uncorrectable error, the system is halted.
 
Status
Not open for further replies.
Top