Replacing an Awful Storage Server

emoulton · Apr 30, 2015

Hello, I'm Eli, previously the builder of a terrible, (but very, very lucky), storage array. Nineteen 1TB SATA disks in a Windows RAID5 set. It runs on 64 bit Windows XP, uses SATA port multipliers, and miraculously has never dropped a drive in the 6 years it's been alive. This monstrosity must be replaced!

I'm quite new to FreeBSD and ZFS. After doing lot of reading on the subject of building and setting up a storage server using ZFS, I've decided to go with FreeNAS. Hardware wise, this build is one of the more ambitious ones that I've done, and my lack of experience with FreeBSD and ZFS has made a somewhat nerve wracking build all the more stressful. I would appreciate any and all comments and advice that anyone may have regarding this build. (Especially the configuration).

First, the hardware:

Chassis: Supermicro SC846XE26-R1K28B
Motherboard: Supermicro X10SRA-F-O
CPU: Intel Xeon E5-1620 v3 3.5GHz
Backplane: Supermicro SAS3-846EL2
RAM: 4x Samsung M386A4G40DM0-CPB 32GB (128GB total)
HBA: 2x LSI 9207-4i4e (Flashed to P16 firmware)
OS Drive: Intel SSDSC2BF120H501 120GB SSD
Storage Drives: 24x WD RE WD4001FYYG 4GB Nearline SAS
L2ARC Drives: 2x Intel SSDSC2BA400G401 400GB SSD
NIC: Emulex OCe11102-NT Dual Port 10GBaseT
Switch: Cisco SG500XG-8F8T 16 Port (8 copper and 8 SFP) 10GbE

The software and configuration:

I'm running FreeNAS-9.3-STABLE-201504152200. I've got all 24 disks setup as a single volume containing four 6 disk RAIDZ2 sets. The two 400GB SSDs are being used for L2ARC for the volume. My main concern about how I configured things has to do with a non-standard swap setup, and a manual setup of SAS multipathing.

I didn't want swap on the storage pool, so I created a swap file on the OS disk. The file is stored at /root/.swap0, and I configured it to be mounted using an rc.conf tunable in the web GUI. It survives reboots, but it would be good to know if this might become an issue later.

As for the multipath setup, FreeNAS initially set it up automatically, but it was in active/passive mode, and the backplane supports active/active. I deleted the multipath setup from the command line, and manually reconfigured it. From what I could find out, it looks like the multipath setup in FreeBSD writes some metadata to the disks that stores the configuration; I'm hoping it'll be persistent. It has survived reboots, but once again, I'd like to know if this could cause me problems later.

I configured the ZFS volume on the command line so that I could make sure that the drives that I wanted in a particular RAIDZ2 set were the right ones. As far as I can tell, this shouldn't be a problem as the configuration is stored on the drives.

This system will be mainly used to store compressed video. It'll pretty much be seeing only sequential reads and writes from two 10GbE workstations, and two or three 1GbE servers. I've read that enabling lz4 compression costs almost nothing, and might as well be enabled even if the files being stored aren't very compressible. Given the workload I've described, what would you recommend?

I haven't setup the 10GbE interfaces yet, though I'll probably be doing that today.

I'd like to finish up this post by mentioning how useful I've found these forums while doing all my research. The many excellent guides, and discussions I've seen here have been a great boon. Thanks for that everyone!

cyberjock · Apr 30, 2015

emoulton said:
I didn't want swap on the storage pool, so I created a swap file on the OS disk. The file is stored at /root/.swap0, and I configured it to be mounted using an rc.conf tunable in the web GUI. It survives reboots, but it would be good to know if this might become an issue later.

I liked everything I read until I read this. I do NOT recommend this, at all. Those 2GB swap partitions have saved many people, and can be a savior if you have to replace one of those 4TB drives and the replacement is slightly smaller. Do not do this unless you want to risk it backfiring... badly.

emoulton said:
As for the multipath setup, FreeNAS initially set it up automatically, but it was in active/passive mode, and the backplane supports active/active. I deleted the multipath setup from the command line, and manually reconfigured it. From what I could find out, it looks like the multipath setup in FreeBSD writes some metadata to the disks that stores the configuration; I'm hoping it'll be persistent. It has survived reboots, but once again, I'd like to know if this could cause me problems later.

I don't recommend you change this. There's not much to be gained from going with active/active unless you've got very poor bottlenecks on the SAS channels. I'd highly recommend you stick with the default setup. The main reason for this is there is no guarantee that things might not go horribly sideways on some future update. Going with FreeNAS for your OS is accepting it as an appliance. You should use it as it is designed and not try to deviate from it because you want something different. If the defaults are not acceptable you should probably switch to full-fledge FreeBSD.

emoulton said:
I configured the ZFS volume on the command line so that I could make sure that the drives that I wanted in a particular RAIDZ2 set were the right ones. As far as I can tell, this shouldn't be a problem as the configuration is stored on the drives.

See what I just wrote about doing things on your own. VERY VERY BAD idea and I'd recommend you reconfigure your zpool right now, using the WebGUI.

You sound like a tinkerer (it's okay if you are). But you need to either check the tinkerer inhibitions at the door or you'll probably find that one day something somewhere went horribly wrong and you can't access your data anymore.

Stick with the defaults. FreeNAS devs have done what they have done to make sure you aren't having to do stuff yourself. As soon as you start doing stuff yourself, you are deviating from the design considerations the devs have already made. Being that you are new to FreeBSD, this is probably a really bad idea. Even I don't do stuff like disable swap, make my zpool from the CLI, or mess with active/active multipathing. I know better, and I could probably get myself out of almost any problem I could get myself into. :)

If you go down this road and don't take this advice, expect that if you have any kind of question about poor performance, problems with FreeNAS, etc it will become obvious pretty quickly you've deviated from the recommendations. Everyone will quickly abandon you and you'll be on your own. It's easy to make assumptions when everyone uses FreeNAS properly. It's also easy to ignore people when someone like to break assumptions and ignore recommendations because it's just not worth the forum's effort to try to figure out how to undo whatever madness you've brought on yourself.

I've clearly explained this in my noobie guide. If you don't want to take my advice right now, don't expect to get my advice when things go different. You will have already told me that my advice means nothing, which means I do not need to spend more time on you when there's a problem. ;)

SweetAndLow · May 1, 2015

What cyber said^.

emoulton · May 1, 2015

Hi Cyberjock,

I'll tell you right now that, regardless of what I end up doing, your advice is appreciated. Thank you for spending the time that you already have.

I'm glad that you mentioned how the swap partitions can be helpful when replacing failed drives. I now remember reading something about how you should partition drives that will be used for ZFS, (with FreeBSD or Linux), for the same reason. I had forgotten about that, and used the drives directly when I setup the pool on the command line.

I've reconfigured the pool with 2GB swap partitions enabled. I've done so using the GUI as you suggest. I was uncertain before about how I could setup what drives were in what RAIDZ2 set using the GUI. I'm not sure why I didn't notice the “Volume to Extend” option before, but I thought I was going to end up with four pools setting it up through the GUI, which is why I ended up on the command line. (I want particular drives in particular RAIDZ2 so that drives manufactured on the same date aren't grouped into the same RAID set).

Finally, the multipathing: The active/active configuration shows up in the GUI under view multipaths correctly. The GUI allows me to configure volumes using the multipath devices, and I'm fairly certain that it's setup correctly. (I used camcontrol to verify that each drive I used to create the multipath device had matching serial numbers). It has also survived the update that was released today. Having said all that: I don't currently need the bandwidth that the active/active configuration provides. However, I may need it in the future if I end up adding more disks via another chassis using the cascading ports on the backplane. Even so, I suppose that I'd still have a bottleneck at the 10GbE ports unless I add another NIC. If I stick with FreeNAS, and down the road I find out that I do need the active/active configuration, would it be better to have that configuration in place now, or to change it when it's needed while there is data on the volume that I can't lose?

cyberjock · May 1, 2015

emoulton said:
Finally, the multipathing: The active/active configuration shows up in the GUI under view multipaths correctly. The GUI allows me to configure volumes using the multipath devices, and I'm fairly certain that it's setup correctly. (I used camcontrol to verify that each drive I used to create the multipath device had matching serial numbers). It has also survived the update that was released today. Having said all that: I don't currently need the bandwidth that the active/active configuration provides. However, I may need it in the future if I end up adding more disks via another chassis using the cascading ports on the backplane. Even so, I suppose that I'd still have a bottleneck at the 10GbE ports unless I add another NIC. If I stick with FreeNAS, and down the road I find out that I do need the active/active configuration, would it be better to have that configuration in place now, or to change it when it's needed while there is data on the volume that I can't lose?

Well, I have two thoughts on what I'd recommend:

#1 - Stick with active/passive because that's pretty much all that FreeNAS/TrueNAS has ever used from what I've seen. You probably don't want to be doing things that haven't been heavily tested by other users, so active/passive is the way to go. This also alleviates the possibility of problems with active/active and upgrades, software conflicts with active/active, etc. active/active is literally untested by the community, so you're totally on your own (boo) if you go with active/active.
#2 - Stick with the mode that you are going to need long term (keyword: need). The problem with this is that if you have a problem, the solution you are going to be told is to go active/passive just like FreeNAS is already designed to do. So why go active/active if any problem is going to be resolved with "use it as designed you knucklehead". As you are building this yourself nobody is going to be around to support you if things go badly. If you call iX they're going to go "what... you customized the config from what FreeNAS does? Well, good luck." It just won't end well if there is ever a problem. Best case if there is a problem is you can fix it by rolling back the boot environment. Worst case, things go sideways and you lose the pool. I will tell you from experience that losing really large pools is more of a pain in the butt than you *ever* want to deal with. Recovery is time consuming, sometimes career limiting, and often impossible as "that big pool was our backups!"

I also find it very very unlikely you're every going to legitimately need more than 10Gb throughput. I've worked with a lot of servers over the last 3 years. Only 3 people I have worked with actually needed (not wanted, but had a legitimate need) for that kind of throughput. One of them I've worked with this week. Everyone thinks they need lots of speed, low latency, etc. But how many actually have a legitimate need for it? Very, very few. Everyone wants the highest specs, the fastest bandwidth, the lowest latency. But you really have to put things in perspective and decide what is actually needed versus what is wanted.

To be honest, if you are trying to build this big badass server that needs lots of bandwidth, just go buy a TrueNAS or FreeNAS certified from iXsystems and let *them* be responsible for making your stuff work right. When something is too slow there is a problem, you get to blame them and make them fix it. Better to make someone else fix stuff that breaks than have stuff break that you are incapable of fixing and nobody else is going to touch it without a large up-front check and no guarantee of an actual fix. Most contractors that I've talked to that do one-off stuff for FreeNAS users start at $150/hour, and they do not accept all customers and they cannot always guarantee a good solution to the problem. They don't handle the FreeNAS code, they aren't paid to handle FreeNAS code, and so you are, at some point, at the whims of the developers. If they code stuff that breaks things for you, you aren't going to get much sympathy from them.

Famous last words I generally see are from people that say one or more of the following:

My boss is making me...
I have to do this with what I have...
I have to do this for cheap...
I'm new to FreeBSD or FreeNAS, but....

On the other hand, if you have the kind of balls it takes to do this and risk ending up unemployed (and potentially unemployable in IT because of how famous your screwup was) then go for it. To me, I don't do try to get involved with projects that can cause that kind of thing. Doesn't make sense for you, as the employee, to have a company expect *you* to build a server (thereby saving them lots of money) but when things go sideways, *you* get escorted out the front door. I'm not a fan of "all responsibility and none of the rewards".

emoulton · May 2, 2015

You're right, I probably don't need that much bandwidth, and latency isn't an issue, unless it's getting above, say, 2 or 3 seconds. Only two machines will have 10GbE connections to the server. They are video editing rigs, and their users would have to be using several layers of uncompressed HD footage to get near that 10Gb/s, and most of the clips they are working with are compressed and need only 100Mb/s each. Everything else will use 1GbE connections to the storage server. I'll change it back to active/passive. (I'll be destroying the pool once more, and deleting the manual multipathing setup I've done. Not just changing the config as I used non-standard names for the multipathing labels). Thanks for talking some sense into me.

Some irrelevant (as far as my configuration is concerned) stuff I think I've learned about FreeNAS and SAS multipathing in case anyone is interested, and/or wants to discuss it/correct me, etc:

After putting up my last post, I was curious about how FreeNAS handled creating the multipath setup. The manual mentions that it will automatically detect active/active and active/passive capable hardware, and I wanted to know why FreeNAS ended up configuring it as active/passive. After searching around to find out if there was a standard way of querying a SAS expander for multipathing capabilities in FreeBSD, it looks like there is no such thing. I then spent a bit of time reading through the FreeNAS source to satisfy my curiosity about how it detects multipathing capabilities.

I'm putting an important disclaimer here in case anyone else reads this and assumes I know what I'm talking about: I don't. The following is all supposition based upon my knowledge of programming languages that aren't Python. (My being able to learn anything from the code is a testament to how well commented and written it is). While I think the following is where this is done in the FreeNAS code, I can't be sure of that without spending WAY more time than the 30 minutes or so that I have reading FreeNAS code. Take all of this with a 30 ton boulder of salt.

Having gotten that out of the way: It looks like it checks the disk devices to see if there are any that have matching serial numbers, and then sets up the multipath configuration if they match, and the device isn't in use. There is a function that is called to check if it should be setup as active/active. If the function returns true, then it'll do an active/active setup. If it's false, then it does active/passive. Here's the function that does that check:

def _multipath_is_active(self, name, geom):

return False

So, if I'm correct, there is why FreeNAS, (and probably TrueNAS), always ends up with an active/passive multipathing setup.

I also learned that while the labels I used when setting up the multipathing manually should be okay, as the current code seems to import the multipath devices without making any assumptions about their names, it's probably a bad idea to do so. And not just for the excellent reasons cyberjock gives, but also because the automatically generated names in the code, (disk0, disk1, etc...), are hardcoded, and it's not impossible that the devs might rely on that later for importing the configuration. (I suppose that this is one of the excellent reasons cyberjock gives against going around the GUI after all).

cyberjock · May 2, 2015

I cried a little when I read that post. That was beautiful!

emoulton · May 6, 2015

It's looking good!

I've got the two machines with 10GbE interfaces connected, and have finished migrating the data over from one of them. (Data from the other one is still pending). I'm surprised at how much compression I'm getting considering most of the data is MPEG2 and H.264 encoded video. (It's about 1.22x right now with about 2.1TB moved over, which is pretty decent).

Performance is decent. I'm not saturating the 10GbE links, but I'm doing a lot better than 1GbE. I can't really say just how quick, as all of the real world numbers I have are limited to the speed of the drives I'm copying the data from. I did run an artificial sequential read/write benchmark and got ~600MB/s read and ~300MB/s write, which is a huge improvement over what those machines were getting off of their local volumes. Opening an Adobe Premiere project file shows ~50% utilization of the 10GbE interface which seems to agree with the ~600MB/s read speed from the benchmark.

cyberjock · May 6, 2015

Glad to see it is working for you. :)

Important Announcement for the TrueNAS Community.

Replacing an Awful Storage Server

emoulton

Cadet

cyberjock

Inactive Account

SweetAndLow

Sweet'NASty

emoulton

Cadet

cyberjock

Inactive Account

emoulton

Cadet

cyberjock

Inactive Account

emoulton

Cadet

cyberjock

Inactive Account

Similar threads

Important Announcement for the TrueNAS Community.

Replacing an Awful Storage Server

Cadet

Inactive Account

Sweet'NASty

Cadet

Inactive Account

Cadet

Inactive Account

Cadet

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replacing an Awful Storage Server"

Similar threads