Future-proof high capacity NAS configuration

JR Arseneau · Apr 8, 2014

Hi all,

I am by no means new to the world of storage. Having worked professionally in this environment for a while and having run my own NAS at home, I feel pretty comfortable navigating around these waters. I've used many enterprise-grade systems but I'm relatively new to ZFS (although I've poured over the literature). In fact, at the office, a few years back I lead the initiative to implement an Isilon scale-out NAS solution which has been purring since 2010.

I know there are a bunch of threads about various configuration paradigms, but I wanted to throw my hat into the ring and get some honest-to-good feedback from the resident experts in here. So here we go:

Goal
My goal is to create a NAS that will grow as my data needs grow. Very much like the Isilon (for those who are aware), you can add an extra storage node (a node is, for example, a 2U server with 12TB of space that gets pooled into the rest of the cluster, which only presents a huge block of contiguous space), I will periodically need to increase the capacity of this NAS. With the kind of space I will be dealing with, it will be difficult (or extremely costly from a monetary point of view) to "copy the data, destroy the zpool, rebuild the zpool and copy the data back".

History

2006: First home NAS running under Gentoo mdm raid5 (500GB disks). I expanded once or twice by adding a 500GB disk and growing the raid5. Capacity: 2.5TB
2009: Decided I wanted to mix-and-match disks while I grow (to save costs), migrated NAS to UnRAID. Capacity: 6TB
2012: Unhappy with the flakey performance of UnRAID, slow updates, inadequate plugin architecture and various other unpleasantries, I decided to move to Linux + FlexRAID. Capacity: 10TB
2013 (Dec): Unhappy with FlexRAID (it crashed A LOT under linux), had a few drives fail and the recovery of the data when I replaced the drive did not fully complete and also the author's decision to focus on Windows development first and treat Linux as a second-class citizen for his upcoming tRAID solution, I decided to move from FlexRAID to SnapRAID + mhddfs for Pooling various disks. Capacity: 18TB (it grew significantly in a year because I bought extra disks, thinking I'd use ZFS, but I got a bit worried, so settled on SnapRAID temporarily)

In the years since having a NAS, I've gone through 500GB disks, to 1.5TB disks, to 2TB disks to 3TB disks where I'm at now (4TB is still a bit cost prohibitive). In my current configuration, I have mostly 3TB drives and still use 2x2TB drives as well for a total of 18TB (15TB usable). Growing the SnapRAID+mhddfs pool is as easy as adding a new 3TB drive, adding the drive to the SnapRAID and mhddfs mount configuration and it's available. The problems with this are numerous however. Pooling solutions are notoriously unstable. They operate above the kernel, sometimes in FUSE (like mhddfs) and often conflict with services such as Samba, NFS, etc. Also, if mhddfs crashes, these services will often not know how to handle it, causing all kinds of other issues (stale fuse mount handles, unexpected drive drops, etc.)

Requirements

My requirements aren't many, I don't think they're unique, but here they are in order of priority.

A contiguous block of space without having to deal with various partitions/zpools/etc.
Ability to grow the the contiguous space as capacity is needed.
Bitrot protection (this has bit me in the ass a few times, especially with UnRAID and FlexRAID)
Ability to replace smaller drives with bigger ones. In other words, in order to meet #2, I don't want to have to have 30-40 drives. I had 4x1.5TB drives before, but they are no longer in use, replaced with 3TB drives last year (copied the data from the 1.5TB drives to the new 3TB drives and that was it).
Stability and maturity - it has to work. I've wasted so much time tinkering with mdm, UnRAID, FlexRAID, SnapRAID, oh my!
Speed isn't so much a factor, if I can saturate a GbE link, that's good enough for me.

Were I rich, I'd have an Isilon, but at 35k for 12TB, that's a bit outside my range. For those who aren't aware, You can have a 300TB cluster (all the space is contiguous), add a 24TB node and within seconds, you now have 324TB of contiguous space. It's as easy as plugging in an Infiniband and network cable and the cluster auto-balances itself. There is zero maintenance (I've essentially repurposed my storage analyst as a systems analyst because we didn't need to manage our storage anymore). But alas, this is not the case.
Questions on Configuration with ZFS

Sorry for being so long, but I've now gotten to where people here can help. Hopefully some people are in the same boat as I. I feel my situation is getting more difficult to manage as my space grows (which is why the SnapRAID + mhddfs is appealing, but flakey). If I create a ZFS zpool of 18-24TB, I am most likely stuck with that for a while. It will not be feasible for me to find 18-24TB of "extra" space to copy my data and rebuild the zpool.

I use Crashplan (cloud version) with a custom 448-bit encryption key and I backup my entire NAS to the cloud. Yes, currently, my entire 13TB of used space is backed up there. So I think I'm good for backups, but obviously, my ISP would probably shit a brick if I downloaded (or attempted to) 13TB in a month,let alone the overage costs.

Solution 1: Zpool comprised of multiple mirrored ZFS. There is a lot of wasted space, and every time you want to expand, it costs you 2 drives. However, you can (I believe, someone could confirm) "swap" out (expand) each of the mirrored vdev's with bigger disks. So if 1 of the vdev's in the zpool was 2x3TB, I think ZFS allows me to replace one of those with a 4TB, wait for resilver and then replace the other one with 4TB, resilver and by the end, I'll have gained 1TB of space. This solution doesn't provide the most IOPS, but should be able to max a GbE link. There is of course still a risk that both drives in the vdev could theoretically fail, taking the entire zpool with it. I think this would be rare (someone can correct me), and because it's a mirror, the resilver should be much quicker.
Solution 2: Zpool comprised of multiple RAIDZ2 vdev's. This means less "wasted" space, more redundancy. However, expanding means I have to add (if I want to be consistent, seeing as you can't convert a mirrored or raidz1 vdev to raidz2) a minimum of 4 or ideally 6 extra disks. This means if I have a zpool of 6x3TB (~12TB usable), I'd have to buy 6 more drives to expand (at current pricing of appx $140 for 3TB, that's $840 to expand - ouch). In a few years, as the 3TB's get older, I could replace with 5 or 6TB drives, but again, the expansion isn't amortized, it's in large chunks.

In the long run, Solution #2 gives you more storage, more redundancy, but will most likely also use more disks and costs more in chunks (although less over the span of the entire life of the spool). I don't believe creating a zpool comprised of various RAIDZ1 vdev's would make sense, which is why I have not mentioned it. I am aware of the apparent risks. Especially dealing with 3TB or 4TB drives. The risk of having another failure during a resilver are higher. I guess multiple RAIDZ1 vdevs comprises of 3 disks (2+1) could be an option.
Any experts care to chime in? Am I missing something obvious here? Is ZFS possibly the wrong solution for what I'm trying to do? I think were BTRFS more mature, it would be a better "fit" for me. But RAID5/6 is experimental, likely won't make it into a "stable" kernel this year and even when it does go stable, it won't have the maturity ZFS has. I also like how ZFS doesn't really have the concept of partitions, and allows me to change the configuration (dedupe, compression) for each data set.
Many thanks!
Cheers,
JR

johnblanker · Apr 9, 2014

I can't really help you out but I am curious as to the negative experiences you had with unRAID. I am considering it as an alternative to my freeNAS setup. Check out this post here if you have time. I would welcome your opinions. Thanks.

SweetAndLow · Apr 9, 2014

First of all I have experience with how Isilon's NAS works and what its feature set is. When I started looking into ZFS and Freenas I was surprised to see how limiting it was when it comes to disk pools and data distribution. With that knowledge I chose to go the route of creating a 6x3TB disk RaidZ2. With the plan to expand with another 6x4TB disk RaidZ2 if and when I need more space. With this solution I don't plan on increasing the disk size of the first pool by replacing each disk with 4TB drives. I think having to buy 6 new drives just to get a couple extra TB by replacing older drives doesn't make sense and would be better off creating a entire new pool with those drives.

I have been running my server for a couple months now and once it is setup there is very little management to be done. With adding more storage you will have to do some work because you need to plug in the HDD's. One other thing to think about is as the zpool gets filled up performance will start to die off so if you have a pool that is 80% full and then you add another pool it would make sense to balance those pools. To do this you will need to do it by hand since there isn't a system built in to do that for you.

HoneyBadger · Apr 9, 2014

Hi JR,

Being well versed in the storage world, you're probably aware that what you're asking for re: the easy expansion is falling a lot more on the enterprise side of things rather than the consumer-grade world. That means "more money" naturally. I'll try to provide more details, but you can very easily build a solution along the lines of the major commercial vendors, with a compute and RAM-heavy "head unit" and then attach multiple shelves of disk via external SAS connectors. FreeNAS is SAS-multipath aware; you could probably even use a pair of external HBAs to provide fully redundant links given the right shelves (but I haven't tested that specifically myself.) With this, you could have 12 disks per shelf, set them up fairly easily as vdevs, and then join them to the main pool, controlled by a 1U/2U head server.

Of course, that's all assuming you're willing to put rackmount gear in your house and deal with the noise, heat, and power consumption. As well as buy NL-SAS drives so that they play nice with daisy-chained expanders and multipath. There's other gotchas in this setup as well, not the least of it being "price."

Regarding your mirrors vs RAIDZ2 - the mirrors will actually provide the best random I/O, but since you're only looking at a single GbE line, either will be just fine except if you're serving virtualization workloads. Mirrors are both more and less resilient - in a hypothetical 12-drive setup, mirrors could lose "up to" 6 drives if only one drive from each vdev fails. But as you noted, if you lose two from the same, there goes the pool. A 12-drive setup of two 6-drive RAIDZ2 vdevs would be able to handle "up to" 4 drives only, but of each vdev, any two could fail and you'd still be okay (albeit degraded).

JR Arseneau · Apr 9, 2014

johnblanker said:
I can't really help you out but I am curious as to the negative experiences you had with unRAID. I am considering it as an alternative to my freeNAS setup. Check out this post here if you have time. I would welcome your opinions. Thanks.

Solutions like UnRAID, FlexRAID, SnapRAID are very interesting and down right alluring for someone who wants to create a big pool of storage and grow that space as time progresses and space requirements increase. There's not worrying about rebalancing your pool, destroying/creating, etc.

However, there are also come caveats to UnRAID, and these may or may not be important depending on your use case:

Only 1 parity (I think there's a plan for more, but with the author, who knows)
No bit rot protection - when parity is calculated or written, if there are bad blocks, these will be written to parity.
ReiserFS is old
If you're using it solely as a storage appliance with no plugins, it isn't bad. I run many things (NFS, SMB, Time Machine, Plex, web server, etc.) and UnRAID falls apart.
It's running slackware (really?)
Performance is an issue. A Cache drive will help, but I've had occasions where the cache contents weren't written to the drives.
There is an active community, but most things require mucking around.
*IF* You run any services, you have to ensure these services start AFTER the pool has started. If the pool crashes (it happens), these services will be orphaned. They may (depending on the service) write arbitrary data to a mount point when the pool isn't mounted and block the pool from re-mounting because the folder isn't empty (this is especially nasty if you don't realize it, because the pool won't start and you have no idea why - but it's because a service like Plex or web or something else wrote data to a non-mounted pool folder).

Again, there is a lot to like for these appliance solutions, but of all of them (especially if you can work around a Linux machine), do yourself a favour and go with SnapRAID + mhddfs (http://snapraid.sourceforge.net).

JR Arseneau · Apr 9, 2014

SweetAndLow said:
I have been running my server for a couple months now and once it is setup there is very little management to be done. With adding more storage you will have to do some work because you need to plug in the HDD's. One other thing to think about is as the zpool gets filled up performance will start to die off so if you have a pool that is 80% full and then you add another pool it would make sense to balance those pools. To do this you will need to do it by hand since there isn't a system built in to do that for you.

I'm glad you mentioned this because I forgot to in my post.

How real is fragmentation? Since starting to use FlexRAID (when I had 1.5 and 2TB drives), I've since added 3TB drives, replaced some 1.5TB and 2TB drives, but I've never, ever had to copy the data off and restart from scratch.

For data needs, I won't be having a whole ton of write operations. 90% of the space is used by media that never changes. Once it's copied, it doesn't move.

The idea is that if I start building a zpool (Solution 1) with multiple mirrored vdev's, I would want to expand indefinitely as I needed more space. So let's say I get to 80% capacity, I buy 2x4TB drives and add 4TB to my zpool, rinse and repeat for the next 5+ years. As disk capacity increases, I replace the older, smaller drives with bigger ones.

If I did this back when I had 500GB drives (mirrored vdev's), I could theoretically have replaced those with 1.5TB, then with 3TB, without necessarily increasing the quantity of spindles in the zpool. For example, with 6 drives:

2007: 3x500GB mirrored vdevs == 1.5TB of space
2009: Replace all 500, 3x1.5TB mirrored vdevs == 4.5TB of space
2013: Replace all 1.5TB's, 3x3TB mirrored vdevs == 9TB of space

Of course, all this requires replacing the 6 drives (which is also costly), or of course, I could just add another vdev of 1.5TB or 3TB to a maximum of X drives I want in my zpool (let's say my case can support 24 drives, that's 12 vdev's)

JR Arseneau · Apr 9, 2014

HoneyBadger said:
Hi JR,

Being well versed in the storage world, you're probably aware that what you're asking for re: the easy expansion is falling a lot more on the enterprise side of things rather than the consumer-grade world. That means "more money" naturally. I'll try to provide more details, but you can very easily build a solution along the lines of the major commercial vendors, with a compute and RAM-heavy "head unit" and then attach multiple shelves of disk via external SAS connectors. FreeNAS is SAS-multipath aware; you could probably even use a pair of external HBAs to provide fully redundant links given the right shelves (but I haven't tested that specifically myself.) With this, you could have 12 disks per shelf, set them up fairly easily as vdevs, and then join them to the main pool, controlled by a 1U/2U head server.

Of course, that's all assuming you're willing to put rackmount gear in your house and deal with the noise, heat, and power consumption. As well as buy NL-SAS drives so that they play nice with daisy-chained expanders and multipath. There's other gotchas in this setup as well, not the least of it being "price."

Regarding your mirrors vs RAIDZ2 - the mirrors will actually provide the best random I/O, but since you're only looking at a single GbE line, either will be just fine except if you're serving virtualization workloads. Mirrors are both more and less resilient - in a hypothetical 12-drive setup, mirrors could lose "up to" 6 drives if only one drive from each vdev fails. But as you noted, if you lose two from the same, there goes the pool. A 12-drive setup of two 6-drive RAIDZ2 vdevs would be able to handle "up to" 4 drives only, but of each vdev, any two could fail and you'd still be okay (albeit degraded).

Hey HoneyBadger,

Thanks for the detailed response. While I manage this type of infrastructure professionally, at home my needs aren't anywhere near what I use at the office.

Most of my data is static media that never gets deleted (just accumulates - photos, videos, music). I would say 90% of it (currently 13TB used space) is never-changing media that is written once and never moved. The other 10% is backup, home folders, and MAYBE (unlikely, but nothing requiring high-throughput) a few ESXi VM's. Again, the entire contents of my RAID are currently being sent to Crashplan Cloud, so I could suffer a complete catastrophic loss, but ideally I would want to prevent that if at all possible.

I will eventually have a 24 bay storage chassis to keep all this, which means a few options:

1. 4x6 disk vdev's in RAIDZ2 ("losing" 8 disks of the 24)
2. 12x2 disk mirrored vdev's ("losing" 12 disks of the 24, or 50%)

#1 provides bigger upfront and replacement costs, provides better resiliency, increases Write IO, decreases read IO
#2 provides lower costs up front and replacement, but more long-term cost (higher $/GB), lower resiliency, but decreased Write IO, and increased (significant?) Read IO.

HoneyBadger · Apr 10, 2014

No problem at all; FreeNAS/ZFS in general are capable of that "enterprise-grade" functionality, but you'll pay "enterprise-grade" money for it. ;)

For your usage pattern (media collection and streaming) you definitely want to optimize for capacity over speed. As I mentioned before, a single GbE line can get saturated very easily; my tiny little two-drive home server can do about 90MB/s read, 70MB/s write (over CIFS)

In terms of "performance" the question boils down to random I/O vs sequential. RAIDZ levels are bad for virtualization because of the small random I/O, but they're great for sequential things like media streaming or backups since there's more spindles to stripe data across. In your situation I'd absolutely save up and start with a 6-drive RAIDZ2 setup.

johnblanker · Apr 10, 2014

Hi JR, I looked at snapRAID and it seems it has everything I want, except for the fact it's command-line driven and it seems to need an OS to run inside(?). At this time, I can't really invest in learning command line Linux.

So, I was reading this thread on Windows 8.1 ReFS file system and "Storage Spaces". VERY INTERESTING. The guy even had great things to say about freeNAS, but did condemn its expensive hardware requirements. He had all great things to say about SS but did seem to leave out the fact that someone using it in a NAS has the added project of maintaining a full-fledged OS, (and a crappy one I might add, IMO).

It has self-healing similar to ZFS, but only in mirror, no parity (RAID 5)
All drives spin up when accessing a single file :(
Can add/swap/remove files to the pool at will without the need to rebuild the entire pool.
Files are still accessible with a crashed drive. (Mirror only i think)
Can take a HDD out, put it in any windows 8 machine and retrieve the contents. (I think this might only apply to mirror raid though).
He did mention sub-par write performance with parity setups (around 20MB p/s).

interesting. Here is the article.

HoneyBadger · Apr 10, 2014

That article was painful to read, from calling FreeNAS a "Linux distribution" to touting up all the awesome features you'll eventually get from ReFS (which you can have right now with ZFS) to complaining about the "high cost" of 16GB of RAM.

johnblanker · Apr 10, 2014

yes, I saw that too. The HW requirements needed are a lot of money for a newbie to digest. All that server-grade equipment is not cheap. 2x the $ of normal desktop pc builds. Don't get me wrong, I think it is justified. But I think a lot of newbs (Like myself) assumed you could use leftover, old parts with freeNAS. Isn't that ECC memory more expensive that non-ecc? Maybe it was just for my aging board. I'm looking at >$200 for only 8GB max for my board.

johnblanker · Apr 10, 2014

Maybe I found it so interesting because I understood everything he was talking about!

Once they get into that VM crap it's all over my head and I'm lost!

Important Announcement for the TrueNAS Community.

Future-proof high capacity NAS configuration

JR Arseneau

Cadet

johnblanker

Explorer

SweetAndLow

Sweet'NASty

HoneyBadger

actually does care

JR Arseneau

Cadet

JR Arseneau

Cadet

JR Arseneau

Cadet

HoneyBadger

actually does care

johnblanker

Explorer

HoneyBadger

actually does care

johnblanker

Explorer

johnblanker

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Future-proof high capacity NAS configuration

Cadet

Explorer

Sweet'NASty

actually does care

Cadet

Cadet

Cadet

actually does care

Explorer

actually does care

Explorer

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Future-proof high capacity NAS configuration"

Similar threads