Clustering services?

mediahound · Mar 28, 2013

As good as ZFS is i'm a little frustrated with some of it's limitations.

Not file system limitations, more like RAM limitations to start. :P Having the ability to store a zetabyte sized file if I require an exabyte of RAM (that 1:1000 ratio) seems to be the first real bottleneck. It sounds like much over 32TB is unknown territory, and much over 16 physical drives in a system seems to also be pretty rare and require some seriously specialized knowledge to make work at all. Surely there must be a way to get a group of 2 or more computers all operate a single zpool of storage visible to the outside, while coordinating in such a way that maybe the RAM requirement ends up just being "per hardware machine" for instance..? I wouldn't mind having 96gigs of RAM for a 100TB array but there aren't many motherboards allowing that yet, and stretching things any further borders on impossible even if you can budget for the physical drives to store the data. I don't think this 'concern about limitation' is frivolous either since each new CPU revision only seems to double the allowed RAM anymore yet hard drives have done more than double, and the enterprise class needs of potential future ZFS users will also likely encounter these limits as well I would assume.

There are separate clustering programs out there to provide fault tolerant storage for data, but by what I can tell they would all work "outside" of all the benefits of ZFS - meaning once you put it in charge of say seeing that data is present in at least 2-3 locations of a large cluster of dumb disk nodes you open the door right back to silent data corruption, failed writes, and all the rest all over again. Yet even if you had a single 'master' computer in charge of a monster zpool made up of many machines, if it maintains the 1:1000 ram to storage ratio to manage all that, it still wont work without the ability to distribute that load to the more local level of machines, whether it's some kind of top down hierarchy to coordinate this, or some kind of peer to peer hierarchy or both together.

So as of right now, are there any credible discussions going on about some kind of a clustering ability that would work around those bottlenecks? Just like "ZFS loves cheap disks" not needing RAID card, something along the line of "ZFS-cluster loves cheap computers" not demanding server class hardware, providing much better 'catstrophe protection' where physically separate servers, either on the same LAN in a different room or even far away over the internet, could coordinate to prevent silent data corruption while reducing sysadmin hassles? To where expanding the storage pool could be done as readily by hooking up another NAS box to the LAN as it could be adding drives to a single NAS? In a way the ZFS philosophy almost works better on a clustering level than on a single computer and seems like a natural extension or something which shouldn't be too hard to hook code into since it already does such a good job preventing data loss on the more local level.

In part i'm wondering what kind of strategies could be used to maintain an "always in ZFS" data chain after the data leaves the specific computer - going over the network, going onto a USB drive then back onto another machine, etc. Once the data leaves ZFS it can be corrupted, the new checksum might not match the old if corruption occurs en route and were back to the problem we've always had so it would seem one way or another, even if most of the work would be up to other projects to interface from their end, there should still be good ways for ZFS to provide data 'upstream' that computer to be used as such. (or maybe it already does - i'm not a programmer or sysadmin afterall, just thinking out loud about what my feature request is so i dont have to become either :P)

cyberjock · Mar 28, 2013

mediahound said:
As good as ZFS is i'm a little frustrated with some of it's limitations.

Not file system limitations, more like RAM limitations to start. :P Having the ability to store a zetabyte sized file if I require an exabyte of RAM (that 1:1000 ratio) seems to be the first real bottleneck. It sounds like much over 32TB is unknown territory, and much over 16 physical drives in a system seems to also be pretty rare and require some seriously specialized knowledge to make work at all. I wouldn't mind having 96gigs of RAM for a 100TB array but there aren't many motherboards allowing that yet, and stretching things any further borders on impossible even if you can budget for the physical drives to store the data. I don't think this 'concern about limitation' is frivolous either since each new CPU revision only seems to double the allowed RAM anymore yet hard drives have done more than double, and the enterprise class needs of potential future ZFS users will also likely encounter these limits as well I would assume.

I completely disagree....

In order to have a zettabyte of hard drive space, you'd need 1 billion 1TB hard drives! So no, I don't consider that a limitation I'll see in the next 20 years(let alone in my lifetime) so I'm not sure why you even consider this a limitation.

There are quite a few motherboards that support 128GB of RAM. But you are talking BIG BIG dollars, and even more if you order incompatible hardware.

And why you make the comment that each new CPU generation only doubles the amount of RAM, the first 4TB hard drive shipped Dec 2011(more than 2 years ago) and there hasn't been any larger drive yet. But CPU generations have continued to plug along just fine.

There are plenty of people that have extensive experience with very large systems, but you won't find them here frequently because the people that work with very large systems wouldn't waste their time with such an "amateur" project like FreeNAS. They probably use FreeBSD and like it.

mediahound said:
There are separate clustering programs out there to provide fault tolerant storage for data, but by what I can tell they would all work "outside" of all the benefits of ZFS - meaning once you put it in charge of say seeing that data is present in at least 2-3 locations of a large cluster of dumb disk nodes you open the door right back to silent data corruption, failed writes, and all the rest all over again. Yet even if you had a single 'master' computer in charge of a monster zpool made up of many machines, if it maintains the 1:1000 ram to storage ratio to manage all that, it still wont work without the ability to distribute that load to the more local level of machines, whether it's some kind of top down hierarchy to coordinate this, or some kind of peer to peer hierarchy or both together.

I disagree again. This isn't really necessary and would add a mess of complexity(read.. more RAM) that is unnecessary for all but a very small handful(probably less than 100 places) on the planet. It's relatively trivial to have 256 hard drives(but the cost is far from trivial) but you have to know what you are doing and have the budget for it. Also, as you get bigger and bigger your LAN connections will become your bottleneck. What's the point in having a 1PB file server if at maximum LAN speed it would take absurdly long periods of time to fill?

mediahound said:
So as of right now, are there any credible discussions going on about some kind of a clustering ability that would work around those bottlenecks? Just like "ZFS loves cheap disks" not needing RAID card, something along the line of "ZFS-cluster loves cheap computers" not demanding server class hardware, providing much better 'catstrophe protection' where physically separate servers, either on the same LAN in a different room or even far away over the internet, could coordinate to prevent silent data corruption while reducing sysadmin hassles?

THere are no discussions, your expectations are not even close to reality. If you want to add a feature(such as catastrophe protection) you are adding complexity to your design. You aren't going to "reduce sysadmin hassles" while adding more features, except in very very rare circumstances. Since you're dealing with a local/remote setup you inevitably are adding "sysadmin hassles" and expecting anything less is crazy. There's a reason why storage admins make good money, and they have a certain amount of job security for a reason. Their job isn't easily outsourced and companies learn VERY quickly why they pay their storage admins as soon as they lay them off.

mediahound said:
In part i'm wondering what kind of strategies could be used to maintain an "always in ZFS" data chain after the data leaves the specific computer - going over the network, going onto a USB drive then back onto another machine, etc. Once the data leaves ZFS it can be corrupted, the new checksum might not match the old if corruption occurs en route and were back to the problem we've always had so it would seem one way or another, even if most of the work would be up to other projects to interface from their end, there should still be good ways for ZFS to provide data 'upstream' that computer to be used as such. (or maybe it already does - i'm not a programmer or sysadmin afterall, just thinking out loud about what my feature request is so i dont have to become either :P)

Adding features = more sysadmin responsibilities. Period. There is no silver bullet. Until FreeNAS if you weren't BIG into Unix OSes you had no options at all for using ZFS. While FreeNAS/NAS4Free and other similar projects similift down ZFS alot, it adds its own new problems. There are people that now feel that since FreeNAS is so easy to install everything else is so "hard". You can only lower the bar so far, then its up to the server admin to do the rest.

The reason why ZFS will never be on Windows.. it can't ever be dumbed down enough to where your average Windows admin won't screw up and lose data. Microsoft doesn't want the blame(or the tech support calls) for admins making horribly stupid mistakes and implementing a new file system on an OS is far from trivial. Especially one as complicated as ZFS.

jgreco · Mar 29, 2013

cyberjock said:
There are quite a few motherboards that support 128GB of RAM. But you are talking BIG BIG dollars, and even more if you order incompatible hardware.

"BIG BIG dollars?" I disagree. I'll take that to task below.

And why you make the comment that each new CPU generation only doubles the amount of RAM, the first 4TB hard drive shipped Dec 2011(more than 2 years ago) and there hasn't been any larger drive yet.

That's something of a fluke historically, and seems to be the result of a "perfect storm": Large capacity, reasonably priced SSD was displacing spinny disks in high end laptops; small capacity, reasonably priced SSD along with Intel Smart Response and a 5400RPM spinny disk is eating into the market for 3.5" 7200 (note Seagate's disinterest in delivering a 4TB 7200 desktop drive); further consolidation in the hard drive manufacturing world has resulted in only three manufacturers; the big flood seriously interrupted drive manufacture, which threw a big wrench into things.

I think there are some factors to be considered:

1) With the advent of SSD, manufacturers are no longer as focused on providing several classes of hard drives (5400, 7200, 10K, 15K) because SSD is rapidly eating into the high performance SAS market; why would I buy a 2.5" 15K 300GB SAS drive for $370 when I can get a Crucial 512GB SSD for $379? Okay, well, the drive has a lot more endurance but the effect is legitimate.

2) How big a hard drive is "big enough"? A lot of the growth of the size of hard drives has been driven by the PC market. With PC's on the decline, and hard drives being generally large enough to hold all the things the average user wants to put on the computer, HDD manufacturers seemed to discover that the clamor for 4TB wasn't nearly as significant as they were expecting. As a result, instead of the 6TB disks that 2010 would have projected as available in 2013, we'll be lucky to see 5TB show up soon. What big things are people storing? Video? How much video do people actually want to store?

3) Even with fast broadband, content has to come from somewhere. If people are acquiring video from iTunes or other services over the Internet, broadband is a limiting factor. Have we reached the point where the slow growth in broadband speeds is causing a slowdown in hardware development?

etc

But CPU generations have continued to plug along just fine.

Well, in ~2005, a large system might have an 8GB per CPU limit (Opteron Sledgehammer like Tyan S2882/S4882, Xeon Lindenhurst) which was in some ways a practical limit because some of them could take DIMMs-made-of-gold if you could find someplace to manufacture them for you. Once Nehalem came around, though, and Intel followed AMD into moving the memory controller onto the CPU, that seems to have opened the memory floodgates for Intel.

http://en.wikipedia.org/wiki/List_of_Intel_chipsets
http://en.wikipedia.org/wiki/Intel_Xeon_chipsets

The low-end chipsets are still limited to 32GB, so when you're looking around Amazon or whatever looking at all the prosumer boards, it is going to seem like attaching a lot of memory is difficult. What you need to do is to look at the E5/Nehalem-EX/Westmere-EX (Socket 1567/2011) boards, where options for up to 1TB of RAM have existed since maybe 2010. Earlier versions of these platforms were available primarily as prebuilts like the Supermicro 8046B-TRF ("I want one of those for my desk!"). So at $3299 I'll grant that's "BIG BIG" dollars, but these days, reasonable options exist:

Supermicro X9SRL ... $269 for up to 256GB.

Supermicro X9DRI ... $395 for up to 512GB.

Supermicro X9DRI-LN4F+ ... $443 for up to 768GB.

There are plenty of people that have extensive experience with very large systems, but you won't find them here frequently because the people that work with very large systems wouldn't waste their time with such an "amateur" project like FreeNAS. They probably use FreeBSD and like it.

Ah, "hi there?"

But don't kid yourself, FreeNAS isn't an amateur project. FreeNAS is aimed at amateur users, but the developers are really doing the work for TrueNAS. There's a significant amount of value in having a bunch of users hammering on your product and finding all the bugs for free. TrueNAS is aimed at paying users, amateur or otherwise.

From my perspective, FreeNAS is simply a packaged FreeBSD that's already had most of the difficult bits taken care of, and given that there are so MANY things to address when working with big systems and complex networks, having something that is a prepackaged appliance is a blessing - as long as it works. And that's where FreeNAS is a real win.

JaimieV · Apr 3, 2013

mediahound said:
There are separate clustering programs out there to provide fault tolerant storage for data, but by what I can tell they would all work "outside" of all the benefits of ZFS

This isn't the case. If you use something like Lustre (not on FreeBSD, but there we go!) then you can use ZFS or any other filesystem for your object data stores. You get all the benefits of both ZFS and clustering. Fortunately, ZFS-on-Linux just hit what its maintainers consider "production ready" status.

Adding clustering to ZFS is a project approaching the scale of the work already done on ZFS, so don't expect it soon!

Important Announcement for the TrueNAS Community.

Clustering services?

mediahound

Dabbler

cyberjock

Inactive Account

jgreco

Resident Grinch

JaimieV

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Clustering services?

mediahound

Dabbler

cyberjock

Inactive Account

jgreco

Resident Grinch

JaimieV

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Clustering services?"

Similar threads