BUILD Doing a ~$14 000 build, input on harddrive choice (and the rest of the components)

Christer · Mar 25, 2015

I till start with the most important tl;dr; question, then go into the details of the server.
It will have a SAS-3 HBA and SAS-3 backplane and use 7200 RPM SAS-drives. Do I need SAS-3 drives for getting the HBA<->backplane part to operate in SAS-3 mode or will the SAS-2 speed only affect the backplane<->harddrive part?
Reason for asking is that Super Micro Computer (SMC) seems to be dumping out the older HGST 7K4000 drives for 30% less then 7K6000 and that makes quite an impact on the price of the whole system.
Edit: My finding about this can be read in my replies #12 and #15. Question is answered.

Background:
Obviously this is system for a company (my employer) but I hope I can get good advice here on the forum. I've been reading the sticked threads and some others so I hope I avoid some newbie mistakes in the choice of hardware.

We use 100-150 virtual machines (mostly Windows Server) spread across 4 VmWare ESXi servers which are really bottlenecked by the storage solution we have today. Most immediate problem is lack of space (and that is not expandable any longer). It's a year or so left on the expected life span for the storage solution. But many of the virtual machines don't have such a heavy load and are not mission critical, quite a few are just inactive space hogs (but the virtual machine owners want to keep them). And there are powered off machines. Tiering our storage for virtual machines with one expensive solution with a lot of redundancy (multipaths, master/slave controllers etc.) and uptime (on-site support) to another with more space but only redundancy for the data itself will be how we work from now on. This system will be sort of a retirement home for low-use virtual machines! It should have a lot of space but if say the motherboard brakes we won't have a problem with waiting for a replacement.

Okay, over to the fun part, the choice components for the build itself. This will be used with the latest FreeNAS version (9.3 Stable).
We're going with a SMC machine and I'm leaning towards letting SMC built it also.

Barebone. SuperStorage Server 5048R-E1CR36L
- Which includes the motherboard X10SRH-CLN4F. The LSI3008 SAS-3 controller is in RAID mode (IR) but according to an "application engineer" at SMC (via my sales representative) replied it is possible to flash it to IT mode (simple HBA) and pointed to the ftp://ftp.supermicro.nl/driver/sas/lsi/3008/Firmware/ FTP for firmwares to the LSI3008 so I believe them.
  Via the mpr driver it should be supported, right?
- The chassis is the 847BE1C-R1K28LPB with space for 36 3.5" hotswap harddrives. I plan to have the MCP-220-82609-0N 2x2.5" rear hotswap kit installed, just asking SMC to really confirm it's used for SATA-ports. Also some interior MCP-220-84701-0N 2x2.5" drive bays. I'm also requesting 2 additional 3.5" hotswap caddies so I can have spare drives mounted in caddies ready to go. Plus the MCP-210-84601-0B front bezel because I like to have bezels on servers :)
CPU. The motherboard is a single-socket LGA 2011 board which probably holds down the price a bit plus the processors are a lot cheaper compared for the same speed. I'm thinking of getting the Intel Xeon E5-1650 V3. 6 core, 15 MB smart cache and 3.5 GHz. Closest contender in dual cpu range is the E5-2643 V3 which only offers 5 MB more cache and nearly trice the price.
RAM. The motherboard have 8 DIMM slots and the price for 16 GB modules is quite okay so I'm thinking of 4x16 = 64 GiB RAM (DDR4 ECC REG). Leaves room for expansion without replacing modules. Looking at the Samsung memory that is on the list of tested memories for the board.
Zpool harddrives. Thinking of buying 20 drives with 18 in the chassis. Looking at 4 TB drives which will give us plenty of space. 3 vdevs each containing RAIDZ2 of 6 drives. Usable drives and capacity: 12*4 = 48 TB minus 10-20% free space. Write performance is helped by more vdevs, right? I'm leaning towards the HGST Ultrastar 7K6000 4TB SAS if I should have SAS-3 or its predecessor 7K4000 if SAS-2 is okay (the first question I asked). Anyone with thoughts of these HGST drives? Another choice would be the (PDF) Seagate Enterprise Capacity 3.5 HDD SAS 512E 4TB ST4000NM0034
SLOG. As this is going to be a datastore for virtual machine harddrives (vmdk) and I really want sync writes (sync=always) with a lot of random writes I've gathered is the time you should have a SLOG. I've understood that you actually only need a few GB but I think HGST ZeusRAM SSD is too expensive. I'm think I will choose a Intel S3710 SSD with 200 GB and crank up the over provisioning size (hope it's possible with hdparm in linux for this drive) so I only end up with say 20 GB usable space. In it's standard config it is rated for 10 DW 5 years and I don't expect we'll be stressing it to that. Have I've gotten it right that the ZIL do sequential writes? The 200 GB drive is rated for 300 MB/s where the 400 GB drive is rated for 470 MB/s. They have the same rated IOPS of 43 000 so if I misunderstood the ZIL. I don't think I will have the SLOG on mirrored drives. Don't shoot me yet! As I understand it newer ZFS versions will work fine if the SLOG dies, it will just direct the ZIL to the zpool and nothing was lost since the ZIL exists in RAM. The SSD is capacitor backed SSD and the data in the SLOG is only important for like 15 seconds? So it should be that the SSD dies AND the computer dies (resets or other thing) within that little time frame which would corrupt the zpool. The system will be powered through a UPS with extra battery so we're looking at ~30 minutes of battery time.
L2ARC. Bigger and not so much writing. Thinking of the Intel S3610 SSD with 800 GB. 3 DW for 5 years. Perhaps we will have little use for it with 64 GiB RAM?
Boot media. The Supermicro SATA DOM (Disk on Module) 32 GB. There quite inexpensive so I'm thinking of buying 4. Perhaps plugging them all in (2 via special SATA ports, 2 via normal and extra power cable) or have some spare outside the machine. I know FreeNAS boots to RAM but I would like to avoid having to reinstall and load config files. And mirroring only two would make me nervous if one brakes and waiting for a new for probably few weeks.
Storage NIC. Intel X520-DA2. The two SAN switches have mostly SPF+ ports and I have good experience with the X520 before. Should be supported by the ixgbe driver.

Thanks for any advice and best regards.
Christer

depasseg · Mar 25, 2015

I've got the same MB. Flashed to IT mode and running with SAS-2 drives without issue.

Specs look somewhat similar to mine (except scaled up).

Only thing I would suggest you consider is swapping out the Intel 10G with a Chelsio. Not critical, but Chelsio seems to be better supported/easier to get running (and performs a little faster).

marbus90 · Mar 25, 2015

You could also contact ixsystems for a TrueNAS. A singlecontroller Z20 with a 24drive JBOD should be around that price. Comes with dual hexacores and 64GB RAM.

If you stick to that build, go for a Dual-CPU board with the possibility for more RAM. 64GB RAM for 72TiB of VM storage is really cutting it close.

SweetAndLow · Mar 25, 2015

Your going to want mirrors sets of vdevs not raidz2. You will need the iops for the VMs. You might want to look into having someone build this for you and spec out the parts based off your use case. Try calling ixsystems and tell them what it will be used for and see what they spec out and at what cost.

Christer · Mar 25, 2015

Thanks for the confirmation about possibility to flash the onboard SAS controller.

Chelsio: Looked for resellers now here in Sweden and found only some IBM and NEC branded chelsio cards (of course totally out of stock) which costs the equivalent of $1000 more then the Intel NIC...

IxSystems: I look through the site but can't really make out if they sell outside the US (or list. I've looked at it but I'm quite certain I won't get approval to buy something over the Atlantic compared to our preferred reseller which in turn have good connection with SMC.

marbus90, I forgot, the 1 GB RAM per TB storage is calculated by the raw storage. And then as you wrote it ends up with 72 TB. Dual-CPU board because more DIMM slots or because FreeNAS/ZFS is CPU-hungry? I can just throw in 8x16 = 128 GiB on this single board (32 GB DIMMs are quite expensive so I think 128 is the limit for reasonable price). Or go with fewer or smaller hard drives.

SweetAndLow, do you mean 9 vdevs of 2 disk mirrors? (if I go with 18 drives). Or 6 vdevs of 3 disk mirrors? I've reading more now about this and RAIDZ doesn't seem to be the best choice for high IOPS. But is it stupid to use "just" 2 disk mirrors. I could potentially loose it all with 2 disk failures before resilvering has completed. But on the other hand RAIDZ2 with 6 drives in each vdev are more disk dependent on other disk to work. Half of the disks (3) fails and I'm toast as in 9x2 setup also would mean.

marbus90 · Mar 25, 2015

128GB is still quite low for a VM storage. I did count all 36 bays full with 4TB disks.

Christer · Mar 25, 2015

marbus90 said:
128GB is still quite low for a VM storage. I did count all 36 bays full with 4TB disks.

Okay. If I swear not to use too many drives? ;-)

SMC haven't launched a single socket X10 barebone with more than 12 and less than 36 3.5" bays so I opted for the bigger one. The 3U chassis doesn't seem to be updated with newer backplanes if I would go the route of picking chassis and motherboard separately.
So if I scale down to 14x4=56 TB drives in use (+2 spare) which leave 8 GB for the bare minimum RAM recommendation for FreeNAS. I get 28 TB usable space (7 striped 2 disk mirrors) before consideration of not filling the filesystem 100%.
And then I could add more drives but also add 4 more RAM DIMMs to get 128GB.

That is already a lot and if I double it (28 drives, 120 GB minium rule of thumb fo RAM) I'm definitely sure we don't want to put any more eggs in the same pocket.

jgreco · Mar 25, 2015

Christer said:
marbus90, I forgot, the 1 GB RAM per TB storage is calculated by the raw storage.

No, it isn't. It's deliberately left vague because it's only a guideline. For stressy things like VM storage, expect to desire/require more memory.

SweetAndLow, do you mean 9 vdevs of 2 disk mirrors? (if I go with 18 drives). Or 6 vdevs of 3 disk mirrors? I've reading more now about this and RAIDZ doesn't seem to be the best choice for high IOPS. But is it stupid to use "just" 2 disk mirrors. I could potentially loose it all with 2 disk failures before resilvering has completed. But on the other hand RAIDZ2 with 6 drives in each vdev are more disk dependent on other disk to work. Half of the disks (3) fails and I'm toast as in 9x2 setup also would mean.

You've got the idea there. So yes, you do three-way mirrors to maintain redundancy during a drive failure, or even four-way if you're super paranoid.

The new VM server I'm working with here uses that strategy. It's a Supermicro SC216 (24 x 2.5) and the intention is to stick 24 2TB hard drives in it, 8 of the SpinPoint M9T SATA's, 8 of the Toshiba 2TB's, and 8 TBD (I'm kinda hoping WD releases a Red 2TB because the Green 2TB has been out awhile). It will have seven vdevs, each a three-way mirror of three heterogeneous drives, plus three spare. I'm purposely avoiding the SAS drive options because it's 3x-4x more expensive. I'd rather go wider with cheap SATA if it comes down to that.

The upside to this is that it closely resembles our workload; there's probably a lot of reads that go on, but most of our VM's attempt to minimize unnecessary writes. Still, the ability to throw things around now and then is nice so it needs to perform well for writes. The read capacity, though, should be amazing. The filer will have about 48TB of raw disk, but only 14TB of space in the pool, and given the amount of free space ZFS needs to maintain performance and avoid fragmentation on an iSCSI pool, we're only expecting 7TB usable space. That's the sucky part of mirrors and iSCSI.

So the other thing I was going to add to this was that I notice that you say you have some inactive space hogs. You might want to contemplate whether maybe setting aside some of your storage as RAIDZ2 might work out. It is much more space efficient and if you have the VM's powered off or maybe only a few running with very low activity, RAIDZ2 is perfectly fine for that. Also if you can migrate stuff to/from the RAIDZ2 pool, you might want to seriously ponder that. We do archive many old and unused VM's here on a RAIDZ3 array and that works fine, you can even register them and power them on.

Finally, you might contemplate SATA disks as an option. They're slightly less reliable than the nearline class drives, and somewhat slower, but the cost reduction might allow you some options you hadn't considered, like another whole machine or something. ;-)

ser_rhaegar · Mar 25, 2015

I believe the inactive space hogs are the only things going on this system, or at least that is how I read:

This system will be sort of a retirement home for low-use virtual machines! It should have a lot of space but if say the motherboard brakes we won't have a problem with waiting for a replacement.

jgreco · Mar 25, 2015

I didn't read it that way, based on

Christer said:
But many of the virtual machines don't have such a heavy load and are not mission critical, quite a few are just inactive space hogs (but the virtual machine owners want to keep them). And there are powered off machines.

If only powered off virtual machines are being stored, a RAIDZ2 based system on a smaller platform would probably be quite suitable. That would be like a "nearline" storage tier.

I think the poster is looking for something that can actually support VM's in use, a midway tier that offers large amounts of space for less-active VM's.

The conventional SAN storage systems are mostly extremely costly and can be difficult to work with. The irony is that a FreeNAS system sometimes turns out to be less expensive AND performs better. No promises of course, but it'd be interesting to hear back in six months how the poster's deployment went.

Christer · Mar 25, 2015

I don't think I will go down to the most simple SATA NAS drive. Compare how WD have the Red, the Red pro and the Raid Edition (RE). The server will be mounted in a rack cabinet with a lot of other servers also making vibrations.

Well. I know we will move one ~1 TB database - but that one is seldomly used but only by 1 or 2 users at a time. Thinking more and more about this the workload wouldn't merit striped mirrors. With 14 drives I could just try out 2x RAIDZ2 or 7xRAID1 and see what performance I get. Today's storage is under the most heavy use when we start windows update. The performance has gone down (gotten more VMs and ESXi hosts contributes to that) so the first machines on this FreeNAS would surely have drastically faster disc access than before.

Christer · Mar 25, 2015

And I think I've found the answer to the first question. LSI DataBolt group two SAS-2 packets and transmitts them from the backplane (expander) to the HBA as a SAS-3 packet at full speed.
http://www.lsi.com/downloads/Public... Common Files/lsi-wp-databolt-performance.pdf

http://www.supermicro.nl/newsroom/pressreleases/2013/press130909_SAS3_12Gbps.cfm
(the question now is only if this particular expander is from LSI and with DataBolt)

Christer · Mar 25, 2015

jgreco, I'm thinking of going NFS instead of iSCSI for this. I have some experience with iSCSI through the Compellent storage solution but only what I've gathered from reading it's easier to mess up with iSCSI.

jgreco · Mar 25, 2015

Someone was saying nasty things about using NFS for VM storage recently. I don't know why but it might be worth looking for.

iSCSI is harder to configure.

Christer · Mar 26, 2015

My sales representative have gotten a firm answer from SMC. The backplanes in the chassis use LSI expander chips and they do have DataBolt.

Thus SAS-2 drives will not bring down the whole bus from the HBA to expander chip. Which makes things like a SAS-3 SSD for some reason in the future won't be forced down to a lower bus speed (my reasoning, not something SMC replied).

jgreco · Mar 26, 2015

Christer said:
jgreco, I'm thinking of going NFS instead of iSCSI for this. I have some experience with iSCSI through the Compellent storage solution but only what I've gathered from reading it's easier to mess up with iSCSI.

So on a tangent here, how's the Compellent stuff? I've worked with some of Dell's EqualLogic gear and while it kinda has a nice featureset (sometimes, when it isn't doing mind bogglingly stupid) the performance is excessively unimpressive. I seem to recall that the Compellent line has a stronger hardware component whereas the EQL has a heavier software layer.

Both seem very expensive and have more vendor lock-in, platform lock-in, and obsolescence issues than I care for. One of the reasons I wanted to do ZFS for our VM storage...

depasseg · Mar 26, 2015

(Disclaimer: I used to work for Dell)
As I'm sure you are aware, both were acquisitions by Dell. When Compellent was acquired, one of the things that really stuck with me, was the customer retention rate. It's was ridiculously high. People who used it seemed to love it and rarely left. That being said, both have hardware lock-in, but Dell is on a mission to provide an "open" system to the greatest extent (supporting industry standards and such) on software/manageability/interoperability side.

I'd say both have a strong software emphasis. Compellent (prior to Dell) was running on SuperMicro stuff. It was just certified/branded SuperMicro stuff. As for capabilities, they are both very robust. And they both do a pretty good job at providing both a simple interface to get the bulk of tasks quickly and easily completed, yet still allow an admin to drill down if needed. One of my favorite features of Compellent was the policy based storage tiering. It made it brain dead easy to set up a policy to slowly move stale data from tier 1 to 2 to 3 within an array (something I would LOVE to see in FreeNas). And since all writes were to tier 1, there really wasn't much of a performance penalty. Dell took capabilities like this from Compellent and added them to EqualLogic.

My 2 cents. Christer will give you his impression from a customer perspective.

jgreco · Mar 26, 2015

I was under the perhaps mistaken impression that Dell had aimed EqualLogic at the large storage but slower tier of storage requirements while they had beefed up the Compellent with more hardware acceleration and better performance.

As for policy based storage tiering, while there's occasionally noise made about that by ZFS developers, it is of course rather tricky to manage, and would be hard to bolt on to the ZFS design. More likely that at some point resources will be cheap enough that someone finds a clever way to layer on a tiering solution on top of ZFS, so you can have several ZFS pools of varying performance and then have the tiering solution migrate stuff transparently. I know that seems unlikely, right now, but looking back to ~1980 and the design of storage systems back then, I have to tell you I'd be amazed to see what evolved in just 25 years.

depasseg · Mar 26, 2015

Equallogic was (is) aimed at the smaller business storage market (Small and Medium Business). They have very high performing arrays, but are really easy to setup and use and are super simple to extend (just buy another node and add it to the mix). Whereas Compellent was designed to scale to handle the needs of big businesses.

The tradeoff with EQL is that every array comes with storage, compute and network ports which provides nearly linear performance increases, but there will become a point where it really isn't cost effective at a larger scale as compared to a traditional frame based array.

jgreco · Mar 26, 2015

depasseg said:
Equallogic was (is) aimed at the smaller business storage market (Small and Medium Business). They have very high performing arrays, but are really easy to setup and use and are super simple to extend (just buy another node and add it to the mix). Whereas Compellent was designed to scale to handle the needs of big businesses.

The tradeoff with EQL is that every array comes with storage, compute and network ports which provides nearly linear performance increases, but there will become a point where it really isn't cost effective at a larger scale as compared to a traditional frame based array.

Hahaha. Wow. This'll go full circle.

Back in ~2003-2004ish some of my clients were complaining that large storage arrays were unaffordable. Things were rough in the Usenet business, and people were trying to get retention out to a few weeks, but the cost was astronomical. Companies like UseNetServer were throwing massive SAN resources at the problem at a premium cost. So we took a 4U chassis with 24 SATA drives, filled it with 250GB drives, and arrived at 6TB of storage in 4U for about $11,400 ea. Didn't need redundancy, handled that up at the application layer. Reduced storage costs substantially (1/5th to 1/10th depending on whose storage had been used). Took what had been a simmering competition in the Usenet business and lit off the retention wars.

At a larger scale, it CAN be totally effective to have every array equipped with its own resources, but only if the array itself isn't outrageously priced to begin with. The usual problem is that the big "frame based arrays" are designed for rich enterprises who need zero downtime and can afford to pay big bucks.

That had been a problem for ZFS because ZFS required relatively large resources on the local host. If you look at something like the BackBlaze Pod, that's among the cheapest options for attaching raw storage to the network, but unsuitable for ZFS because it lacked the big CPU and memory resources needed for a high performance filer, and retrofitting it with better resources jacked up the cost substantially. ZFS had been closer to the cost of one of those big arrays because of the extra resources required.

But now we're finally getting to the point where the per-node cost for a ZFS box isn't totally outrageous, and with the newest CPU's supporting massive memory, and memory prices falling, it is becoming ever more practical to load lots of CPU and RAM into a host along with a pile of disk. And ZFS can leverage SSD for L2ARC when it has sufficient RAM.

I'm estimating that the VM filer I'm building will cost around $5000-$6000 total. That's to store about 7TB of data, so it isn't exactly cheap, but it also isn't oppressively expensive. True, it's using SATA drives, but all carefully selected to provide a high performance array.

By comparison, a Dell PowerVault MD3800i in default config (single controller, 2 x 600GB SAS HDD) is being quoted on the Dell web site at $7139. And it'll have the approximate speed of a single 15K RPM HDD.

So what I'm seeing is that the ZFS stuff is dropping in price, slowly, steadily, as a function of the drop in price of computing gear in general, and/or increasing in performance, in that same way.

I propose that sooner or later, "cost effective at a larger scale as compared to a traditional frame based array" - which I realize you said specifically referring to EQL - isn't going to be true for racks of independent arrays such as ZFS builds. It'll probably continue to be true for EQL but we're now seeing ZFS arrays getting amazingly cheap to create, and that's what ultimately undercuts the cost effectiveness of the monster enterprise arrays. Pretty sure we're there already, and of course it isn't just ZFS doing the undercutting.

We live in an interesting time.

Important Announcement for the TrueNAS Community.

BUILD Doing a ~$14 000 build, input on harddrive choice (and the rest of the components)

Dabbler

FreeNAS Replicant

Guru

Sweet'NASty

Dabbler

Guru

Dabbler

Resident Grinch

Patron

Resident Grinch

Dabbler

Dabbler

Dabbler

Resident Grinch

Dabbler

Resident Grinch

FreeNAS Replicant

Resident Grinch

FreeNAS Replicant

Resident Grinch

Similar threads