First Build @ ~140Tb: I want to get this Right! Please critique.

Status
Not open for further replies.

hkent1

Cadet
Joined
Jan 15, 2014
Messages
5
Hi Folks, We have a project coming on line that will require ~ 140Tb of disk space. My Chassis build will basically be two of the following, connected via SAS expander (or I may build out rather than up???). If the pilot proves successful, we could endup setting up serveral (6+) of these.

Motherboard: SuperMicro X9DRI-F Socket LGA2011
CPU: Intel Xeon Quad Core E5-2609V2 2.5 GHz Processor
RAM: 32GB DDR3-1333 PC3-10600 CL9 ECC Registered DDR3 RAM (4x8GB)
NIC: 4 x Intel® i350 Gigabit Ethernet Controller
Controller: LSI 9211 (Host) Controller in JBOD mode
Case: Supermicro CSE-846BE16-R920B
PwrSupply: 920 Watts Redundant Power Supply
HDD: 4TB X 24 Seagate Constellation ES.3 7200 RPM

I plan on creating 1 ZPOOL by combining 4 vdevs at 12 drives each running RAIDZ2. I considered RAIDZ3 but the capacity requirements nessitate that I stop at RAIDZ2

The plan is to attach to a windows server via iSCSI (1Gb) for now, possilby bonding in the future. We'll be storing security footage (H264 = Compressed). Everyday we'll be adding roughly 1Tb of HD (5 Mpixel) video with a retention of just under 140 days.

If you have any critques please lay them on me. I've spent the last two weeks studing and devouring anything I can get on ZFS. I'm hoping more eyes will shed light on weak spots. Over the past two years, I've ran smaller (much smaller) implementations with only 1 vdev/zpool in the past that I can't even compare to this. I'm wondering if someone could tell me if I'm thinking correctly on my zpool setup or do I need to break the vdevs into smaller units for performance ?

Thank you for your time!
..K
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Definitely more ram. I'd start at a minimum of 64 gig. 128 would probably be better.

Raid z2 vdevs of 12 drives is just too much. Especially in a business environment. I'd strongly recommend more parity overall. With 24 drive chassis, I'd probably do either 4x 6 drive z2's, or 2x 11 (or 12) drive z3's. Also keep in mind you don't want to go more than 80% full on the zpool. So don't just plan for 140 tb zpool if you actually intend to write 140 tb to it.

With 24 drives, the most I see getting out of it is 18 drive capacity. (2x 12 drive z3). With 4tb drives, this is 72 tb, minus 1000 -> 1024 conversion, and the bit of zfs overhead, you're probably looking at a zpool of about 62 tb.

jgreco probably knows more about these size systems than most. Maybe he'll make a appearance.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Motherboard: SuperMicro X9DRI-F Socket LGA2011
CPU: Intel Xeon Quad Core E5-2609V2 2.5 GHz Processor
RAM: 32GB DDR3-1333 PC3-10600 CL9 ECC Registered DDR3 RAM (4x8GB)
NIC: 4 x Intel® i350 Gigabit Ethernet Controller
Controller: LSI 9211 (Host) Controller in JBOD mode
Case: Supermicro CSE-846BE16-R920B
PwrSupply: 920 Watts Redundant Power Supply
HDD: 4TB X 24 Seagate Constellation ES.3 7200 RPM

I plan on creating 1 ZPOOL by combining 4 vdevs at 12 drives each running RAIDZ2. I considered RAIDZ3 but the capacity requirements nessitate that I stop at RAIDZ2


You'll be so soooooorrrryyyyy...

Probably not quite the right path. Not super far off, BUT, the off bits will kill you.

You have four mistakes. Well, three mistakes and two half mistakes (equals four), then maybe some debatable issues.

Mistake 1: X9DRi-F is the wrong board. Period. If you're going to add quad NIC, then why the heck aren't you going with an X9DRi-LN4F+? Quad network PLUS extra memory slots, bonus! There are some caveats about the speed of the extra memory slots, but basically you will be beating your head against the RAM issue at some point on a large system.

I've been playing with an X9DR7-TF+ here for a little over a year, similar to the X9DRi-LN4F+ but with 10GbE and an LSI2208 added (and no quad). Very nice bit of hardware.

Mistake 2: Do not under any circumstances get 8GB registered DDR3 modules. They're slot stuffers - crap you have to pull and discard when you discover you need more. Get 16's. Preferably DDR3-1600. One set of four 16's gets you to 64GB. Two takes you to 128GB. And three will take you to 192GB - but at a speed penalty. For 140TB of space, 64GB is very strongly recommended, and 128GB ought to be sufficient.

Mistake 3: Your storage goal never dictates your pool layout. Your business reliability and problem survivability requirements guide your pool design. RAIDZ2 on a 12-drive wide vdev is risky. I've got RAIDZ3 on an 11-drive wide plus a warm spare ready for hands-off replacement. It turns 48TB of fallible 4TB drives into a fairly nice 30TB usable array. Multiple RAIDZ2 vdevs striped together basically links a bunch of risky vdevs together, which increases the risk. If you absolutely must do that, make separate pools so that you limit the scope of your loss.

Mistake 4a) That E5-2609. It sucks. No hyperthreading. DDR3-1066. Slow. Got one, I used it as a placeholder while I held out for the Ivy Bridge E5's to come out last September. It is a contemptible CPU. You may not need the fastest CPU known to man, but you'll be sorry with the 2609. The ideal CPU is a low core count, high clock speed E5. 2637, 2643, etc. You don't need ideal but everything E5 is expensive, except ones like the E5-2609. Not a showstopper but definitely a "think carefully".

Mistake 4b) The Constellation ES.3. No need for 7200 RPM drives. Just burns watts and contributes to cost and early failures when they're tightly packed in a storage array like the CSE-846.

Non-mistakes:

Awesome chassis. I've got the BE26 here, just has the secondary SAS expander that isn't currently meaningful anyways. The R920B is a good pick especially if you don't anticipate going to dual CPU. It'd be a better pick if you went with 5400/5900RPM drives with their slightly lower start current. Definitely also the right supply to use for any expansion chassis.

Unsolicited commentary:

Given the size goal you have, you might wish to consider the SC847BE16. It is harder to cool and more cramped to work in though. Also go take a look at the SC847DE16 ("holy crap!") but more realistically consider the SC847E16-R1K28JBOD as a possibility. These are only things to do if you're tight on rack space, of course. The 846 is always a better choice for cooling purposes.
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
Also keep in mind you don't want to go more than 80% full on the zpool. So don't just plan for 140 tb zpool if you actually intend to write 140 tb to it.
.

First time I read about this 80% business... What's the reason for this?
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
Yeah, you'll definitely need more RAM.
 

hkent1

Cadet
Joined
Jan 15, 2014
Messages
5
WOW! Thank you to everyone who posted: this is exactly the information/critiques I was looking for.

jgreco, I'm making the changes with your insights! thank you!

I do have a question regarding:

jgreco said:
Mistake 4b) The Constellation ES.3. No need for 7200 RPM drives. Just burns watts and contributes to cost and early failures when they're tightly packed in a storage array like the CSE-846

What would be ideal here without loosing speed? Would you recommend stepping to a 5400RPM drive instead?

Thanks again!

...K
 

hkent1

Cadet
Joined
Jan 15, 2014
Messages
5
OK, after re-reading I a
WOW! Thank you to everyone who posted: this is exactly the information/critiques I was looking for.

jgreco, I'm making the changes with your insights! thank you!

I do have a question regarding:



What would be ideal here without losing speed? Would you recommend stepping to a 5400RPM drive instead?

Thanks again!

...K


OK, after re-reading I saw your post regarding the speed of the recommended hard disk. (apologies for not reading closero_O ).

Now to get my lack of experience to really shine. :eek:

1 remaining question - will 54/5900 RPM drives give sufficient write speed to saturate a 1Gb or even two 1Gb links?

...K
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
A 1GbE port has a theoretical max of about 125MBytes/sec. A single contemporary 5400RPM drive transfers about 100-150MBytes/sec peak. An array of them should do better. Drives are impacted by seeks and related cruft so hands can be waved to make these numbers support various claims.

The 7200's, great for heavy seeking like lots of tiny files or database transactions.

The 5400/5900's, just dandy for storing big files.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
A single 5400 rpm may or may not be able to do gigabit, but if you're talking 4 or more drives, and here we are, then there certainly should be a problem.

The only reason I see for using 7200 rpm drives is if you need huge sequential IO for local stuff running in a jail, or for 10 gig-e or something that'll actually use it. 7200's will also give you slightly lower rotation latency for access times, but you're not using the storage for something that's going to be high random io (like vm storage). And if you were, lots of ram, and a properly sized l2arc will mitigate most of it. Being that you're not going to be doing huge random io, and not going 10 gig, the slower drives should be fine. (and if you were going to be doing lots of random io, then you'd want to pick a different zpool layout anyways).

Performance reasons is also another reason not to run ridiculously large vdev's. I know I've read of issues where people haven't been able to complete resilvering when they had huge vdevs. I'd consider 11 disk z3's to be the widest you should ever have. One of my nas's is an 11 disk z3, the other is 2x 6 disks in z2 (both have 8 drive capacity).
 

hkent1

Cadet
Joined
Jan 15, 2014
Messages
5
@ jgreco and titan_rw, points taken..Thank you!

I came up with one more question. Given that the files will be compressed already, is it worth it to enable compression on the vdevs? I've done some reading on it and I understand how it could help with files coming on b0ard that aren't compressed, but I'm not sure if I understand how it can improve performance on a system where the files coming on board are already compressed.

I guess what I'm really asking is ... can it ever hurt system performance to have it on given sufficient RAM & CPU requirements have been met?

Thanks again!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
It can kill performance if you use an aggressive compression algorithm. I recommend you not use compression.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Or perhaps better yet just avoid aggressive compression algorithms. Or test to see how performance is with them enabled, since you can twiddle it at any time. It is generally the COMpression that is piggy, the DEcompression is usually very lightweight.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
In my testing LZ4 was basically transparent with a modern CPU. It aborts early if it detects incompressible data. In fact, LZ4 seems to be the new default as pools created in 9.2.1 will have it enabled automatically (https://bugs.freenas.org/issues/3872).
However, if all your data are already compressed then there's to point in enabling the compression.
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
In my testing LZ4 was basically transparent with a modern CPU. It aborts early if it detects incompressible data. In fact, LZ4 seems to be the new default as pools created in 9.2.1 will have it enabled automatically (https://bugs.freenas.org/issues/3872).
However, if all your data are already compressed then there's to point in enabling the compression.

My experience too. With already compressed data, overhead was virtually nil.
So on average, with compression enabled you actually get much greater speed overall.
 

KTrain

Dabbler
Joined
Dec 29, 2013
Messages
36
The only thing I haven't seen much discussion on yet is the kind of writes/reads that will be performed against the storage. I've seen some camera systems use storage very inefficiently. If it was me, I'd want to validate the I/O being run against the storage array for CYA. If this array is only going to be used for archiving then it seems like it'll be just fine. In contrast, if the application managing the data is fondling the storage target frequently it could cause a lot of inefficient I/O. Should that kind of behavior agitate large volumes of data it could effect your performance. Lastly, if the application head-end is writing active data directly to this storage array for active capture it could cause a lot of I/O. Since there probably isn't a sensitive performance SLA on your implementation this could all be moot, but they're things to think about.

I'm drawing from experience with a security system that has around 250 cameras on it. The software for the system was ultra-crappy and did a really bad job managing writes and reads to the storage array. In short, the behavior ended up more like a heavy SQL DB banging the heck out of the storage array then a camera system moving big chunks of data.

You may also consider presenting the storage to your security system as multiple logical volumes. Windows is limited in the amount of threads it can run against one logical volume so depending on how the camera system performs this could cause you headaches.

Outside of that this looks like a really exciting build! Good Luck!
 

hkent1

Cadet
Joined
Jan 15, 2014
Messages
5
Everyone Thank you!
I think I'll leave LZ4 enabled as testing has showed very little overhead.

The only thing I haven't seen much discussion on yet is the kind of writes/reads that will be performed against the storage. I've seen some camera systems use storage very inefficiently. If it was me, I'd want to validate the I/O being run against the storage array for CYA. If this array is only going to be used for archiving then it seems like it'll be just fine. In contrast, if the application managing the data is fondling the storage target frequently it could cause a lot of inefficient I/O. Should that kind of behavior agitate large volumes of data it could effect your performance. Lastly, if the application head-end is writing active data directly to this storage array for active capture it could cause a lot of I/O. Since there probably isn't a sensitive performance SLA on your implementation this could all be moot, but they're things to think about.

I'm drawing from experience with a security system that has around 250 cameras on it. The software for the system was ultra-crappy and did a really bad job managing writes and reads to the storage array. In short, the behavior ended up more like a heavy SQL DB banging the heck out of the storage array then a camera system moving big chunks of data.

You may also consider presenting the storage to your security system as multiple logical volumes. Windows is limited in the amount of threads it can run against one logical volume so depending on how the camera system performs this could cause you headaches.


Thanks KTrain! Basically, we are using the FreeNas system as an archive. After reading your post, I went and did some research regarding the software. The software lands the video on local 10K SAS drives, bundles it and lays down a large sequential file to the archive (freenas). I also found that the system will fill one store/volume up then move to the next volume. We've decided to break the storage down into multiple smaller zvols, if we lose a zvol, we lose a week; not 120 days worth!

Thanks everyone!
We'll be rolling the first chassis in production this week. In a few weeks I'll post back on how thing are going!

cheers....K.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Good luck. With large sequential files you have the best scenario for this sort of thing.
 
Status
Not open for further replies.
Top