NFS Datastore for LOTS of VMs - 768GB of RAM -12x4TB Drives

vektorprime

Cadet
Joined
Oct 20, 2020
Messages
2
Hi all, I wanted to get some feedback and share my experience with FreeNAS.

Thought I'd share my latest build, but first I'll review the previous one. Note these are all ebay, used items nothing but the ooptane drives were new.

My old build:
  • Dell r720xd (12 SFF 3.5)
  • For HBA, I flashed a H310 mini as I wanted to save PCIE slots. Thanks to this forum, this was possible.
  • For CPU, originally I had some E5-26xxL CPUs but after some benchmarking and reading I found those might be a bottle neck, so I got 2x E5-2650 v2 @ 2.60GHz .
  • For RAM, DDR3 376GB 1333Mhz
  • For SLOG I used an Intel Optane 280GB AIX NVME PCIE card. (LOVE IT)
  • For L2ARC I used 2 900GB Samsung 9xxP M2 NVME drives. (pretty good, would've been good SLOGs too but the Optane performed way better)
  • For the drives I bought 12 used Seagate 4TB on ebay for about $200-300.
    • I configured these in a 3 way mirror (3 drives in a mirror instead of 2) because I was worried about all the SMART errors and wasn't sure how long they'd last.
    • Unfortunately I didn't use the 2x SSD slots in the back, I wish I had put 2 small SSDs in a raid 1 for the OS since this is a prod storage server.
  • For NICs I originally had 2x 2 10G intel NICs, but the drivers weren't that great and I really had to tweak it a lot.
    • I ended up trading into a 4x 10G Chelsio NIC which just worked very well.
  • Currently storage is served to as just 4x separate networks with NFS3 as proxmox's NFS4 support isn't great.
Currently the above pool sits at: HEALTHY: 8.56 TiB (61%) Used / 5.49 TiB Free with 1.22 compression ratio.

I house mostly linux and windows VMs using proxmox and a bunch of customized scripts for automation. Originally I used VMWARE but I needed more automation with less effort so I went with proxmox/KVM.

When I tune my TXG settings, with sync on/forced, compression on, I can easily saturate the 10G nics. Unfortunately I did increase my TXG timeout so give me more (insane) iops. My ZIL sits at around 20-50GB of TXG data, which I know can be bad if power is lost but I'll take the risk, at least it also sits on the Optane drive for reply if needed. My hope is that with the new build I can have this down to under 5GB and not mess with the TXG settings.


For the new build:
  • Dell r720xd (12 SFF 3.5)
  • For HBA, I used a flashed H710 Mini, again thanks to this forum, and this guy (https://fohdeesha.com/docs)
  • For CPU, I have 2x 35-2680 v2.
  • For RAM, 768GB DDR3 at 1333Mhz (24x32gb)
  • For SLOG I got another Optane 280GB as it just worked so well.
  • For L2ARC I have 2x 2TB Samsung M2 NVME Drives.
  • For the drives, this time I found a REALLY good deal on 6 new 4TB HGST drives, and the other 6 I bought at normal price. But at least they were all new.
    • The goal here was to not do 3 way mirror so I can get more storage and more write speed/IOPS for the TXG flushing.
  • For the NICS, I went to 2x 10Gbps Broadcom NICs as I wanted at least some NIC redundancy.
  • Storage is served via 4x 10G LACP (VPC (cisco mlag)), which is dirty to say but I just needed the redundancy and I have enough hosts where I can load balance across the links fine. NFS 4 would've been nice to use but again proxmox doesn't play to nice with it.

I haven't ran any bench marks yet or created the pool on the new build, I'll be doing that today and updating the post.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Nice initial post, and welcome!

It sounds like you've got a good grasp on what's needed for the high performance, as well as an understanding of the tradeoffs made when adjusting your TXG and dirty_data settings (more data allowed to be dirty is faster, but riskier)

Just bear in mind that the "consumer" Optane cards like the 900P technically have a clause in their warranty that voids them if used in multi-tenant or multiple-concurrent user systems - but considering that you can usually buy three 900P's for the price of a single P4800X, I imagine you've done the math in your head there as well.

Looking forward to the updates and the benchmark racing!
 

vektorprime

Cadet
Joined
Oct 20, 2020
Messages
2
Nice initial post, and welcome!

It sounds like you've got a good grasp on what's needed for the high performance, as well as an understanding of the tradeoffs made when adjusting your TXG and dirty_data settings (more data allowed to be dirty is faster, but riskier)

Just bear in mind that the "consumer" Optane cards like the 900P technically have a clause in their warranty that voids them if used in multi-tenant or multiple-concurrent user systems - but considering that you can usually buy three 900P's for the price of a single P4800X, I imagine you've done the math in your head there as well.

Looking forward to the updates and the benchmark racing!


Thanks HoneyBadger, I've read quite a few of your posts here as well as a lurker. Yea, I'm finding the trade-off to be worth it for sure. I can get the IOPS of an SSD array without the SSDs. I think I'll see even way better performance now that I have double my write speed capacity by using 2 way mirror instead of 3. In the future I may try to get the 2.5" variant of the server and use USED intel sata SSDs as the cost is not bad.

I didn't realize that point about the warranty so thanks for noting.

I'll probably put it in production this week so will update the post then.

Thanks!
 
Last edited by a moderator:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I (and a couple other users) would probably be interested in seeing your tunable values for the dirty data and TXG sizing. eg: vfs.zfs.dirty_data_*

For used Intel SSDs, if you're buying used-pulls, HP/Dell firmware models may be less expensive but often mask critical SMART data regarding wear-leveling. Check for the Samsung and Toshiba datacenter models as another option; Micron has historically struggled with high outliers for latencies (which is why the 5100's are often cheap) but their 5300 series seems to be fairly well-received.
 
Top