Adding NVMe to 'fix' slow writes - sanity check / second set of eyes?

atmarx

Cadet
Joined
Jan 19, 2020
Messages
2
Hi folks --

My team has used FreeNAS a few times whenever a researcher wants us to set up some shared storage between compute nodes and doesn't want to spend... well, anything. Usually set it and forget it. So, when the 10-bay Synology box backing a smaller 4 node HyperV cluster started freezing up every week or so, and I figured I'd try FreeNAS.

I took one of our old Dell PE R720 server with two E5-2630 procs, loaded it up with 128GB of RAM, and 5 Micron 5200 8TB SSDs. I swapped out the H710 for an H310 HBA so the drives could be individually addressed (I originally tried making single drive RAID0 arrays with the H710, but realized there was no way to replace it if a drive failed without a reboot). I've got them set up as 2 mirrored vdevs with a hot spare.

I put two Intel X550-T2 cards in there, and bonded 1 connection from each card together using LACP (so ix0+ix2=lag0 on vlan40, ix1+ix3=lag1 on vlan41).
I set up more of the same cards in each of the 4 blades, installed Intel's latest drivers, set the card profile to Storage Server (low latency, turns off vmq), set flow control off on the switch (a Netgear XS728T). Jumbo frames across the board.

I got everything set up, got iSCSI connected using MPIO, mounted the extent, and boom - I'm in business. Then I ran some speed tests and... the reads are good (2-300MBps). The writes suck. Sequential was okay (100MBps) but random was terrible (single digits). Copying over a large file would go fast (~100MBps) for the first few hundred megs and then would tank - bouncing between 5MBps and 0.

I know those SSDs are not speed demons, but I didn't think it would be that terrible. The VMs I'm hosting are mostly for research computation, so writes are bursty (scratch files, results, logging) -- it doesn't need to sustain multiple TBs at a time, but more than 1GB would be nice.

So I read a bunch more and have two Intel 900P 280GB PCIe cards on the way. My question is what would be the most advantageous way to deploy them to get better write speeds? Mirrored SLOG? Single SLOG and L2ARC? I have another R720 I can sacrifice and steal the RAM from to get up to 192GB.

I also have more of those 8TB drives on the way as well (once the SSD backlog clears up) -- I don't need the space, but I'm guessing adding more vdevs will also make a difference across the board.

When all is said and done, not counting the reused server, I'll have put in <$10k to get a (hopefully) decently useful SAN.

If there's anything glaring I've missed, please let me know -- thanks for reading and any advice you can give.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I think you've misunderstood a number of things, but most importantly to your question...

For fastest write speeds, turn off sync writes. This is the absolute fastest your pool can sustain. Adding SLOG will always be slower than simply disabling sync writes.

Make sure your H310 has been crossflashed to LSI IT mode and make sure you are using 20.00.07.00 firmware on it. The Dell firmware suffers insanely low queue depths and won't work reliably with FreeNAS anyways. This is actually what I suspect could be your problem, but don't disregard the other points.

The X550 isn't a card that strikes me as working particularly well with FreeNAS. See if you can dig up a single X520-DA2 or SR2 card and set up with a normal 10G setup that's easy to test and get a working configuration with, before you try crazy LACP setups with unusual cards.
 

atmarx

Cadet
Joined
Jan 19, 2020
Messages
2
Thanks -- I'll flash the LSI IT firmware and retest. I'll disable sync writes to test -- but from what you've written previously, I wouldn't want sync writes disabled long term, right?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thanks -- I'll flash the LSI IT firmware and retest. I'll disable sync writes to test -- but from what you've written previously, I wouldn't want sync writes disabled long term, right?

Depends!

But your complaint is about write speeds. When building these things, usually what you want to do is start with the simpler configuration, make sure it is working as expected, and then add complications. I've been doing this for decades and my experience with starting out with all the complications right away is poor at best.

Get your pool working. Make it work fast locally. It cannot work faster from remote than it does locally, and the networking and SLOG are basically additional potential bottlenecks, so skip those at first.

Then check to see how it is performing over the network. Suddenly getting crap-for-speeds? Check your network. Check if you're using a good NIC. Check if you're using jumbo frames (generally- don't.) Because if either the pool OR the network is crap, then you get crap, but it's more miserable to isolate where the issue is when you have multiple things working in concert.

Still good? Turn on sync writes. Performance tanks? Then you know that you need SLOG, or need to avoid sync writes.

The decision tree for SLOG has basically got to do with how you value your data and what the risk factors are. The biggest risk that should have you running a SLOG is if you're running VM's that can be corrupted, or any other truly valuable data where losing ten seconds worth of writes would be an issue. VM's are particularly treacherous because if your NAS reboots and "loses" writes, but the hypervisors and VM's believe those writes were committed to disk, you can end up with disk corruption on your VM's. And other similar situations.

If you're just doing research and none of your VM's are holding precious data and your NAS never crashes or reboots unexpectedly, it's quite possible that this isn't as significant an issue as it would be for someone running commercial VM's running banking transactions. The reality is probably somewhere in between for most people. Only you can really answer the SLOG question for sure.
 
Top