Strange speed problem - it might be a config item that I am missing.

geekmaster64

Explorer
Joined
Mar 14, 2018
Messages
50
Hey all!

So I've been running FreeNAS for about a year or two and have done a good amount of reading on the forums but I am running into a strange bottleneck. I will admit, I may not have a configurable item that I should have but I'm really struggling.

Short story, I have the FreeNAS rig (in my signature) providing iSCSI storage to another similarly beefy HPE DL380P G8. 4 iSCSI 1X GB NIC's in MPIO with Least Queue Depth.

I have 13 VM's presently on there (however this issue occurs if just one is on) and they are all (except one) Windows Server 2016. These are VHDX hard drive files (thin provisioned).

Inside of one of these VM's if I copy a 5GB file, it starts at around 200MB/s (source: Itself Destination: Itself - example, copying file from D:/temp/filename to D:/temp2/filename) then it slows down dramatically to around 10-30MB/s. Frankly, with how many spinners, memory, SSD's I would really expect to over-come this problem but I don't know what I don't know.

With the hardware behind this, I really don't think this is a hardware or lack thereof (all on the same Dell PowerConnect 24 port switch without VLAN's) but more a configurable item or perhaps the method in which my storage is setup. I do see that MPIO is working properly and when things are in the memory cache, I can saturate all 4 NICs.

I am sorry that I do not know what else to provide, I've come from a strong Nimble Storage / NetApp for VMware and Hyper-V background so I am happy to run commands to provide missing information. Presently running 11.2 U1 as well. Also as a note, when I go to the Reports - Disk Activity % for the spinners is typically in the 60-75% range (none are over that at all) and the SSD for the ZIL is at 80% as well.

Thank you for reading this and I appreciate your help!
 
Last edited:

praecorloth

Contributor
Joined
Jun 2, 2011
Messages
159
Alright, assuming this is the setup in your sig, the first problem I see is RAIDZ2. Yes, you've got a ton of spindles, but the topology is going to murder write performance. I'm not entirely sure how much the ZIL helps here. I've seen marginal increases in performance adding a SSD ZIL to a RAIDZ2 of spinning disks, but nothing to get too excited about.

The piece I'm really not sure about is MPIO. This part is a lot of theory based on the little I know. If I understand MPIO properly, and assuming I understand your setup correctly, your VM is likely only using 1x 1Gb channel at a shot. A performance testing application like fio, performed on your virtualization host would likely be able to saturate all of the lanes given multiple worker threads. But unless configured otherwise, a VM is only going to be 1 thread, and as such I think it will only use 1 lane at a time.

To test that, you might try a performance application like fio, and give it only 1 worker. If you get roughly the same performance as your VM, then we might be on to something here. If that is the case, then maybe trying out IO threads on your VM, but I think that would still require multiple disk controllers and multiple virtual disks. Copying from D: to D: would still be painful.
 

Jessep

Patron
Joined
Aug 19, 2018
Messages
379
If I'm reading that correct you have a 23 wide Z2 pool? Recommended Z2 width is no wider than 8-10.

ESXi VM access is recommended mirrors, so 11 x Mirrors with extra disk left over.

iSCSI is recommended to not fill the pool over 50% due to performance loss, I don't see you list how full your pool is currently.

1Gb isn't your best choice for networking, for any business case in this day and age you should be using 10Gb. It's cheap, standard, and ubiquitous.

Rebuild your pool to mirrors, keep the SLOG if it's a proper PLP enterprise SLOG otherwise replace, add more RAM if possible, keep your pool usage to under 50%, change to 10Gb networking with RoCE adapters.
 

geekmaster64

Explorer
Joined
Mar 14, 2018
Messages
50
Hey there,

I attached a copy of the storage configuration, I used the Wizard when I created it so my apologies for not fully understanding it in its completeness.

Regarding the pool storage, I had read that 80% was the 'do not cross' line for performance, but other sources have shown write deteriorates at 15% full to 50% speed. Can't find the link that I read that from in my history which doesn't really help. But regardless, the pool is at 41.9% presently.

10GBe definitely in the future, but in the mean time, its not a bottleneck presently. When I use a file that's been loaded into the ARC, I can saturate all 4 1GBE NICs (4GBPs), but read performance isn't a problem, just write.

With the attached screen-shot, am I in violation of the width you are referring to? Ultimately, it sounds like instead of having 3 mirrors, for my Z2 Pool, more mirrors = more IOPs.

Can't add memory in the near future, the wife budgeting committee has other appropriations for the monies this year.

Last question, would an NVMe SSD for my SLOG improve matters?

Again, thank both of you for your help and wisdom!
 

Attachments

  • Raid Capture.PNG
    Raid Capture.PNG
    23.8 KB · Views: 341

geekmaster64

Explorer
Joined
Mar 14, 2018
Messages
50

geekmaster64

Explorer
Joined
Mar 14, 2018
Messages
50
Just spent some more time on this, found that the pool had "Sync - Always" set. When I set this to Standard, writes really increased on certain write work-loads.. But the original copying/paste inside a VM has changed wildy.

My read speeds go all the way up to a few hundred MB/s, then the reads stop, and the writes go at 200-400MB/s but these are exceptionally short bursts. So it reads a little bit at high rates, then writes a little bit at high rates. This goes on and off, often between this, there are pauses of 0 activity at all.

So then for fun, I set Sync - Disabled to see what the difference would be. It was staggering. Reads averaged (sustained now with few dips), 150-220MB/s and the writes were doing bursts at 300-400MB/s to keep up, averaging the file transfer at 150MB/s.

I'm going to take that SLOG out of the array and test it, I wonder if its no longer healthy or just not capable of keeping up and I actually do need an NVME AIC.
 
Last edited:
Top