Extremely Slow Pool Performance with SAS Drives

janderson121

Cadet
Joined
Feb 17, 2023
Messages
3
Hello! I'm having some extremely poor performance with my TrueNAS setup. Below are my specs, is there anything that I might be doing wrong? I'm getting various transfer speeds, mostly hanging around 5-10 MB/s, sometimes I'll see it increase to 30 MB/s, and on very rare occasions I see 100-112 MB/s.
Any feedback and advice is welcome and appreciated! Thank you!

System Specs:

TrueNAS-13.0-U3.1
Dell R610 1U
2x Intel Xeon X5670 - 2.93Ghz
48GB of ECC DDR3 Memory
4x Gigabit Ethernet Adapters in a LACP Group - lagg0
Dell PE PERC H200E - 2 SAS cables attached, 1 per DS4246 controller


Storage Specs:

NetApp DS4246 24-Bay Disk Shelf - JBOD
2x IOM6 Controllers in DS4246

Pool info:
16x HGST HUS724030ALS640 3TB (2.73 TiB) SAS Drives
Raidz1 - 4 Disks per vDev, 4 vDevs
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Why do you blame the disks? They're one of many links in the chain and it's far from clear where the problem lies:
4x Gigabit Ethernet Adapters
What kind? Broadcom or Intel?
in a LACP Group - lagg0
You should probably get rid of that and see if your problem is solved. If you do need more than 1 Gb/s, get a 10GbE NIC.
Raidz1 - 4 Disks per vDev, 4 vDevs
That's something of a weird configuration. Sure, it supports more IOPS than two 8-wide RADIZ2 vdevs, but if you're strapped for IOPS, you should really be using SSDs instead of HDDs.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello,

Thanks for providing your system specs. You're using what I would consider older, but still "correct for the period" hardware for ZFS and TrueNAS - but the performance you're seeing is still lower than what you're after.

Let's look at a couple things - first, the network connections. LACP will not increase the speed of a single-client connection, so the 100-112MB/s peaks you see are likely to be your maximum. If possible, try removing the LACP/link-aggregation setup, and attempt to get a steady speed on a single wire first.

Next up, direction and type of traffic. Are you reading from or writing to the TrueNAS machine - and are you working with a large number of small files, or a small number of large files?

Finally, if you're using a ZFS feature like deduplication, that can significantly impact your throughput. Definitely turn that off in this case.

We can look at setting up an iperf test between your client and the TrueNAS server as well, as well as some quick local disk testing to identify which link in the chain is the problem.

What kind? Broadcom or Intel?
If my memory is correct, the R610 has a quad Broadcom BCM5709C onboard.

[4x 4-wide Z1] is something of a weird configuration.
Not too outside the norm, depending on the type of data intended to be stored there.
 

janderson121

Cadet
Joined
Feb 17, 2023
Messages
3
Thank you for the replies!

You should probably get rid of that and see if your problem is solved.
If possible, try removing the LACP/link-aggregation setup
I will remove the LACP configuration once my transfers are complete.

Are you reading from or writing to the TrueNAS machine
Sorry, I should have included this in my original post; Both. An NFS share is being accessed by my ESXi cluster. I have two hosts, and my intent is to run VMs with compute being provided by ESXi and storage being provided by TrueNAS.

are you working with a large number of small files, or a small number of large files?
A bit of both. The larger files (which would make up the majority of the file transfers) are .vmdk files, which vary in size from 10GB to 70GB. The smaller files would just be log and general files that VMware creates, and those are 1KB up to 11MB. There are generally around 10 of these smaller files per VM. I currently have 10 VMs. The VMs are not being stored on the TrueNAS as I'm facing these issues, just transferring to and from to test the throughput.

ZFS feature like deduplication
Dedup is not configured.

We can look at setting up an iperf test between your client and the TrueNAS server as well, as well as some quick local disk testing to identify which link in the chain is the problem.
I'm very new to the TrueNAS system, how would I go about this?

Thanks!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
An NFS share is being accessed by my ESXi cluster.

There's the issue. The ESXi NFS client uses synchronous writes (it wants your storage to confirm that your files are stored on non-volatile storage) and your VMDK files act much closer to a "block storage" workload than a regular file storage one.

I'm going to provide a few links to community resources here as some reading - but you will definitely need to make some adjustments to your pool layout, as well as likely obtain some additional hardware.




Related to the first resource, about "sync writes" - the way to accelerate these safely is with an SLOG (Separate LOG) device, which I have a handy link to a benchmark thread in my signature:


TL;DR
Virtualization is one of the toughest workloads for a storage server to handle. You'll need to reconfigure your 16x SAS drives as a group of 8x 2-drive mirrors, add a fast SLOG device in the R610 (If you have a free PCIe slot, an Optane P1600X would be a good budget option, there are SAS/SATA devices if there's a budget pinch but with less performance) and grab as much RAM you can get your hands on (thankfully DDR3 RDIMMs are relatively cheap)
 

janderson121

Cadet
Joined
Feb 17, 2023
Messages
3
There's the issue. The ESXi NFS client uses synchronous writes (it wants your storage to confirm that your files are stored on non-volatile storage) and your VMDK files act much closer to a "block storage" workload than a regular file storage one.
Thank you so much for the reply! I've been trying to get input from r/truenas on Reddit, but you just get berated over there. I truly appreciate your guidance!
the way to accelerate these safely is with an SLOG (Separate LOG) device
When you refer to SLOG device, is this a Log disk when creating a pool? (See image)
f6667d11983dc49dbee19119848ccc1b.png


Secondly, does the SSD need to be directly attached to the TrueNAS host? Could I use a SSD in the NetApp DS4246 JBOD?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Thank you so much for the reply! I've been trying to get input from r/truenas on Reddit, but you just get berated over there. I truly appreciate your guidance!

When you refer to SLOG device, is this a Log disk when creating a pool? (See image)
f6667d11983dc49dbee19119848ccc1b.png


Secondly, does the SSD need to be directly attached to the TrueNAS host? Could I use a SSD in the NetApp DS4246 JBOD?
Glad to help!

The "Log" device is indeed the correct type of VDEV - the SSD can be in an expansion shelf if it's SAS/SATA, but the reason I suggested locating it in the R610 head unit is because those SLOGs are almost always 2.5" form factor (I did make the assumption that your R610 was SFF) and also that the best-performing devices are NVMe-based and would have to sit there as that's the only option for a PCIe slot.
 
Top