TrueNAS Core performance questions

TxAggieEngineer

Dabbler
Joined
Apr 25, 2023
Messages
16
I have been experimenting with TrueNAS over the last month or so and have it loaded on two QNAP TS-879U units. O.k., they're probably not the best platforms for testing but they're all I have that would hold enough drives. The configurations are as follows:

unit 1 (currently at ~90% capacity)
Intel Core i3-3220 3.3GHz
8GB RAM
Intel x520 NIC for 10G connectivity
1x WD Gold 4TB disk (boot drive)
7x WD Gold 4TB disks in RAIDZ1

unit 2 (near 0% capacity)
Intel Core i3-3220 3.3GHz
32GB RAM
Intel x520 NIC for 10G connectivity
2x Kingston 240GB A400 SSD's
6x HGST Ultrastar 2TB disks (various RAIDZ configs during testing)
Samsung 980 Pro 2TB NVMe SSD on PCIe expansion card
Silicon Power 128GB NVMe SSD on PCIe expansion card

A Nimble CS300 is the production iSCSI storage connected to the same hosts and 10G LAN. I've also benchmarked a Nimble CS1000 and Nimble AF20 for comparisons in different environments. CrystalDiskMark 8.0.4 on a VM is how the performance testing has been done. Default and "Real World Performance +Mix" profiles with 16MB and 32GB data sets have been used. The in-house testing was done by creating a new 100GB vdisk for an existing VM and moving that vdisk between the CS300 and the TrueNAS systems to minimize variables. Very low I/O loads were present at the time of testing on all units. iSCSI sync was disabled on the TrueNAS units; the rationale being that most organizations have daily backups and maybe hourly snapshots and have already implicitly accepted data losses of 1-24 hours so the loss of 5 seconds of data residing in ARC during an extremely rare crash or power outage is acceptable. Both TrueNAS systems are using a single 10G connection (with jumbo frames) because I can't get VMware to see the second path (it's working fine on the Nimble so it's not a VMware or network issue).

Results based on testing so far:
  • CrystalDiskMark shows that both TrueNAS units outperform both Nimble hybrid units. Testing showed that unit 2 even outperforms the Nimble all-flash unit.
  • As indicated in other posts, enabling L2ARC with the Samsung SSD in unit 2 resulted in poorer performance.
  • Unit 2 with 6xRAIDZ1 configuration slightly outperformed 3x stripe-over-mirror (3 vdev's where each vdev is a 2-disk mirror; not sure what the proper terminology is for this)
  • A non-critical terminal server was migrated between the CS300 and unit 2 and normal office applications were run. Although CrystalDiskMark shows better performance, VMware consistently shows higher latency by several milliseconds for the TrueNAS system under the VM -> Monitor -> Advanced graphs for the datastore and virtual disks.
  • Performance was slightly better with compression enabled. Although this surprised me at first, I thought about it and figured that the overhead of compressing data is offset by less data going to and from the disks, resulting in better performance.

Some questions...
  • Would "normal" workloads affect TrueNAS performance? In other words, is there something specific to the CrystalDiskMark tests that would show superior TrueNAS performance that would not be reflected in regular VM workloads? "Normal" (steady-state) loads are around 750-1000 IOPS at 4MiB/s according to the Nimble real-time performance stats.
  • Are there any faults are there with the testing methodology I used that would provide unrealistic results?
  • Is the rationale valid for disabling sync? Is it normal to have it enabled or disabled in typical business environments?
  • How often is L2ARC beneficial? It would seem a rare case that the same data is being read repeatedly and if a 5-pass CrystalDiskMark test doesn't use L2ARC I don't know what would.
  • The RAIDZ1 configuration outperforming the stripe-over-mirror was surprising. Is there a better way to configure the pool/vdev's with stripe and mirror that would perform better than RAIDZ1?
  • Any thoughts as to why VMware would see higher datastore and virtual disk latency for TrueNAS than the Nimble?
  • I'm having a really hard time believing that low-end hardware with commodity components can outperform these very expensive enterprise-class storage units so I'm trying to understand how these results are possible. Could it simply be that the Nimbles have the equivalent of sync enabled to prevent data loss? I know this is a relatively small and isolated test but I would expect the Nimble units, especially the all-flash one, to significantly outperform the test systems. This is not to disparage TrueNAS in any way but to simply point out the QNAP units are nothing special.
I am looking to potentially replace the CS300 with a TrueNAS Enterprise system, which is what led me to start this testing. Any thoughts on the above questions or using TrueNAS Enterprise systems in production environments would be appreciated!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Would "normal" workloads affect TrueNAS performance? In other words, is there something specific to the CrystalDiskMark tests that would show superior TrueNAS performance that would not be reflected in regular VM workloads? "Normal" (steady-state) loads are around 750-1000 IOPS at 4MiB/s according to the Nimble real-time performance stats.
Well, benchmarks are always benchmarks. Nothing beats testing with your workload.
Is the rationale valid for disabling sync? Is it normal to have it enabled or disabled in typical business environments?
Can you live without the 5 seconds of data? Note that the application (VMware?) might freak out, leading to longer times to recovery. For VMs, you would typically want sync enabled given all the layers involved down to the guest OS' filesystem.
How often is L2ARC beneficial? It would seem a rare case that the same data is being read repeatedly and if a 5-pass CrystalDiskMark test doesn't use L2ARC I don't know what would.
I'll answer a different question: L2ARC is useful when the ARC hit rate is low but the ARC deadlist hit rate is high and adding more memory is not practical. The impact is greater for a demanding workload like block storage, where a read served from ARC/L2ARC is one less operation that needs to be issues to the main disks.
The RAIDZ1 configuration outperforming the stripe-over-mirror was surprising. Is there a better way to configure the pool/vdev's with stripe and mirror that would perform better than RAIDZ1?
I suspect the benchmark was dodgy, though the use of async only makes life a lot easier for RAIDZ if there are mostly writes involved.
Also, keep in mind that RAIDZ is going to end up with significant wasted space when used with small blocks, like those you'd have with block storage workloads.
I'm having a really hard time believing that low-end hardware with commodity components can outperform these very expensive enterprise-class storage units so I'm trying to understand how these results are possible. Could it simply be that the Nimbles have the equivalent of sync enabled to prevent data loss?
That would likely be part of it.
I am looking to potentially replace the CS300 with a TrueNAS Enterprise system
Well, give iX a call and go over your scenario with them. Your scenario sounds fairly typical and they'll have experience when it comes to the details to help you out.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
unit 1 (currently at ~90% capacity)
Intel Core i3-3220 3.3GHz
8GB RAM
Intel x520 NIC for 10G connectivity
1x WD Gold 4TB disk (boot drive)
7x WD Gold 4TB disks in RAIDZ1

And you're doing iSCSI on this? Wow, that's horrible.

 

TxAggieEngineer

Dabbler
Joined
Apr 25, 2023
Messages
16
And you're doing iSCSI on this? Wow, that's horrible.

If you're referring to the hardware configuration and current capacity utilization, this was strictly for testing and comparing different hardware configurations. I wanted to see how different factors affected performance. For example, I've read that small amounts of RAM and high capacity utilization will result in poor performance and I wanted to see just how much of an effect there was. There's no production data on either of these systems.
 
Joined
Jun 15, 2022
Messages
674
And you're doing iSCSI on this? Wow, that's horrible.
So are some of the women guys in I.T. date, or worse yet: marry. Still, it happens on a regular basis. :tongue:

(the converse may or may not be true, the sample size of 'women in I.T.' is too small to be conclusive)
 
Top