Poor iSCSI 10GbE performance

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
I rebuilt my TrueNAS server to the latest version and upgraded ESXi hosts, and this time used multi-pathing but get terrible iSCSI performance. It starts off really great and speed drops to a crawl.

Here is my setup: Dell r720xd, 256GB DDR3 ECC, 12 x 4TB SAS-2, pool is split into 3 x 4-drive z2 vdevs. I have an SSD for cache (240GB) and 2 x PCIe NVMe drives mirrored as LO (also 240GB). I have 2 ESXi hosts running 7.0 u1 all fully patched, HPE using their custom ISO, all extensions and drivers patched and updated. I have 2 x 1-GbE switches amd have iSCSI multipath set up properly with 2 physical adapters QLogic 57810 10 Gigabit Ethernet Adapter which support hardware iSCSI, each adapter is connected to a different switch, 2 different VLANs for iSCSI and the TrueNAS is also connected by a single 10GbE DAC to the appropriate switches, so there is no routing or cross-switch traffic happening. All other cards are Intel X520-DA2 and I have read about possible issues with these cards but can't see any errors or anything to indicate I am having hardware issues. The cards are genuine, and complain if non intel SFP+ modules are connected, but are happy with Cisco DAC, again genuine ones.

I have played about with sync on and off, I was using autotune but I turned that off and rebooted to try looking for better settings. Searching for recommendations brings up so many old, conflicting or outright ambiguous options for setting this, I have tried to filter to the last year to get up to date recomendations, but it is confusing and difficult to navigate. Does anyone have proven or latest settings to try?

For example, here is a test with 1GB dummy text files copying inside a 2016 server VM, this one shows sync disabled and you can see it starts off with 600MB/s and drops to less than single disk performance. Any ideas?

Screenshot 2021-03-07 120026.png
 

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
Why do all my posts get marked as waiting approval?

Anyway, here is my tuneables these were created by autotune. I've turned this off and rebooted and plan on changing these once to suit my setup.
Screenshot 2021-03-07 120252.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Why do all my posts get marked as waiting approval?

The forum has had some problems with spam. You don't meet some threshold for "known good participant" quite yet. Don't ask me which one, I don't know.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I rebuilt my TrueNAS server to the latest version and upgraded ESXi hosts, and this time used multi-pathing but get terrible iSCSI performance. It starts off really great and speed drops to a crawl.

Yup, that is hardly unexpected...

Here is my setup: Dell r720xd, 256GB DDR3 ECC, 12 x 4TB SAS-2, pool is split into 3 x 4-drive z2 vdevs. I have an SSD for cache (240GB) and 2 x PCIe NVMe drives mirrored as LO (also 240GB). I have 2 ESXi hosts running 7.0 u1 all fully patched, HPE using their custom ISO, all extensions and drivers patched and updated. I have 2 x 1-GbE switches amd have iSCSI multipath set up properly with 2 physical adapters QLogic 57810 10 Gigabit Ethernet Adapter which support hardware iSCSI, each adapter is connected to a different switch, 2 different VLANs for iSCSI and the TrueNAS is also connected by a single 10GbE DAC to the appropriate switches, so there is no routing or cross-switch traffic happening. All other cards are Intel X520-DA2 and I have read about possible issues with these cards but can't see any errors or anything to indicate I am having hardware issues. The cards are genuine, and complain if non intel SFP+ modules are connected, but are happy with Cisco DAC, again genuine ones.

The X520's are like the second-best-choice card, I don't know what "possible issues" you're referring to, because if they are genuine cards, fakes are the primary cause of "issues", and overheating is the next most common, which won't be happening in an R720XD.

But it is a real shame that you put all this effort and nice hardware into it and then sabotaged yourself with RAIDZ2.

Please do go read both linked articles:

https://www.truenas.com/community/r...and-why-we-use-mirrors-for-block-storage.112/

which is massively relevant to your particular case, and

https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

which contains information about other pitfalls.
 

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
Thanks for your reply, very helpful and I will do some reading. I went with the z2 vdevs after someone else's recommendation on the forums before I rebuilt this, it illustrates my point where I want to do 6 x 2 drive mirrors and was going to stick with that but was convinced not to. It's cool though. this is a homelab setup and I can just blow away the config and rebild. I'll do some testing and report back.
 

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
Well I rebuilt the pool with 6 x 2 drive mirrors, the same cache and LOG setup and not impressed with speeds again. I was getting a stable 140MB/sec copying data inside the pool with sync set to always, so really no better than I was getting with z2 vdevs.

Here is sync disabled, and again, was impressive about 500MB/sec until it ran out of steam and you can see speeds plunge. I'm nearly at the limits of patience with this, I don't understand how people are getting so much performance out of their setups. Where should I look for clues to see what is wrong? Obviously I can't run with syncs disabled even with a UPS, but this set up is capabale of so much faster writes.


Capture.PNG
 

Lix

Dabbler
Joined
Apr 20, 2015
Messages
27
QLogic 57810 10 Gigabit - Is this handling iSCSI in the Dell?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Well I rebuilt the pool with 6 x 2 drive mirrors, the same cache and LOG setup and not impressed with speeds again. I was getting a stable 140MB/sec copying data inside the pool with sync set to always, so really no better than I was getting with z2 vdevs.

Your transfer seems to have gone for a lot longer before running out of steam here, so I wouldn't say it's "no better."

I have an SSD for cache (240GB) and 2 x PCIe NVMe drives mirrored as LO (also 240GB).

Can you post the exact model numbers of the SSDs and what model HBA is in your R720XD?
 

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
Apologies, I meant with sync enabled it was not much better, not the night and day difference I was expecting. The screenshot is with sync disabled.

The cache is a sandisk SSD, it's not the best it's a 240GB 'SanDisk SSD PLUS 240 GB' speeds are meant to be 550 read and 440 write. The NVMe drives are corsair force MP510s 240GB suppsed to be 3100 read and 1050 write. All of these are MB/s of course.

I'm using an Intel X520-DA2 for the 10GbE iSCSI interfaces, I've tried with and without jumbo frames currently using 9000 MTU but it made next to no difference. The server does have a combo dual SFP+ and dual 1GbE adapter, it's an official dell card that plugs into a mezzaine slot, I actually get worse performance with that using the same cables and switch set-up.

And for completeness the SAS controller is a Dell H710 mini flashed to IT mode, according to seller (he has a youtube channel so I feel I trust him more than a normal ebay seller):
  1. LSI Avago IT mode firmware version P20 (20.00.07.00)
  2. MPTSAS2 BIOS ROM flashed version 07.39.02.00
  3. MPTSAS2 UEFI ROM flashed version 07.27.01.01
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Apologies, I meant with sync enabled it was not much better, not the night and day difference I was expecting. The screenshot is with sync disabled.

Ah, that's something entirely different. I'll review this later, but re: the sync enabled being identical, this is a result of the SSDs in use.

The cache is a sandisk SSD, it's not the best it's a 240GB 'SanDisk SSD PLUS 240 GB' speeds are meant to be 550 read and 440 write. The NVMe drives are corsair force MP510s 240GB suppsed to be 3100 read and 1050 write. All of these are MB/s of course.

Your choice of cache/L2ARC drive is fine, because it's just serving up random reads; almost every SSD can handle L2ARC duties from a performance perspective.

SLOG duties, on the other hand, are an entirely different and much more punishing workload. The link in my signature helps explain some of the nitty-gritty, but the short version is that SLOG performance has almost nothing to do with the "hero numbers" published by SSD vendor marketing departments. Those are done with large records (megabytes at a time) often written sequentially, at high queue depths. Basically, the benchmark is tuned for "max throughput over time." A sync-write workload that hammers your SLOG, on the other hand, often comes in at very small block sizes (4K, 8K, 16K) and with no queue depth - ZFS isn't going to send the acknowledgement back to the client system until the data is safe on non-volatile storage, such as your SLOG.

I couldn't find a benchmark on the 240GB model, but the larger 960GB was only able to turn in around 169MB/s at 4K Q1T1 according to a review on Tom's Hardware - and larger drives tend to have much better performance than smaller ones.

1615154241173.png


Obviously not every write will be at 4K granularity, but it shows the difference between the "real-world" SLOG performance compared to the "marketing numbers" of peak writes at 1MB+Q32T1 which turns in a screaming-fast 3GB/s for the 960GB drive.

1615154331078.png


The unfortunate truth here is that the Corsair Force MP510 is "not a good SLOG device" - sorry. If you're gunning for fast, you're likely thinking Intel Optane DC series.

And for completeness the SAS controller is a Dell H710 mini flashed to IT mode, according to seller (he has a youtube channel so I feel I trust him more than a normal ebay seller):

Based on the firmware/stats that controller is fine.

As far as why your speeds are dropping off when you're running without sync writes, that one also has a simple answer - your network pipe is faster than your member vdevs. A 10Gbps network can feed data in at (roughly) 1GB/s, and twelve SAS drives simply can't keep up with it. The mirrors do an admirable job of trying, and as you saw can cope better than the 2x6-way Z2, but eventually you overwhelm the amount of outstanding data allowed to live in RAM, the ZFS write throttle has to kick in, and back everything off. (Although 38MB/s is far too low for the "sustained write speed" - unless of course there's a lot of other activity on the pool competing for I/O time.)

Purely from a theoretical "don't-ever-run-production-this-way" perspective you could configure it as a 12-drive stripe and see how long it can sustain the speeds that way; this would let you extrapolate and hazard a guess as to how a 24-drive 12x2 mirror setup would perform.

HPE using their custom ISO, all extensions and drivers patched and updated.

Quick sidebar on this one - what's the hardware model here (DL360/DL380?) and have you managed to update the firmware, especially for the HP 57810 cards? Might need the HP SPP (Service Pack for ProLiant) ISO for that one.
 

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
Thanks for another helpful and detailed reply, really appreciate it. It makes sense and I don't mind hearing the hardware especially NVMe isn't that great I bought the cards new from scan.co.uk used them many times before and they were selling 'refurbished' NVMe drives which were brand new and sealed with 0 hours on them for £22 each delivered, so I can find plenty of other uses for them and I don't mind buying something that will make a difference, but it's starting to look like ZFS is a terrible file system for iSCSI, might need to look at alternatives, I really just want the speeds I know the hardware can easily do however that needs to be done. My consumer grade QNAP NAS can give full 1 gig wire speed for VMs and isn't really that far off what TrueNAS was giving me. Optane could be an idea but so far I feel I have spent enough so far and not really got the speeds I would expect, so another £600 for further disappointment isn't very appealing.

The DL380s are gen9s and the 57810 cards were about £16 each so again not a huge loss if they don't perform. The firmware all look to be from 2019 so maybe the latest for this model. In truth both hosts have purple screened just mounting the iSCSI test LUNs so there could be something in that BUT...

before an unwarranted 'Aha...!' I had previous poor issues with FreeNAS running ESXi 6.7 with the same Dell hardware as now apart from the cache and LOG, and started a thread about wanting to do it properly this time round - I wasn't trying MPIO back then but the performance issues aren't new and not caused by the cards or drivers. I even got a 2nd 10gig switch to split the traffic into multiple paths cos I had no free ports and had high hopes for this. I appreciate gen9s aren't that new but I was running 10 year old lenovo x3650 M3 for years that never missed a beat, never crashed, never hung but struggle by todays standards.
 

Lix

Dabbler
Joined
Apr 20, 2015
Messages
27
No, these are handling iSCSI on the ESXi hosts

I asked because I had some issues with slow sequential reads with iSCSI from TrueNAS and Windows(Starwind) with those 57810 cards in my R720 and R720xd. Did not matter what the underlying storage was.
 

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
Thanks, I can try with another intel card. I am actually testing with unraid it's going to take a while to build the parity but at least I will know if it's my setup or ZFS.
 

Lix

Dabbler
Joined
Apr 20, 2015
Messages
27
They work great in unRAID, both native and with ZFS, as a NFS datastore.
 

Lix

Dabbler
Joined
Apr 20, 2015
Messages
27
TrueNAS Scale also gave some good read numbers (NFS), it might be linked to the version of bnx2x driver that the 57810 cards use on Linux. I did not investigate that well, as this setup is just for fun/play. I can run some fio benchmarks from a VM on the unRAID NFS datastore. I have 2x Micron S630DC-1600 1.6TB SAS SSD´s and Intel 3700 200GB Sata SSD that can be utilized with sync testing. If that is of interest of course?
 
Top