Performance regression after Freenas 11.2 (or 3?)

CDRG

Dabbler
Joined
Jun 12, 2020
Messages
18
Hi all,

I'm hoping that the collective can help here with what seems to be a regression of performance in my setup. I'll try to be as concise as possible.
For the HW:

2x E5-2697 v2 server with 256GB RAM
Brodcom 9305 HBA
8x WD60EFRX
6x WD30EFRX
Chelsio T4 NIC
ESXi 6.7

For the Truenas VM:
12.0-U2.1
12x CPU
128GB RAM
No SLOG
HBA passthrough
Disks above setup as 7x mirror VDEVs
3x datasets
-Media (1M record size)
-My storage (default 128k record size)
-Wife's storage (default 128k record size)
-All default lz4 compTest
SMB shares for Win10 clients
10Gb Intel NIC X520-DA2 configured for "users"
1Gb NIC from server onboard for CCTV feeds (connected, but not currently in use).

For the Client:
Win10 Pro
ASUS Zenith II Extreme mobo with Aquantia 10Gb NIC
All data sources on NVME disks, mainly 2TB 970 Evo or 1TB 970 Pro.

Network:
UniFi XG 6POE switch
No jumbo frames set anywhere.
In my original setup, I had 8x WD30EFRX disks as 4x VDEV mirrors on Freenas 11.x. Its performance for sequential transfers was pushing upwards of 600MBps or so over 10Gb. Upon upgrading to I believe a later version of 11.3, there was a massive regression in performance where I would barely touch 200MBps, mostly less. There was chatter about an issue related to Chelsio NICs, but as this was a VM, I wasn't aware of how this would have directly affected me. The fix for this in whatever version seemed to do nothing. Fast forward to today and I've just rebuilt as shown, rolling all the disks into the same pool and creating various datasets within. I'd have expected that with the extra fan out of VDEVs, I'd get that performance back, but it seems to be exactly the same as the previous setup. I'm now liking this to some HW related issue. I do not believe there's any network related issue here as, well...there isn't, and by that I mean there are no drops, errors, etc that are of any significance (3.05 e-8 errors/.0003% discards) nor or incrementing while I'm performing these tests. Server is connected via fiber back to the corresponding switch. Client is via 10Gb copper to the same switch, all within the same L2 segment.

I do have the ability to swap out to an Intel 520 NIC, though that will be a bigger to-do given my FW is also a VM on it. However, not against trying that, but no changes have occurred within the host so I'm struggling to see how this may be an issue.

What I do see that's interesting is via iperf testing, and I'll take this with a grain of salt.

To my Plex VM, I can get about 4.5Gb on a single socket; 8.5Gb with parallel connections.
To the NAS VM, about 2.6Gb regardless of how many connections.
To the host itself, only 285Mb, but that could just be the host doing something odd and not worried about it all that much.

So there's some wild differences here, and there's no logical reason for it unless it is some issue with the NIC itself and how its presented, though for both Plex and the NAS VM, it's identical (VMXNET 3).

That all said, short of swapping the NIC out, I'm very much open to suggestions.

TIA
 
Last edited by a moderator:

CDRG

Dabbler
Joined
Jun 12, 2020
Messages
18
Well...I stumbled on what may be the answer here...seems it was the NIC configuration. VMXNET 3, which is what had been used up to this point, somehow picked up an issue. I've changed it to be E1000e and I'm back up to ~900MBps sequential transfers. I don't quite know what happened, or when, but seems I was a little premature here in my post in that I didn't quite troubleshoot it well.

That said, as I'm not a ESXi guru by any stretch, if someone has some better insight as to what this issue could be, I'd be happy to dig into it further.
 
Top