10 GbE: poor read performance accessing nonsequential data (image file sequences) using various intel NICs

LeifK

Cadet
Joined
Dec 4, 2020
Messages
5
Dear community,

we are in the process of building a file server for our small postproduction / 3D company. We need fast and responsive access from several Windows 10 clients (workstations) to a centralized storage server for our collaborative projects. Besides video data / raw footage we mainly work with file image sequences (EXR images in Full HD or 4K resolution, typically about 8MByte filesize each (varying from several hundred KByte to 25 MByte). Accessing out data from internal SSD allows for smooth playback and operation, our goal ist to achieve a similar degree of operability with a 10GbE-network on our new centralized storage.

So far we have successfully built a server based on decent supermicro hardware with sufficient amount of RAM. We are running Truenas Core 12 and wanted to run some initial network performance tests before implementing our full storage. Please take a look at the signature for detailed system specs.

right now we have only few workstations to be connected to the server, so our idea was to initially pass on using a 10GbE switch and instead put 2-3 network interface controllers inside the server to directly connect it to the Win10 clients, each using different network IDs (so not on the same subnet). No Bridging intended. So the 10GbE connections are used for nothing else but pure storage access. Every other communication between the workstations is done using conventional 1 GbE network.
A switch should be added in the future once there ist need to connect additional clients.

For testing purposes we only run one single storage device: a Samsung EVO 860 SSD 512 GByte. It has been tested under windows 10 and is running flawless.
We basically got our network running as intended, got several independend connections from several clients to the server.

Beforehands we did some tests using windows10 only, directly connecting our workstations in different combinations with each other. So under Win10 everything works as intended. Connection speeds are perfect, all NICs and all transceivers and Cabling works flawless. We get "correct" benchmarks and real life file access scenarios are more or less the same as if the files (image sequences) were stored on a local SSD.

Right now we have access to:

- two different intel X520 cards (genuine, passed yottamark verification), original intel transceivers as well as intel compatible ones from fs.com,
- one "OEM" intel X710-DA4 (probably chinese knockoff), two mellanox connectX-3 cards with transceivers from mellanox and fs.com,
- proper OM4 fiber cable and optional DAC cables
- as well as a 10GBaseT-based option (supermicro /intel X540 NIC in the server and onboard Aquantia-NICs in two workstations).


So next we did the same performance tests with Truenas 12, using different combinations of NICs in the server /workstations.
The single SSD is shared using SMB.

The problem we ran accross is that each time we use an intel based NIC inside the Truenas-server, we get poor read performance when accessing small(er) files (such as our typical image sequences), while sequential read performance is very good under every situation.

When we instead access one of the mellanox CX3-Adapters inside of the server, read performance is perfect.
Its always the same behaviour regardless of which combination of NIC / Transceiver, PCIE-Port and so on we use. Mellanox NICs give us great read performance in both sequential and small file access, while intel NICs have poroblems accessing smaller files.

The problem can be measured in benchmarks, as well as experienced under real world-usage conditions (processing/playback of image sequences in our editing software.). Accessing image sequences via the server's intel NICs is even slower than on conventional 1GbE connections, while transfer of huge files (ZIP-Folders, Raw Video Footage) is perfectly saturating 10GbE. Interestingly it does not matter if the image sequences are cached in Truenas' ARC or not - the read performance remains slow.

We made some benchmarks / screenshots to explain the problem. For further explanations, please see below:

iperf3_intel_S-C.gif
iperf3_intel_C-S.gif
iperf3_mellanox_S-C.gif
iperf3_mellanox_C-S.gif
CDM_intelx520_optical.gif
CDM_intelx540SM_RJ45.gif
CDM_Mellanox.gif
BM_Intel.jpg
BM_Mellanox.jpg


The above screenshots (iperf3) show that the connections themselves are okay. The NIC mentioned in red letters is the one inside of the truenas box.
With iperf3, the results are almost the same regardless of the kind of nic in the Win10 client, the transceiver and so on. Also the RJ45-connections show more ore less the same behavior.
To keep things simple, Jumbo frames are off, (though i get slightly different results with enabled Jumbos, especially the mellanox cards run a bit faster with Jumbos).

Next, we have Crystal Diskmark. as you can see, accessing the mellanox cards we get snappy file access in every situation. But with intel, read performance tanks dramatically under certain situations.
Same goes for the Blackmagic Disk Speed test, which delivers perfect performance values with the mellanox, but very bad performance with every intel NIC.

The JRJ45-Connection (intel X540) seens generally a bit faster than the X520 /X710 with Optics or DAC. But nevertheless, the performance is not acceptable.


While the above screenshots are taken without any tuneables, we did try different settings, especially regarding buffer sizes and other things we came accross in differend postings. But while we got small variances, the mainproblem persisted, the intel controllers in the truenas box seem to have major performance impacts when accessing smaller files, while huge sequential transfers are no problem at all.

Does anybody have made similar experiences? Is there a simple solution to this? Some driver setup/tuning/optimization required? We have made our experience right out of the box, so maybe there is a simple setting we did miss?

Do you need additional information?

Thank you very much,

Leif
 

LeifK

Cadet
Joined
Dec 4, 2020
Messages
5
Update: for additional testing we did a fresh installation of FreeNAS 11.3 instead of TrueNas 12.0. It seems that our problem does NOT exist on the FreeNAS installation. We will do some further investigations and try a new, fresh TrueNAS installation.
 

LeifK

Cadet
Joined
Dec 4, 2020
Messages
5
Update #2: Another fresh TrueNAS 12 installation brought back the issue. So our conclusion is:

The default TrueNAS CORE installation seems to be buggy, incomplete or does some other kind of crippling to various intel 10GbE driver setups.

This behavior is NOT existent in Freenas 11.3.


Is there nobody else experiencing this problem?
 
Last edited:

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
There have been multiple reports of similar issues. I've seen the same thing on Intel and Chelsio cards. Very interesting that you got a different result for the mellanox. I wonder if it really is the driver.

Anyway, you can see two of the bugs that have been reported here:

There may be more. After about 3 months, it doesn't seem like we are close to a solution, nor have the bug reports for this been highly prioritized.
 

LeifK

Cadet
Joined
Dec 4, 2020
Messages
5
Thank you very much for this useful information. Seems I should have taken a closer look to the bug reports before. Instead I have concentrated on web research and forum posts, with less success.

This is really disappointing, especially the low priority the problem has been given.
As is, this renders TrueNAS 12 unusable for us. So we will stick with FreeNAS 11.3 until things get fixed, Fingers crossed...

Thanks!!
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
@LeifK You may want to submit your own bug report. Your experience comparing the different NICS might be new and useful information. I was not able to go that far as none of my NICS worked correctly. I think the more people who report their issues, and the more information IX has, the more likely it is to get fixed. Up to you though.
 

LeifK

Cadet
Joined
Dec 4, 2020
Messages
5
Sure, I will definitely submit a report. I am out of office now, but back tomorrow to do so.
Really interesting - and strange - that also the Chelsio NICs are problematic.
 
Top