Lackluster 10G performance

goosesensor

Dabbler
Joined
Jan 7, 2019
Messages
10
I've recently put together a new FreeNAS box and invested in some 10G infrastructure equipent for my home office. Now that the dust has settled and the NAS has been up and running for a while, I'm turing my attention to performance tuning. Depending on client OS/protocol (see below) I get around 450MB/sec read/write transfers via SFP+. I'd like to improve this number. Here are the details of the setup:

FreeNAS 11.2
ASRock C236 WSI
Xeon E3-1245
16GB ECC Memory
8x Pioneer 1TB SSDs connected via SATA
Mellanox Connectx2 SFP+ NIC, MTU 9000

Pool is RAIDZ2 across all drives, have tired lz4 and uncompressed datasets.

The Workstation PC (main client) is a beefy Intel machine with NVMe storage and a 10Gtek Intel chipset SFP+ NIC.

The NAS and PC are connected through a MikroTik CRS305-1G-4S+in, NAS via DAC and PC via OM3. I have tested with the NAS & PC connected directly via OM3 with no difference in performance.


For a given platform I see the best performance using its native file sharing protocol.

Arch Linux: ~ 470MB/sec up&down via NFS with "-rw,soft,intr,rsize=131072,wsize=131072,async", NIC MTU 9000
Windows: ~ 470MB/sec up&down via SMB, default NIC MTU
macOS: ~ 470MB/sec up&down via AFP, NIC MTU 9000


To test NAS disk perfoamnce, I made an uncompressed dataset and used dd for reading and writing about 20GB to/from /dev/zero. I see about 1,400MB/sec read and 1,000MB/sec write:

Code:
% dd if=/dev/zero of=testfile bs=2048k count=10000
10000+0 records in
10000+0 records out
20971520000 bytes transferred in 19.359261 secs (1083280995 bytes/sec)


Code:
% dd if=testfile of=/dev/zero bs=2048k count=10000
10000+0 records in
10000+0 records out
20971520000 bytes transferred in 14.378746 secs (1458508227 bytes/sec)


To test TCP performance, I used iperf yielding a full 10Gbps between NAS and PC:

iperf PC -> NAS:

Code:
% iperf -c 10.0.1.2
------------------------------------------------------------
Client connecting to 10.0.1.2, TCP port 5001
TCP window size:  845 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.4 port 50812 connected with 10.0.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  11.4 GBytes  9.80 Gbits/sec



iperf NAS -> PC:

Code:
% iperf -c 10.0.1.4
------------------------------------------------------------
Client connecting to 10.0.1.4, TCP port 5001
TCP window size: 35.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.2 port 22862 connected with 10.0.1.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  11.4 GBytes  9.75 Gbits/sec


iperf NAS loopback:

Code:
iperf -c 10.0.1.2
------------------------------------------------------------
Client connecting to 10.0.1.2, TCP port 5001
TCP window size: 47.9 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.2 port 37612 connected with 10.0.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  71.1 GBytes  61.1 Gbits/sec


iperf PC loopback:

Code:
% iperf -c 10.0.1.4
------------------------------------------------------------
Client connecting to 10.0.1.4, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 10.0.1.4 port 35818 connected with 10.0.1.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  98.8 GBytes  84.9 Gbits/sec


During sustained large file transfers between NAS and Workstation via NFS I see a NAS CPU usage of about 30% and memory usage usually hovers around about 80%. The workstation doesn't seem to blink.


So where is the bottleneck happening? Here's what I think I've ruled out:

- NAS resource constraints. CPU and memory are happy. Disks are fast.
- Workstation resource constraints. CPU and memory are happy. NVMe should be able to keep up, and testing with a ramdisk yields no changes.
- Network resource connstrains. The SFP+ switch seems happy, as bypassing it yields no changes.


This makes me think that NFS is improperly configured on the host, client, or both.

I've done some Googling and tinkering and most common suggestions I come across are:

1. Increase MTU [done]
2. Ensure full duplex [done]
3. For NFS (my main use) increate rsize and wsize [done]
4. For NFS use async. [not done]

For #4 -- async NFS -- I can't see to get this working. I've tried adding it to the server exports but it just gets overriten on restart. I've also tried setting vfs.nfsd.async=1 with sysctl but it too disappears after reboot. 'nfsstat -m' on the workstation never reports 'async' for and shares.

I'm out of ideas... help!
 
Last edited:

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Is the Z2 pool arranged as 2 vdevs or one? How quick are those SSDs by themselves?
 

goosesensor

Dabbler
Joined
Jan 7, 2019
Messages
10
Some notes about my original post:

With sync off I actually see 800MBps-1GBps write via NFS. It's kind of sporadic still a big improvement.

I tried an eBay Chelsio NIC in the freenas box and read/write results are identical to the Mellanox Conexant-2 that's usually in it. I also tried a Mellanox Conexant-2 from another PC in the Workstation and see a consistent +~80MBps increase in read. That's still only about 550-560, though.

@sotiris.bos
I played with opening the cases and hitting the cards with a fan but that didn't seem to have any effect, so it doesn't appear to be a thermal issue.

@Constantin
Love the avatar. Not really sure about the vdev.. probably one... :rolleyes: See screenshot below.

The manufacturer says 550MBps read, and since I can get a steady 800-1024 write over NFS I *assume* the read speed of the array is okay.
Screenshot_2019.03.08_23.06.29.png


Edit: One more detail. Using a sparse file of all 0s (from the disk performance test using dd from /dev/zero) NFS read & write saturate the NFS connection. Files filled with data from /dev/urandom or otherwise non-sparse (like tgzs) still only read at 450-500ish over NFS.
 
Last edited:

sotiris.bos

Explorer
Joined
Jun 12, 2018
Messages
56
What brand is the onboard SATA controller? How is it connected to the CPU? I would tell you to try with an HBA but your motherboard does not have a second PCI-e slot.

Since you are getting the same speeds using different protocols and iperf shows no problems, I am led to believe that it either is a hardware problem or a pool problem, not a problem with NFS. Did you try SMB/Samba with jumbo frames on the workstation?

What is the clock speed on the workstation machine?
Is there a chance the NVMe controller is overheating and throttling?
Is trim working properly on both machines?
How full are your drives? (both FreeNAS and workstation)
Is the transfer speed the same when copying either small or large files?
Try dd with different block sizes. Check out this thread:
https://www.ixsystems.com/community...n-sync-writes-with-smaller-block-sizes.54675/
Here the OP is using an SLOG but post #12 might help with troubleshooting.
Also, can you try reading from the NAS from two or more clients and see what the total read speed is? That might also be helpful.
https://www.ixsystems.com/community...e-numbers-i-need-more-speed.30499/post-197028
 
Last edited:

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
If your transfer protocol and software supports multiple parallel transfers, I’d consider splitting the pool into two z2 vdevs and testing again. The IOPS of a 1-vdev pool is limited by the performance of a single disk. A 2-vdev pool will potentially double IOPS. However, you lose another 2 disks to parity, reducing the storage capacity you can use for data of the pool. (see explanation here).

I’d also consider doing the performance tests in each drive per the instructions in the SLOG discussion. This gives you a better idea how well your disks handle various block / data sizes and verifies that each drive behaves similarly. After all, the speed of the pool is limited by the slowest drive in it.

Past testing I’ve read about also suggested making sure that the test file being transferred between the server and the NAS is filled with random content so that the LZ4 compression doesn’t skew the results. Otherwise some transfers might look better then they should.

Thanks for the kind word re our vampire pig. That was our first Halloween project.

EDITED to remove a boneheaded mistake, thank you TimoJ
 
Last edited:

TimoJ

Dabbler
Joined
Jul 28, 2018
Messages
39
A 1-vdev pool write speed is limited to the speed of a single disk.
Isn't it: write speed of a single drive x number of drives (- parity drives) ?
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Let me re-word that. I meant IOPS. Argh. I shouldn't post before coffee.

Monday will be interesting. New motherboard arrives with a single LSI 2116 HBA instead of the merry band of assorted Marvell and Intel storage controllers on the C270D4I. While the new CPU will have a lower clockspeed than the C2750D4I I use today (1.7-2.3 vs. 2.4GHz), I suspect that the HBA and integrated SFP+ will make a positive difference (presently, SFP+ is via a PCIe 2.0 card). Time will tell!
 
Last edited:

goosesensor

Dabbler
Joined
Jan 7, 2019
Messages
10
@sotiris.bos

Thanks for the ideas. See answers below.


What brand is the onboard SATA controller? How is it connected to the CPU? I would tell you to try with an HBA but your motherboard does not have a second PCI-e slot.

- The SATA controller is the Intel C236 chipset. I assume that means it's connected via PCIe. No word on what the max combined SATA throughput is. If I were in the mood to throw money at the machine (I'm not!! :D ) I'd try with a Supermicro X11SDV-4C-TLN2F (with built-in 10Gbase-t) with HBA and 10Gbate-T SFP+ module for my SFP+ switch. Another option would be a Gigabyte MB10-DS4 (with built-in SFP+) and HBA...

What is the clock speed on the workstation machine?

- Workstation is 5.0Ghz.

Is there a chance the NVMe controller is overheating and throttling?

- I tried a number of disk stress tests on the workstation all while monitoring the NVMe temp. The temp did creep up a big but the drive was plenty fast (I forget exactly) and never slowed.

Is trim working properly on both machines?

- On the workstation it looks like TRIM is supported and enabled. On FreeNAS using `camcontrol identify /dev/ada0 ` it says it's supported but the enabled field is blank... Also recall that I also see the performance issues when reading from the NAS, not just writing.

How full are your drives? (both FreeNAS and workstation)

- They're all < 50%. I've also used multiple OS's on the workstation with different types of disks, and a separate Debian client machine with an NVMe connected via DAC.

Is the transfer speed the same when copying either small or large files?

- Yes -- generally appears the same. It does however look like there is an initial boost when writing, as if the data's being buffered in memory before hitting the disk. Clients often see 1GB+/sec for the first few seconds of a transfer to the NAS.

Try dd with different block sizes. Check out this thread:
https://www.ixsystems.com/community...n-sync-writes-with-smaller-block-sizes.54675/

- I basically did what 'xyzzy' did in post #12, but reading from disk and writing to memory, instead of the other way around. I created a 2GB file sourced from /dev/urandom, placing it in an uncompressed dataset. I then created a ramdisk (in the same manner as xyzzy) and did some tests copying the file from the dataset to the ramdisk:

Code:

    # pwd: /mnt/DefaultPool/Uncompressed


    dd if=random2G of=/mnt/md1/randomDst bs=1024k

    2147483648 bytes transferred in 0.708013 secs (3033113327 bytes/sec) # 2,893 MB/sec


    dd if=random2G of=/mnt/md1/randomDst bs=512k 

    2147483648 bytes transferred in 0.706123 secs (3041232401 bytes/sec) # 2,900 MB/sec


    dd if=random2G of=/mnt/md1/randomDst bs=256k

    2147483648 bytes transferred in 0.726005 secs (2957945792 bytes/sec) # 2,821 MB/sec


    dd if=random2G of=/mnt/md1/randomDst bs=128k 

    2147483648 bytes transferred in 0.763019 secs (2814454983 bytes/sec) # 2,684 MB/sec


    dd if=random2G of=/mnt/md1/randomDst bs=64k 

    2147483648 bytes transferred in 0.793015 secs (2707997521 bytes/sec) # 2,583 MB/sec


    dd if=random2G of=/mnt/md1/randomDst bs=32k

    2147483648 bytes transferred in 0.850199 secs (2525859001 bytes/sec) # 2,409 MB/sec


    dd if=random2G of=/mnt/md1/randomDst bs=16k

    2147483648 bytes transferred in 1.053377 secs (2038666201 bytes/sec) # 1,944 MB/sec


    dd if=random2G of=/mnt/md1/randomDst bs=8k 

    2147483648 bytes transferred in 1.457112 secs (1473794756 bytes/sec) # 1,405 MB/sec



Here the OP is using an SLOG but post #12 might help with troubleshooting.
Also, can you try reading from the NAS from two or more clients and see what the total read speed is? That might also be helpful.
https://www.ixsystems.com/community...e-numbers-i-need-more-speed.30499/post-197028

- I did try reading from two clients at once and I see around 700MB/sec TX with some bursts up to 800 or so in the FreeNAS web GUI.
Interesting that cyberjock says "Don't expect to do more than about 500MB/sec over NFS from a single client to/from your server no matter what you do. You just won't. If you had Chelsio cards, you might get to 800MB/sec."
I did see a slight increase when using a Mellanox on the workstation over the Intel card I usually use. The NAS is currently using a Chelsio card. I wonder if it's worth trying one in the workstation, too.
 
Top