10Gbe - 8Gbps with iperf, 1.3Mbps with NFS

LearnLearnLearn · Jan 4, 2022

The db01 is a vm on the ESX host I've been testing.
I tested from both a vm and from the host itself with the NFS share mounted as another datastore.

NugentS · Jan 4, 2022

Can I suggest testing from a physical?
Just to eliminate ESX shenannigans

LearnLearnLearn · Jan 4, 2022

Sure, that's a great idea. Only problem is figuring out how to do it.
I do have another TN server with a 10GBe card in it so I'll have to dig around to see how I can mount the NFS share we've been testing onto it then transfer some files. I also have to make sure that transfer is going through the 10GBe card.

NugentS · Jan 4, 2022

OK - this is my test.
Server: TrueNAS /mnt/SSDPool/NFS/NFS.SSDPool.NewNAS.
Client: Ubuntu. 10Gb NIC (through 2 * 10Gb switches).

I mounted the TN folder on Ubuntu
I copied 32GB file from TN to /tmp on the Client (on a plain SATA SSD)
Renamed the file to howfastami.bin

I am limited by the SATA SSD here and any file caching the client is doing

Sorry - I don't understand your pv command
but
sync && time cp howfastami.bin /mnt/SSD/target.file


real    0m57.247s
user    0m0.055s
sys     0m20.470s

File was 30544 MB
30544/57.247 = 533.548 MB/s = 4.26Gb/s

OK - file is too big for RAM and I have now reached my linux knowledge limit.
I think I need a ramdisk and a smaller file as the client only has 16GB

So....................

mkdir /mnt/Ramdisk
mnt# mount -t tmpfs -o size=6G tmpfs /mnt/Ramdisk/
Find a 5GB file to copy around (Windows Pro ISO)
Copied the file from TN to the ramdisk
Renamed the file to newtestfile.wibble
sync && time cp newtestfile.wibble /mnt/SSD/transferredtestfile.wibble

File is 5216.02MB in 6.855seconds = 848MB/s = 6.784 Gbps

This is all done on my primary TrueNAS (QNAS would be horribly slow, and only has 1Gb)

Hopefully my maths is correct

NugentS · Jan 4, 2022

Hang on are you confusing Mb and MB?

This shows 533 MiB/s = 4.264 Gbps (+4.9% ish) which for a single threaded NFS copy isn't shabby

LearnLearnLearn · Jan 4, 2022

Mb mega bits, MB mega bytes. I might also have wrote MBps at some point in error.

You're right however,

READ: bw=702MiB/s (736MB/s), 702MiB/s-702MiB/s (736MB/s-736MB/s), io=41.2GiB (44.2GB), run=60054-60054msec
WRITE: bw=702MiB/s (736MB/s), 702MiB/s-702MiB/s (736MB/s-736MB/s), io=41.2GiB (44.2GB), run=60054-60054msec

According to this test, I'm getting 702MiB/s=5.888802816 Gigabit per second on both read/write which seems impossible but ok.
The problem is, this is what I'm getting using the fio test but using NFS, I'm barely getting 1Gbps.

I'd like to try this second machine test you suggested but am having a hard time configuring an IP for the 10G card on the other TN server I have. The server complains about not allowing DHCP and setting a static has no gateway option which I think is why it isn't working.

I'll have to spend some time on that.

NugentS · Jan 4, 2022

Err your NFS test, that I copied in my post, says you are getting 4+Gbps, not less than 1 Gbps writing to the remote store
Copy again

LearnLearnLearn · Jan 5, 2022

I apologize for not noticing these things, I'm doing too many things at the same time and end up losing focus.
Aside from not noticing those, if you look back at some of the images I posted, I notice that ESX is showing in Mbps and TN NFS are showing both Gbps and MBps. One of those tests shows 1.5 Gbps.

When using iperf and getting over 9Gbps, I know there is no networking issue and when testing against NFS, I'm able to get 4+Gbps as you mentioned but ESX and the TN reporting are showing I'm not getting any more than near 1Gbps.

NugentS · Jan 5, 2022

so, replicate my test (I built a Ubuntu client on good hardware). Hardware to hardware without ESX and its layers of abstraction anywhere near the process. Also, don't believe graphs and statistics. Believe what actually happens - where I timed a file copy of known size. Then test the NFS transfer in the same way I did - what do you get? (Run iperf from this client as well first, to prove connectivity is good). Do the copy from a ramdisk as I did, to eliminate the disk system on the client

Honestly at this point I am looking at the / a problem being somewhere other than in TN (but am not sure).

We / you have proved that the network seems good (iperf). We seem to have proved that the disk subystem is good (fio) at high queue depth, but maybe not good at low queue depth, but thats just a benchmark. You seem to have proved with a straight NFS copy to the TN that things are performing OK
Please run


fio --bs=128k --direct=1 --directory=/mnt/pool01/io --gtod_reduce=1 --ioengine=posixaio --iodepth=1 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based

Which is a specific test, based on the last depth but forcing iodepth to 1, what do you get?

BTW - I get


   READ: bw=1654MiB/s (1734MB/s), 1654MiB/s-1654MiB/s (1734MB/s-1734MB/s), io=96.9GiB (104GB), run=60023-60023msec
  WRITE: bw=1655MiB/s (1735MB/s), 1655MiB/s-1655MiB/s (1735MB/s-1735MB/s), io=97.0GiB (104GB), run=60023-60023msec

=13.2 Gbps

One thing, can you post your complete TN hardware spec in your signature please - that way I don't have to search back through 9 pages of posts to find it each time I want to look

LearnLearnLearn · Jan 5, 2022

I have added the system I'm working on to my sig.

The only other hardware I have that has 10G NICS is my older TN server but I've not been able to assign an IP to the 10G card, still trying to figure that out.

Running your fio command, this is what I get.

Code:

# fio --bs=128k --direct=1 --directory=/mnt/pool01/io --gtod_reduce=1 --ioengine=posixaio --iodepth=1 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=1
...
fio-3.27
Starting 12 processes
Jobs: 12 (f=12): [m(12)][100.0%][r=381MiB/s,w=380MiB/s][r=3047,w=3036 IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=9337: Wed Jan  5 10:09:10 2022
  read: IOPS=5947, BW=743MiB/s (779MB/s)(43.6GiB/60085msec)
   bw (  KiB/s): min=317024, max=1764703, per=100.00%, avg=764073.83, stdev=26329.92, samples=1428
   iops        : min= 2472, max=13783, avg=5965.63, stdev=205.70, samples=1428
  write: IOPS=5939, BW=742MiB/s (778MB/s)(43.6GiB/60085msec); 0 zone resets
   bw (  KiB/s): min=365824, max=1714839, per=100.00%, avg=762970.52, stdev=25587.57, samples=1428
   iops        : min= 2858, max=13390, avg=5957.02, stdev=199.90, samples=1428
  cpu          : usr=1.09%, sys=1.25%, ctx=718405, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=357329,356856,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=743MiB/s (779MB/s), 743MiB/s-743MiB/s (779MB/s-779MB/s), io=43.6GiB (46.8GB), run=60085-60085msec
  WRITE: bw=742MiB/s (778MB/s), 742MiB/s-742MiB/s (778MB/s-778MB/s), io=43.6GiB (46.8GB), run=60085-60085msec

NugentS · Jan 5, 2022

Which is 5.9+ Gbps, a lot slower than mine - but still a lot better than what you get from ESX.

Can you run the same test, but first remove the optane SLOG and set sync=disabled please (I don't know if this will make a difference, but am curious as you might be running into the limitations of the Optane. I use 2 * 900p which are a lot faster (and cost a lot more)). You don't need to remove the SLOG, just make IO a dataset and set the sync value on it

Your hardware, full specs please. HBA's, RAID configuration, Network Card etc

NugentS · Jan 5, 2022

If I run the fio command (iodepth=1) on my IO dataset I get
disabled: 2,142MiB/s = 17.1 Gbps
always: 1,677MiB/s = 13.4 Gbps
So the SLOG would seem to have a significant effect on things. (slower but safer)
The disabled value is the absolute maximum the pool will run at (wide open throttle, no seatbelt, full ahead and damn the torpedoes (to mix metaphors))

LearnLearnLearn · Jan 5, 2022

I don't see any option for removing the SLOG once it's created.

NugentS · Jan 5, 2022

Set up a dataset called IO (rather than a folder)
And then run the fio test on that dataset using sync=always and sync=disabled - see if there is a difference and what the values are

LearnLearnLearn · Jan 5, 2022

Code:

With sync always;
# fio --bs=128k --direct=1 --directory=/mnt/pool01/io --gtod_reduce=1 --ioengine=posixaio --iodepth=1 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=1
...
Run status group 0 (all jobs):
   READ: bw=258MiB/s (270MB/s), 258MiB/s-258MiB/s (270MB/s-270MB/s), io=15.1GiB (16.2GB), run=60069-60069msec
  WRITE: bw=258MiB/s (271MB/s), 258MiB/s-258MiB/s (271MB/s-271MB/s), io=15.1GiB (16.3GB), run=60069-60069msec

With sync disabled;
# fio --bs=128k --direct=1 --directory=/mnt/pool01/io --gtod_reduce=1 --ioengine=posixaio --iodepth=1 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=1
...
Run status group 0 (all jobs):
   READ: bw=674MiB/s (707MB/s), 674MiB/s-674MiB/s (707MB/s-707MB/s), io=39.5GiB (42.4GB), run=60031-60031msec
  WRITE: bw=673MiB/s (706MB/s), 673MiB/s-673MiB/s (706MB/s-706MB/s), io=39.5GiB (42.4GB), run=60031-60031msec

HoneyBadger · Jan 5, 2022

Those numbers (~260MB/s) are about right for being bottlenecked by the Optane M10 32G SLOG device.

SLOG can be removed from the pool - not sure exactly where it is on the TN12 UI, I believe it's under the pool status and choosing the gear/menu option to the right of the log vdev.

LearnLearnLearn · Jan 5, 2022

Bottleneck? I'm totally confused now. Several times in this thread, it was said that adding a SLOG, especially using a fast Optane device would increase performance drastically. Did I miss something?

Yes, you're right, removed it from the pool using status.

HoneyBadger · Jan 5, 2022

TooManyProjects said:
Bottleneck? I'm totally confused now. Several times in this thread, it was said that adding a SLOG, especially using a fast Optane device would increase performance drastically. Did I miss something?

Yes, you're right, removed it from the pool using status.

sync=disabled is "fast but unsafe"
sync=always is "slow but safe"
Adding the SLOG and keeping sync=always makes it "not as slow, but still safe"

Is pool01 the SSD-only pool, or the HDD pool?

If it's the HDD one, re-enable sync=always and do the fio test against the dataset again with the SLOG missing.

LearnLearnLearn · Jan 5, 2022

I replaced all the drives with non encrypted SSD's for the pool and two M.2 devices for OS and SLOG.
One of the SSD's died already so I'm waiting on a replacement to go with all the drives again.

Code:

# zpool status -v pool01
  pool: pool01
 state: ONLINE
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool01                                          ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/cc25d298-6b5e-11ec-8cb3-90b11c1dd891  ONLINE       0     0     0
            gptid/cc6c3490-6b5e-11ec-8cb3-90b11c1dd891  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            gptid/cc542f9d-6b5e-11ec-8cb3-90b11c1dd891  ONLINE       0     0     0
            gptid/cc742ee6-6b5e-11ec-8cb3-90b11c1dd891  ONLINE       0     0     0
          mirror-2                                      ONLINE       0     0     0
            gptid/cbf7fe61-6b5e-11ec-8cb3-90b11c1dd891  ONLINE       0     0     0
            gptid/cc3dbd73-6b5e-11ec-8cb3-90b11c1dd891  ONLINE       0     0     0

The thing is, I was fine with reliable over high speeds but then got all caught up in performance somehow thanks to really bad transfer rates ended up getting all this new hardware.
Since I have much better hardware in this server at this point, I'd like to take advantage with a mix of performance and safety. Some of this storage will be used to serve web pages hosted for load balanced web servers so it needs to be reliable.

NugentS · Jan 6, 2022

OK. We can deal with the Optane issue later.The M10 is great for an HDD Pool, not so hot for an SSD Pool, but thiis is the wrong rabbithole for the moment. We are sidetracked..

I suggest leaving the pool as sync=disabled for the moment

We / You need to eliminate ESX as being the problem causer. So a seperate 10Gb machine, mounting NFS and then copying a large file from the machine to the TN we are testing will test overall write performance to the pool, without going through the layers of virtualisation that ESX is adding. If transfer speeds are high then the issue is pointing at ESX. If the speeds are low then we are pointing at TN

You need iperf from the hardware client and a transfer of a large file with the sync && time cp howfastami.bin /mnt/SSD/target.file leave sync=disabled

Interestingly if we look at your pool, its setup (ignore the SLOG) the same way mine is. 6 SSD's (except mine are Intel DC 3610's and a bit larger. I have more memory (irrelevent) and a faster CPU (Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz) which might be relevent (I also have a network card!!! - see below)*


  pool: SSDPool
 state: ONLINE
  scan: scrub repaired 0B in 00:11:43 with 0 errors on Mon Dec 20 16:14:19 2021
config:

        NAME                                            STATE     READ WRITE CKSUM
        SSDPool                                         ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/42b6f9b3-3b92-11ec-b85f-3cecef246b70  ONLINE       0     0     0
            gptid/42cab853-3b92-11ec-b85f-3cecef246b70  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            gptid/4be68b19-3b92-11ec-b85f-3cecef246b70  ONLINE       0     0     0
            gptid/4bef25da-3b92-11ec-b85f-3cecef246b70  ONLINE       0     0     0
          mirror-2                                      ONLINE       0     0     0
            gptid/566a328b-3b92-11ec-b85f-3cecef246b70  ONLINE       0     0     0
            gptid/568e945a-3b92-11ec-b85f-3cecef246b70  ONLINE       0     0     0
        logs
          mirror-3                                      ONLINE       0     0     0
            gptid/b4f0c857-3bf4-11ec-8e4f-3cecef246b70  ONLINE       0     0     0
            gptid/b8c655c1-3bf4-11ec-8e4f-3cecef246b70  ONLINE       0     0     0

errors: No known data errors

*BTW - you still haven't posted full specs. At the moment I am suprised your hardware is working at all as you have no network card. We need full specs, not half specs. Assumptions here will not help.

Important Announcement for the TrueNAS Community.

10Gbe - 8Gbps with iperf, 1.3Mbps with NFS

Patron

MVP

Patron

MVP

MVP

Patron

MVP

Patron

MVP

Patron

MVP

MVP

Patron

MVP

Patron

actually does care

Patron

actually does care

Patron

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "10Gbe - 8Gbps with iperf, 1.3Mbps with NFS"

Similar threads