10Gbe - 8Gbps with iperf, 1.3Mbps with NFS

LearnLearnLearn · Dec 19, 2021

Code:

# midclt call tunable.query | jq
[
  {
    "id": 3,
    "value": "2",
    "type": "LOADER",
    "comment": "",
    "enabled": true,
    "var": "hint.isp.0.role"
  },
  {
    "id": 4,
    "value": "2",
    "type": "LOADER",
    "comment": "",
    "enabled": true,
    "var": "hint.isp.1.role"
  },
  {
    "id": 5,
    "value": "2",
    "type": "LOADER",
    "comment": "",
    "enabled": true,
    "var": "hint.isp.2.role"
  },
  {
    "id": 6,
    "value": "2",
    "type": "LOADER",
    "comment": "",
    "enabled": true,
    "var": "hint.isp.3.role"
  },
  {
    "id": 7,
    "value": "0",
    "type": "LOADER",
    "comment": "?",
    "enabled": true,
    "var": "hw.pci.honor_msi_blacklist"
  },
  {
    "id": 8,
    "value": "2048",
    "type": "LOADER",
    "comment": "256",
    "enabled": true,
    "var": "hw.vmx.rxndesc"
  },
  {
    "id": 9,
    "value": "16",
    "type": "LOADER",
    "comment": "8",
    "enabled": true,
    "var": "hw.vmx.rxnqueue"
  },
  {
    "id": 10,
    "value": "1024",
    "type": "LOADER",
    "comment": "512",
    "enabled": true,
    "var": "hw.vmx.txndesc"
  },
  {
    "id": 11,
    "value": "16777216",
    "type": "SYSCTL",
    "comment": "2097152",
    "enabled": true,
    "var": "kern.ipc.maxsockbuf"
  },
  {
    "id": 12,
    "value": "1024",
    "type": "SYSCTL",
    "comment": "128",
    "enabled": true,
    "var": "kern.ipc.soacceptqueue"
  },
  {
    "id": 13,
    "value": "2048",
    "type": "SYSCTL",
    "comment": "256",
    "enabled": true,
    "var": "net.inet.ip.intr_queue_maxlen"
  },
  {
    "id": 14,
    "value": "4194304",
    "type": "SYSCTL",
    "comment": "16384",
    "enabled": true,
    "var": "net.inet.tcp.recvbuf_inc"
  },
  {
    "id": 15,
    "value": "16777216",
    "type": "SYSCTL",
    "comment": "2097152",
    "enabled": true,
    "var": "net.inet.tcp.recvbuf_max"
  },
  {
    "id": 16,
    "value": "4194304",
    "type": "SYSCTL",
    "comment": "65536",
    "enabled": true,
    "var": "net.inet.tcp.recvspace"
  },
  {
    "id": 17,
    "value": "4194304",
    "type": "SYSCTL",
    "comment": "8192",
    "enabled": true,
    "var": "net.inet.tcp.sendbuf_inc"
  },
  {
    "id": 18,
    "value": "16777216",
    "type": "SYSCTL",
    "comment": "2097152",
    "enabled": true,
    "var": "net.inet.tcp.sendbuf_max"
  },
  {
    "id": 19,
    "value": "4194304",
    "type": "SYSCTL",
    "comment": "32768",
    "enabled": true,
    "var": "net.inet.tcp.sendspace"
  },
  {
    "id": 20,
    "value": "0",
    "type": "SYSCTL",
    "comment": "Preclude unnecessary ARP info logging",
    "enabled": true,
    "var": "net.link.ether.inet.log_arp_movements"
  },
  {
    "id": 21,
    "value": "2048",
    "type": "SYSCTL",
    "comment": "256",
    "enabled": true,
    "var": "net.route.netisr_maxqlen"
  }
]

Samuel Tai · Dec 19, 2021

OK, so it looks like your tunables are missing cubic.

LOADER cc_cubic_load="YES"
SYSCTL net.inet.tcp.cc.algorithm="cubic"

Although, since you've got the ESX host and TN on the same 10G switch, you could also try:

LOADER cc_dctcp_load="YES"
SYSCTL net.inet.tcp.cc.algorithm="dctcp"

LearnLearnLearn · Dec 20, 2021

Hi,

Just got back to this and something is still not right.
At this point, the only NIC being used on the ESX host is the 10Gbe since it's just a test host anyhow.
The memory upgrade seems to have no value as the system said it was only using a few gigs during the transfer. I need to keep the wattage down as much as possible so if I don't need all those chips once we're done, I'd revert back to 32GB. We'll see.

Here is the updated list of tunables (in case I made a typo somewhere) and some images of the results of moving a multi-GB file/vm.
As you can see, I'm hitting only around 1.3Gbps.

Code:

# midclt call tunable.query | jq
[
  {
    "id": 3,
    "value": "2",
    "type": "LOADER",
    "comment": "",
    "enabled": true,
    "var": "hint.isp.0.role"
  },
  {
    "id": 4,
    "value": "2",
    "type": "LOADER",
    "comment": "",
    "enabled": true,
    "var": "hint.isp.1.role"
  },
  {
    "id": 5,
    "value": "2",
    "type": "LOADER",
    "comment": "",
    "enabled": true,
    "var": "hint.isp.2.role"
  },
  {
    "id": 6,
    "value": "2",
    "type": "LOADER",
    "comment": "",
    "enabled": true,
    "var": "hint.isp.3.role"
  },
  {
    "id": 7,
    "value": "0",
    "type": "LOADER",
    "comment": "?",
    "enabled": true,
    "var": "hw.pci.honor_msi_blacklist"
  },
  {
    "id": 8,
    "value": "2048",
    "type": "LOADER",
    "comment": "256",
    "enabled": true,
    "var": "hw.vmx.rxndesc"
  },
  {
    "id": 9,
    "value": "16",
    "type": "LOADER",
    "comment": "8",
    "enabled": true,
    "var": "hw.vmx.rxnqueue"
  },
  {
    "id": 10,
    "value": "1024",
    "type": "LOADER",
    "comment": "512",
    "enabled": true,
    "var": "hw.vmx.txndesc"
  },
  {
    "id": 11,
    "value": "16777216",
    "type": "SYSCTL",
    "comment": "2097152",
    "enabled": true,
    "var": "kern.ipc.maxsockbuf"
  },
  {
    "id": 12,
    "value": "1024",
    "type": "SYSCTL",
    "comment": "128",
    "enabled": true,
    "var": "kern.ipc.soacceptqueue"
  },
  {
    "id": 13,
    "value": "2048",
    "type": "SYSCTL",
    "comment": "256",
    "enabled": true,
    "var": "net.inet.ip.intr_queue_maxlen"
  },
  {
    "id": 14,
    "value": "4194304",
    "type": "SYSCTL",
    "comment": "16384",
    "enabled": true,
    "var": "net.inet.tcp.recvbuf_inc"
  },
  {
    "id": 15,
    "value": "16777216",
    "type": "SYSCTL",
    "comment": "2097152",
    "enabled": true,
    "var": "net.inet.tcp.recvbuf_max"
  },
  {
    "id": 16,
    "value": "4194304",
    "type": "SYSCTL",
    "comment": "65536",
    "enabled": true,
    "var": "net.inet.tcp.recvspace"
  },
  {
    "id": 17,
    "value": "4194304",
    "type": "SYSCTL",
    "comment": "8192",
    "enabled": true,
    "var": "net.inet.tcp.sendbuf_inc"
  },
  {
    "id": 18,
    "value": "16777216",
    "type": "SYSCTL",
    "comment": "2097152",
    "enabled": true,
    "var": "net.inet.tcp.sendbuf_max"
  },
  {
    "id": 19,
    "value": "4194304",
    "type": "SYSCTL",
    "comment": "32768",
    "enabled": true,
    "var": "net.inet.tcp.sendspace"
  },
  {
    "id": 20,
    "value": "0",
    "type": "SYSCTL",
    "comment": "Preclude unnecessary ARP info logging",
    "enabled": true,
    "var": "net.link.ether.inet.log_arp_movements"
  },
  {
    "id": 21,
    "value": "2048",
    "type": "SYSCTL",
    "comment": "256",
    "enabled": true,
    "var": "net.route.netisr_maxqlen"
  },
  {
    "id": 23,
    "value": "dctcp",
    "type": "SYSCTL",
    "comment": "",
    "enabled": true,
    "var": "net.inet.tcp.cc.algorithm"
  },
  {
    "id": 24,
    "value": "yes",
    "type": "LOADER",
    "comment": "",
    "enabled": true,
    "var": "cc_dctcp_load"
  }
]

LearnLearnLearn · Dec 20, 2021

Should I rebuild this, remove the tunes, something else? Maybe I need to post asking other R620 owners what they are getting for performance.

Samuel Tai · Dec 20, 2021

At this point, it's unclear if you're have a network issue, a vmWare issue, or a pool throughput issue. If iperf still gives you line rate, then we can rule out a network issue. fio will indicate the throughput of your pool.

LearnLearnLearn · Dec 21, 2021

Well, the two devices I've been testing are kind of isolated, meaning, I'm not testing against servers which are across other switches.
The ESX host and the TN server are both connected directly into the same 10Gbe switch.
That switch has a connection to the upstream network but when I do this testing, the test should not be extending beyond the 10Gbe switch that both are connected to.
I remind also that the 10Gbe switch ports connecting the two devices are set to MTU 9000 along with the ESX host NIC.
I think I can eliminate the problem being a network one.

The following test is being done on a vm on the same ESX host that cannot get better than 1.4Gbps transferring a file using NFS to TN.

root@truenas[~]# iperf -s -w 1024k
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 1000 KByte
------------------------------------------------------------
[ 4] local 192.168.1.150 port 5001 connected with 192.168.1.84 port 58908
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 9.79 GBytes 8.41 Gbits/sec

And for good measure, I ran iperf server on a server that is not on the same switch and only has a 1Gbps NIC.
This is the result from the TN server.

[root@dev50 /]# iperf3 -s
**snip**
[ 5] 0.00-10.36 sec 1.09 GBytes 908 Mbits/sec receiver

This is why I mentioned when starting this thread that it's confusing to see the NFS transfer being so slow.

At this point, we have spent a lot of time and effort tuning and re-installing TN, me ordering more parts and about to install a SLOG device when it comes in (today maybe) only to get to the same point. In some cases, maybe some of these suggestions are small increments that can add up but we were already seeing over 8Gbps transfers from machine to machine at least.

Since iperf can reach such high speeds without using NFS, to me, it means there is something about the TN NFS service that is not right or needs tuning. I'm not saying that in a blaming way, I just mean that it's obvious there is no network issue since we can see such speeds using other methods but not using NFS.

There are no mounting options when mounting an NFS share on ESX unless they are in some other section of the host like Manage, etc.
I wanted to try iperf from the ESX command line but all the articles I found on how to install it fail. That could be a good test.

LearnLearnLearn · Dec 21, 2021

Finally, I was able to get iperf installed on the ESX host and the results are interesting.
I first tested using our MTU 9000 setup then I tested by changing both back to 1500 MTU.
The results are almost identical. Using 9000 MTU, I got 9.36Gbps and using 1500, I got 9.38Gbps through NFS..

Host to host without NFS, I get over 9Gbps. So.... something to do with the NFS service?

Samuel Tai · Dec 22, 2021

Could be, or could be the underlying storage layer. Can you report the results of fio on the TN? This will give you an idea of the raw performance possible from your pool. You're likely correct this is an NFS issue, but let's make sure there aren't any other bottlenecks.

For an explanation of fio, see https://arstechnica.com/gadgets/202...-disks-find-out-the-open-source-way-with-fio/. You'll want to set fio options to those closest to your anticipated workload. Since you're moving VMs one at a time, you should probably try it with a 2 GiB sequential write.

Also, please check if you've disabled sync on your pool, since you don't have an SLOG yet.

LearnLearnLearn · Dec 22, 2021

Sure.
Optane device should be in today but it was delayed so it's hard to know when they put that on the tracking. Sync is disabled.
I ran two test examples I found on the net. Hope these options are ok but if not, let me know and I'll run it again.

Code:

# fio --filename=./test --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --bs=4k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest --size=4G
4ktest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=16
...
fio-3.27
Starting 16 processes
4ktest: Laying out IO file (1 file / 4096MiB)
Jobs: 13 (f=13): [E(1),r(7),f(1),r(2),E(2),r(2),f(1)][100.0%][r=768MiB/s][r=197k IOPS][eta 00m:00s]
4ktest: (groupid=0, jobs=16): err= 0: pid=35562: Wed Dec 22 07:43:15 2021
  read: IOPS=211k, BW=823MiB/s (863MB/s)(48.2GiB/60002msec)
    clat (usec): min=4, max=1552, avg=74.14, stdev=34.53
     lat (usec): min=4, max=1553, avg=74.31, stdev=34.54
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[   11], 10.00th=[   13], 20.00th=[   63],
     | 30.00th=[   70], 40.00th=[   74], 50.00th=[   78], 60.00th=[   82],
     | 70.00th=[   88], 80.00th=[   96], 90.00th=[  109], 95.00th=[  123],
     | 99.00th=[  149], 99.50th=[  163], 99.90th=[  281], 99.95th=[  375],
     | 99.99th=[  578]
   bw (  KiB/s): min=604020, max=1265007, per=100.00%, avg=844666.28, stdev=9900.74, samples=1884
   iops        : min=150998, max=316246, avg=211160.84, stdev=2475.20, samples=1884
  lat (usec)   : 10=3.93%, 20=11.39%, 50=0.81%, 100=68.37%, 250=15.35%
  lat (usec)   : 500=0.12%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%
  cpu          : usr=3.08%, sys=96.89%, ctx=16607, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=12646671,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=823MiB/s (863MB/s), 823MiB/s-823MiB/s (863MB/s-863MB/s), io=48.2GiB (51.8GB), run=60002-60002msec

# fio --filename=./test --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --bs=4k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest --size=2G
4ktest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=16
...
fio-3.27
Starting 16 processes
Jobs: 2 (f=2): [_(4),r(1),_(8),E(1),_(1),r(1)][96.0%][r=345MiB/s][r=88.4k IOPS][eta 00m:02s]
4ktest: (groupid=0, jobs=16): err= 0: pid=35598: Wed Dec 22 07:44:46 2021
  read: IOPS=175k, BW=685MiB/s (719MB/s)(32.0GiB/47803msec)
    clat (usec): min=3, max=2587, avg=77.33, stdev=37.70
     lat (usec): min=3, max=2587, avg=77.50, stdev=37.71
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[   11], 10.00th=[   14], 20.00th=[   59],
     | 30.00th=[   67], 40.00th=[   72], 50.00th=[   77], 60.00th=[   84],
     | 70.00th=[   93], 80.00th=[  106], 90.00th=[  123], 95.00th=[  139],
     | 99.00th=[  165], 99.50th=[  174], 99.90th=[  243], 99.95th=[  314],
     | 99.99th=[  537]
   bw (  KiB/s): min=640638, max=1115190, per=100.00%, avg=825662.89, stdev=8488.41, samples=1295
   iops        : min=160154, max=278793, avg=206410.06, stdev=2122.10, samples=1295
  lat (usec)   : 4=0.01%, 10=3.39%, 20=11.13%, 50=1.52%, 100=59.94%
  lat (usec)   : 250=23.92%, 500=0.08%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%
  cpu          : usr=3.08%, sys=96.86%, ctx=12342, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=8388608,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=685MiB/s (719MB/s), 685MiB/s-685MiB/s (719MB/s-719MB/s), io=32.0GiB (34.4GB), run=47803-47803msec

Samuel Tai · Dec 22, 2021

OK, the problem appears to be your pool, despite the stripe of mirrors. The max transfer bandwidth it can sustain appears to be 700-800 MB/s, which limits the bandwidth available to NFS. I think we're maxing out what your drives are physically capable of.

The only way to go faster is to use SSDs for your pool.

LearnLearnLearn · Dec 23, 2021

I can't afford to change out the drives at this point. This is just one of many machines that were being built for an upgrade.
Maybe the SLOG will help some?

I think I am not quite understanding something in terms of drive speeds. Even SSD that does 500Mbps, only does just that, 500Mbps so how could it reach anywhere near the 10Gbps network speed? What is the most I could expect in what combination?

This is the specs on the 1TB drives being used for storage.

Dell 9W5WV Seagate ST91000640SS SAS Hard Disk Drive

Seagate Constellation.2 ST91000640SS 1TB / 1000GB 7.2K 6.0Gbps SFF Serial SCSI / SAS Hard Drive - Brand New OEM

www.disctech.com

I do have a bunch of 600GB 10K SAS drives that are are 135Mbps sustained transfer rate but the above are only a little less at around 115Mbps so even going from a 7200RPM drive to a 10K one would not make a lot of difference.

I might have some drives that are 500Mbps read but before I go through yet more steps, how sure are we about this? I mean, 1.3Gbps seems really slow doesn't it? Should I try to reconfigure the pool in some other way just to be sure and to perhaps get more space out of it?

The goal isn't so much to get the faster possible speed through tiny increments but to get as much use from the 10Gbps network.

Samuel Tai · Dec 23, 2021

You have to distinguish between the IO transfer rate (600 MBps, which is to the drive buffer) and the sustained rate (115 MBps, which is to the platters). With a 4-way stripe, you can get 4x 115MBps, or up to 3.7 Gbps, which is around what I expected. Since we're seeing only 800 Mbps, I suspect we're being limited by the SED encryption engine on these specific SAS drives. Do you have any drives that don't have SED encryption?

LearnLearnLearn · Dec 23, 2021

I looked and I do have some other 1TB drives but all seem to have this SED encryption function.
Is there a way to disable that maybe? Perhaps throwing them into a Linux box and running some commands against them? Looking on the net, there might be some way of doing that. Need to read more when i get a chance.

# sedutil-cli --scan
Scanning for Opal compliant disks
/dev/da0 No SEAGATE ST91000640SS AS08
/dev/da1 No SEAGATE ST91000640SS AS08
/dev/da2 No SEAGATE ST91000640SS AS08
/dev/da3 No SEAGATE ST91000640SS AS08
/dev/da4 No IBM-ESXSMBE2147RC SC18
/dev/da5 No SEAGATE ST91000640SS AS08
/dev/da6 No IBM-ESXSMBE2147RC SC17
/dev/da7 No SEAGATE ST91000640SS AS08
/dev/da8 No SEAGATE ST91000640SS AS08
/dev/da9 No SEAGATE ST91000640SS AS08
No more disks present ending scan

Samuel Tai · Dec 23, 2021

I don't think it's possible to disable the SED encryption engine on those drives. The most I'm aware you can do with SEDs is to disable the need for a key, but the drive still encrypts/decrypts with a null key.

Do you have any cheap SATA drives instead of these SAS drives? The SATA drives tend not to have SED functionality to meet the cheaper price point.

LearnLearnLearn · Dec 23, 2021

Unfortunately, I don't. I bought a bunch of drives to build a dozen or so servers but all are in use.

I have some mm1000ebkaf SATA which I think are 300Mbps or 375Mbps transfer rate but sata are all 7200 rpm if that matters.
Problem is the 4 way stripe we created will not be enough storage and I can't afford to get 2TB drives at this point.

I'll do a little more digging and see what I can come up with. I already missed my deadline so it's looking like I'm into January at this point so maybe there is an option to find some SSD drives. I'm already way over budget.

Cool thread, learning all kinds of things so appreciate that.

LearnLearnLearn · Dec 24, 2021

What do you think about the following options.
SSDSC2BB800G4P 800GB R500/W450, MTFDDAA800MBB 800GB R425/W375.

This one, MTFDDAK960MAV 980GB, I'm having a hard time confirming R/W speeds. One page says R279/W227 and another says R500/W400..

With much faster SSD transfer rates, I could get a larger amount of storage space with less mirrors right?

What kind of transfer speeds do you think a setup with eight SD's could see using the 10Gbps switch with?

Samuel Tai · Dec 24, 2021

Those should all be good options, except possibly the Microns, which claim to support SED functionality. If these are new drives, they'll be fantastic. If they're refurbished, you'll need to keep an eye out on the write lifetime remaining.

Using the Intels as a baseline, they claim 450 MBps, which works out to 3.6 Gbps each. A 4-way stripe of 2-way mirrors would saturate your 10G link at a total capacity of 3.2 TB. A 2-way stripe of 4-way RAIDZ1s would top out around 7 Gbps, and you'd have ~4.5 TB capacity, close to your requirement of 5 TB.

LearnLearnLearn · Dec 24, 2021

The 960GB drives, I would not have any way to know if SED had been enabled.
Those have a MTTF of some 1.2 million hours so I think they should be ok.

The stored data will not be critical because anything critical gets backed up multiple times so maybe I can get more storage with less safety in terms of stripes/mirrors. I'm also willing to give up some transfer speed since nothing will be running off this storage and the only 'live' data will be shared html/php pages for some of the servers. The servers are load balanced so share the same web pages to serve up.

LearnLearnLearn · Dec 24, 2021

BTW, for my intended use, do you think they need to be enterprise drives or do you think good quality laptop/desktop drives would do the job?

Samuel Tai · Dec 24, 2021

I would go with enterprise drives, since they're spec'ed for much longer write endurance, and are engineered for a much more stressful duty cycle than desktop/laptop SSDs.

Important Announcement for the TrueNAS Community.

10Gbe - 8Gbps with iperf, 1.3Mbps with NFS

Patron

Never underestimate your own stupidity

Patron

Patron

Never underestimate your own stupidity

Patron

Patron

Never underestimate your own stupidity

Patron

Never underestimate your own stupidity

Patron

Never underestimate your own stupidity

Patron

Never underestimate your own stupidity

Patron

Patron

Never underestimate your own stupidity

Patron

Patron

Never underestimate your own stupidity

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "10Gbe - 8Gbps with iperf, 1.3Mbps with NFS"

Similar threads