10Gbe - 8Gbps with iperf, 1.3Mbps with NFS

HoneyBadger · Dec 27, 2021

Common error. You'll need to target the storage device nvd0 with the diskinfo command.

The last device is the actual enclosure itself (ses0) which can be addressed by the sesutil command to do things like blink drive bay LEDs.

LearnLearnLearn · Dec 27, 2021

Thanks. I should have noticed that in the info I was able to pull up and in the disks list on the system.

# diskinfo -v nvd0
nvd0
512 # sectorsize
29260513280 # mediasize in bytes (27G)
57149440 # mediasize in sectors
0 # stripesize
0 # stripeoffset
INTEL MEMPEK1J032GA # Disk descr.
PHBT8030015K032P # Disk ident.
Yes # TRIM/UNMAP support
0 # Rotation rate in RPM

I also read that the SLOG is assigned to the pool/s and since I'm waiting for the drives, maybe I can't set anything up yet.
That's IF I understood what was meant in the post.

The enclosure access is interesting.
That means it's similar to some embedded devices I work with like mini routers, Pi devices, where I can control the LEDs as you mentioned and other devices built into the hardware.

HoneyBadger · Dec 27, 2021

Log devices (and cache devices) can be attached and detached from pools without harming the contents, so you can attach it to your current pool and see if it has a positive effect. Once you get your new SSDs you can then detach from current pool and attach to SSD pool.

Re: sesutil another user @Ender117 created a neat little batch script to check pools and blink the LED of the failed drive. Not sure if it's been updated since release but unless the commands have changed it should still work.

Wrote a script to turn on LED on disk failure, needs review/testing

Inspired by this thread, I wrote (shamelessly copying) a script using sesutil. First time try scripting but looks pretty straightforward to me. Tried on my system and looks like it worked. What's your thought on this? If you tried it on your system, let me know how it worked if [ ! "$1" ]...

www.truenas.com

LearnLearnLearn · Jan 1, 2022

Well, happy new year everyone. I've got down time this morning so thought I'd play with this since I got the SSD's yesterday.
I created the following and later added the slog but I'm not sure if I've done it right since there isn't much speed improvement.

Code:

# zpool status -v pool01
  pool: pool01
 state: ONLINE
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool01                                          ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/5177bf3c-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
            gptid/523d87a8-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            gptid/5096ab16-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
            gptid/52148cbb-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
          mirror-2                                      ONLINE       0     0     0
            gptid/50c02623-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
            gptid/52633bce-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
          mirror-3                                      ONLINE       0     0     0
            gptid/514863e8-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
            gptid/51fb7702-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
          mirror-4                                      ONLINE       0     0     0
            gptid/51b7e69d-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
            gptid/526a2d98-6a94-11ec-b980-90b11c1dd891  ONLINE       0     0     0
        logs
          gptid/0b3db7a3-6b29-11ec-8cb3-90b11c1dd891    ONLINE       0     0     0

errors: No known data errors

Not even hitting 1Gbps either from copying a file from esx host to TN or from a vm with the TN NFS share mounted and copying to it.

And, I see this in the logs. Guess I should have ordered a couple extras. Now I have to wait again for a replacement to come in.
Is it possible the performance is crummy because of just one drive?

Device: /dev/da2, SMART Failure: WARNING: ascq=0x5

Code:

# smartctl -a /dev/da2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               STEC
Product:              S842E800M2
Revision:             E4T1
Compliance:           SPC-4
User Capacity:        800,166,076,416 bytes [800 GB]
Logical block size:   512 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate:        Solid State Device
Form Factor:          2.5 inches
Logical Unit id:      0x5000a7203007e1ad
Serial number:        STM000176F52
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sat Jan  1 10:56:47 2022 PST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: WARNING: ascq=0x5 [asc=b, ascq=5]

Current Drive Temperature:     34 C
Drive Trip Temperature:        75 C

Accumulated power on time, hours:minutes 17792:50
Elements in grown defect list: 2

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   88546974545 50384615  88546974545  88597359160   88597359160    1134504.022         389
write:         0        1         1         1          1     623811.367           0
verify: 1335998615      228  1335998615  1335998843   1335998843      12658.790          15

Non-medium error count:        0

No Self-tests have been logged

I decided to try removing that drive from the pool and making a simple pool. I only got up to 900+Mbps.

# zpool status -v pool01
pool: pool01
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
pool01 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/b493dcba-6b52-11ec-8cb3-90b11c1dd891 ONLINE 0 0 0
gptid/b4a62c73-6b52-11ec-8cb3-90b11c1dd891 ONLINE 0 0 0
logs
gptid/b4804181-6b52-11ec-8cb3-90b11c1dd891 ONLINE 0 0 0

errors: No known data errors

LearnLearnLearn · Jan 1, 2022

Can't seem to find a solution/answer to this. Trying to create a smaller pool of three mirrors and adding SLOG but keep getting the stripe warning and no options to change anything.

I went ahead anyhow just to test. This is all being done with MTU back to defaults since the DC doesn't allow jumbo frames between different locations so there is no point in testing using jumbo.

I mounted the NFS share I created onto a vm on the ESX host that is connected using a 10Gbps NIC.
I was able to hit just above 1Gbps but it always drops down to just under 300Mbps as the file is copied.

Quite discouraged and nervous I've missed my deadline and have spent a lot of additional money to get not much further.
Must be missing something again.

Code:

root@truenas[~]# fio --name=pool01 --size=5g --rw=write --ioengine=posixaio --direct=1 --bs=1m
pool01: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=1
fio-3.27
Starting 1 process
pool01: Laying out IO file (1 file / 5120MiB)
Jobs: 1 (f=1): [W(1)][100.0%][eta 00m:00s]
pool01: (groupid=0, jobs=1): err= 0: pid=20820: Sat Jan  1 16:58:24 2022
  write: IOPS=249, BW=249MiB/s (261MB/s)(5120MiB/20534msec); 0 zone resets
    slat (usec): min=25, max=1507, avg=79.84, stdev=32.80
    clat (usec): min=351, max=3700.8k, avg=3927.01, stdev=105220.39
     lat (usec): min=379, max=3700.9k, avg=4006.86, stdev=105219.98
    clat percentiles (usec):
     |  1.00th=[    371],  5.00th=[    433], 10.00th=[    510],
     | 20.00th=[    523], 30.00th=[    537], 40.00th=[    545],
     | 50.00th=[    553], 60.00th=[    586], 70.00th=[    627],
     | 80.00th=[    652], 90.00th=[    676], 95.00th=[    709],
     | 99.00th=[   2737], 99.50th=[   2769], 99.90th=[   2933],
     | 99.95th=[3338666], 99.99th=[3707765]
   bw (  KiB/s): min=324982, max=1579596, per=100.00%, avg=918998.55, stdev=428309.41, samples=11
   iops        : min=  317, max= 1542, avg=897.00, stdev=418.16, samples=11
  lat (usec)   : 500=8.01%, 750=87.77%, 1000=0.06%
  lat (msec)   : 4=4.06%, >=2000=0.10%
  cpu          : usr=2.00%, sys=0.41%, ctx=5136, majf=1, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,5120,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=249MiB/s (261MB/s), 249MiB/s-249MiB/s (261MB/s-261MB/s), io=5120MiB (5369MB), run=20534-20534msec

NugentS · Jan 2, 2022

A SLOG device is pool critical, which is why you are getting the warning. If you tick force you can still create the pool, which for testing is fine

LearnLearnLearn · Jan 3, 2022

Hi, that's what I figured, nothing to lose, just testing so went ahead but I must still have something messed up since the speeds aren't great yet.

LearnLearnLearn · Jan 3, 2022

I noticed the 10GBe driver was a little older so I updated that but still barely getting 1Gbps.

LearnLearnLearn · Jan 3, 2022

I would really appreciate some help from anyone following this as I'm super late on getting this installed and there is no point running the storage using 10G NICS when I can only get around 1GGbps.

I must be missing something since I thought I followed all of the recommendations in this thread.

NugentS · Jan 3, 2022

For my hardware configuration (Pool design) - please see my signature

Running the same test as you on my pools:


fio --name=pool01 --size=5g --rw=write --ioengine=posixaio --direct=1 --bs=1m
pool01: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=1

BigPool, HDD, backed by Optane SLOG:
WRITE: bw=222MiB/s (233MB/s), 222MiB/s-222MiB/s (233MB/s-233MB/s), io=5120MiB (5369MB), run=23081-23081msec

SSDPool, SSD, backed by Optane SLOG


WRITE: bw=231MiB/s (243MB/s), 231MiB/s-231MiB/s (243MB/s-243MB/s), io=5120MiB (5369MB), run=22117-22117msec

So I get the same results as you from that test

NVMEPool for giggles. No SLOG, Sync=disabled


WRITE: bw=2129MiB/s (2232MB/s), 2129MiB/s-2129MiB/s (2232MB/s-2232MB/s), io=5120MiB (5369MB), run=2405-2405msec

New Test:

fio --bs=128k --direct=1 --directory=/mnt/lol/fio --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based

BigPool:


  READ: bw=254MiB/s (266MB/s), 254MiB/s-254MiB/s (266MB/s-266MB/s), io=15.0GiB (16.1GB), run=60435-60435msec
  WRITE: bw=255MiB/s (267MB/s), 255MiB/s-255MiB/s (267MB/s-267MB/s), io=15.0GiB (16.1GB), run=60435-60435msec

SSDPool:


   READ: bw=1605MiB/s (1683MB/s), 1605MiB/s-1605MiB/s (1683MB/s-1683MB/s), io=94.1GiB (101GB), run=60033-60033msec
  WRITE: bw=1605MiB/s (1682MB/s), 1605MiB/s-1605MiB/s (1682MB/s-1682MB/s), io=94.1GiB (101GB), run=60033-60033msec

NVMEPool, No SLOG, Sync=disabled (if it makes a difference)


   READ: bw=308MiB/s (323MB/s), 308MiB/s-308MiB/s (323MB/s-323MB/s), io=18.3GiB (19.6GB), run=60699-60699msec
  WRITE: bw=308MiB/s (323MB/s), 308MiB/s-308MiB/s (323MB/s-323MB/s), io=18.3GiB (19.6GB), run=60699-60699msec

So to summarise

Old Test (your testing command)
BigPool = 1.8Gb/s
SSDPool = 1.8Gb/s
NVMEPool = 17.0Gb/s !!!!

New Test
BigPool = 1.8 Gb/s
SSDPool = 12.9 Gb/s
NVMEPool = 2.5 Gb/s

My takeaway from this is that benchmarks are crap - what does anyone else think?
Maybe we could agree a fio command that a group of us could post results on!!

NugentS · Jan 4, 2022

Hmmm, that NVME test is weird, results feel wrong
Repeating the NVME Tests
Old: 7.4Gb/s - much different. Still quick though
New: 2.6Gb/s

LearnLearnLearn · Jan 4, 2022

Is there anything else I can share about the test setup? Maybe I don't have the SLOG set up right but even without it, now I have all SSD drives without encryption so why am I not even seeing over 1Gbps yet? That's confusing.

NugentS · Jan 4, 2022

Try running the fio command I used in the second set of tests. Does it give different results?

LearnLearnLearn · Jan 4, 2022

Sure though without the --directory option since I'm not sure what that would be right now.

Code:

# fio --bs=128k --direct=1 --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=32
...
fio-3.27
Starting 12 processes
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
Jobs: 12 (f=12): [m(12)][100.0%][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=65554: Tue Jan  4 08:37:55 2022
  read: IOPS=603, BW=75.6MiB/s (79.3MB/s)(4994MiB/66027msec)
   bw (  MiB/s): min=  139, max= 2341, per=100.00%, avg=1076.09, stdev=61.61, samples=108
   iops        : min= 1108, max=18729, avg=8603.11, stdev=492.83, samples=108
  write: IOPS=610, BW=76.8MiB/s (80.5MB/s)(5069MiB/66027msec); 0 zone resets
   bw (  MiB/s): min=  167, max= 2359, per=100.00%, avg=1087.94, stdev=61.71, samples=108
   iops        : min= 1335, max=18870, avg=8697.67, stdev=493.75, samples=108
  cpu          : usr=0.09%, sys=0.26%, ctx=9279, majf=0, minf=1
  IO depths    : 1=2.1%, 2=4.7%, 4=10.0%, 8=21.1%, 16=55.4%, 32=6.6%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.4%, 8=0.6%, 16=0.4%, 32=2.6%, 64=0.0%, >=64=0.0%
     issued rwts: total=39818,40316,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=75.6MiB/s (79.3MB/s), 75.6MiB/s-75.6MiB/s (79.3MB/s-79.3MB/s), io=4994MiB (5237MB), run=66027-66027msec
  WRITE: bw=76.8MiB/s (80.5MB/s), 76.8MiB/s-76.8MiB/s (80.5MB/s-80.5MB/s), io=5069MiB (5315MB), run=66027-66027msec

NugentS · Jan 4, 2022

the directory option allows you to indicate where the fio files will go. Without it I don't know where you were testing. Just create a folder called IO on the top level dataset/pool you are testing (it makes it easier to delete) so we know where the fio command is testing. Adjust to match your setup.
For example - My SSD pool is called SSDPool, so I used shell and created /mnt/SSDPool/IO and used that to test the fio command

For all I know you might have been testing your boot disk with that command

LearnLearnLearn · Jan 4, 2022

Ok, done.

Code:

# fio --bs=128k --direct=1 --directory=/mnt/pool01/io --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=32
...
fio-3.27
Starting 12 processes
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
randrw: Laying out IO file (1 file / 256MiB)
Jobs: 12 (f=12): [m(12)][100.0%][r=666MiB/s,w=660MiB/s][r=5325,w=5282 IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=68841: Tue Jan  4 13:07:22 2022
  read: IOPS=5614, BW=702MiB/s (736MB/s)(41.2GiB/60054msec)
   bw (  KiB/s): min=277199, max=1780099, per=99.99%, avg=718838.66, stdev=26213.18, samples=1428
   iops        : min= 2158, max=13904, avg=5611.39, stdev=204.81, samples=1428
  write: IOPS=5611, BW=702MiB/s (736MB/s)(41.2GiB/60054msec); 0 zone resets
   bw (  KiB/s): min=286652, max=1782065, per=99.93%, avg=718326.28, stdev=25961.68, samples=1428
   iops        : min= 2232, max=13918, avg=5607.60, stdev=202.83, samples=1428
  cpu          : usr=1.20%, sys=2.19%, ctx=561857, majf=0, minf=1
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=71.3%, 32=28.5%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.9%, 8=4.4%, 16=1.6%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=337176,336988,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=702MiB/s (736MB/s), 702MiB/s-702MiB/s (736MB/s-736MB/s), io=41.2GiB (44.2GB), run=60054-60054msec
  WRITE: bw=702MiB/s (736MB/s), 702MiB/s-702MiB/s (736MB/s-736MB/s), io=41.2GiB (44.2GB), run=60054-60054msec

NugentS · Jan 4, 2022

Which is a very different figure to your first result.
702MiB/s = 5.6 Gbps (actually it isn't as the converter I am using cos I am lazy doesn't have MiB/s only only has MB/s - its actually slightly more - its about 4.9% more I think)

So you are at 50%+ of your NIC speed now with a higher iodepth

NugentS · Jan 4, 2022

So - question - whose benchmark is correct? (answer - probably neither, or rather "it depends")

LearnLearnLearn · Jan 4, 2022

Well, that's an interesting observation and calculations but why am I still not seeing over 1Gbps in terms of transfers?

NugentS · Jan 4, 2022

Ahh - don't know.
Is db01, physical or virtual? Is there a directpath to the storage? [You may have answered these before - if so sorry] or does it go through other switches?
the pv command, what was it running on (hardware)?

I might try something similar on mine. Just don't have anything setup atm

Important Announcement for the TrueNAS Community.

10Gbe - 8Gbps with iperf, 1.3Mbps with NFS

actually does care

Patron

actually does care

Patron

Patron

MVP

Patron

Patron

Patron

MVP

MVP

Patron

MVP

Patron

MVP

Patron

MVP

MVP

Patron

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "10Gbe - 8Gbps with iperf, 1.3Mbps with NFS"

Similar threads