Help needed - Low performance on NVME pool

Rand

Guru
Joined
Dec 30, 2013
Messages
906
And some results to see the impact:

fio --direct=1 --refill_buffers --norandommap --randrepeat=0 --group_reporting --ioengine=posixaio --name="a" --size=100G --bs=64k --iodepth=1 --numjobs=1 --rw=write --filename=/mnt/p3xz2/out.fio
=>
Run status group 0 (all jobs):
WRITE: bw=1614MiB/s (1692MB/s), 1614MiB/s-1614MiB/s (1692MB/s-1692MB/s), io=100GiB (107GB), run=63449-63449msec


fio --direct=1 --refill_buffers --norandommap --randrepeat=0 --group_reporting --ioengine=posixaio --name="a" --size=100G --bs=64k --iodepth=16 --numjobs=16 --rw=write --filename=/mnt/p3xz2/out.fio
=>
Run status group 0 (all jobs):
WRITE: bw=8748MiB/s (9173MB/s), 8748MiB/s-8748MiB/s (9173MB/s-9173MB/s), io=1600GiB (1718GB), run=187284-187284msec

(Thats a 3xZ2 pool, out of 6 SAS3 SSDs each, recordsize 128K, async)
 
Last edited:

potzkin

Dabbler
Joined
Dec 15, 2019
Messages
16
Code:
fio  --direct=1 --refill_buffers --norandommap --randrepeat=0 --group_reporting --ioengine=posixaio --name="a"  --runtime=300 --size=100G --time_based  --bs=64k --iodepth=1 --numjobs=1 --rw=write --filename=/mnt/<pool>/<dataset>/out.fio


This runs a 64K blocksize test with 1 thread, queue depth 1, streaming write, duration 300s

Code:
fio  --direct=1 --refill_buffers --norandommap --randrepeat=0 --group_reporting --ioengine=posixaio --name="a"  --size=100G --bs=64k --iodepth=1 --numjobs=1 --rw=write --filename=/mnt/<pool>/<dataset>/out.fio


This is the same with a test that ends when a 100G test file was created

further options:
Code:
#--rw=
    #read       Sequential reads.
    #write        Sequential writes.
    #trim        Sequential trims (Linux block devices and SCSI character devices only).
    #randread        Random reads.
    #randwrite        Random writes.
    #randtrim        Random trims (Linux block devices and SCSI character devices only).
    #rw,readwrite        Sequential mixed reads and writes.
    #randrw        Random mixed reads and writes.
    #trimwrite        Sequential trim+write sequences. Blocks will be trimmed first, then the same blocks will be written to.


To run with more threads change numjobs, to add more QueueDepth (stacked requests of a single thread) change iodepth.
At some point you will become CPU bound when scaling up (usually when you reach threads=# of cores unless they are very fast).

But remember, its of no use whatsoever to reach gigantic numbers in benchmarks if it does not reflect the actual use case that you have. Measure for your need and not to see huge numbers:)


Edit: Fixed double entry, added info re scaling+ comments
nice!!!
how can i make it run on multithread?
 

Rand

Guru
Joined
Dec 30, 2013
Messages
906
as in my example, set iodepth and or numjobs to > 1

edit: multithreaded would be numjobs to be precise;)
 

potzkin

Dabbler
Joined
Dec 15, 2019
Messages
16
as in my example, set iodepth and or numjobs to > 1

edit: multithreaded would be numjobs to be precise;)

cool, i start play with this tool, getting around 6GB on raid 10.
much reasonable numbers...
il add Connect X5 to see how it preform with network.
thanks for help!
 

Rand

Guru
Joined
Dec 30, 2013
Messages
906
Let me quote myself :
But remember, its of no use whatsoever to reach gigantic numbers in benchmarks if it does not reflect the actual use case that you have. Measure for your need and not to see huge numbers:)

Its not about the numbers ;)
Whats your actual use case?
 

potzkin

Dabbler
Joined
Dec 15, 2019
Messages
16
another issue,
i connected the freenas with Mellanox, via iperf im getting 50Gig of between client and freenas.
so far it is fine...
when mount the nfs on ubuntu i cannot pass 1.5G


cat /proc/mounts | grep nfs
12.12.12.10:/mnt/pool /mnt/storage nfs4 rw,noatime,nodiratime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,nocto,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=12.12.12.1,local_lock=none,addr=12.12.12.10 0 0


fio --direct=0 --norandommap --randrepeat=0 --group_reporting --ioengine=posixaio --name="a" --size=1G --bs=1M --iodepth=20 --numjobs=20 --rw=write --filename=/mnt/storage/out.fio

Run status group 0 (all jobs):
WRITE: bw=1570MiB/s (1646MB/s), 1570MiB/s-1570MiB/s (1646MB/s-1646MB/s), io=20.0GiB (21.5GB), run=13046-13046msec


any help will be great...
 

Rand

Guru
Joined
Dec 30, 2013
Messages
906
Well -

The following things come into play...
  • sync (you didnt mount with sync so it shouldn't be an issue)
  • nfs 4.1 - irrc FreeNas only does 4.0 at this time - not sure if that has negative impact
  • rsize/wsize - is your pool block size at 128k or why did you set wsize to that? You tried playing with that?
  • You can also play with network parameters (NIC level or sysctls) to enlarge buffers et al.
  • latency - on (very) fast network connection thats a topic which unfortunately leads to RDMA as solution - thats not something that FN really is supporting at this time
Else - I am not there yet as I am still stuck on local performance issues, and my old pools didnt go beyond 1GB/s, sorry.

Still considering switching to ZoL due to lack of RDMA capabilities in FN to be honest... but maybe if I delay enough we do get the FN gui on Linux, but not sure if they support non BSD capabilities then...
 

potzkin

Dabbler
Joined
Dec 15, 2019
Messages
16
Well -

The following things come into play...
  • sync (you didnt mount with sync so it shouldn't be an issue)
  • nfs 4.1 - irrc FreeNas only does 4.0 at this time - not sure if that has negative impact
  • rsize/wsize - is your pool block size at 128k or why did you set wsize to that? You tried playing with that?
  • You can also play with network parameters (NIC level or sysctls) to enlarge buffers et al.
  • latency - on (very) fast network connection thats a topic which unfortunately leads to RDMA as solution - thats not something that FN really is supporting at this time
Else - I am not there yet as I am still stuck on local performance issues, and my old pools didnt go beyond 1GB/s, sorry.

Still considering switching to ZoL due to lack of RDMA capabilities in FN to be honest... but maybe if I delay enough we do get the FN gui on Linux, but not sure if they support non BSD capabilities then...


ill try to set nfs 4.
i also tried different rsize/wsize
for nics im connected via mellanox so bandwidth and latency is good as it gets.
enlarge buffers - do you have experience with that?

what is Zol?
 

Rand

Guru
Joined
Dec 30, 2013
Messages
906
buffers as in the FreeNas 10G tuning guide, or in the mellanox tuning guides for CX4/CX5 cards

ZoL ist ZFS on Linux
 

paulg

Cadet
Joined
Apr 11, 2020
Messages
5
ill try to set nfs 4.
i also tried different rsize/wsize
for nics im connected via mellanox so bandwidth and latency is good as it gets.
enlarge buffers - do you have experience with that?

what is Zol?
Any updates on your project?
I am going to build a nas with NVMEs so I want to see how it went for you.
thanks
 
Top