SOLVED Very Poor 10GBE Performance

Donny Davis · Jan 18, 2018

So I am better off just putting disks back in those slots and expanding my pool to 18x mirrored Stripes

jgreco · Jan 18, 2018

Donny Davis said:
The S320's are not a very fast SSD, so going down to one will most likely be more painful.

No, it won't be. As I said, commits to the ZIL happen sequentially. If you are writing to a single device, or writing to one of six devices but only one at a time, which is a harder thing? Hint: it's managing the round-robin of the six devices. It also means that if any one of the SLOG component devices fail, your SLOG is toast, so your reliability is SIGNIFICANTLY impacted. The only meaningful thing to do is to use two in a mirror, which increases the availability.

jgreco · Jan 18, 2018

Donny Davis said:
So I am better off just putting disks back in those slots and expanding my pool to 18x mirrored Stripes

You should do that anyways. Mount the SLOG SSD's inside. Suggest: industrial Velcro, if there's no mounting point available.

Donny Davis · Jan 18, 2018

I do have NVME pci-e cards in them, they just don't have the greatest of SSD to back them up. SM951 i do believe is the model of the SSD. They were for a different project that i had on these servers when i was running ceph. Are there any good M.2 options that can be had for *reasonable* monies.

Thanks for your time @jgreco, it has value and is very much appreciated.

Donny Davis · Jan 18, 2018

jgreco said:
No, it won't be. As I said, commits to the ZIL happen sequentially. If you are writing to a single device, or writing to one of six devices but only one at a time, which is a harder thing? Hint: it's managing the round-robin of the six devices. It also means that if any one of the SLOG component devices fail, your SLOG is toast, so your reliability is SIGNIFICANTLY impacted. The only meaningful thing to do is to use two in a mirror, which increases the availability.

Ok that took a lot longer for me to get than it should have. So it's not about the speed or IO of the zil, latency is much more important. Makes perfect sense when you break it down barney style like that for me. The SLOG is only writing meta-data, its not a cache tiering device.

Donny Davis · Jan 18, 2018

And how about that.. Same numbers without the SLOG

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=mount/test --iodepth=64 --size=7G --readwrite=randrw --rwmixread=50 --bs=1M

test: (g=0): rw=randrw, bs=1M-1M/1M-1M, ioengine=libaio, iodepth=64

fio-2.0.9

Starting 1 process

Jobs: 1 (f=1): [m] [100.0% done] [147.9M/143.9M /s] [147 /143 iops] [eta 00m:00s]

test: (groupid=0, jobs=1): err= 0: pid=1137: Thu Jan 18 22:00:54 2018

read : io=3603.0MB, bw=171468KB/s, iops=167 , runt= 21517msec

write: io=3565.0MB, bw=169659KB/s, iops=165 , runt= 21517msec

cpu : usr=1.23%, sys=6.22%, ctx=930, majf=0, minf=17

IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.1%

submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%

issued : total=r=3603/w=3565/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):

READ: io=3603.0MB, aggrb=171467KB/s, minb=171467KB/s, maxb=171467KB/s, mint=21517msec, maxt=21517msec

WRITE: io=3565.0MB, aggrb=169659KB/s, minb=169659KB/s, maxb=169659KB/s, mint=21517msec, maxt=21517msec

Disk stats (read/write):

vda: ios=10778/10627, merge=1/0, ticks=1162612/1447102, in_queue=2638626, util=99.54%

That is from a VM on a ISCSI mounted pool. (zvol to be specific).

Anything else I can do to make this run faster?

I still have the 600MB/s (as measured on the nic) issue server to server. Doesn't seem to matter 10G/40G, switch, no switch... All the exact same numbers. Read/Write FreeNAS to FreeNAS 60MB/s

Right now it's running a zfs send/rcv... benchmarks like the one above (abeit fio is good) are too synthetic to give you the real deal numbers.

I just can't understand what I am doing wrong here.

Donny Davis · Jan 18, 2018

I think I have this figured out. I started to watch the CPU util in top, and when compression is enabled its pinned at 100%.

The attached photo has a little bump and then a big bump. The little one is with compression turned on.

The compression process looks to have been my bottleneck. My little 2.4GHz CPU's just couldn't keep up with the workload.

I will report back with results when I have more to share

Donny Davis · Jan 18, 2018

rsync performance is still not very good at all (same as before), but at least ZFS replication is. This is a large improvement for me.

To be honest I see modest savings in the compression dept, so this is not a loss for me really. I have 150TB of usable space, and not 150TB of requirements.

Donny Davis · Jan 18, 2018

The top nic is coming from my oVirt deployment, the bottom from my replication. I think this thread can be marked as solved(ish). I will open a new thread in the right forum if I have more issues.

For now I can't expect 1GB writes and reads over the network without spending more time testing and tuning.

Thanks for your help @jgreco @Nick2253 @SweetAndLow

jgreco · Jan 19, 2018

Donny Davis said:
Ok that took a lot longer for me to get than it should have. So it's not about the speed or IO of the zil, latency is much more important. Makes perfect sense when you break it down barney style like that for me. The SLOG is only writing meta-data, its not a cache tiering device.

No problem, this stuff is hard to learn, what's obvious to me isn't always obvious to everyone. I enjoy being able to get you over that rough bit, especially since you figured out the compression issue yourself... suggests to me my time is well spent. ;-)

It may not be a good idea to disable compression completely. You may wish to try different compression algs to see if one performs better. You can switch as needed, previously written data remains compressed with the previous alg, and all the decompression algs are very fast. Compression is almost always faster than writing uncompressed to the raw disk, and should always be faster when reading.

Donny Davis · Jan 19, 2018

I needed to come back and correct my statement.

I disabled compression on the replication. I did both the pool and the replication, but in the end only needed to disable compression on the replication.

I also took your advice on doing something better for zil. I have two NVME drives in the server, they aren't great for this workload but have surely yielded respectable results for my synthetic testing. I am using two Samsung SM951's in a stripe for ZIL. They have no power loss protection, and are not the fastest on the market... but I am out of lab dollars for now. I will look into getting an Optane drive for ZIL and use the Samsung drives for L2ARC if I need them.

Depending on block size it varies from 100MB/s to 400MB/s

I am asking quite a lot from my cheapy setup, I only have 6K in these two servers and that is not bad for 100TB of usable space. $17/TB usable is not a bad price all things considered. I could probably rebuild my pools is a more space efficient manner and get similar numbers and drive my cost/TB down much further.

Write Test

512k Blocks
./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=mount/test --iodepth=64 --size=8G --readwrite=randrw --rwmixread=0 --bs=512K
test: (g=0): rw=randrw, bs=512K-512K/512K-512K, ioengine=libaio, iodepth=64
fio-2.0.9
Starting 1 process
Jobs: 1 (f=1): [w] [100.0% done] [0K/185.4M /s] [0 /370 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1108: Fri Jan 19 12:22:24 2018
write: io=8192.0MB, bw=252099KB/s, iops=492 , runt= 33275msec
cpu : usr=1.79%, sys=5.33%, ctx=2065, majf=0, minf=17
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=16384/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
WRITE: io=8192.0MB, aggrb=252099KB/s, minb=252099KB/s, maxb=252099KB/s, mint=33275msec, maxt=33275msec

Disk stats (read/write):
vda: ios=0/16335, merge=0/16243, ticks=0/2091323, in_queue=2096702, util=99.77%

1M Blocks
./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=mount/test --iodepth=64 --size=8G --readwrite=randrw --rwmixread=0 --bs=1M
test: (g=0): rw=randrw, bs=1M-1M/1M-1M, ioengine=libaio, iodepth=64
fio-2.0.9
Starting 1 process
Jobs: 1 (f=1): [w] [100.0% done] [0K/301.8M /s] [0 /301 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1111: Fri Jan 19 12:23:19 2018
write: io=8192.0MB, bw=305451KB/s, iops=298 , runt= 27463msec
cpu : usr=2.20%, sys=6.18%, ctx=1480, majf=0, minf=17
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.2%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=8192/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
WRITE: io=8192.0MB, aggrb=305451KB/s, minb=305451KB/s, maxb=305451KB/s, mint=27463msec, maxt=27463msec

Read Test
512K Blocks
./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=mount/test --iodepth=64 --size=8G --readwrite=randrw --rwmixread=100 --bs=512K
test: (g=0): rw=randrw, bs=512K-512K/512K-512K, ioengine=libaio, iodepth=64
fio-2.0.9
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [438.6M/0K /s] [877 /0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1105: Fri Jan 19 12:21:39 2018
read : io=8192.0MB, bw=425149KB/s, iops=830 , runt= 19731msec
cpu : usr=0.33%, sys=9.06%, ctx=2063, majf=0, minf=17
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=16384/w=0/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
READ: io=8192.0MB, aggrb=425148KB/s, minb=425148KB/s, maxb=425148KB/s, mint=19731msec, maxt=19731msec

1M Blocks
./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=mount/test --iodepth=64 --size=8G --readwrite=randrw --rwmixread=100 --bs=1M
test: (g=0): rw=randrw, bs=1M-1M/1M-1M, ioengine=libaio, iodepth=64
fio-2.0.9
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [289.8M/0K /s] [289 /0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1117: Fri Jan 19 12:33:37 2018
read : io=8192.0MB, bw=404817KB/s, iops=395 , runt= 20722msec
cpu : usr=0.18%, sys=7.75%, ctx=1005, majf=0, minf=17
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.2%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=8192/w=0/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
READ: io=8192.0MB, aggrb=404816KB/s, minb=404816KB/s, maxb=404816KB/s, mint=20722msec, maxt=20722msec

Disk stats (read/write):
vda: ios=24335/1, merge=1/0, ticks=2550564/119, in_queue=2556103, util=99.48%

I could do more synthetic testing, but its not worth the time. I will just put a workload on it, and see where I have issues.

Donny Davis · Jan 20, 2018

With a real workload copying virtual disks from one server to another using oVirt, I am getting the performance I would expect to see. It did require some tuning, but I am a happy camper.

I am marking this thread solved.

Donny Davis · Jan 20, 2018

Here are my settings in the tuning section. Some of these I pulled from my pfSense tuning I did on a different thread.
https://forum.pfsense.org/index.php?topic=139588.0

fricker_greg · Jan 30, 2018

This was also solved for me with the 11.1-U1 update

Important Announcement for the TrueNAS Community.

SOLVED Very Poor 10GBE Performance

Donny Davis

Contributor

jgreco

Resident Grinch

jgreco

Resident Grinch

Donny Davis

Contributor

Donny Davis

Contributor

Donny Davis

Contributor

Donny Davis

Contributor

Attachments

Donny Davis

Contributor

Donny Davis

Contributor

Attachments

jgreco

Resident Grinch

Donny Davis

Contributor

Donny Davis

Contributor

Attachments

Donny Davis

Contributor

Attachments

fricker_greg

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

SOLVED Very Poor 10GBE Performance

Contributor

Resident Grinch

Resident Grinch

Contributor

Contributor

Contributor

Contributor

Attachments

Contributor

Contributor

Attachments

Resident Grinch

Contributor

Contributor

Attachments

Contributor

Attachments

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Very Poor 10GBE Performance"

Similar threads