very slow pool with truenas on HBA LSI SAS2308

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
Unsure why I have all the problems on truenas.
Now I'm trying to debug the super slow speed.

zpool iostat shows basically, drives either idling or 200K, or so.
Is anyone with SAS2308 (IT mode) having issues?

I assume that because I migrated my pools to truenas 12, I can't go back to freenas 11 which was working perfectly for me.
I shouldn't get 170MB/s read/write on zraid1. (unless freenas lied to me when I was doing, 450 to 650 MB/s easy).
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I assume that because I migrated my pools to truenas 12, I can't go back to freenas 11
I believe that is a true statement if you did upgrade your pools. Make sure you remember this lesson, never upgrade your pools until you are absolutely sure you will not be rolling back. Many of us have been bitten by this and speak from experience.

If you decide to roll back you could backup all your data if it's not already backed up, destroy your vdevs, roll back to FreeNAS 11, create your vdevs, restore your data.

I shouldn't get 170Mb/s read/write on zraid1. (unless freenas lied to me when I was doing, 450 to 650 Mb/s easy).
Hum... 170MB/sec sounds good, not 170Mb/sec. MB = Mega Byte, Mb = Mega Bit. Please make sure you are using proper terminology, it matters. For now I will assume you mean MB/sec because Mb/sec would make your system almost useless.

Also, depending on your testing and your configuration, I guess you could get 650 MB/sec but I'm not sure you could on a RAIDZ1, unless it was cached (small files since you do have 32GB RAM), but I'm not the expert in this area.

EDIT: If you are having super slow speeds (Mb/sec), what is the capacity of your pool? Are you over 90% full? Also add any more details.
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
Corrected my post to reflect MB/s

Unsure how 170MB/s is good for raidz1 with 5 drives.

and this looks normal?

pool iostat -v
myPool 3.93T 50.6T 12 56 154K 822K
raidz1 3.93T 50.6T 12 56 154K 822K
gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 2 11 30.7K 164K
gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 2 11 31.1K 164K
gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 2 11 30.8K 164K
gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 2 11 31.0K 164K
gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 2 11 30.9K 164K


Under load: copying a big file over.
myPool 3.93T 50.6T 12 56 161K 817K
raidz1 3.93T 50.6T 12 56 161K 817K
gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 2 11 32.1K 163K
gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 2 11 32.5K 163K
gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 2 11 32.2K 163K
gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 2 11 32.2K 163K
gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 2 11 32.2K 163K
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
No that doesn't look right. I hope you are testing wrong. Tell me exactly how you are performing the test, pretend I'm stupid and you have to walk mr though each step. Believe it or not this actually helps in troubleshooting. I try to not make assumptions.
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
With pleasure. Let me see if I can give you a good little write up.
I'm actually debugging this (on my day off).
I was about to remove the LSI card and go back to the SATA port. (to remove variables).
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
Latest test:

pool iostat -v:

myPool 3.93T 50.6T 167 51 959K 316K
raidz1 3.93T 50.6T 167 51 959K 316K
gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 33 10 191K 63.3K
gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 33 10 190K 63.4K
gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 33 10 193K 63.3K
gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 33 10 193K 63.0K
gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 33 10 192K 63.2K


===

root@Network-Storage[~]# dd if=/dev/urandom of=./test.dat bs=128k count=10000
10000+0 records in
10000+0 records out
1310720000 bytes transferred in 7.629213 secs (171802774 bytes/sec)

root@Network-Storage[~]#

then just to see:

dd if=./test.dat of=./test2.dat bs=128k count=10000
10000+0 records in
10000+0 records out
1310720000 bytes transferred in 0.640176 secs (2047436641 bytes/sec)

that's very fast, must be cached here or the test methodology for 1 file to other with dd is incorrect. i just wanted to mention it in case that mattered.

====
fio --name=seqwrite \
--rw=write \
--direct=0 \
--iodepth=32 \
--bs=1M \
--numjobs=1 \
--size=32G \
--runtime=30

gives:
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=32
fio-3.19
Starting 1 process
seqwrite: Laying out IO file (1 file / 32768MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=434MiB/s][w=434 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=2981: Thu Nov 12 11:37:41 2020
write: IOPS=524, BW=524MiB/s (550MB/s)(15.4GiB/30001msec)
clat (usec): min=98, max=1146.3k, avg=1891.47, stdev=9196.80
lat (usec): min=105, max=1146.3k, avg=1905.01, stdev=9196.87
clat percentiles (usec):
| 1.00th=[ 102], 5.00th=[ 106], 10.00th=[ 159], 20.00th=[ 1205],
| 30.00th=[ 1319], 40.00th=[ 2212], 50.00th=[ 2278], 60.00th=[ 2311],
| 70.00th=[ 2343], 80.00th=[ 2376], 90.00th=[ 2442], 95.00th=[ 2474],
| 99.00th=[ 3490], 99.50th=[ 3523], 99.90th=[ 7898], 99.95th=[29754],
| 99.99th=[49021]
bw ( KiB/s): min=403888, max=3155855, per=99.73%, avg=535602.24, stdev=376438.14, samples=58
iops : min= 394, max= 3081, avg=522.48, stdev=367.55, samples=58
lat (usec) : 100=0.10%, 250=16.92%, 500=0.15%, 750=0.02%, 1000=0.33%
lat (msec) : 2=20.42%, 4=61.78%, 10=0.23%, 50=0.05%, 2000=0.01%
cpu : usr=1.24%, sys=8.76%, ctx=62888, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,15735,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
WRITE: bw=524MiB/s (550MB/s), 524MiB/s-524MiB/s (550MB/s-550MB/s), io=15.4GiB (16.5GB), run=30001-30001msec


Anything else I should provide, or maybe correct if the testing here is incorrect?
The reason I started debugging this is that at freenas 11.x the speeds were awesome, everything was smooth, fast, responsive, I even started using iscsi as everything was so awesome.
Now with truenas, everything seems to be broken, so I'm trying to fix 1 problem at a time. I had some issues with network, now I'm getting close to 10Gb speed between my machines (iperf3). but file transfer (iperf3 -f testing, or samba are super, super slow) so I started to debug and saw that my drives/HBA seem to be the issue, but unsure. Something is off though.

Update:

With the HBA out of the picture, direct to SATA ports, I get the exact same results. (within margin of error)
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Just to make sure that you agree, the fio test results looked good to you?
Now run it with --numjobs=4 and see what your results are. Your speed may go up slightly.

As for iostat, use zpool iostat -Td -v 5 and it will average for every 5 seconds. Now start copying files to see what iostat reports. You can change the -v 5 to -v 1 for once a second, or to -v 10 for average of 10 seconds (the value I like). To exit hit CTRL-C.

Let me know what your results are. And if you move files within FreeNAS/TrueNAS vice external network connection, they should be pretty fast. the command dd comes to mind or even just mv or cp.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
dd if=./test.dat of=./test2.dat bs=128k count=10000
10000+0 records in
10000+0 records out
1310720000 bytes transferred in 0.640176 secs (2047436641 bytes/sec)

that's very fast, must be cached here or the test methodology for 1 file to other with dd is incorrect. i just wanted to mention it in case that mattered.
Cached. The file you just wrote is ~1.2GB and fits into your RAM space I suspect.
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
Will test shortly. And yeah the fio looks good to me. I will also try a 64gb file for dd to go over my RAM
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Will test shortly. And yeah the fio looks good to me. I will also try a 64gb file for dd to go over my RAM
Hey, if it returns a super fast value with a 64GB file, you have some speedy drives :)
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
So, just tested with numbjob 4 and it actually went down. a lot. (do note that this is with the direct to SATA ports on mobo, not the HBA)

root@Network-Storage[~]# fio --name=seqwrite \
--rw=write \
--direct=0 \
--iodepth=32 \
--bs=1M \
--numjobs=4 \
--size=32G \
--runtime=30
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=32
...
fio-3.19
Starting 4 processes
seqwrite: Laying out IO file (1 file / 32768MiB)
seqwrite: Laying out IO file (1 file / 32768MiB)
seqwrite: Laying out IO file (1 file / 32768MiB)
seqwrite: Laying out IO file (1 file / 32768MiB)
Jobs: 4 (f=4): [W(4)][100.0%][w=407MiB/s][w=407 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=3587: Thu Nov 12 17:29:28 2020
write: IOPS=121, BW=121MiB/s (127MB/s)(3645MiB/30004msec)
clat (usec): min=111, max=411693, avg=8204.84, stdev=7181.83
lat (usec): min=123, max=411711, avg=8229.69, stdev=7181.83
clat percentiles (usec):
| 1.00th=[ 255], 5.00th=[ 996], 10.00th=[ 5669], 20.00th=[ 7767],
| 30.00th=[ 7832], 40.00th=[ 7898], 50.00th=[ 8979], 60.00th=[ 9372],
| 70.00th=[ 9503], 80.00th=[ 9503], 90.00th=[ 9503], 95.00th=[ 9634],
| 99.00th=[ 10028], 99.50th=[ 11338], 99.90th=[ 36439], 99.95th=[ 44303],
| 99.99th=[413139]
bw ( KiB/s): min=100151, max=312214, per=23.12%, avg=123778.90, stdev=36602.32, samples=59
iops : min= 97, max= 304, avg=120.34, stdev=35.74, samples=59
lat (usec) : 250=0.93%, 500=2.94%, 750=0.85%, 1000=0.33%
lat (msec) : 2=1.43%, 4=1.65%, 10=90.92%, 20=0.71%, 50=0.22%
lat (msec) : 500=0.03%
cpu : usr=0.31%, sys=2.39%, ctx=28533, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,3645,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
seqwrite: (groupid=0, jobs=1): err= 0: pid=3588: Thu Nov 12 17:29:28 2020
write: IOPS=127, BW=127MiB/s (133MB/s)(3814MiB/30005msec)
clat (usec): min=114, max=387558, avg=7841.57, stdev=6887.42
lat (usec): min=125, max=387584, avg=7865.23, stdev=6888.03
clat percentiles (usec):
| 1.00th=[ 147], 5.00th=[ 363], 10.00th=[ 1106], 20.00th=[ 7635],
| 30.00th=[ 7832], 40.00th=[ 7898], 50.00th=[ 8356], 60.00th=[ 9372],
| 70.00th=[ 9503], 80.00th=[ 9503], 90.00th=[ 9503], 95.00th=[ 9634],
| 99.00th=[ 10028], 99.50th=[ 10814], 99.90th=[ 35914], 99.95th=[ 52167],
| 99.99th=[387974]
bw ( KiB/s): min=97912, max=624429, per=23.95%, avg=128241.80, stdev=70343.53, samples=59
iops : min= 95, max= 609, avg=124.69, stdev=68.68, samples=59
lat (usec) : 250=2.57%, 500=5.66%, 750=1.18%, 1000=0.45%
lat (msec) : 2=0.97%, 4=1.49%, 10=86.52%, 20=0.92%, 50=0.18%
lat (msec) : 100=0.03%, 500=0.03%
cpu : usr=0.32%, sys=2.58%, ctx=28317, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,3814,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
seqwrite: (groupid=0, jobs=1): err= 0: pid=3589: Thu Nov 12 17:29:28 2020
write: IOPS=143, BW=144MiB/s (151MB/s)(4316MiB/30003msec)
clat (usec): min=90, max=264989, avg=6927.47, stdev=5438.61
lat (usec): min=99, max=265007, avg=6949.81, stdev=5440.84
clat percentiles (usec):
| 1.00th=[ 125], 5.00th=[ 225], 10.00th=[ 330], 20.00th=[ 775],
| 30.00th=[ 7701], 40.00th=[ 7832], 50.00th=[ 7898], 60.00th=[ 9241],
| 70.00th=[ 9503], 80.00th=[ 9503], 90.00th=[ 9503], 95.00th=[ 9634],
| 99.00th=[ 9896], 99.50th=[ 10421], 99.90th=[ 35914], 99.95th=[ 44827],
| 99.99th=[265290]
bw ( KiB/s): min=97912, max=1442982, per=26.52%, avg=141978.02, stdev=173961.74, samples=59
iops : min= 95, max= 1409, avg=138.12, stdev=169.94, samples=59
lat (usec) : 100=0.51%, 250=7.55%, 500=10.70%, 750=1.20%, 1000=0.28%
lat (msec) : 2=0.93%, 4=1.44%, 10=76.58%, 20=0.63%, 50=0.14%
lat (msec) : 100=0.02%, 500=0.02%
cpu : usr=0.33%, sys=3.04%, ctx=28394, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,4316,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
seqwrite: (groupid=0, jobs=1): err= 0: pid=3590: Thu Nov 12 17:29:28 2020
write: IOPS=130, BW=130MiB/s (137MB/s)(3915MiB/30002msec)
clat (usec): min=95, max=392176, avg=7637.90, stdev=6969.66
lat (usec): min=106, max=392215, avg=7661.39, stdev=6970.70
clat percentiles (usec):
| 1.00th=[ 135], 5.00th=[ 322], 10.00th=[ 482], 20.00th=[ 7308],
| 30.00th=[ 7832], 40.00th=[ 7898], 50.00th=[ 8029], 60.00th=[ 9372],
| 70.00th=[ 9503], 80.00th=[ 9503], 90.00th=[ 9503], 95.00th=[ 9634],
| 99.00th=[ 10028], 99.50th=[ 10683], 99.90th=[ 36439], 99.95th=[ 51643],
| 99.99th=[392168]
bw ( KiB/s): min=97912, max=792669, per=24.46%, avg=130989.71, stdev=90972.39, samples=59
iops : min= 95, max= 774, avg=127.39, stdev=88.90, samples=59
lat (usec) : 100=0.05%, 250=3.04%, 500=7.20%, 750=1.48%, 1000=0.41%
lat (msec) : 2=1.12%, 4=1.48%, 10=84.29%, 20=0.69%, 50=0.18%
lat (msec) : 100=0.03%, 500=0.03%
cpu : usr=0.34%, sys=2.67%, ctx=28363, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,3915,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
WRITE: bw=523MiB/s (548MB/s), 121MiB/s-144MiB/s (127MB/s-151MB/s), io=15.3GiB (16.5GB), run=30002-30005msec
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
for the dd commands.
after reboot (testing freenas 11.x) I ran again dd to see:

dd if=./test.dat of=./test3.dat bs=128k count=10000
10000+0 records in
10000+0 records out
1310720000 bytes transferred in 3.441758 secs (380828661 bytes/sec)

(380MB/s not cached, though would fit in RAM).

==
then re-run right after with new filename for destination but still same source file.

dd if=./test.dat of=./test4.dat bs=128k count=10000
10000+0 records in
10000+0 records out
1310720000 bytes transferred in 0.571512 secs (2293425433 bytes/sec)

2300MB.sec like before (stored in memory). As expected

=

Much bigger file so RAM wouldn't be used here:

dd if=/dev/urandom of=./test.dat bs=128k count=400000
400000+0 records in
400000+0 records out
52428800000 bytes transferred in 305.825307 secs (171433818 bytes/sec)

171MB/sec for creation of the file from urandom.

while it was doing this, I ran:
zpool iostat -Td -v 5

which gave poor numbers in my opinion, this is representative all the entries logged as it was doing the dd.

capacity operations bandwidth
pool alloc free read write read write
---------------------------------------------- ----- ----- ----- ----- ----- -----
freenas-boot 59.0G 389G 0 1.50K 1.60K 188M
ada0p2 59.0G 389G 0 1.50K 1.60K 188M
---------------------------------------------- ----- ----- ----- ----- ----- -----
myPool 3.93T 50.6T 64 104 382K 1.05M
raidz1 3.93T 50.6T 64 104 382K 1.05M
gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 12 21 76.8K 210K
gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 13 19 86.3K 216K
gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 11 20 66.4K 216K
gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 13 20 73.6K 219K
gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 12 21 79.2K 216K
=

Then I copied that file with dd locally instead of generating it from random.

dd if=./test.dat of=./test5.dat bs=128k count=400000
400000+0 records in
400000+0 records out
52428800000 bytes transferred in 507.007359 secs (103408361 bytes/sec)

100MB/sec. quite slow.

=

I think those 0 are alarming.

myPool 3.93T 50.6T 5 61 21.6K 393K
raidz1 3.93T 50.6T 5 61 21.6K 393K
gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 0 11 1.60K 76.8K
gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 0 12 2.40K 79.2K
gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 1 12 6.40K 76.8K
gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 1 12 8.00K 80.0K
gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 0 12 3.20K 80.8K
---------------------------------------------- ----- ----- ----- ----- ----- -----
Thu Nov 12 17:55:32 2020
capacity operations bandwidth
pool alloc free read write read write
---------------------------------------------- ----- ----- ----- ----- ----- -----
freenas-boot 80.9G 367G 713 713 88.8M 88.9M
ada0p2 80.9G 367G 713 713 88.8M 88.9M
---------------------------------------------- ----- ----- ----- ----- ----- -----
myPool 3.93T 50.6T 9 40 86.4K 249K
raidz1 3.93T 50.6T 9 40 86.4K 249K
gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 1 7 18.4K 48.8K
gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 2 8 15.2K 53.6K
gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 1 9 16.8K 52.8K
gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 1 7 14.4K 46.4K
gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 2 7 21.6K 48.0K
---------------------------------------------- ----- ----- ----- ----- ----- -----
Thu Nov 12 17:55:37 2020
capacity operations bandwidth
pool alloc free read write read write
---------------------------------------------- ----- ----- ----- ----- ----- -----
freenas-boot 82.1G 366G 740 761 92.1M 92.4M
ada0p2 82.1G 366G 740 761 92.2M 92.4M
---------------------------------------------- ----- ----- ----- ----- ----- -----
myPool 3.93T 50.6T 0 65 3.20K 469K
raidz1 3.93T 50.6T 0 65 3.20K 469K
gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 0 12 818 88.8K
gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 0 12 818 90.4K
gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 0 13 818 94.4K
gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 0 12 818 100K
gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 0 13 0 96.0K
---------------------------------------------- ----- ----- ----- ----- ----- -----
Thu Nov 12 17:55:42 2020
capacity operations bandwidth
pool alloc free read write read write
---------------------------------------------- ----- ----- ----- ----- ----- -----
freenas-boot 82.1G 366G 686 687 85.6M 85.6M
ada0p2 82.1G 366G 686 687 85.5M 85.6M
---------------------------------------------- ----- ----- ----- ----- ----- -----
myPool 3.93T 50.6T 0 39 811 243K
raidz1 3.93T 50.6T 0 39 811 243K
gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 0 7 0 49.9K
gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 0 9 811 53.1K
gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 0 8 0 47.5K
gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 0 6 0 45.1K
gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 0 7 0 47.5K
---------------------------------------------- ----- ----- ----- ----- ----- -----


PS: all the HD drives are new from 6months ago.

How to debug this?
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
I just ran the dd command but this time from a new USB 3 drive (mechanical)
I got:
dd if=/dev/urandom of=./test.dat bs=128k count=400000
400000+0 records in
400000+0 records out
52428800000 bytes transferred in 396.930561 secs (132085571 bytes/sec)

126MB/s (what I expect really (easy store 14TB enclosure)).

a single USB3 spinning drive is almost the same speed as the huge pool on zraid1 :)
something ain't right.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Run status group 0 (all jobs):
WRITE: bw=523MiB/s (548MB/s), 121MiB/s-144MiB/s (127MB/s-151MB/s), io=15.3GiB (16.5GB), run=30002-30005msec
So it didn't really go down, it remained basically the same with 4 jobs averaged out. That is actually a good thing.
Thu Nov 12 17:55:42 2020
capacity operations bandwidth
pool alloc free read write read write
---------------------------------------------- ----- ----- ----- ----- ----- -----
freenas-boot 82.1G 366G 686 687 85.6M 85.6M
ada0p2 82.1G 366G 686 687 85.5M 85.6M
---------------------------------------------- ----- ----- ----- ----- ----- -----
myPool 3.93T 50.6T 0 39 811 243K
raidz1 3.93T 50.6T 0 39 811 243K
gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 0 7 0 49.9K
gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 0 9 811 53.1K
gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 0 8 0 47.5K
gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 0 6 0 45.1K
gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 0 7 0 47.5K
---------------------------------------------- ----- ----- ----- ----- ----- -----
Um, where are your running this dd command from? The freenas-boot drive? It sure seems very active.

I think your system is running fine and you might not agree with me and that is your right.
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
hmm You're bringing a good point.

Thank you for your input, I really do appreciate it.


On the 2nd test , for the USB, I ran it from the backupUSB pool. which is what's expected. (/mnt/backup-usb)
but now that I'm thinking about it, I ran the first tests with dd ./test to go to ~ , which for root is /root which, now, think of it, is probably not in my /mnt/mypool. I'm an idiot. and that would explain why the iostat would show nothing, or very low values

Let me test from my real /mnt/mypool, but this is worrying me even worse, if ~ (/root) is on the boot up drive (which is my ssd)
the results are even worse then than previously thought.)

Going to test more.

Meanwhile, while you think my setup is working correctly, I want to say that, the only reason I started to dig into this is because since the migration to truenas, it started to perform really badly.
I used to get 450MB/s to 750MB/s transfers easy. iscsi was flying, samba was great etc.
Since truenas i get anywhere from 70MBs, or 150MBs at most in real life situations.
I have fixed my network, where now I get consistent 700 to 900MB/s network transfer (send and recv) just any file transfer is super bad.

I am currently reinstalling everything from scratch and going back to freenas 11.x
1) to confirm that the issue was freenas, not my hardware that could have gone faulty, or my client machines.
2) to do more perf. tests.

Still, unsure how a zraid1 pool of 5 drives (on LSI HBA or direct SATA on mobo) is almost the same speed as 1 single USB 3 mechanical drive (WD easy store, same HDD but with USB3.0 interface).
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
As I'm currently doing a replication from my current pool to external USB 3.0 drive (easy store WD).
I can see there's a problem.

Here's what zpool iostat shows:

backup-USB 1.76T 9.11T 0 146 1.60K 102M


gptid/c4237e14-2547-11eb-a37d-0007432cef80 1.76T 9.11T 0 146 1.60K 102M

---------------------------------------------- ----- ----- ----- ----- ----- -----


freenas-boot 122G 326G 1 0 17.3K 16.9K


ada0p2 122G 326G 1 0 17.3K 16.9K


---------------------------------------------- ----- ----- ----- ----- ----- -----


myPool 3.93T 50.6T 1.43K 69 97.3M 821K


raidz1 3.93T 50.6T 1.43K 69 97.3M 821K


gptid/fe04a05b-d3bc-11ea-86cf-94de80aeb20b - - 293 13 19.4M 164K


gptid/fde98c3f-d3bc-11ea-86cf-94de80aeb20b - - 293 13 19.6M 164K


gptid/fef0d2bc-d3bc-11ea-86cf-94de80aeb20b - - 292 13 19.6M 164K


gptid/ff0c60c3-d3bc-11ea-86cf-94de80aeb20b - - 293 13 19.4M 164K


gptid/ff230d5a-d3bc-11ea-86cf-94de80aeb20b - - 292 13 19.3M 164K


---------------------------------------------- ----- ----- ----- ----- ----- -----

correct me if I'm wrong in my interpretation:
a) The boot drive (SSD) is not moving much, which is expected.
b) the current main pool is super slow. 100MB/sec for zraid1 (each drive doing 20MB/sec, why are they so slow? is zraid broken?)
c) the backup-USB is working as I expect it, it's writing 100MB/s. (single HDD via USB3.0)

the current action is simply the Task > Replication task with myPool as src and backup-USB as target.

What am I missing?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I am currently reinstalling everything from scratch and going back to freenas 11.x
1) to confirm that the issue was freenas, not my hardware that could have gone faulty, or my client machines.
2) to do more perf. tests.
I commend you for doing this, verification is important. What I hope you do is be very consistent in your testing. You still may have a problem but it could be difficult to prove if you are not careful in how you test. And remember that you should turn off LZ4 compression for the dataset you are testing with and you should not use compressible data. It's best to create a file with random data in it and a file large enough to not be cached.

Good Luck and I hope that in the end, everything is working perfectly.
 

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
the plot thickens.
I copied my pool to freenas 11.
rebooted with my old freenas 11 USB key (instead of SSD).

I did a 10GBe to freenas (10GBe) disk test and sure enough, I was not dreaming. I got great speed again.

here the 2 screenshots from client, running Aja. (disk cache on Aja client disabled), any caching would be done on the server.

You can see that the 2nd pass is higher, (as expected). 1004MB/s vs 885MB/s

and this is all in the RAM really as my current pool is hosted on the USB 3.0 HDD drive. (only capable to do 130MB/s on a good day)

I'm going to reboot in TrueNAS and do the exact same test.

(note that I have 0 tunable set on both system, except max arc size on truenas)
Also, instate doesn't show much activity when doing this test, again, probably all RAM.
 

Attachments

  • Screen Shot 2020-11-13 at 11.51.31 AM.png
    Screen Shot 2020-11-13 at 11.51.31 AM.png
    998.9 KB · Views: 300
  • Screen Shot 2020-11-13 at 11.50.25 AM.png
    Screen Shot 2020-11-13 at 11.50.25 AM.png
    1 MB · Views: 331

x130844

Dabbler
Joined
Oct 25, 2020
Messages
46
Ok, just rebooted in truenas.

Using the real pool (to SATA ports, zraid1):
217MB/s write, 417MB/s read

2nd test right after to see any cache influence:
211MB/s. write and 408MB/s read

I then tested truenas again, but this time using the USB 3.0 HDD.
243MB/s write and 424MB/s read

So, to me, the pool doesn't seem to matter, the HD might be ok. but there's something going on.

The only variables at play here are the version of truenas, and possibly its settings.
The rest remains the same, same drives, same network, same client, (the client is on nvme drive getting 2700MB/s locally).

If I'm the only one getting slower speed on truenas, where do you think I messed up? how to debug this?
(PS iperf3 for network is full 10Gbe, send & receive, on both freenas, and truenas)
 

Attachments

  • Screen Shot 2020-11-13 at 12.12.57 PM.png
    Screen Shot 2020-11-13 at 12.12.57 PM.png
    989 KB · Views: 302
  • Screen Shot 2020-11-13 at 12.23.25 PM.png
    Screen Shot 2020-11-13 at 12.23.25 PM.png
    1,010.7 KB · Views: 302
  • Screen Shot 2020-11-13 at 12.15.08 PM.png
    Screen Shot 2020-11-13 at 12.15.08 PM.png
    1 MB · Views: 335

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
It's like you have no write cache, the reads seem to be the same.
 
Top