truenasuserh
Cadet
- Joined
- Sep 29, 2020
- Messages
- 9
I've got the following setup:
Disks here are Seagate IronWolf 12 TB hard disks (ST12000VN0008)
I've got a test folder of representative production data, which I copy over SMB. This folder is 2.2GB, with mixed filesizes, some 12KB, some 2MB, with a ratio of roughly 50/50 (in size, not file count). During this, the smallest files seem to slow down the copy a lot. My boss is convinced that small or big files shouldn't matter, as small files would get aggregated using transactions, and therefore the hard drives will only see large consecutive writes. This copy takes roughly 59.5 seconds.
Copying this folder twice in parallel, I'd expect about two minutes. However, this actually takes about 8 minutes 20 seconds, with transfer speeds dropping to a couple hunderd KB/s when the transfers hit the small files.
I'd assume this is an IOPS issue. But what's killing my IOPS? The FIO tests show 2K iops, so a folder with 2K files should not be throttled on IOPS, right?
System information:
OS Version:
TrueNAS-13.0-U5.3
Model:
D120-C21
Memory:
32 GiB
CPU
Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz
(CPU MAX according to the reporting is 70%.)
(Mod Edit - Removed the color tags from your post and converted your multilines to codeblocks for readability.)
Disks here are Seagate IronWolf 12 TB hard disks (ST12000VN0008)
I've got a test folder of representative production data, which I copy over SMB. This folder is 2.2GB, with mixed filesizes, some 12KB, some 2MB, with a ratio of roughly 50/50 (in size, not file count). During this, the smallest files seem to slow down the copy a lot. My boss is convinced that small or big files shouldn't matter, as small files would get aggregated using transactions, and therefore the hard drives will only see large consecutive writes. This copy takes roughly 59.5 seconds.
Copying this folder twice in parallel, I'd expect about two minutes. However, this actually takes about 8 minutes 20 seconds, with transfer speeds dropping to a couple hunderd KB/s when the transfers hit the small files.
I'd assume this is an IOPS issue. But what's killing my IOPS? The FIO tests show 2K iops, so a folder with 2K files should not be throttled on IOPS, right?
Code:
fio --randrepeat=1 --direct=1 --gtod_reduce=1 --numjobs=1 --bs=4k --iodepth=64 --size=1G --readwrite=randwrite --ramp_time=4 --group_reporting --name=test --filename=test
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [w(1)][90.6%][w=344MiB/s][w=88.1k IOPS][eta 00m:03s]
test: (groupid=0, jobs=1): err= 0: pid=4860: Wed Nov 29 15:04:23 2023
write: IOPS=10.4k, BW=40.8MiB/s (42.8MB/s)(1023MiB/25076msec); 0 zone resets
bw ( KiB/s): min= 341, max=579808, per=95.83%, avg=40027.96, stdev=106981.49, samples=49
iops : min= 85, max=144952, avg=10006.61, stdev=26745.43, samples=49
cpu : usr=0.87%, sys=14.31%, ctx=12697, majf=0, minf=1
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,261841,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=40.8MiB/s (42.8MB/s), 40.8MiB/s-40.8MiB/s (42.8MB/s-42.8MB/s), io=1023MiB (1073MB), run=25076-25076msec
Code:
fio --randrepeat=1 --direct=1 --gtod_reduce=1 --numjobs=10 --bs=4k --iodepth=64 --size=1G --readwrite=randwrite --ramp_time=4 --group_reporting --name=test --filename=test
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
...
fio-3.28
Starting 10 processes
Jobs: 10 (f=10): [w(10)][46.7%][w=1151MiB/s][w=295k IOPS][eta 00m:08s]
test: (groupid=0, jobs=10): err= 0: pid=4889: Wed Nov 29 15:06:06 2023
write: IOPS=402k, BW=1572MiB/s (1648MB/s)(3773MiB/2400msec); 0 zone resets
bw ( MiB/s): min= 1092, max= 2089, per=89.32%, avg=1404.13, stdev=41.10, samples=40
iops : min=279768, max=535000, avg=359453.75, stdev=10520.42, samples=40
cpu : usr=4.14%, sys=61.84%, ctx=148692, majf=0, minf=1
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,965810,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=1572MiB/s (1648MB/s), 1572MiB/s-1572MiB/s (1648MB/s-1648MB/s), io=3773MiB (3956MB), run=2400-2400msec
Code:
fio --randrepeat=1 --direct=1 --gtod_reduce=1 --numjobs=10 --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4 --group_reporting --name=test --filename=test
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
...
fio-3.28
Starting 10 processes
Jobs: 5 (f=5): [_(1),w(1),_(1),w(3),_(2),w(1),_(1)][99.8%][w=298MiB/s][w=76.2k IOPS][eta 00m:01s]
test: (groupid=0, jobs=10): err= 0: pid=4910: Wed Nov 29 15:17:34 2023
write: IOPS=16.9k, BW=65.8MiB/s (69.0MB/s)(40.0GiB/621485msec); 0 zone resets
bw ( KiB/s): min= 7887, max=427386, per=100.00%, avg=67421.05, stdev=3381.92, samples=12352
iops : min= 1968, max=106843, avg=16852.13, stdev=845.49, samples=12352
cpu : usr=1.09%, sys=16.80%, ctx=8370246, majf=0, minf=1
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10474974,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=65.8MiB/s (69.0MB/s), 65.8MiB/s-65.8MiB/s (69.0MB/s-69.0MB/s), io=40.0GiB (42.9GB), run=621485-621485msec
Code:
zpool status pool1
pool: pool1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: resilvered 19.9G in 00:12:05 with 0 errors on Wed Nov 29 13:31:05 2023
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
gptid/96687099-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/f7429afe-dc42-11ed-9b7a-e0d55e60faba ONLINE 0 0 0
spare-2 ONLINE 0 0 0
gptid/97f3a674-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/a6c07145-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/99e77ff3-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/18d7ebc7-dc43-11ed-9b7a-e0d55e60faba ONLINE 0 0 0
gptid/9e3ed33b-d203-11eb-9e64-e0d55e60faba ONLINE 47 0 0
gptid/396b46cc-dc43-11ed-9b7a-e0d55e60faba ONLINE 0 0 0
gptid/9bf4f4c2-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/9eb74444-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/97416f1e-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/a10904dd-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/a2003ccd-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/a468b099-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/a3ec783a-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/a5a34558-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
special
mirror-2 ONLINE 0 0 0
gptid/a51b321a-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/a59ed3da-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
gptid/a59dcf5d-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
logs
gptid/a06d9015-d203-11eb-9e64-e0d55e60faba ONLINE 0 0 0
spares
gptid/a6c07145-d203-11eb-9e64-e0d55e60faba INUSE currently in use
errors: No known data errors
System information:
OS Version:
TrueNAS-13.0-U5.3
Model:
D120-C21
Memory:
32 GiB
CPU
Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz
(CPU MAX according to the reporting is 70%.)
(Mod Edit - Removed the color tags from your post and converted your multilines to codeblocks for readability.)
Last edited by a moderator: