Improve read performance for applications (many small files)

Tomaae

Dabbler
Joined
Feb 24, 2021
Messages
12
Hello,
I'm looking if it is possible to get viable performace for application use. Problem here is that it loads many small files and loading take up to 5 times slower, which is up to several minutes. I dont expect same performace as local, but cutting current slowdown to half would be more the enough.
Network itself is not a problem, as any other load will easely saturate its capacity.
I'm looking to find what is actually causing the slowdown. I dont believe ZFS is a problem here as this is mostly read and NAS is used only by single person.
SMB could be the issue, but I did not find any good way to test similiar load synthetically.

Does anybody has experience with tuning truenas for small random reads?
I have already looked up everything I could find on these forums and adjusted SMB accordingly, which resulted in about 15% performance increase.

Hardware:
MB: Supermicro X11SCH-LN4F
CPU: i3-8300 CPU
RAM: 128GB ECC
DISKS:
2x WD Red SA500 2TB
2x Samsung SSD 860 EVO 2TB
1x Samsung SSD 860 QVO 2TB
2x INTEL SSD 665p 1TB
2x Crucial BX500 120GB (Boot)

CTRL: Build-in
LAN: Build-in dual i210
This is single user NAS, designed for data accessibility, safety and backup. 99% load is read, so Q/TLC and SSD write cache on cheaper SSD is never a problem.
Everything except backup disk is set up in mirrors.

SMB aux parameters:
socket options = IPTOS_LOWDELAY TCP_NODELAY SO_RCVBUF=65536 SO_SNDBUF=65536
mangled names = illegal
store dos attributes = no
map archive = no
map hidden = no
map system = no
map readonly = no
use sendfile = yes
read raw = yes
write raw = yes
aio max threads = 100
aio read size = 1
aio write size = 1
allocation roundup size = 1048576
max xmit = 65535
getwd cache = yes

Test bellow is performed on mirrored m.2 665p both locally from shell and from windows pc over network. Not sure how helpful these are, since they look way too good even on client.
fio --bs=16k --direct=1 --directory . --gtod_reduce=1 --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randread --size=256M --time_based

On NAS:
randrw: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=32
...
fio-3.28
Starting 12 processes
Jobs: 6 (f=6): [_(3),E(1),_(1),E(1),r(6)][100.0%][r=13.6GiB/s][r=890k IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=70972: Mon Oct 10 15:03:56 2022
read: IOPS=878k, BW=13.4GiB/s (14.4GB/s)(805GiB/60046msec)
bw ( MiB/s): min= 5536, max=27652, per=99.94%, avg=13715.35, stdev=300.27, samples=1428
iops : min=354312, max=1769751, avg=877777.79, stdev=19217.48, samples=1428
cpu : usr=3.33%, sys=29.10%, ctx=46259, majf=0, minf=1
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=52736524,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
READ: bw=13.4GiB/s (14.4GB/s), 13.4GiB/s-13.4GiB/s (14.4GB/s-14.4GB/s), io=805GiB (864GB), run=60046-60046msec

Client PC over SMB:
randrw: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=windowsaio, iodepth=32
...
fio-3.32
Starting 12 threads
Jobs: 12 (f=12): [r(12)][100.0%][r=112MiB/s][r=7177 IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=28312: Mon Oct 10 14:52:00 2022
read: IOPS=7125, BW=111MiB/s (117MB/s)(6699MiB/60120msec)
bw ( KiB/s): min=98360, max=121470, per=100.00%, avg=114171.18, stdev=440.91, samples=1434
iops : min= 6140, max= 7588, avg=7131.71, stdev=27.58, samples=1434
cpu : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.4%, 32=99.6%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.1%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=428357,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
READ: bw=111MiB/s (117MB/s), 111MiB/s-111MiB/s (117MB/s-117MB/s), io=6699MiB (7024MB), run=60120-60120msec
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
Those SMB stats are max for your 1Gbps NIC. Want something faster, you have to upgrade network
 

Tomaae

Dabbler
Joined
Feb 24, 2021
Messages
12
Those SMB stats are max for your 1Gbps NIC. Want something faster, you have to upgrade network
That is just synthetic test, not real use.
According to that, it should be fast, but I'm not getting anywhere near that with real data.
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
how is the pool with real data setup? How many files in a directory? any deduplication ? any L2ARC (and if so - any tuning?)
 

Tomaae

Dabbler
Joined
Feb 24, 2021
Messages
12
how is the pool with real data setup? How many files in a directory? any deduplication ? any L2ARC (and if so - any tuning?)
Its just mirrors, no special disks. I only use SSD, so no L2ARC.
No dedup, 128 KiB record size, lz4, disabled both atime and sync for this dataset.
My smb tuning is in first post.

For example, one of my projects is 2887 directories with 139k files. Not everything is loaded at the start of course, but I have no way of knowing which files are or any percentage. I know that significant portion of it is just archive packages and sets that are used only in specific circumstances.

For better overview, this is how throughput on client PC looks like when I'm loading a project:
1665414624533.png
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Have you compared time of the first and second "load"? Your NAS has a lot of RAM to cache things. Not that SSDs should introduce much latency, comparing to HDDs, but when it is about many small files/operations, it is all about latency. ZFS tries to do speculative prefetch within each file to fight one, and it is constantly improving, but it can't do much between files, since it can't predict the future. Though if even after having all active data in ARC you are still getting low performance, then it must be network or network service latency.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I'd be interested to see if there's a large amount of metadata misses showing on arcstat.py or arc_summary.py - this symptom is somewhat similar to the "slow performance on rsync" issue described by some users when there's metadata thrashing in ARC.
 

Tomaae

Dabbler
Joined
Feb 24, 2021
Messages
12
Have you compared time of the first and second "load"? Your NAS has a lot of RAM to cache things. Not that SSDs should introduce much latency, comparing to HDDs, but when it is about many small files/operations, it is all about latency. ZFS tries to do speculative prefetch within each file to fight one, and it is constantly improving, but it can't do much between files, since it can't predict the future. Though if even after having all active data in ARC you are still getting low performance, then it must be network or network service latency.
I have run it multiple times before posting results. Synthetic test was actually faster first time, but thats most likely not relevant as those are not real data.
Is there a way I can test remote share latency? That sounds like something worth checking, specially since there is SMB involved.

But I dont know if I have all active data in ARC. Server is also used for plex (I pretty much listen to audiobooks 24/7), so there is probably lot of useless stuff moved to ARC. Maybe if there is a way to exclude datasets from ARC, it would get much more efficient. But I'm not that knowledgeble in ZFS.
Edit: Found how to, so I changed all plex and archive datasets to primarycache=metadata. I rebooted nas after to get clean slate and tested everything several times to put it into ARC, but there was no noticable improvement. Still, it makes more sense not to cache data I use once per week or less.
 
Last edited:

Tomaae

Dabbler
Joined
Feb 24, 2021
Messages
12
I'd be interested to see if there's a large amount of metadata misses showing on arcstat.py or arc_summary.py - this symptom is somewhat similar to the "slow performance on rsync" issue described by some users when there's metadata thrashing in ARC.

Sure, here it is:
Code:
# arcstat -a 10
    time  hits  miss  read  hit%  miss%  dhit  dmis  dh%  dm%  phit  pmis  ph%  pm%  mhit  mmis  mread  mh%  mm%  arcsz  size     c   mfu   mru  mfug  mrug  eskip  el2skip  el2cach  el2el  el2mfu  el2mru  el2inel  mtxmis  dread  pread  grow  need  free  avail  waste
22:31:39     0     0     0     0      0     0     0    0    0     0     0    0    0     0     0      0    0    0    99G   99G  100G     0     0     0     0      0        0        0      0       0       0        0       0      0      0     1     0   12G    10G    79M
22:31:49   71K     0   71K   100      0   71K     0  100    0     0     0    0    0   71K     0    71K  100    0    99G   99G  100G   70K   227     0     0      0        0        0      0       0       0        0       0    71K      0     1     0   12G    10G    79M
22:31:59  144K     0  144K   100      0  144K     0  100    0     0     0    0    0  144K     0   144K  100    0    99G   99G  100G  144K     6     0     0      0        0        0      0       0       0        0       0   144K      0     1     0   12G    10G    79M
22:32:09  144K     0  144K   100      0  144K     0  100    0     0     0    0    0  143K     0   143K  100    0    99G   99G  100G  144K    74     0     0      0        0        0      0       0       0        0       0   144K      0     1     0   12G    10G    79M
22:32:19   47K     0   47K   100      0   47K     0  100    0     0     0    0    0   47K     0    47K  100    0    99G   99G  100G   47K     0     0     0      0        0        0      0       0       0        0       0    47K      0     1     0   12G    10G    79M
22:32:29   46K     0   46K   100      0   46K     0  100    0     0     0    0    0   45K     0    45K  100    0    99G   99G  100G   46K     0     0     0      0        0        0      0       0       0        0       0    46K      0     1     0   12G    10G    79M
22:32:39   47K     0   47K   100      0   47K     0  100    0     0     0    0    0   47K     0    47K  100    0    99G   99G  100G   47K     0     0     0      0        0        0      0       0       0        0       0    47K      0     1     0   12G    10G    79M
22:32:49   39K     0   39K   100      0   39K     0  100    0     0     0    0    0   38K     0    38K  100    0    99G   99G  100G   39K    11     0     0      0        0        0      0       0       0        0       0    39K      0     1     0   12G    10G    79M
22:32:59  139K     0  139K   100      0  139K     0  100    0     0     0    0    0  139K     0   139K  100    0    99G   99G  100G  139K    53     0     0      0        0        0      0       0       0        0       0   139K      0     1     0   12G    10G    79M
22:33:09   56K     1   56K    99      0   56K     1   99    0     0     0    0    0   55K     0    55K  100    0    99G   99G  100G   56K   108     0     1      0        0        0      0       0       0        0       0    56K      0     1     0   12G    10G    79M
22:33:19  183K     0  183K   100      0  183K     0  100    0     0     0    0    0  183K     0   183K  100    0    99G   99G  100G  183K    36     0     0      0        0        0      0       0       0        0       0   183K      0     1     0   12G    10G    79M
22:33:29  147K     0  147K   100      0  147K     0  100    0     0     0    0    0  146K     0   146K  100    0    99G   99G  100G  146K    69     0     0      0        0        0      0       0       0        0       0   147K      0     1     0   12G    10G    79M
22:33:39   85K     0   85K   100      0   85K     0  100    0     0     0  100    0   85K     0    85K  100    0    99G   99G  100G   85K    99     0     0      0        0        0      0       0       0        0       0    85K      0     1     0   12G    10G    79M
22:33:49  152K     0  152K   100      0  152K     0  100    0     0     0    0    0  152K     0   152K  100    0    99G   99G  100G  152K    16     0     0      0        0        0      0       0       0        0       0   152K      0     1     0   12G    10G    79M
22:33:59  126K     0  126K    99      0  126K     0   99    0     0     0    0    0  126K     0   126K  100    0    99G   99G  100G  126K   121     0     0      0        0        0      0       0       0        0       0   126K      0     1     0   12G    10G    79M
22:34:09   39K     0   39K   100      0   39K     0  100    0     0     0    0    0   39K     0    39K  100    0    99G   99G  100G   39K    14     0     0      0        0        0      0       0       0        0       0    39K      0     1     0   12G    10G    79M
22:34:19   25K     0   25K   100      0   25K     0  100    0     0     0    0    0   25K     0    25K  100    0    99G   99G  100G   25K     5     0     0      0        0        0      0       0       0        0       0    25K      0     1     0   12G    10G    79M

 

Tomaae

Dabbler
Joined
Feb 24, 2021
Messages
12
I'd be interested to see if there's a large amount of metadata misses showing on arcstat.py or arc_summary.py - this symptom is somewhat similar to the "slow performance on rsync" issue described by some users when there's metadata thrashing in ARC.
2nd part, because of post character limit
------------------------------------------------------------------------
ZFS Subsystem Report Thu Oct 13 22:36:03 2022
FreeBSD 13.1-RELEASE-p1 zpl version 5
Machine: *** (amd64) spa version 5000

ARC status: HEALTHY
Memory throttle count: 0

ARC size (current): 100.0 % 100.0 GiB
Target size (adaptive): 100.0 % 100.0 GiB
Min size (hard limit): 4.0 % 4.0 GiB
Max size (high water): 25:1 100.0 GiB
Most Frequently Used (MFU) cache size: 90.7 % 89.2 GiB
Most Recently Used (MRU) cache size: 9.3 % 9.2 GiB
Metadata cache size (hard limit): 75.0 % 75.0 GiB
Metadata cache size (current): 3.9 % 2.9 GiB
Dnode cache size (hard limit): 10.0 % 7.5 GiB
Dnode cache size (current): 10.2 % 785.2 MiB

ARC hash breakdown:
Elements max: 1.3M
Elements current: 71.2 % 947.8k
Collisions: 4.4M
Chain max: 5
Chains: 26.0k

ARC misc:
Deleted: 3.1M
Mutex misses: 269
Eviction skips: 11.5k
Eviction skips due to L2 writes: 0
L2 cached evictions: 0 Bytes
L2 eligible evictions: 2.3 TiB
L2 eligible MFU evictions: 19.4 % 461.6 GiB
L2 eligible MRU evictions: 80.6 % 1.9 TiB
L2 ineligible evictions: 78.2 GiB

ARC total accesses (hits + misses): 14.2G
Cache hit ratio: 100.0 % 14.2G
Cache miss ratio: < 0.1 % 5.2M
Actual hit ratio (MFU + MRU hits): 99.9 % 14.2G
Data demand efficiency: 99.8 % 251.4M
Data prefetch efficiency: 19.4 % 1.1M

Cache hits by cache type:
Most frequently used (MFU): 99.3 % 14.1G
Most recently used (MRU): 0.7 % 94.3M
Most frequently used (MFU) ghost: < 0.1 % 2.8M
Most recently used (MRU) ghost: < 0.1 % 501.9k
Anonymously used: < 0.1 % 2.9M

Cache hits by data type:
Demand data: 1.8 % 250.8M
Demand prefetch data: < 0.1 % 217.0k
Demand metadata: 97.5 % 13.9G
Demand prefetch metadata: 0.7 % 102.9M

Cache misses by data type:
Demand data: 11.7 % 604.9k
Demand prefetch data: 17.5 % 902.6k
Demand metadata: 22.0 % 1.1M
Demand prefetch metadata: 48.8 % 2.5M

DMU prefetch efficiency: 147.3M
Hit ratio: 5.8 % 8.5M
Miss ratio: 94.2 % 138.8M

L2ARC not detected, skipping section

Tunables:
abd_scatter_enabled 1
abd_scatter_min_size 4097
allow_redacted_dataset_mount 0
anon_data_esize 0
anon_metadata_esize 0
anon_size 4624384
arc.average_blocksize 8192
arc.dnode_limit 0
arc.dnode_limit_percent 10
arc.dnode_reduce_percent 10
arc.evict_batch_limit 10
arc.eviction_pct 200
arc.grow_retry 0
arc.lotsfree_percent 10
arc.max 107374182400
arc.meta_adjust_restarts 4096
arc.meta_limit 0
arc.meta_limit_percent 75
arc.meta_min 0
arc.meta_prune 10000
arc.meta_strategy 1
arc.min 0
arc.min_prefetch_ms 0
arc.min_prescient_prefetch_ms 0
arc.p_dampener_disable 1
arc.p_min_shift 0
arc.pc_percent 0
arc.prune_task_threads 1
arc.shrink_shift 0
arc.sys_free 0
arc_free_target 695549
arc_max 107374182400
arc_min 0
arc_no_grow_shift 5
async_block_max_blocks 18446744073709551615
autoimport_disable 1
ccw_retry_interval 300
checksum_events_per_second 20
commit_timeout_pct 5
compressed_arc_enabled 1
condense.indirect_commit_entry_delay_ms 0
condense.indirect_obsolete_pct 25
condense.indirect_vdevs_enable 1
condense.max_obsolete_bytes 1073741824
condense.min_mapping_bytes 131072
condense_pct 200
crypt_sessions 0
dbgmsg_enable 1
dbgmsg_maxsize 4194304
dbuf.cache_shift 5
dbuf.metadata_cache_max_bytes 18446744073709551615
dbuf.metadata_cache_shift 6
dbuf_cache.hiwater_pct 10
dbuf_cache.lowater_pct 10
dbuf_cache.max_bytes 18446744073709551615
dbuf_state_index 0
ddt_data_is_special 1
deadman.checktime_ms 60000
deadman.enabled 1
deadman.failmode wait
deadman.synctime_ms 600000
deadman.ziotime_ms 300000
debug 0
debugflags 0
dedup.prefetch 0
default_bs 9
default_ibs 15
delay_min_dirty_percent 60
delay_scale 500000
dirty_data_max 4294967296
dirty_data_max_max 4294967296
dirty_data_max_max_percent 25
dirty_data_max_percent 10
dirty_data_sync_percent 20
disable_ivset_guid_check 0
dmu_object_alloc_chunk_shift 7
dmu_offset_next_sync 1
dmu_prefetch_max 134217728
dtl_sm_blksz 4096
embedded_slog_min_ms 64
flags 0
fletcher_4_impl [fastest] scalar superscalar superscalar4 sse2 ssse3 avx2
free_bpobj_enabled 1
free_leak_on_eio 0
free_min_time_ms 1000
history_output_max 1048576
immediate_write_sz 32768
initialize_chunk_size 1048576
initialize_value 16045690984833335022
keep_log_spacemaps_at_export 0
l2arc.feed_again 1
l2arc.feed_min_ms 200
l2arc.feed_secs 1
l2arc.headroom 2
l2arc.headroom_boost 200
l2arc.meta_percent 33
l2arc.mfuonly 0
l2arc.noprefetch 1
l2arc.norw 0
l2arc.rebuild_blocks_min_l2size 1073741824
l2arc.rebuild_enabled 0
l2arc.trim_ahead 0
l2arc.write_boost 8388608
l2arc.write_max 8388608
l2arc_feed_again 1
l2arc_feed_min_ms 200
l2arc_feed_secs 1
l2arc_headroom 2
l2arc_noprefetch 1
l2arc_norw 0
l2arc_write_boost 8388608
l2arc_write_max 8388608
l2c_only_size 0
livelist.condense.new_alloc 0
livelist.condense.sync_cancel 0
livelist.condense.sync_pause 0
livelist.condense.zthr_cancel 0
livelist.condense.zthr_pause 0
livelist.max_entries 500000
livelist.min_percent_shared 75
lua.max_instrlimit 100000000
lua.max_memlimit 104857600
max_async_dedup_frees 100000
max_auto_ashift 16
max_dataset_nesting 50
max_log_walking 5
max_logsm_summary_length 10
max_missing_tvds 0
max_missing_tvds_cachefile 2
max_missing_tvds_scan 0
max_nvlist_src_size 0
max_recordsize 1048576
metaslab.aliquot 1048576
metaslab.bias_enabled 1
metaslab.debug_load 0
metaslab.debug_unload 0
metaslab.df_alloc_threshold 131072
metaslab.df_free_pct 4
metaslab.df_max_search 16777216
metaslab.df_use_largest_segment 0
metaslab.find_max_tries 100
metaslab.force_ganging 16777217
metaslab.fragmentation_factor_enabled 1
metaslab.fragmentation_threshold 70
metaslab.lba_weighting_enabled 1
metaslab.load_pct 50
metaslab.max_size_cache_sec 3600
metaslab.mem_limit 25
metaslab.preload_enabled 1
metaslab.preload_limit 10
metaslab.segment_weight_enabled 1
metaslab.sm_blksz_no_log 16384
metaslab.sm_blksz_with_log 131072
metaslab.switch_threshold 2
metaslab.try_hard_before_gang 0
metaslab.unload_delay 32
metaslab.unload_delay_ms 600000
mfu_data_esize 93321028608
mfu_ghost_data_esize 56229888
mfu_ghost_metadata_esize 6841636352
mfu_ghost_size 6897866240
mfu_metadata_esize 434672640
mfu_size 95795588608
mg.fragmentation_threshold 95
mg.noalloc_threshold 0
min_auto_ashift 9
min_metaslabs_to_flush 1
mru_data_esize 6619606528
mru_ghost_data_esize 96284251136
mru_ghost_metadata_esize 721069568
mru_ghost_size 97005320704
mru_metadata_esize 10984448
mru_size 9838829056
multihost.fail_intervals 10
multihost.history 0
multihost.import_intervals 20
multihost.interval 1000
multilist_num_sublists 0
no_scrub_io 0
no_scrub_prefetch 0
nocacheflush 0
nopwrite_enabled 1
obsolete_min_time_ms 500
pd_bytes_max 52428800
per_txg_dirty_frees_percent 5
prefetch.array_rd_sz 1048576
prefetch.disable 0
prefetch.max_distance 67108864
prefetch.max_idistance 67108864
prefetch.max_sec_reap 2
prefetch.max_streams 8
prefetch.min_distance 4194304
prefetch.min_sec_reap 1
read_history 0
read_history_hits 0
rebuild_max_segment 1048576
rebuild_scrub_enabled 1
rebuild_vdev_limit 33554432
reconstruct.indirect_combinations_max 4096
recover 0
recv.queue_ff 20
recv.queue_length 16777216
recv.write_batch_size 1048576
removal_suspend_progress 0
remove_max_segment 16777216
resilver_disable_defer 0
resilver_min_time_ms 3000
scan_blkstats 0
scan_checkpoint_intval 7200
scan_fill_weight 3
scan_ignore_errors 0
scan_issue_strategy 0
scan_legacy 0
scan_max_ext_gap 2097152
scan_mem_lim_fact 20
scan_mem_lim_soft_fact 20
scan_strict_mem_lim 0
scan_suspend_progress 0
scan_vdev_limit 4194304
scrub_min_time_ms 1000
send.corrupt_data 0
send.no_prefetch_queue_ff 20
send.no_prefetch_queue_length 1048576
send.override_estimate_recordsize 0
send.queue_ff 20
send.queue_length 16777216
send.unmodified_spill_blocks 1
send_holes_without_birth_time 1
slow_io_events_per_second 20
spa.asize_inflation 24
spa.discard_memory_limit 16777216
spa.load_print_vdev_tree 0
spa.load_verify_data 1
spa.load_verify_metadata 1
spa.load_verify_shift 4
spa.slop_shift 5
space_map_ibs 14
special_class_metadata_reserve_pct 25
standard_sm_blksz 131072
super_owner 0
sync_pass_deferred_free 2
sync_pass_dont_compress 8
sync_pass_rewrite 2
sync_taskq_batch_pct 75
top_maxinflight 1000
traverse_indirect_prefetch_limit 32
trim.extent_bytes_max 134217728
trim.extent_bytes_min 32768
trim.metaslab_skip 0
trim.queue_limit 10
trim.txg_batch 32
txg.history 100
txg.timeout 5
unflushed_log_block_max 131072
unflushed_log_block_min 1000
unflushed_log_block_pct 400
unflushed_log_txg_max 1000
unflushed_max_mem_amt 1073741824
unflushed_max_mem_ppm 1000
user_indirect_is_special 1
validate_skip 0
vdev.aggregate_trim 0
vdev.aggregation_limit 1048576
vdev.aggregation_limit_non_rotating 131072
vdev.async_read_max_active 3
vdev.async_read_min_active 1
vdev.async_write_active_max_dirty_percent 60
vdev.async_write_active_min_dirty_percent 30
vdev.async_write_max_active 5
vdev.async_write_min_active 1
vdev.bio_delete_disable 0
vdev.bio_flush_disable 0
vdev.cache_bshift 16
vdev.cache_max 16384
vdev.cache_size 0
vdev.def_queue_depth 32
vdev.default_ms_count 200
vdev.default_ms_shift 29
vdev.file.logical_ashift 9
vdev.file.physical_ashift 9
vdev.initializing_max_active 1
vdev.initializing_min_active 1
vdev.max_active 1000
vdev.max_auto_ashift 16
vdev.min_auto_ashift 9
vdev.min_ms_count 16
vdev.mirror.non_rotating_inc 0
vdev.mirror.non_rotating_seek_inc 1
vdev.mirror.rotating_inc 0
vdev.mirror.rotating_seek_inc 5
vdev.mirror.rotating_seek_offset 1048576
vdev.ms_count_limit 131072
vdev.nia_credit 5
vdev.nia_delay 5
vdev.queue_depth_pct 1000
vdev.read_gap_limit 32768
vdev.rebuild_max_active 3
vdev.rebuild_min_active 1
vdev.removal_ignore_errors 0
vdev.removal_max_active 2
vdev.removal_max_span 32768
vdev.removal_min_active 1
vdev.removal_suspend_progress 0
vdev.remove_max_segment 16777216
vdev.scrub_max_active 3
vdev.scrub_min_active 1
vdev.sync_read_max_active 10
vdev.sync_read_min_active 10
vdev.sync_write_max_active 10
vdev.sync_write_min_active 10
vdev.trim_max_active 2
vdev.trim_min_active 1
vdev.validate_skip 0
vdev.write_gap_limit 4096
version.acl 1
version.ioctl 15
version.module v2022081800-zfs_27f9f911a
version.spa 5000
version.zpl 5
vnops.read_chunk_size 1048576
vol.mode 2
vol.recursive 0
vol.unmap_enabled 1
xattr_compat 1
zap_iterate_prefetch 1
zevent.len_max 512
zevent.retain_expire_secs 900
zevent.retain_max 2000
zfetch.max_distance 67108864
zfetch.max_idistance 67108864
zil.clean_taskq_maxalloc 1048576
zil.clean_taskq_minalloc 1024
zil.clean_taskq_nthr_pct 100
zil.maxblocksize 131072
zil.nocacheflush 0
zil.replay_disable 0
zil.slog_bulk 786432
zio.deadman_log_all 0
zio.dva_throttle_enabled 1
zio.exclude_metadata 0
zio.requeue_io_start_cut_in_line 1
zio.slow_io_ms 30000
zio.taskq_batch_pct 80
zio.taskq_batch_tpq 0
zio.use_uma 1

VDEV cache disabled, skipping section

ZIL committed transactions: 32.0M
Commit requests: 8.3M
Flushes to stable storage: 8.3M
Transactions to SLOG storage pool: 0 Bytes 0
Transactions to non-SLOG storage pool: 147.1 GiB 8.2M
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
2nd part, because of post character limit
Code:
ARC total accesses (hits + misses): 14.2G
Cache hit ratio: 100.0 % 14.2G
Cache miss ratio: < 0.1 % 5.2M
Actual hit ratio (MFU + MRU hits): 99.9 % 14.2G
Data demand efficiency: 99.8 % 251.4M
Data prefetch efficiency: 19.4 % 1.1M

Cache hits by cache type:
Most frequently used (MFU): 99.3 % 14.1G
Most recently used (MRU): 0.7 % 94.3M
Most frequently used (MFU) ghost: < 0.1 % 2.8M
Most recently used (MRU) ghost: < 0.1 % 501.9k
Anonymously used: < 0.1 % 2.9M

Cache hits by data type:
Demand data: 1.8 % 250.8M
Demand prefetch data: < 0.1 % 217.0k
Demand metadata: 97.5 % 13.9G
Demand prefetch metadata: 0.7 % 102.9M

Cache misses by data type:
Demand data: 11.7 % 604.9k
Demand prefetch data: 17.5 % 902.6k
Demand metadata: 22.0 % 1.1M
Demand prefetch metadata: 48.8 % 2.5M

Looking at those numbers I can't see it being a metadata problem - or really a disk speed problem at all given that it's actually recording a "100%" cache hit ratio.

Handling lots of small files is a tough workload, especially over SMB - @anodos has catalogued a lot of experiments (albeit on an older build) here:

 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
The auxiliary parameters are unsupported. Some are invalid, some are unsafe for general use-case. Samba 4.15 symlink safety checks have known performance impacts (we have not internally quantified them yet, but upstream reported that it can be significant in certain workloads) that get amplified when dealing with large numbers of files. Any old benchmarking I did ages ago before I joined iX is woefully out of date and shouldn't be considered authoritative.

Samba 4.17 / BlueFin BETA 1 currently have fixes for symlink safety regression.
 

Tomaae

Dabbler
Joined
Feb 24, 2021
Messages
12
The auxiliary parameters are unsupported. Some are invalid, some are unsafe for general use-case. Samba 4.15 symlink safety checks have known performance impacts (we have not internally quantified them yet, but upstream reported that it can be significant in certain workloads) that get amplified when dealing with large numbers of files. Any old benchmarking I did ages ago before I joined iX is woefully out of date and shouldn't be considered authoritative.

Samba 4.17 / BlueFin BETA 1 currently have fixes for symlink safety regression.
Well, I would not expect iX to officially support tunning using aux params directly (outside of ones implemented as checkboxes). After all, thats all samba and there are a lot of them.
Is there a way I can disable symlink safety checks to test this on my system?
 
Last edited:

Tomaae

Dabbler
Joined
Feb 24, 2021
Messages
12
I did a test with really small project.
All loads we from zero, so even development UI was closed, so that adds up quite a bit. But I'm not able to time that as it is not possible to open UI without any project.
Locally it takes 16 seconds to open. From NAS, 56 seconds after 5th load (to ensure its all cached).
 
Top