I've been running my TrueNas setup for about 2 years now. Been great so far - but I've recently hit a bit a snag and and could use your expertise!
It's a pretty basic setup:
VM with 32 GB RAM & 2 cores
4 - 4TB SATA in RAIDZ1 that are passed through from the host to the NAS system
1 Pool, 2 Datasets (one new one w/ deduped enabled, other not deduped)
I'm trying to transfer about 1 TB of backups from a dataset (which dedup isn't enabled) to a new dataset (which is enabled) to take advantage of deduplication for backups. The process goes like this:
1) copy one of the 25GB daily backup files OUT of the un-deduped dataset to a linked system which temporarily hold it in RAM
2) then copy the data back into the deduped dataset from the temp system
3) clear the temporary hold location
4) repeat
For awhile (30-60 minutes) it runs great! Getting fantastic dedup rates and everything. Then after some time it looks to hit a wall. The read rate TANKs hard and the only thing I have found that fixes it is a full reboot of the NAS system.
Thing's I've tried:
updated to the most recent version TrueNAS-13.0-U4 from 12.0-U8.1
set vfs.zfs.arc.meta_min=8589934592 (suspected ARC metadata thrashing)
disable smb service
The only thing I can get to fix it is doing a full reboot of the system in which case the speeds return to normal.
Here's a visual so you can get the idea. Let me know what other logs or data you want to see. Thanks in advance!!!
It's a pretty basic setup:
VM with 32 GB RAM & 2 cores
4 - 4TB SATA in RAIDZ1 that are passed through from the host to the NAS system
1 Pool, 2 Datasets (one new one w/ deduped enabled, other not deduped)
I'm trying to transfer about 1 TB of backups from a dataset (which dedup isn't enabled) to a new dataset (which is enabled) to take advantage of deduplication for backups. The process goes like this:
1) copy one of the 25GB daily backup files OUT of the un-deduped dataset to a linked system which temporarily hold it in RAM
2) then copy the data back into the deduped dataset from the temp system
3) clear the temporary hold location
4) repeat
For awhile (30-60 minutes) it runs great! Getting fantastic dedup rates and everything. Then after some time it looks to hit a wall. The read rate TANKs hard and the only thing I have found that fixes it is a full reboot of the NAS system.
Thing's I've tried:
updated to the most recent version TrueNAS-13.0-U4 from 12.0-U8.1
set vfs.zfs.arc.meta_min=8589934592 (suspected ARC metadata thrashing)
disable smb service
The only thing I can get to fix it is doing a full reboot of the system in which case the speeds return to normal.
Here's a visual so you can get the idea. Let me know what other logs or data you want to see. Thanks in advance!!!
Code:
root@NAS[~]# arc_summary
------------------------------------------------------------------------
ZFS Subsystem Report Fri Mar 31 00:07:07 2023
FreeBSD 13.1-RELEASE-p7 zpl version 5
Machine: NAS.landoscloud.com (amd64) spa version 5000
ARC status: HEALTHY
Memory throttle count: 0
ARC size (current): 74.9 % 23.2 GiB
Target size (adaptive): 75.8 % 23.5 GiB
Min size (hard limit): 3.2 % 1022.4 MiB
Max size (high water): 30:1 30.9 GiB
Most Frequently Used (MFU) cache size: 58.2 % 12.2 GiB
Most Recently Used (MRU) cache size: 41.8 % 8.8 GiB
Metadata cache size (hard limit): 75.0 % 23.2 GiB
Metadata cache size (current): 4.0 % 941.6 MiB
Dnode cache size (hard limit): 34.5 % 8.0 GiB
Dnode cache size (current): 1.7 % 136.6 MiB
ARC hash breakdown:
Elements max: 427.3k
Elements current: 100.0 % 427.3k
Collisions: 316.6k
Chain max: 4
Chains: 20.4k
ARC misc:
Deleted: 3.5M
Mutex misses: 3.1k
Eviction skips: 84.5k
Eviction skips due to L2 writes: 0
L2 cached evictions: 0 Bytes
L2 eligible evictions: 189.9 GiB
L2 eligible MFU evictions: 1.1 % 2.1 GiB
L2 eligible MRU evictions: 98.9 % 187.7 GiB
L2 ineligible evictions: 240.5 GiB
ARC total accesses (hits + misses): 35.8M
Cache hit ratio: 94.0 % 33.6M
Cache miss ratio: 6.0 % 2.2M
Actual hit ratio (MFU + MRU hits): 93.9 % 33.6M
Data demand efficiency: 64.0 % 1.7M
Data prefetch efficiency: < 0.1 % 1.4M
Cache hits by cache type:
Most frequently used (MFU): 83.8 % 28.2M
Most recently used (MRU): 16.2 % 5.4M
Most frequently used (MFU) ghost: 0.2 % 59.0k
Most recently used (MRU) ghost: < 0.1 % 16.3k
Cache hits by data type:
Demand data: 3.3 % 1.1M
Prefetch data: < 0.1 % 14
Demand metadata: 96.7 % 32.5M
Prefetch metadata: < 0.1 % 5.7k
Cache misses by data type:
Demand data: 29.1 % 628.7k
Prefetch data: 63.0 % 1.4M
Demand metadata: 7.6 % 164.0k
Prefetch metadata: 0.3 % 6.7k
DMU prefetch efficiency: 1.8M
Hit ratio: 51.6 % 949.9k
Miss ratio: 48.4 % 889.3k
L2ARC not detected, skipping section
Tunables:
abd_scatter_enabled 1
abd_scatter_min_size 4097
allow_redacted_dataset_mount 0
anon_data_esize 0
anon_metadata_esize 0
anon_size 2015012864
arc.average_blocksize 8192
arc.dnode_limit 0
arc.dnode_limit_percent 10
arc.dnode_reduce_percent 10
arc.evict_batch_limit 10
arc.eviction_pct 200
arc.grow_retry 0
arc.lotsfree_percent 10
arc.max 0
arc.meta_adjust_restarts 4096
arc.meta_limit 0
arc.meta_limit_percent 75
arc.meta_min 8589934592
arc.meta_prune 10000
arc.meta_strategy 1
arc.min 0
arc.min_prefetch_ms 0
arc.min_prescient_prefetch_ms 0
arc.p_dampener_disable 1
arc.p_min_shift 0
arc.pc_percent 0
arc.prune_task_threads 1
arc.shrink_shift 0
arc.sys_free 0
arc_free_target 173776
arc_max 0
arc_min 0
arc_no_grow_shift 5
async_block_max_blocks 18446744073709551615
autoimport_disable 1
btree_verify_intensity 0
ccw_retry_interval 300
checksum_events_per_second 20
commit_timeout_pct 5
compressed_arc_enabled 1
condense.indirect_commit_entry_delay_ms 0
condense.indirect_obsolete_pct 25
condense.indirect_vdevs_enable 1
condense.max_obsolete_bytes 1073741824
condense.min_mapping_bytes 131072
condense_pct 200
crypt_sessions 0
dbgmsg_enable 1
dbgmsg_maxsize 4194304
dbuf.cache_shift 5
dbuf.metadata_cache_max_bytes 18446744073709551615
dbuf.metadata_cache_shift 6
dbuf_cache.hiwater_pct 10
dbuf_cache.lowater_pct 10
dbuf_cache.max_bytes 18446744073709551615
dbuf_state_index 0
ddt_data_is_special 1
deadman.checktime_ms 60000
deadman.enabled 1
deadman.failmode wait
deadman.synctime_ms 600000
deadman.ziotime_ms 300000
debug 0
debugflags 0
dedup.prefetch 0
default_bs 9
default_ibs 15
delay_min_dirty_percent 60
delay_scale 500000
dirty_data_max 3430569164
dirty_data_max_max 4294967296
dirty_data_max_max_percent 25
dirty_data_max_percent 10
dirty_data_sync_percent 20
disable_ivset_guid_check 0
dmu_object_alloc_chunk_shift 7
dmu_offset_next_sync 1
dmu_prefetch_max 134217728
dtl_sm_blksz 4096
embedded_slog_min_ms 64
flags 0
fletcher_4_impl [fastest] scalar superscalar superscalar4 sse2 ssse3 avx2 avx512f
free_bpobj_enabled 1
free_leak_on_eio 0
free_min_time_ms 1000
history_output_max 1048576
immediate_write_sz 32768
initialize_chunk_size 1048576
initialize_value 16045690984833335022
keep_log_spacemaps_at_export 0
l2arc.exclude_special 0
l2arc.feed_again 1
l2arc.feed_min_ms 200
l2arc.feed_secs 1
l2arc.headroom 2
l2arc.headroom_boost 200
l2arc.meta_percent 33
l2arc.mfuonly 0
l2arc.noprefetch 1
l2arc.norw 0
l2arc.rebuild_blocks_min_l2size 1073741824
l2arc.rebuild_enabled 0
l2arc.trim_ahead 0
l2arc.write_boost 8388608
l2arc.write_max 8388608
l2arc_feed_again 1
l2arc_feed_min_ms 200
l2arc_feed_secs 1
l2arc_headroom 2
l2arc_noprefetch 1
l2arc_norw 0
l2arc_write_boost 8388608
l2arc_write_max 8388608
l2c_only_size 0
livelist.condense.new_alloc 0
livelist.condense.sync_cancel 0
livelist.condense.sync_pause 0
livelist.condense.zthr_cancel 0
livelist.condense.zthr_pause 0
livelist.max_entries 500000
livelist.min_percent_shared 75
lua.max_instrlimit 100000000
lua.max_memlimit 104857600
max_async_dedup_frees 100000
max_auto_ashift 14
max_dataset_nesting 50
max_log_walking 5
max_logsm_summary_length 10
max_missing_tvds 0
max_missing_tvds_cachefile 2
max_missing_tvds_scan 0
max_nvlist_src_size 0
max_recordsize 1048576
metaslab.aliquot 1048576
metaslab.bias_enabled 1
metaslab.debug_load 0
metaslab.debug_unload 0
metaslab.df_alloc_threshold 131072
metaslab.df_free_pct 4
metaslab.df_max_search 16777216
metaslab.df_use_largest_segment 0
metaslab.find_max_tries 100
metaslab.force_ganging 16777217
metaslab.fragmentation_factor_enabled 1
metaslab.fragmentation_threshold 70
metaslab.lba_weighting_enabled 1
metaslab.load_pct 50
metaslab.max_size_cache_sec 3600
metaslab.mem_limit 25
metaslab.preload_enabled 1
metaslab.preload_limit 10
metaslab.segment_weight_enabled 1
metaslab.sm_blksz_no_log 16384
metaslab.sm_blksz_with_log 131072
metaslab.switch_threshold 2
metaslab.try_hard_before_gang 0
metaslab.unload_delay 32
metaslab.unload_delay_ms 600000
mfu_data_esize 12987881984
mfu_ghost_data_esize 1798595072
mfu_ghost_metadata_esize 25378304
mfu_ghost_size 1823973376
mfu_metadata_esize 40105472
mfu_size 13099472896
mg.fragmentation_threshold 95
mg.noalloc_threshold 0
min_auto_ashift 9
min_metaslabs_to_flush 1
mru_data_esize 7640310784
mru_ghost_data_esize 16270403584
mru_ghost_metadata_esize 90891776
mru_ghost_size 16361295360
mru_metadata_esize 362282496
mru_size 9407368704
multihost.fail_intervals 10
multihost.history 0
multihost.import_intervals 20
multihost.interval 1000
multilist_num_sublists 0
no_scrub_io 0
no_scrub_prefetch 0
nocacheflush 0
nopwrite_enabled 1
obsolete_min_time_ms 500
pd_bytes_max 52428800
per_txg_dirty_frees_percent 30
prefetch.array_rd_sz 1048576
prefetch.disable 0
prefetch.max_distance 67108864
prefetch.max_idistance 67108864
prefetch.max_sec_reap 2
prefetch.max_streams 8
prefetch.min_distance 4194304
prefetch.min_sec_reap 1
read_history 0
read_history_hits 0
rebuild_max_segment 1048576
rebuild_scrub_enabled 1
rebuild_vdev_limit 33554432
reconstruct.indirect_combinations_max 4096
recover 0
recv.queue_ff 20
recv.queue_length 16777216
recv.write_batch_size 1048576
removal_suspend_progress 0
remove_max_segment 16777216
resilver_disable_defer 0
resilver_min_time_ms 3000
scan_blkstats 0
scan_checkpoint_intval 7200
scan_fill_weight 3
scan_ignore_errors 0
scan_issue_strategy 0
scan_legacy 0
scan_max_ext_gap 2097152
scan_mem_lim_fact 20
scan_mem_lim_soft_fact 20
scan_strict_mem_lim 0
scan_suspend_progress 0
scan_vdev_limit 4194304
scrub_min_time_ms 1000
send.corrupt_data 0
send.no_prefetch_queue_ff 20
send.no_prefetch_queue_length 1048576
send.override_estimate_recordsize 0
send.queue_ff 20
send.queue_length 16777216
send.unmodified_spill_blocks 1
send_holes_without_birth_time 1
slow_io_events_per_second 20
spa.asize_inflation 24
spa.discard_memory_limit 16777216
spa.load_print_vdev_tree 0
spa.load_verify_data 1
spa.load_verify_metadata 1
spa.load_verify_shift 4
spa.slop_shift 5
space_map_ibs 14
special_class_metadata_reserve_pct 25
standard_sm_blksz 131072
super_owner 0
sync_pass_deferred_free 2
sync_pass_dont_compress 8
sync_pass_rewrite 2
sync_taskq_batch_pct 75
top_maxinflight 1000
traverse_indirect_prefetch_limit 32
trim.extent_bytes_max 134217728
trim.extent_bytes_min 32768
trim.metaslab_skip 0
trim.queue_limit 10
trim.txg_batch 32
txg.history 100
txg.timeout 5
unflushed_log_block_max 131072
unflushed_log_block_min 1000
unflushed_log_block_pct 400
unflushed_log_txg_max 1000
unflushed_max_mem_amt 1073741824
unflushed_max_mem_ppm 1000
user_indirect_is_special 1
validate_skip 0
vdev.aggregate_trim 0
vdev.aggregation_limit 1048576
vdev.aggregation_limit_non_rotating 131072
vdev.async_read_max_active 3
vdev.async_read_min_active 1
vdev.async_write_active_max_dirty_percent 60
vdev.async_write_active_min_dirty_percent 30
vdev.async_write_max_active 5
vdev.async_write_min_active 1
vdev.bio_delete_disable 0
vdev.bio_flush_disable 0
vdev.cache_bshift 16
vdev.cache_max 16384
vdev.cache_size 0
vdev.def_queue_depth 32
vdev.default_ms_count 200
vdev.default_ms_shift 29
vdev.file.logical_ashift 9
vdev.file.physical_ashift 9
vdev.initializing_max_active 1
vdev.initializing_min_active 1
vdev.max_active 1000
vdev.max_auto_ashift 14
vdev.min_auto_ashift 9
vdev.min_ms_count 16
vdev.mirror.non_rotating_inc 0
vdev.mirror.non_rotating_seek_inc 1
vdev.mirror.rotating_inc 0
vdev.mirror.rotating_seek_inc 5
vdev.mirror.rotating_seek_offset 1048576
vdev.ms_count_limit 131072
vdev.nia_credit 5
vdev.nia_delay 5
vdev.queue_depth_pct 1000
vdev.read_gap_limit 32768
vdev.rebuild_max_active 3
vdev.rebuild_min_active 1
vdev.removal_ignore_errors 0
vdev.removal_max_active 2
vdev.removal_max_span 32768
vdev.removal_min_active 1
vdev.removal_suspend_progress 0
vdev.remove_max_segment 16777216
vdev.scrub_max_active 3
vdev.scrub_min_active 1
vdev.sync_read_max_active 10
vdev.sync_read_min_active 10
vdev.sync_write_max_active 10
vdev.sync_write_min_active 10
vdev.trim_max_active 2
vdev.trim_min_active 1
vdev.validate_skip 0
vdev.write_gap_limit 4096
version.acl 1
version.ioctl 15
version.module v2023012500-zfs_9ef0b67f8
version.spa 5000
version.zpl 5
vnops.read_chunk_size 1048576
vol.mode 2
vol.recursive 0
vol.unmap_enabled 1
wrlog_data_max 6861138328
xattr_compat 1
zap_iterate_prefetch 1
zevent.len_max 512
zevent.retain_expire_secs 900
zevent.retain_max 2000
zfetch.max_distance 67108864
zfetch.max_idistance 67108864
zil.clean_taskq_maxalloc 1048576
zil.clean_taskq_minalloc 1024
zil.clean_taskq_nthr_pct 100
zil.maxblocksize 131072
zil.nocacheflush 0
zil.replay_disable 0
zil.slog_bulk 786432
zio.deadman_log_all 0
zio.dva_throttle_enabled 1
zio.exclude_metadata 0
zio.requeue_io_start_cut_in_line 1
zio.slow_io_ms 30000
zio.taskq_batch_pct 80
zio.taskq_batch_tpq 0
zio.use_uma 1
VDEV cache disabled, skipping section
ZIL committed transactions: 1.3k
Commit requests: 79
Flushes to stable storage: 79
Transactions to SLOG storage pool: 0 Bytes 0
Transactions to non-SLOG storage pool: 2.4 MiB 78