task txg_sync:14870 blocked for more than 120 seconds

kjd

Cadet
Joined
Jan 4, 2024
Messages
4
Hi All

I know this one is super common and it's because of IO, I'm hoping I have a unique issue though...

Hardware is a S451-3R1 Gigabyte NAS:

Linux version 6.1.63-production+truenas (Cobian)​
AMD EPYC 7352 24-Core Processor​
GIGABYTE S451-Z30-00​
48GB RAM (8GB sticks)​
34 x Toshiba 16TB HDD (TOSHIBA_MG08ACA16TE) SAS​
2 x Toshiba SDD (boot disks, RAID 1)​
Intel 10Gb NICs​
I have a Pool of 430TB, ZFS. It originally had dedupe and LZO compression enabled on it. I've removed those. It stores mostly VERY large video files, but it also has a mix of a lot of small files, sometimes in the 270k+ region in a subtree.

Upon some deletions from the Pool I end up with this is syslog:

Code:
[47609.082625] INFO: task txg_sync:14870 blocked for more than 120 seconds.
[47609.090097]       Tainted: P           OE      6.1.63-production+truenas #2
[47609.097721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47609.106210] task:txg_sync        state:D stack:0     pid:14870 ppid:2      flags:0x00004000
[47609.115234] Call Trace:
[47609.118340]  <TASK>
[47609.121110]  __schedule+0x2ed/0x860
[47609.125265]  schedule+0x5a/0xb0
[47609.129052]  schedule_timeout+0x94/0x150
[47609.133621]  ? __bpf_trace_tick_stop+0x10/0x10
[47609.138716]  io_schedule_timeout+0x4c/0x80
[47609.143278]  __cv_timedwait_common+0x12a/0x160 [spl]
[47609.148708]  ? cpuusage_read+0x10/0x10
[47609.152908]  __cv_timedwait_io+0x15/0x20 [spl]
[47609.157816]  zio_wait+0x10b/0x220 [zfs]
[47609.162343]  spa_sync_frees+0x3a/0x70 [zfs]
[47609.167201]  spa_sync_iterate_to_convergence+0x10a/0x200 [zfs]
[47609.173705]  spa_sync+0x306/0x5d0 [zfs]
[47609.178055]  txg_sync_thread+0x1e4/0x250 [zfs]
[47609.182943]  ? txg_dispatch_callbacks+0xf0/0xf0 [zfs]
[47609.188435]  ? sigorsets+0x10/0x10 [spl]
[47609.192661]  thread_generic_wrapper+0x5a/0x70 [spl]
[47609.197850]  kthread+0xe9/0x110
[47609.201291]  ? kthread_complete_and_exit+0x20/0x20
[47609.206385]  ret_from_fork+0x22/0x30
[47609.210262]  </TASK>
[48954.437910] systemd-journald[1454]: Data hash table of /var/log/journal/e6a7eed86d154d0f8f649912be0a1a89/system.journal has a fill level at 75.0 (8533 of 11377 i
tems, 6553600 file size, 768 bytes per hash table item), suggesting rotation.
[48954.462014] systemd-journald[1454]: /var/log/journal/e6a7eed86d154d0f8f649912be0a1a89/system.journal: Journal header limits reached or header out-of-date, rotati
ng.
[51475.585424] INFO: task rm:307616 blocked for more than 120 seconds.
[51475.592424]       Tainted: P           OE      6.1.63-production+truenas #2
[51475.599998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[51475.608433] task:rm              state:D stack:0     pid:307616 ppid:1      flags:0x00004006
[51475.617506] Call Trace:
[51475.620555]  <TASK>
[51475.623271]  __schedule+0x2ed/0x860
[51475.627379]  schedule+0x5a/0xb0
[51475.631139]  io_schedule+0x42/0x70
[51475.635168]  cv_wait_common+0xaa/0x130 [spl]
[51475.640084]  ? cpuusage_read+0x10/0x10
[51475.644498]  txg_wait_open+0x89/0xd0 [zfs]
[51475.649326]  dmu_free_long_range_impl+0x1e6/0x340 [zfs]
[51475.655237]  dmu_free_long_range+0x78/0xc0 [zfs]
[51475.660522]  zfs_rmnode+0x34c/0x440 [zfs]
[51475.665211]  zfs_inactive+0x127/0x200 [zfs]
[51475.670063]  zpl_evict_inode+0x36/0x50 [zfs]
[51475.675003]  evict+0xd0/0x1d0
[51475.678279]  do_unlinkat+0x170/0x320
[51475.682152]  __x64_sys_unlinkat+0x33/0x60
[51475.686464]  do_syscall_64+0x5b/0xc0
[51475.690343]  ? ksys_write+0x6b/0xf0
[51475.694128]  ? syscall_exit_to_user_mode+0x27/0x40
[51475.699215]  ? do_syscall_64+0x67/0xc0
[51475.703266]  ? syscall_exit_to_user_mode+0x27/0x40
[51475.708369]  ? do_syscall_64+0x67/0xc0
[51475.712423]  ? do_syscall_64+0x67/0xc0
[51475.716477]  ? syscall_exit_to_user_mode+0x27/0x40
[51475.721570]  ? do_syscall_64+0x67/0xc0
[51475.725616]  ? do_syscall_64+0x67/0xc0
[51475.729650]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[51475.734987] RIP: 0033:0x7ffa28b2a8d7
[51475.738850] RSP: 002b:00007ffc1d642c28 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[51475.746711] RAX: ffffffffffffffda RBX: 000055f7be946c40 RCX: 00007ffa28b2a8d7
[51475.754133] RDX: 0000000000000000 RSI: 000055f7be946d40 RDI: 0000000000000004
[51475.761561] RBP: 000055f7be93ad00 R08: 0000000000000003 R09: 0000000000000000
[51475.768991] R10: f88dd7b6b1ebeedc R11: 0000000000000246 R12: 0000000000000000
[51475.776416] R13: 00007ffc1d642e10 R14: 0000000000000003 R15: 000055f7be946c40
[51475.783850]  </TASK>
[51596.413619] INFO: task rm:307616 blocked for more than 241 seconds.
[51596.420316]       Tainted: P           OE      6.1.63-production+truenas #2
[51596.427585] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[51596.435708] task:rm              state:D stack:0     pid:307616 ppid:1      flags:0x00004006
[51596.444450] Call Trace:
[51596.447196]  <TASK>
[51596.449599]  __schedule+0x2ed/0x860
[51596.453395]  schedule+0x5a/0xb0
[51596.456832]  io_schedule+0x42/0x70
[51596.460536]  cv_wait_common+0xaa/0x130 [spl]
[51596.465111]  ? cpuusage_read+0x10/0x10
[51596.469162]  txg_wait_open+0x89/0xd0 [zfs]
[51596.473726]  dmu_free_long_range_impl+0x1e6/0x340 [zfs]
[51596.479409]  dmu_free_long_range+0x78/0xc0 [zfs]
[51596.484476]  zfs_rmnode+0x34c/0x440 [zfs]
[51596.488945]  zfs_inactive+0x127/0x200 [zfs]
[51596.493592]  zpl_evict_inode+0x36/0x50 [zfs]
[51596.498309]  evict+0xd0/0x1d0
[51596.501590]  do_unlinkat+0x170/0x320
[51596.505473]  __x64_sys_unlinkat+0x33/0x60
[51596.509800]  do_syscall_64+0x5b/0xc0
[51596.513694]  ? ksys_write+0x6b/0xf0
[51596.517506]  ? syscall_exit_to_user_mode+0x27/0x40
[51596.522606]  ? do_syscall_64+0x67/0xc0
[51596.526655]  ? syscall_exit_to_user_mode+0x27/0x40
[51596.531747]  ? do_syscall_64+0x67/0xc0
[51596.535794]  ? do_syscall_64+0x67/0xc0
[51596.539832]  ? syscall_exit_to_user_mode+0x27/0x40
[51596.544917]  ? do_syscall_64+0x67/0xc0
[51596.548960]  ? do_syscall_64+0x67/0xc0
[51596.552997]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[51596.558342] RIP: 0033:0x7ffa28b2a8d7
[51596.562218] RSP: 002b:00007ffc1d642c28 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[51596.570080] RAX: ffffffffffffffda RBX: 000055f7be946c40 RCX: 00007ffa28b2a8d7
[51596.577522] RDX: 0000000000000000 RSI: 000055f7be946d40 RDI: 0000000000000004
[51596.584954] RBP: 000055f7be93ad00 R08: 0000000000000003 R09: 0000000000000000
[51596.592383] R10: f88dd7b6b1ebeedc R11: 0000000000000246 R12: 0000000000000000
[51596.599825] R13: 00007ffc1d642e10 R14: 0000000000000003 R15: 000055f7be946c40
[51596.607256]  </TASK>
[51717.245807] INFO: task rm:307616 blocked for more than 362 seconds.
[51717.252514]       Tainted: P           OE      6.1.63-production+truenas #2
[51717.259792] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[51717.267917] task:rm              state:D stack:0     pid:307616 ppid:1      flags:0x00004006
[51717.276656] Call Trace:
[51717.279407]  <TASK>
[51717.281818]  __schedule+0x2ed/0x860
[51717.285613]  schedule+0x5a/0xb0
[51717.289055]  io_schedule+0x42/0x70
[51717.292758]  cv_wait_common+0xaa/0x130 [spl]
[51717.297342]  ? cpuusage_read+0x10/0x10
[51717.301394]  txg_wait_open+0x89/0xd0 [zfs]
[51717.305964]  dmu_free_long_range_impl+0x1e6/0x340 [zfs]
[51717.311647]  dmu_free_long_range+0x78/0xc0 [zfs]
[51717.316708]  zfs_rmnode+0x34c/0x440 [zfs]
[51717.321182]  zfs_inactive+0x127/0x200 [zfs]
[51717.325826]  zpl_evict_inode+0x36/0x50 [zfs]
[51717.330549]  evict+0xd0/0x1d0
[51717.333835]  do_unlinkat+0x170/0x320
[51717.337719]  __x64_sys_unlinkat+0x33/0x60
[51717.342049]  do_syscall_64+0x5b/0xc0
[51717.345948]  ? ksys_write+0x6b/0xf0
[51717.349764]  ? syscall_exit_to_user_mode+0x27/0x40
[51717.354863]  ? do_syscall_64+0x67/0xc0
[51717.358912]  ? syscall_exit_to_user_mode+0x27/0x40
[51717.364003]  ? do_syscall_64+0x67/0xc0
[51717.368051]  ? do_syscall_64+0x67/0xc0
[51717.372087]  ? syscall_exit_to_user_mode+0x27/0x40
[51717.377171]  ? do_syscall_64+0x67/0xc0
[51717.381209]  ? do_syscall_64+0x67/0xc0
[51717.385251]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[51717.390602] RIP: 0033:0x7ffa28b2a8d7
[51717.394473] RSP: 002b:00007ffc1d642c28 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[51717.402337] RAX: ffffffffffffffda RBX: 000055f7be946c40 RCX: 00007ffa28b2a8d7
[51717.409773] RDX: 0000000000000000 RSI: 000055f7be946d40 RDI: 0000000000000004
[51717.417206] RBP: 000055f7be93ad00 R08: 0000000000000003 R09: 0000000000000000
[51717.424637] R10: f88dd7b6b1ebeedc R11: 0000000000000246 R12: 0000000000000000
[51717.432069] R13: 00007ffc1d642e10 R14: 0000000000000003 R15: 000055f7be946c40
[51717.439512]  </TASK>

You get the idea. Once these start appearing it will eventually crash and not free up. Upon reboot, all the deleted data is back from the dead like something from Groundhog Day.

Some pool info for you:
Code:
zpool status Data     

  pool: Data
 state: ONLINE
  scan: scrub in progress since Wed Jan  3 19:55:49 2024
        6.15T / 430T scanned at 122M/s, 5.49T / 430T issued at 109M/s
        0B repaired, 1.28% done, 47 days 07:50:45 to go
config:

        NAME                                      STATE     READ WRITE CKSUM
        Data                                      ONLINE       0     0     0
          raidz3-0                                ONLINE       0     0     0
            a5217421-9292-4482-a634-f9bdbf83640f  ONLINE       0     0     0
            036df555-b043-4cd6-b8d7-c8bf1ab68bfa  ONLINE       0     0     0
            661e54d8-b4ea-4dc5-a07a-53ebb484b4cd  ONLINE       0     0     0
            a533bcc7-1c45-4282-b24e-0a8daedc13f9  ONLINE       0     0     0
            faa15b16-9a04-4510-9d87-20723f997b7e  ONLINE       0     0     0
            07e992b0-c90b-4a8b-b75f-389ef1facbe5  ONLINE       0     0     0
            dd2f1c12-d43d-4c13-8e08-c7a6100e5dad  ONLINE       0     0     0
            e6dbad60-ee3c-4c78-8602-715e9a7b065d  ONLINE       0     0     0
            1a0ae9f0-44d9-441d-ad8d-a799157c5305  ONLINE       0     0     0
            b73f6c5c-3a0c-42f6-b717-19d1f432da16  ONLINE       0     0     0
            8fc6ea7f-4b49-45b6-9531-fa43edf9c3f6  ONLINE       0     0     0
            9965c5fe-ade4-4c5c-858f-60cad764c035  ONLINE       0     0     0
            922e7bd7-bfa5-4e2d-99ad-69ce155fd445  ONLINE       0     0     0
            84177458-c1c6-4980-a613-d8ba65b9bd9c  ONLINE       0     0     0
            da75589a-b10f-4fbf-a948-dd34f052a519  ONLINE       0     0     0
            7deb555a-eadb-4b28-9c0c-7474d60352a2  ONLINE       0     0     0
            cd4bbf1f-eb23-4435-8591-dba7fd39c2e0  ONLINE       0     0     0
            b0b91c2d-1889-409f-8939-b88f3cdb491e  ONLINE       0     0     0
            ffc8409c-f867-444e-8c10-63a5440b46a6  ONLINE       0     0     0
            9673ab81-0003-43b0-898f-f0e8f630bfd5  ONLINE       0     0     0
            c330017d-62ce-485d-965f-1dcac77f53eb  ONLINE       0     0     0
            2d6c2228-5031-402f-8557-94a36e791dcf  ONLINE       0     0     0
            efbdae32-a34c-46e8-8801-5d04811f92d7  ONLINE       0     0     0
            5a19087b-7354-4e66-b07e-f298d1c621b2  ONLINE       0     0     0
            fa023577-eb85-44dc-b6c7-fe2c1fb05e72  ONLINE       0     0     0
            23672161-c8e4-4b27-b1b3-ce6506f78bb9  ONLINE       0     0     0
            689c43d7-a9ca-4287-87ce-19750d79c79f  ONLINE       0     0     0
            21c7bb84-a641-4959-9a73-028178e2ab9f  ONLINE       0     0     0
            543addb4-2aa1-4e2a-b8d4-5c4e8a6c60a8  ONLINE       0     0     0
            4feb278b-9af9-493a-a7a3-a2a708d79357  ONLINE       0     0     0
            b31f7595-f3ba-4525-8775-66f600696adb  ONLINE       0     0     0
            74789bb2-a9fd-4fa4-98f1-b33e7594ffb3  ONLINE       0     0     0
            db0fa131-d0aa-4252-af47-accfc25d6dcb  ONLINE       0     0     0
            0cef4b94-4f58-420b-84d7-ac59c5087580  ONLINE       0     0     0
            7a6132c4-c64f-4c16-b148-539effaa6fde  ONLINE       0     0     0
            dbf1d058-15fe-4a92-9903-2f921a63b33e  ONLINE       0     0     0

errors: No known data errors

zpool iostat Data     
              capacity     operations     bandwidth  
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
Data         421T   103T  16.1K    741   258M  16.9M

zpool list Data  
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
Data   524T   421T   103T        -         -    22%    80%  1.54x    ONLINE  /mnt



It is under a little duress at the moment due to running a scrub, however, I cannot cancel it, the system will hang.

Now for the fun bit. Sometimes I can delete a straight TB and it'll be 100% fine. Other files look like they've gone, then eventually the dreaded crash and voila, they're back after the reboot.

I don't really want to nuke the pool and I have a feeling I won't be able to. I have disabled dedupe and compression going forward as I'm now using it as a Restic target since I'm doing a NAS1 to NAS2 backup situation - rsync wasn't a backup.

Any thoughts, ideas, or advice for me?
 

kjd

Cadet
Joined
Jan 4, 2024
Messages
4
Just adding to this. I decided to write a script that would slow delete, giving the system plenty of time to keep up. I started with monitoring the pool, checking for dmesg, etc. This is the script I ended up with:

Code:
#!/bin/bash
# Delete function with wait
delete_item_and_wait() {
    local item=$1
    if [ -d "$item" ] && [ -z "$(ls -A "$item")" ]; then
        # It's an empty directory
        rmdir "$item"
        echo "Deleted directory: $item"
    elif [ -f "$item" ]; then
        # It's a file
        rm "$item"
        echo "Deleted file: $item"
    fi
    echo "Waiting for 5 seconds..."
    for i in {5..1}; do
        echo -ne "$i... "
        sleep 1
    done
    echo
}

# Item Count to see what's left
total_items=$(find . -depth -type d -empty -o -type f | wc -l)
echo "Items to be removed: $total_items"

# Main loop
while :; do
    # Check for txg_sync blockage in dmesg using tail
    if ! dmesg | tail -n 20 | grep -q "task txg_sync"; then
        # Get current Write IO - this might be the wrong thing to monitor
        IO=$(zpool iostat -p Data | tail -n1 | awk '{print $5}')
        echo "Current Write IO: $IO"

        # Check if IO writes is under the threshold
        if [ "$IO" -lt 100 ]; then
            item=$(find . -depth -type d -empty -o -type f | head -n 1)
            if [ -n "$item" ]; then
                # Call delete function
                delete_item_and_wait "$item"
                total_items=$((total_items - 1))
                echo "Items remaining: $total_items"
            else
                echo "No more files or empty directories to delete."
                break
            fi
        else
            echo "IO is too high, waiting for 10 seconds..."
            for i in {10..1}; do
                echo -ne "$i... "
                sleep 1
            done
            echo
        fi
    else
        echo "txg_sync task is blocked, waiting for 240 seconds..."
        for i in {240..1}; do
            echo -ne "$i... "
            sleep 1
        done
        echo
    fi
done


I ran the script, it was looking good, and then it just stopped. It hangs the terminal. I am using tmux so I created another and dmesg had no errors initially, iostat is really low. Then, the errors kicked in:

Code:
[ 4352.373878] INFO: task rm:35628 blocked for more than 483 seconds.
[ 4352.380590]       Tainted: P           OE      6.1.63-production+truenas #2
[ 4352.387882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4352.396017] task:rm              state:D stack:0     pid:35628 ppid:35548  flags:0x00004006
[ 4352.404683] Call Trace:
[ 4352.407438]  <TASK>
[ 4352.409852]  __schedule+0x2ed/0x860
[ 4352.413661]  schedule+0x5a/0xb0
[ 4352.417105]  io_schedule+0x42/0x70
[ 4352.420818]  cv_wait_common+0xaa/0x130 [spl]
[ 4352.425410]  ? cpuusage_read+0x10/0x10
[ 4352.429462]  txg_wait_open+0x89/0xd0 [zfs]
[ 4352.434050]  dmu_free_long_range_impl+0x1e6/0x340 [zfs]
[ 4352.439736]  dmu_free_long_range+0x78/0xc0 [zfs]
[ 4352.444800]  zfs_rmnode+0x34c/0x440 [zfs]
[ 4352.449260]  zfs_inactive+0x127/0x200 [zfs]
[ 4352.453893]  zpl_evict_inode+0x36/0x50 [zfs]
[ 4352.458614]  evict+0xd0/0x1d0
[ 4352.461895]  do_unlinkat+0x170/0x320
[ 4352.465779]  __x64_sys_unlinkat+0x33/0x60
[ 4352.470095]  do_syscall_64+0x5b/0xc0
[ 4352.473977]  ? preempt_count_add+0x47/0xa0
[ 4352.478380]  ? up_read+0x37/0x70
[ 4352.481923]  ? do_user_addr_fault+0x1bb/0x570
[ 4352.486589]  ? exc_page_fault+0x70/0x170
[ 4352.490827]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[ 4352.496197] RIP: 0033:0x7f70465cb8d7
[ 4352.500088] RSP: 002b:00007ffeead2fc28 EFLAGS: 00000206 ORIG_RAX: 0000000000000107
[ 4352.507967] RAX: ffffffffffffffda RBX: 000055d3c753bfb0 RCX: 00007f70465cb8d7
[ 4352.515409] RDX: 0000000000000000 RSI: 000055d3c753ad90 RDI: 00000000ffffff9c
[ 4352.522856] RBP: 000055d3c753ad00 R08: 0000000000000003 R09: 0000000000000000
[ 4352.530299] R10: 00007f70464eaf60 R11: 0000000000000206 R12: 0000000000000000
[ 4352.537738] R13: 00007ffeead2fe10 R14: 0000000000000003 R15: 000055d3c753bfb0
[ 4352.545172]  </TASK>


I upgraded the RAM recently to 48GB from 16GB to see if this cured the issue, it did not. I have run memtest and it passed, so it's not that. Any other ideas as I'm banging my head against a wall here?
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
None of the messages I see mean actual errors. It looks like the system is waiting for some I/O completion. Do you see active disk I/O when it happens? Your extremely wide single-vdev RAIDZ pool of HDDs must have very small IOPS performance, that may cause low performance on metadata accesses. If dedup was used for for most of data writes, your dedup table should be pretty big, and likely not fitting into RAM, so attempt to massively delete the data may require a lot of random I/Os, for which your pool is not great. So it may be not an actual bug, but a system misconfiguration.

So wide RAIDZ pools usually don't have sense, and may be useful only if you store only very large AND you increased recordsize on datasets to the maximum, somewhere about 4-16MB. Dedup with relatively small amount of RAM also would require as big block as possible. And both points would benefit heavily from special vdev to improve metadata performance.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Deduplication is likely a major factor as mentioned by @mav@ as well as the 36-wide RAIDZ3 (you mentioned 34 drives in your initial post - but your zpool status shows 36?)

Can you post the results of zpool status -D Data?

Breaking the pool up into smaller vdevs is highly advised but cannot be done without a "nuke and rebuild" - dedup should definitely stay off, and I assume you are using compression upstream in Restic?

2 x Toshiba SDD (boot disks, RAID 1)

What is the storage controller you're using? The use of the term "RAID1" suggests there may be hardware RAID in use as well.
 

kjd

Cadet
Joined
Jan 4, 2024
Messages
4
Hello folks

So, I nuked it. You are correct, I was in a flurry when I wrote 34, it's 36 x 16TB.

I have recreated the pool using 9 wide in Z2, turned off atime, dedupe and compression. This NAS is purely going to be used for a backup Archive and be a repo for Restic. I needed as much space as possible with a decent amount of disk failure protection as a swap may not be possible for a failed disk for up to 72 hours, at worst case.

Thank you for all the input I received. I have read up a lot on ZFS now and have realised it was a bad design out of the bag, and something that should have been weighed before being put into production. Never let your third-party vendor tell you how the device should be configured as they are maximising profits - as I have just learnt the hard way. Also, don't let the junior tech on your team be the final say. Sigh.

To answer your question @HoneyBadger, I am using Restics inbuilt compression. Hopefully this is all solved for now. I'll be back if it falls over at the end of the 293TB backup =D
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
@kjd Glad to see that you had the freedom to be able to destroy and recreate - the split into a "4x9wZ2" config will help, as will the removal of deduplication.

Quick question about something else mentioned in the original post:

but it also has a mix of a lot of small files, sometimes in the 270k+ region in a subtree.

Is this "270K+ files in a single directory" or just "under a single dataset"? A large number of files in a single directory can make for a lot of metadata I/O if the client system queries each of these files for things like its created/modified time, which could still hamper overall system responsiveness.
 

kjd

Cadet
Joined
Jan 4, 2024
Messages
4
270k+ files in multiple sub folders, but a single folder may have 10k+. Think of multiple burst shots on a camera and a sidecar xml being stored alongside.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
270k+ files in multiple sub folders, but a single folder may have 10k+. Think of multiple burst shots on a camera and a sidecar xml being stored alongside.

That will definitely be a much harder workload for RAIDZ to handle vs. the large-block of the Restic backups, especially if there's a thumbnailing process from the client side that wants to read each file or the XML. Looking at your chassis layout, it seems you might have a four-pack of 2.5" slots at the rear. Depending on capacity requirements, and if you find the performance lacking, I'd consider trying to leverage that as a mirror/RAIDZ1 pool of SSDs for speed, and set up a periodic job to replicate the contents to the larger, more-redundant disk pool.
 
Top