Sprinklings of "txg_sync blocked" in my logs

Perry The Cynic

Dabbler
Joined
Aug 15, 2023
Messages
34
I'm qualifying a Mini R for service. I side-graded it to TrueNAS-SCALE-22.12.3.3, and it seems to be working fine. (Not as fast as I want it to go, but that's why I'm qualifying...)

Everything seems to be in order, but the kernel log contains messages of this form:

Code:
[100773.105765] INFO: task txg_sync:313897 blocked for more than 241 seconds.
[100773.106618]       Tainted: P           OE     5.15.107+truenas #1
[100773.107462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[100773.108333] task:txg_sync        state:D stack:    0 pid:313897 ppid:     2 flags:0x00004000
[100773.109256] Call Trace:
[100773.110181]  <TASK> (etc)


They seem to coincide with times of high zfs load (replication tasks, mostly). Right now, with a 1TB+ replication task running from one local pool to another, I get this message every few minutes. I've perused the existing forum messages about "txg_sync", but they all come with complaints of various failures or freezes. My system seems to be generally fine.

So... is this an indication of a problem I need to hunt down before putting the system in service, or is it, well, an INFO that I should keep an eye on but can otherwise live with? If it's the former, how do I investigate?

Cheers
-- perry
 
Last edited by a moderator:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hi @Perry The Cynic

I've edited the RGB codes out of your message and changed the quote to a code snippet, just for ease of readability. Hope you don't mind.

txg_sync blocking for that long would indicate that it took over four minutes to commit data to disk. How fast is the replication running at?

I did see that you've got a Mini R for hardware - can you tell me how much RAM is installed, and the disk make/model and vdev layout?
 

Perry The Cynic

Dabbler
Joined
Aug 15, 2023
Messages
34
Thanks for the edit.

64GB of RAM (32GB stock + the official IX upgrade kit). Incremental replication between two pools.
Source: 2x2 18TB WD Red Pro + 1x2 2TB Micron 5300 (as a special vdev). That's the "real" pool disks under test.
Destination: 1x2 10TB, a pair of Seagate and HGST disks from my desk drawer, purely for testing.

The ARC seems to be working fine (the RAM is mostly filled, but there's spare). I see no errors, but transfer seems a bit slow (60-80MB/s average; the destination disks can do 200MB/s flat out, and the pool is mostly empty). FWIW, that (destination) pool is currently the system pool; that's purely an artifact of how this test setup was built.
There is one idle VM running, and the stock Prometheus+Grafana apps doing nominal things (scraping four sources).

Thanks
-- perry
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I'll see about running some tests against a Mini R with similar settings.

The destination 10TB drives should be outside the major risk of encountering SMR technology, but can you clarify with the model numbers as well?

What kind of data are you replicating? Even though it's happening at the "lower-level" of ZFS, smaller records will have more overhead in copying, so if you're replicating something with a small recordsize (or small files) it could be significantly slower than with larger multi-MB sections.

The system dataset can be relocated to the "source" pool as well through the System -> Advanced -> Storage -> Configure option if that wasn't intended:

1693936669815.png
 

Perry The Cynic

Dabbler
Joined
Aug 15, 2023
Messages
34
The two remaindered disks are one ST10000VN0004-1ZD101 and one HGST_HDN721010ALE604. I think they're older than SMR technology. :confused:

The data is (intentionally) a motley mix; its origin is a bunch of semi-randomly chosen items from my current operational pool (running the latest OpenZFS-on-macOS). Time Machine backup images (thousands of 8M band files), a working maildir (lots of small files), and a few zvols. Some got replicated in over the network, some got rsync'ed over. I was basically trying everything once. :smile:

Aaaand... I just rebooted my NAS for unrelated reasons, and the txg_sync messages just... disappeared. I've re-run the replication job that caused them (after wiping out the destination), and - nothing. Everything is clean, message wise. I'm left to hope that this was a fluke, rather than a gremlin going into hiding until I deploy. :confused:

Thank you for your attention, and if anything I did or said raises concerns, I'm happy to learn more about proper NAS husbandry...

Cheers
-- perry
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Neither of those should be SMR drives, Seagate hasn't made any SMR Ironwolfs (Ironwolves?) and the HGST line there is all CMR as well.

I'm trying a local replication right now from a similar configuration (source is 4x 10TB drives in 2x2-mirror, destination is 2x 10TB in mirror) with a similar mix of data from my personal storage, and it's humming along at ~190-200MB/s - no signs of txg timeouts in the logs.

Normally, excessive time to write a transaction group would be indicative of an underlying hardware problem - a slow disk dragging things down, for example - but that's not something I'd expect to magically resolve itself on a reboot. Did your transfer speeds also improve the second time around?
 

Perry The Cynic

Dabbler
Joined
Aug 15, 2023
Messages
34
I've recreated both directions of my test after rebooting again. Big pool -> little pool reaches >200MB/s, which is pretty good (those old disks theoretically can do faster, but not by much), though it also goes through periods of much slower transfers. I should note that the (recursive) dataset being replicated contains a variety of compression algorithms, including some zstd19, and the CPU is peaking at >90% at times, coinciding with slower transfer rates. Clearly I'm not going to use zstd>3 in deployment (for my data sets, it only improves over zstd3 by <4%).

On the other hand, transfers from the small pool to the big one seem to cap out at about 200MB/s as well. I had expected more than that, since the two vdevs each should be capable of >200MB/s each (and the small pool can share the read bandwidth, yielding upwards of 400MB/s in theory). In both directions, it looks like the limit is the small pool (no surprise there).

FWIW, I tried disabling the VMs and apps I had running, but that doesn't seem to make much of a difference (except a wiggle while they're turning up or down).

And all along the way, no txg_sync messages in the log at all.

Cheers
-- perry
 
Top