Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

ZFS Replication to USB drive caused NAS reboot

Western Digital Drives - The Preferred Drives of FreeNAS and TrueNAS CORE

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
I bought one of the 14TB EasyStores the other day and am trying to use it as USB-attached backup media, but zfs replication to that device causes my NAS to reboot.

I have the EasyStore drive plugged into a USB3 port, and have created a single vdev pool as a backup target.

I configured a replication task in the GUI, replicating my primary pool's set of datasets to this single backup pool.

That process runs, and eventually, the system spontaneously reboots. I have been able to replicate this behavior reliably.

Every time that replication task runs, the NAS reboots in short order. It seems like it’s encountering something at a particular point in the replication that it does not like, perhaps an issue with a dataset? I am not quite sure where to begin for troubleshooting.

I am running TrueNAS 12.0-RELEASE, which shows up in a uname -a as 12.2-RC3 7c4ec6ff02c (HEAD)

Hardware is:
  • AMD-compatible motherboard with Ryzen 7 processor
  • Onboard SATA controller
  • Seasonic 550w power supply
  • 64GB ECC memory
 

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
The only tunable I have in place is vfs.zfs.arc_max set to 32212254720 bytes.
 

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
I removed the tunable and split up the pool replication into individual dataset replications running simultaneously and have encountered another reboot while running those replications. I'm down to two suspect jobs and will report back.
 

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
I have narrowed the reboot-related replication task to a single dataset. While the whole pool is encrypted, and most datasets are inheriting encryption, this particular dataset is the only one that I have with differing encryption that is not inherited. This dataset is encrypted with a key passphrase and is not automatically unlocked and mounted. I have been performing the replication with that dataset unmounted and still encrypted. I'll try it mounted/unlocked.
 
Last edited:

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
Unlocking that dataset made no difference. Immediate crash upon starting that replication task. I also disabled C6 in the BIOS in the event that that would help and it made no difference in crashing.

Of note, this dataset has a space in the dataset name, which I understand isn't advised. However, I have another dataset with a space in the name that isn't exhibiting this same problem.

I've created a new dataset with the same kind of encryption, albeit a different passphrase, that does not have a space. I am syncing the data from the original dataset to this one, and will try to replicate the new dataset.
 
Last edited:

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
As I said above, I created a new dataset with the same kind of encryption, albeit a different passphrase, that does not have a space in the name. I rsynced the data from the first dataset to this one, and set up replication.

This replication task got to 99.9% and just as I was composing a post proclaiming success....TrueNAS crashed. So the problem does not appear to be the space in the dataset name.

My next test will be to snapshot the original dataset, clone it and promote it, and try replication again.
 

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
Actually, not sure that will tell me anything useful. Think I'll diff the origin and replica filesystems and see where it's dying.
 

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
Well, can't do that. The incomplete state of the snapshot replication makes the replicated dataset unmountable. I suppose I will start halving the amount of data replicated until I find the nasty bit.
 

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
Using the “halve your problem set, test, and halve again” method, I have narrowed about 2TB of data down to exactly two files, either of which reliably crash TrueNAS when I include them in a replication task.

Both files are actually (broken) symlinks from an archive of a MacOS ~/Library/Saved Application State directory, which is the result of a backup of a MacOS computer using Carbon Copy Cloner, the fruits of which ultimately ended up on TrueNAS.

I have confirmed that OTHER broken symlinks in the same directory do NOT crash TrueNAS when part of a replication process. I have confirmed that OTHER MacOS backups containing ~/Library/Saved Application State directories ALSO with broken symlinks DO NOT cause a crash. It is definitively these two only.

I have ruled out my specific hardware/software by standing up a virtualized TrueNAS 12-R instance with virtualized disks, and I am able to routinely replicate (heh) the crash with either or both of these files. I used dataset names without spaces in them for these tests, so that is definitively not the issue either.

This is looking like a bug of some kind at this point, so I'll open a ticket.
 
Last edited:

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
I thought this might be related to encryption, so here is more information and more test results:

My primary pool is encrypted with native 12-R ZFS AES-256-GCM encryption with a keyfile. The dataset in which these files live resides in that pool.
The dataset in question does not inherit encryption/keys from that pool, but is rather encrypted with AES-256-GCM via a different passphrase.

I created a new dataset inside that pool that DID inherit pool encryption, copied the files to it, set up replication, and TrueNAS crashed.
I created a new dataset inside that pool that did NOT inherit any encryption, copied the files to it, set up replication, and TrueNAS crashed.

I then created a new pool with NO encryption.
I created a new dataset in that new pool, encrypted with a new pass phrase, copied the files to it, set up replication, and TrueNAS crashed.
I created a new dataset in that new pool that inherited the lack of encryption, copied the files to it, set up replication, and replication SUCCEEDED without a crash.

Again, all of this on a virtualized TrueNAS instance, not my production NAS.
 

freqlabs

iXsystems
iXsystems
Joined
Jul 18, 2019
Messages
23
Hi, could you please attach a send stream for this dataset to help us reproduce the panic?
 

HenchRat

Member
Joined
Nov 27, 2020
Messages
38
I'm having difficulty generating one.

root@truenas[~]# zfs send -p -L -c tank/dataset@auto-2020-12-09_13-41 > stream
cannot send tank/dataset@auto-2020-12-09_13-41: encrypted dataset tank/dataset may not be sent with properties without the raw flag
warning: cannot send 'tank/dataset@auto-2020-12-09_13-41': backup failed
root@truenas[~]#
 

winnielinnie

Senior Member
Joined
Oct 22, 2019
Messages
391
As an update for anyone who is following this, there are working on a fix. It has something to do with a combination of natively encrypted ZFS dataset + symlinks + special "spill over" blocks due to long path names.
 
Top