"cannot unmount '/var/db/system': Device busy" error while trying to do zfs replication

Gordon Thagard

Dabbler
Joined
Jan 25, 2016
Messages
33
FreeNAS-11.3-U3.2

This is kind of a strange one to me. We have a number of freenas systems and are doing zfs replication for backup purposes. Its easy to setup and works well except for one machine. I've even tried switching the system dataset between vol1 and freenas-boot and then rerunning the replication task. It always errors out with "cannot unmount '/var/db/system': Device busy".

Does anyone have any insight into this?

Error
full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T
full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T
cannot unmount '/var/db/system': Device busy
warning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received.
Logs
[2020/06/15 09:00:15] INFO [Thread-17] [zettarepl.paramiko.replication_task__task_6] Connected (version 2.0, client OpenSSH_8.0-hpn14v15)
[2020/06/15 09:00:15] INFO [Thread-17] [zettarepl.paramiko.replication_task__task_6] Authentication (publickey) successful!
[2020/06/15 09:00:15] INFO [replication_task__task_6] [zettarepl.replication.run] For replication task 'task_6': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/15 09:00:16] WARNING [replication_task__task_6] [zettarepl.replication.run] For task 'task_6' at attempt 1 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/var/db/system': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/15 09:00:16] INFO [replication_task__task_6] [zettarepl.replication.run] After recoverable error sleeping for 1 seconds
[2020/06/15 09:00:17] INFO [replication_task__task_6] [zettarepl.replication.run] For replication task 'task_6': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/15 09:00:18] WARNING [replication_task__task_6] [zettarepl.replication.run] For task 'task_6' at attempt 2 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/var/db/system': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/15 09:00:18] INFO [replication_task__task_6] [zettarepl.replication.run] After recoverable error sleeping for 2 seconds
[2020/06/15 09:00:20] INFO [replication_task__task_6] [zettarepl.replication.run] For replication task 'task_6': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/15 09:00:20] WARNING [replication_task__task_6] [zettarepl.replication.run] For task 'task_6' at attempt 3 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/var/db/system': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/15 09:00:20] INFO [replication_task__task_6] [zettarepl.replication.run] After recoverable error sleeping for 4 seconds
[2020/06/15 09:00:25] INFO [replication_task__task_6] [zettarepl.replication.run] For replication task 'task_6': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/15 09:00:25] WARNING [replication_task__task_6] [zettarepl.replication.run] For task 'task_6' at attempt 4 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/var/db/system': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/15 09:00:25] INFO [replication_task__task_6] [zettarepl.replication.run] After recoverable error sleeping for 8 seconds
[2020/06/15 09:00:34] INFO [replication_task__task_6] [zettarepl.replication.run] For replication task 'task_6': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/15 09:00:34] WARNING [replication_task__task_6] [zettarepl.replication.run] For task 'task_6' at attempt 5 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/var/db/system': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/15 09:00:34] ERROR [replication_task__task_6] [zettarepl.replication.run] Failed replication task 'task_6' after 5 retries
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
How large is your system dataset? du-sh /var/db/system. It could be too large to fit on the freenas-boot pool.
 

Gordon Thagard

Dabbler
Joined
Jan 25, 2016
Messages
33
root@freenas1[~]# df -h
Filesystem Size Used Avail Capacity Mounted on
freenas-boot/ROOT/11.3-U3.2 5.7G 1.0G 4.7G 17% /
devfs 1.0K 1.0K 0B 100% /dev
tmpfs 16G 8.5M 16G 0% /etc
tmpfs 2.0G 8.0K 2.0G 0% /mnt
tmpfs 5.3T 37M 5.3T 0% /var
fdescfs 1.0K 1.0K 0B 100% /dev/fd
vol1 3.5T 96K 3.5T 0% /mnt/vol1
vol1/qa 11T 7.0T 3.5T 66% /mnt/vol1/qa
freenas-boot/.system 4.7G 51K 4.7G 0% /var/db/system
freenas-boot/.system/cores 4.7G 60M 4.7G 1% /var/db/system/cores
freenas-boot/.system/samba4 4.7G 6.4M 4.7G 0% /var/db/system/samba4
freenas-boot/.system/syslog-445a1355f3314ba4946bd845d9821e43 4.7G 2.5M 4.7G 0% /var/db/system/syslog-445a1355f3314ba4946bd845d9821e43
freenas-boot/.system/rrd-445a1355f3314ba4946bd845d9821e43 4.7G 23M 4.7G 0% /var/db/system/rrd-445a1355f3314ba4946bd845d9821e43
freenas-boot/.system/configs-445a1355f3314ba4946bd845d9821e43 4.7G 2.4M 4.7G 0% /var/db/system/configs-445a1355f3314ba4946bd845d9821e43
freenas-boot/.system/webui 4.7G 23K 4.7G 0% /var/db/system/webui
root@freenas1[~]# du -sh /var/db/system
94M /var/db/system
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What happens when you run zfs list | grep vol1/.system?
 

Gordon Thagard

Dabbler
Joined
Jan 25, 2016
Messages
33
# zfs list | grep vol1/.system
vol1/.system 801M 3.54T 160K legacy
vol1/.system/configs-445a1355f3314ba4946bd845d9821e43 23.9M 3.54T 23.9M legacy
vol1/.system/cores 538M 3.54T 538M legacy
vol1/.system/rrd-445a1355f3314ba4946bd845d9821e43 164M 3.54T 164M legacy
vol1/.system/samba4 38.5M 3.54T 38.5M legacy
vol1/.system/syslog-445a1355f3314ba4946bd845d9821e43 36.0M 3.54T 36.0M legacy
vol1/.system/webui
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
OK, so there's remnants of the system database left over on zpool vol1. Try rebooting to make sure the system database is running completely on freenas-boot. Then, making sure you have a good backup of vol1 and your configuration, try zfs destroy -r vol1/.system.
 

Gordon Thagard

Dabbler
Joined
Jan 25, 2016
Messages
33
When you asked me to look for .system on vol1 I still had the system dataset on freenas-boot. I moved it back to vol1 before I ran the zfs list, only because the system dataset lives on vol1 on all my other healthy freenas systems. There are no remains on freenas-boot at present:

root@freenas1[~]# zfs list | grep vol1/.system
vol1/.system 801M 3.54T 160K legac
vol1/.system/configs-445a1355f3314ba4946bd845d9821e43 23.9M 3.54T 23.9M legac
vol1/.system/cores 538M 3.54T 538M legac
vol1/.system/rrd-445a1355f3314ba4946bd845d9821e43 164M 3.54T 164M legac
vol1/.system/samba4 38.5M 3.54T 38.5M legac
vol1/.system/syslog-445a1355f3314ba4946bd845d9821e43 36.0M 3.54T 36.0M legac
vol1/.system/webui 88K 3.54T 88K legac
root@freenas1[~]# zfs list | grep freenas-boot/.system
root@freenas1[~]#
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Actually belay that. After the reboot, run the zfs list again to see if those remnants still exist. Also run mount | grep /var/db to show where the active system dataset lives.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Looking over your original error report, it looks like you're replicating vol1 to itself, and of course, it can't unmount vol1 to receive the send, as it's hosting the system dataset.
 

Gordon Thagard

Dabbler
Joined
Jan 25, 2016
Messages
33
As I look at my snapshot and replication tasks I don't think I'm replicating vol1 pool itself. I expressly set it to replicate the dataset qa. But OK, it certainly is acting like I'm doing something to vol1 itself. How can I check this and/or undo it?

Here is the output of your other request:
root@freenas1[~]# mount | grep /var/db
vol1/.system on /var/db/system (zfs, local, nfsv4acls)
vol1/.system/cores on /var/db/system/cores (zfs, local, nfsv4acls)
vol1/.system/samba4 on /var/db/system/samba4 (zfs, local, nfsv4acls)
vol1/.system/syslog-445a1355f3314ba4946bd845d9821e43 on /var/db/system/syslog-445a1355f3314ba4946bd845d9821e43 (zfs, local, nfsv4acls)
vol1/.system/rrd-445a1355f3314ba4946bd845d9821e43 on /var/db/system/rrd-445a1355f3314ba4946bd845d9821e43 (zfs, local, nfsv4acls)
vol1/.system/configs-445a1355f3314ba4946bd845d9821e43 on /var/db/system/configs-445a1355f3314ba4946bd845d9821e43 (zfs, local, nfsv4acls)
vol1/.system/webui on /var/db/system/webui (zfs, local, nfsv4acls)
 

Attachments

  • replication.png
    replication.png
    27.1 KB · Views: 336
  • snapshots.png
    snapshots.png
    27.5 KB · Views: 334

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Try deleting the replication task and recreating it.
 

Gordon Thagard

Dabbler
Joined
Jan 25, 2016
Messages
33
I created a dataset on the receiver side called qa. I then ran the replication task and it worked. Its almost 7tb so it took a long while to complete before I could look inside and see the directory structure. I was fully expecting that it would look like /mnt/vol1/qa/qa on the receiver side but no. Its /mnt/vol1/qa and all the sub-directories and files are there same as on the sender side. As I have played with this stuff a bunch in the past my experience tells me it should look like /mnt/vol1/qa/qa on the receiver side doing what I did. This is a head scratcher to me.

But anyway, I have been operating under the assumption that it was a sender side problem when in fact those errors are from the receiver side. Running another test I deleted the qa dataset on the receiver side and also moved the system dataset to freenas-boot. I re-ran the replication task and it failed again:

Error
full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T
full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T
cannot unmount '/mnt/vol1': Device busy
warning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received.
Logs
[2020/06/17 10:02:49] INFO [Thread-55] [zettarepl.paramiko.replication_task__task_10] Connected (version 2.0, client OpenSSH_8.0-hpn14v15)
[2020/06/17 10:02:49] INFO [Thread-55] [zettarepl.paramiko.replication_task__task_10] Authentication (publickey) successful!
[2020/06/17 10:02:49] INFO [replication_task__task_10] [zettarepl.replication.run] For replication task 'task_10': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/17 10:02:50] WARNING [replication_task__task_10] [zettarepl.replication.run] For task 'task_10' at attempt 1 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/mnt/vol1': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/17 10:02:50] INFO [replication_task__task_10] [zettarepl.replication.run] After recoverable error sleeping for 1 seconds
[2020/06/17 10:02:51] INFO [replication_task__task_10] [zettarepl.replication.run] For replication task 'task_10': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/17 10:02:52] WARNING [replication_task__task_10] [zettarepl.replication.run] For task 'task_10' at attempt 2 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/mnt/vol1': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/17 10:02:52] INFO [replication_task__task_10] [zettarepl.replication.run] After recoverable error sleeping for 2 seconds
[2020/06/17 10:02:54] INFO [replication_task__task_10] [zettarepl.replication.run] For replication task 'task_10': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/17 10:02:55] WARNING [replication_task__task_10] [zettarepl.replication.run] For task 'task_10' at attempt 3 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/mnt/vol1': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/17 10:02:55] INFO [replication_task__task_10] [zettarepl.replication.run] After recoverable error sleeping for 4 seconds
[2020/06/17 10:02:59] INFO [replication_task__task_10] [zettarepl.replication.run] For replication task 'task_10': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/17 10:03:00] WARNING [replication_task__task_10] [zettarepl.replication.run] For task 'task_10' at attempt 4 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/mnt/vol1': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/17 10:03:00] INFO [replication_task__task_10] [zettarepl.replication.run] After recoverable error sleeping for 8 seconds
[2020/06/17 10:03:08] INFO [replication_task__task_10] [zettarepl.replication.run] For replication task 'task_10': doing push from 'vol1/qa' to 'vol1' of snapshot='auto-2020-06-14_00-00' incremental_base=None receive_resume_token=None
[2020/06/17 10:03:08] WARNING [replication_task__task_10] [zettarepl.replication.run] For task 'task_10' at attempt 5 recoverable replication error RecoverableReplicationError("full send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\nfull send of vol1/qa@auto-2020-06-14_00-00 estimated size is 6.96T\ncannot unmount '/mnt/vol1': Device busy\nwarning: cannot send 'vol1/qa@auto-2020-06-14_00-00': signal received")
[2020/06/17 10:03:08] ERROR [replication_task__task_10] [zettarepl.replication.run] Failed replication task 'task_10' after 5 retries

The fun never ends. I may just give up on this game and use what works, ie make a dataset under vol1 on the receiver machine and tell it to replicate there.
 

yags

Cadet
Joined
Feb 23, 2018
Messages
4
I had a similar situation with /var/db/system.

I backed up volume my_pool to volume bu.
Then I deleted volume my_pool, added 2 additional disks to it, then recreated my_pool.
I tried to zfs send bu | zfs recv my_pool and received the same error.

It turns out that after I recreated my_pool, I imported the volume, then I imported bu volume.
That order of imports was the issue.
To solve the problem, I detached everything, imported bu, then imported my_pool.

After this, I was able to zfs send |zfs recv to restore the data.
 

PackElend

Explorer
Joined
Sep 23, 2020
Messages
60
I had the same issue when I restarted the system (restart took ages)

In my case, it turned out that the system moved the dataset from the boot pool to the HDD pool, when the latter was created (it was the first one being created).
This action in combination with
and maybe
  1. Stuck job alerts.process_alerts | TrueNAS Community
  2. Stuck at alert.process_alerts | TrueNAS Community
caused that issue.

I moved it to the SSD pool and restarted the system, which resolved it.
 
Top