SOLVED Replication between FreeNAS

adrianwi · Mar 2, 2015

I applied the latest update to my backup FreeNAS box over the weekend and have now noticed the following in the console:

Code:

Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error committing transaction: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing database name.db: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing database didname.db: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing database devino.db: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing database cnid2.db: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing DB environment: Bad file descriptor
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing DB logfile: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error opening cwd: Input/output error
Mar  2 02:18:23 freenas2 alert.py: [freenasOS.Update:801] Cached sequence, FreeNAS-9.3-STABLE-201502210408, does not match system sequence, FreeNAS-9.3-STABLE-201502271818
Mar  2 02:18:23 freenas2 alert.py: [freenasOS.Update:591] Cached sequence does not match system sequence

The periodic snapshot is created at 02:00 on my main FreeNAS box, so assuming this is related to that. I didn't update the main box as everything was working OK but thought I'd test on the backup. Do they need to be running the same version?

And just tried to connect to the AFP share on backup and have lots of these appearing in the console:

Code:

Mar 2 17:29:36 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:36 freenas2 afpd[3987]: transmit: Request to dbd daemon (volume APE_pool2) timed out.
Mar 2 17:29:36 freenas2 afpd[3987]: enumerate(vid:1, did:2, name:'.'): error adding dir: 'TM_imac'
Mar 2 17:29:36 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:36 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:37 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:38 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:39 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:40 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:41 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:42 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:43 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
.......

dlavigne · Mar 6, 2015

Were you able to figure out the cause of this?

adrianwi · Mar 6, 2015

Not really, but after some further investigation the replication task appears to be working OK and the changes on main FreeNAS box are copying across to the backup.

I'm still getting this message every day in the console, which given the time and the version references make me think it must be related to the replication task (auto snapshot taken at 02:00 on 201502210408):

Code:

Mar  6 00:00:01 freenas2 syslog-ng[1676]: Configuration reload request received, reloading configuration;
Mar  6 02:37:49 freenas2 alert.py: [freenasOS.Update:801] Cached sequence, FreeNAS-9.3-STABLE-201502210408, does not match system sequence, FreeNAS-9.3-STABLE-201502271818
Mar  6 02:37:49 freenas2 alert.py: [freenasOS.Update:591] Cached sequence does not match system sequence

Haven't really tried accessing the backup via AFP as it's just that, a backup, and I've randomly checked files each day through the CLI and new files have copied across as expected.

I was going to update the main box over the weekend, but when everything is working I'm tempted to just leave it!

dlavigne · Mar 6, 2015

Regarding the "cached sequence" error, it's worth creating a bug report at bugs.freenas.org as that sounds like a config upgrade issue or something in replication not taking upgrades into account. Please post the issue number here after creating the report.

adrianwi · Mar 8, 2015

Logged as #8459

adrianwi · Mar 11, 2015

So after some discussion in the bug thread, I understand what was causing this a little better.

The FreeNAS team are going to add a checkbox to the Replication Task GUI so the .system dataset can be excluded if you're replicating a recursive top-level snapshot to the top-level of another machine.

Another solution (which I wasn't aware of) is that you can just point the replication task to a dataset which would also avoid this error message :)

rogerh · Mar 11, 2015

When I first tried replication of my whole pool to another ZFS machine (actually into a dataset, but just out of neatness) I was rather surprised not to see a new dataset representing the 'root' of the pool. I.E. if the pool were named 'tank' and the dataset I put it in was "othertank/dataset" I expected to see a new dataset "othertank/dataset/tank" rather than just the top level datasets of the replicated pool. So I could have run into this problem without any warning. I am reassured that apparently the ".system" dataset will do no harm if not in the top level pool. I hope the same is true of jails, someone complained of them booting in the replication receiving machine.

adrianwi · Mar 12, 2015

I don't know if it would generally cause a problem replicating the .system or whether I just struck lucky as I'm running 2 almost identical HP Microservers.

I did wonder about the jails (I'm running 4 with Plex Media Server, PlexConnect, ownCloud and openVPN) but even though they appear at the top level pool (e.g. /tank/jails) the configuration isn't there so they didn't interfere with anything.

adrianwi · May 11, 2015

After raising this as an issue, and the development team making some changes to restrict the replication of the system file when replicating to the top-level destination, I thought I'd should test it.

I removed the snapshots and replication task on my main machine (freenas1) and then destroyed the pool on my backup machine (freenas2). I then created a new pool on freenas2 and recreated the periodic snapshot (with the Exclude System Dataset checkbox ticked) and replication task on freenas1 (including the Initialise remote side once checkbox)

Replication is from /mnt/APEpool1 on freenas1 to /mnt/APEpool2 on freenas2.

First replication ran without any issues and showed as successful, but when I looked this morning I have red flashing alert and the following message on freenas1:

Code:

CRITICAL: Replication APEpool1 -> 192.168.168.65:APEpool2 failed: cannot open 'APEpool2/.system/rrd-30a263b0273743feb8ee7748ebbdf9af': dataset does not exist cannot open 'APEpool2/.system/cores': dataset does not exist cannot open 'APEpool2/.system/syslog-30a263b0273743feb8ee7748ebbdf9af': dataset does not exist cannot open 'APEpool2/.system/samba4': dataset does not exist Succeeded

Not entirely sure what the message means, but I suspect it's not working as intended?

Will add this to the incident log.

adrianwi · May 12, 2015

How bizarre! Last night's replication worked fine and is showing and "succeeded".

rogerh · May 12, 2015

My replication gives a 'critical' error every time to the effect that aclmode cannot be set. This is because the receiving system is zfsonlinux and doesn't know what aclmode is, though it possibly has an equivalent mode it doesn't understand the synonym. However, as in post #9, this error is followed by "Succeeded" and it does seem to have worked. So clearly some 'critical' errors don't stop it working. Also, in the GUI, the last error to happen (for instance "SSH failed" when the receiving host was offline) seems to be repeated after every snapshot is replicated until it is replaced by a new error.

If you can read the files and clone the snapshots on the receiving system it may be reasonable to ignore some errors, especially ones you know are resolved. Apparently, the GUI replication scheme is going to be rewritten for FreeNAS 10.

cyberjock · May 12, 2015

The ACLmode being lost is a critical error as it can drastically change the behavior of how ACLs and Unix permissions work together to not conflict. The replication will still succeed for this error, but you are at risk of corrupting the permissions that were replicated. Unless something changes the permissions on your linux box, your permissions are correct.

If the aclmode is the only thing lost, that's good. If others are lost, their consequences would obviously have to be determined based on what that parameter does.

rogerh · May 12, 2015

cyberjock said:
The ACLmode being lost is a critical error as it can drastically change the behavior of how ACLs and Unix permissions work together to not conflict. The replication will still succeed for this error, but you are at risk of corrupting the permissions that were replicated. Unless something changes the permissions on your linux box, your permissions are correct.

If the aclmode is the only thing lost, that's good. If others are lost, their consequences would obviously have to be determined based on what that parameter does.

Point taken, but the SSH error is a better example. It is constantly repeated in the GUI although it is no longer happening. I am not convinced the aclmode error is a recurring one after the initial replication, unless it is also in the contents of each new snapshot, as it occurs when the datasets are initially created. In any case, the Linux server is purely for backup. If needed the backups would be replicated back to the FreeNAS box if it still exists or to a new one. Hopefully if it is a new one I will remember what the dataset permissions should be.

My basic point is the critical error message followed by "Succeeded" may not be a disaster. BICBW

Edit: I am sure this is the wrong place to ask if "aclinherit" in zfsonlinux usefully maps to "aclmode" in FreeNAS, and whether I have solved the problem by changing the receiving datasets aclinherit value to "restricted" as the aclmode value of the originals are "restricted"?
Edit2: no, it isn't. There doesn't seem to be any useful equivalent of aclmode in zfsonlinux, so I suppose I need to discover what its behaviour in this respect actually is. Not really anything to do with FreeNAS.

cyberjock · May 12, 2015

aclinherit is different from aclmode. ;)

adrianwi · May 16, 2015

AdrianWilliamson said:
After raising this as an issue, and the development team making some changes to restrict the replication of the system file when replicating to the top-level destination, I thought I'd should test it.

I removed the snapshots and replication task on my main machine (freenas1) and then destroyed the pool on my backup machine (freenas2). I then created a new pool on freenas2 and recreated the periodic snapshot (with the Exclude System Dataset checkbox ticked) and replication task on freenas1 (including the Initialise remote side once checkbox)

Replication is from /mnt/APEpool1 on freenas1 to /mnt/APEpool2 on freenas2.

First replication ran without any issues and showed as successful, but when I looked this morning I have red flashing alert and the following message on freenas1:

Code:
CRITICAL: Replication APEpool1 -> 192.168.168.65:APEpool2 failed: cannot open 'APEpool2/.system/rrd-30a263b0273743feb8ee7748ebbdf9af': dataset does not exist cannot open 'APEpool2/.system/cores': dataset does not exist cannot open 'APEpool2/.system/syslog-30a263b0273743feb8ee7748ebbdf9af': dataset does not exist cannot open 'APEpool2/.system/samba4': dataset does not exist Succeeded

Not entirely sure what the message means, but I suspect it's not working as intended?

Will add this to the incident log.

AdrianWilliamson said:
How bizarre! Last night's replication worked fine and is showing and "succeeded".

And after 4 replications with a "succeeded" message I'm back to the same error message as first reported, although as far as I can see everything copied across successfully.

Important Announcement for the TrueNAS Community.

SOLVED Replication between FreeNAS

adrianwi

Guru

dlavigne

Guest

adrianwi

Guru

dlavigne

Guest

adrianwi

Guru

adrianwi

Guru

rogerh

Guru

adrianwi

Guru

adrianwi

Guru

adrianwi

Guru

rogerh

Guru

cyberjock

Inactive Account

rogerh

Guru

cyberjock

Inactive Account

adrianwi

Guru

Similar threads

Important Announcement for the TrueNAS Community.

SOLVED Replication between FreeNAS

Guru

dlavigne

Guest

Guru

dlavigne

Guest

Guru

Guru

Guru

Guru

Guru

Guru

Guru

Inactive Account

Guru

Inactive Account

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replication between FreeNAS"

Similar threads