SOLVED Replication between FreeNAS

Status
Not open for further replies.

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
I applied the latest update to my backup FreeNAS box over the weekend and have now noticed the following in the console:

Code:
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error committing transaction: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing database name.db: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing database didname.db: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing database devino.db: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing database cnid2.db: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing DB environment: Bad file descriptor
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error closing DB logfile: Input/output error
Mar  2 02:01:56 freenas2 cnid_dbd[3988]: error opening cwd: Input/output error
Mar  2 02:18:23 freenas2 alert.py: [freenasOS.Update:801] Cached sequence, FreeNAS-9.3-STABLE-201502210408, does not match system sequence, FreeNAS-9.3-STABLE-201502271818
Mar  2 02:18:23 freenas2 alert.py: [freenasOS.Update:591] Cached sequence does not match system sequence


The periodic snapshot is created at 02:00 on my main FreeNAS box, so assuming this is related to that. I didn't update the main box as everything was working OK but thought I'd test on the backup. Do they need to be running the same version?

And just tried to connect to the AFP share on backup and have lots of these appearing in the console:

Code:
Mar 2 17:29:36 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:36 freenas2 afpd[3987]: transmit: Request to dbd daemon (volume APE_pool2) timed out.
Mar 2 17:29:36 freenas2 afpd[3987]: enumerate(vid:1, did:2, name:'.'): error adding dir: 'TM_imac'
Mar 2 17:29:36 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:36 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:37 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:38 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:39 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:40 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:41 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:42 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
Mar 2 17:29:43 freenas2 cnid_metad[2494]: error in sendmsg: Broken pipe
.......
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
Not really, but after some further investigation the replication task appears to be working OK and the changes on main FreeNAS box are copying across to the backup.

I'm still getting this message every day in the console, which given the time and the version references make me think it must be related to the replication task (auto snapshot taken at 02:00 on 201502210408):

Code:
Mar  6 00:00:01 freenas2 syslog-ng[1676]: Configuration reload request received, reloading configuration;
Mar  6 02:37:49 freenas2 alert.py: [freenasOS.Update:801] Cached sequence, FreeNAS-9.3-STABLE-201502210408, does not match system sequence, FreeNAS-9.3-STABLE-201502271818
Mar  6 02:37:49 freenas2 alert.py: [freenasOS.Update:591] Cached sequence does not match system sequence


Haven't really tried accessing the backup via AFP as it's just that, a backup, and I've randomly checked files each day through the CLI and new files have copied across as expected.

I was going to update the main box over the weekend, but when everything is working I'm tempted to just leave it!
 
D

dlavigne

Guest
Regarding the "cached sequence" error, it's worth creating a bug report at bugs.freenas.org as that sounds like a config upgrade issue or something in replication not taking upgrades into account. Please post the issue number here after creating the report.
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
Last edited:

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
So after some discussion in the bug thread, I understand what was causing this a little better.

The FreeNAS team are going to add a checkbox to the Replication Task GUI so the .system dataset can be excluded if you're replicating a recursive top-level snapshot to the top-level of another machine.

Another solution (which I wasn't aware of) is that you can just point the replication task to a dataset which would also avoid this error message :)
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
When I first tried replication of my whole pool to another ZFS machine (actually into a dataset, but just out of neatness) I was rather surprised not to see a new dataset representing the 'root' of the pool. I.E. if the pool were named 'tank' and the dataset I put it in was "othertank/dataset" I expected to see a new dataset "othertank/dataset/tank" rather than just the top level datasets of the replicated pool. So I could have run into this problem without any warning. I am reassured that apparently the ".system" dataset will do no harm if not in the top level pool. I hope the same is true of jails, someone complained of them booting in the replication receiving machine.
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
I don't know if it would generally cause a problem replicating the .system or whether I just struck lucky as I'm running 2 almost identical HP Microservers.

I did wonder about the jails (I'm running 4 with Plex Media Server, PlexConnect, ownCloud and openVPN) but even though they appear at the top level pool (e.g. /tank/jails) the configuration isn't there so they didn't interfere with anything.
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
After raising this as an issue, and the development team making some changes to restrict the replication of the system file when replicating to the top-level destination, I thought I'd should test it.

I removed the snapshots and replication task on my main machine (freenas1) and then destroyed the pool on my backup machine (freenas2). I then created a new pool on freenas2 and recreated the periodic snapshot (with the Exclude System Dataset checkbox ticked) and replication task on freenas1 (including the Initialise remote side once checkbox)

Replication is from /mnt/APEpool1 on freenas1 to /mnt/APEpool2 on freenas2.

First replication ran without any issues and showed as successful, but when I looked this morning I have red flashing alert and the following message on freenas1:

Code:
CRITICAL: Replication APEpool1 -> 192.168.168.65:APEpool2 failed: cannot open 'APEpool2/.system/rrd-30a263b0273743feb8ee7748ebbdf9af': dataset does not exist cannot open 'APEpool2/.system/cores': dataset does not exist cannot open 'APEpool2/.system/syslog-30a263b0273743feb8ee7748ebbdf9af': dataset does not exist cannot open 'APEpool2/.system/samba4': dataset does not exist Succeeded


Not entirely sure what the message means, but I suspect it's not working as intended?

Will add this to the incident log.
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
How bizarre! Last night's replication worked fine and is showing and "succeeded".
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
My replication gives a 'critical' error every time to the effect that aclmode cannot be set. This is because the receiving system is zfsonlinux and doesn't know what aclmode is, though it possibly has an equivalent mode it doesn't understand the synonym. However, as in post #9, this error is followed by "Succeeded" and it does seem to have worked. So clearly some 'critical' errors don't stop it working. Also, in the GUI, the last error to happen (for instance "SSH failed" when the receiving host was offline) seems to be repeated after every snapshot is replicated until it is replaced by a new error.

If you can read the files and clone the snapshots on the receiving system it may be reasonable to ignore some errors, especially ones you know are resolved. Apparently, the GUI replication scheme is going to be rewritten for FreeNAS 10.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The ACLmode being lost is a critical error as it can drastically change the behavior of how ACLs and Unix permissions work together to not conflict. The replication will still succeed for this error, but you are at risk of corrupting the permissions that were replicated. Unless something changes the permissions on your linux box, your permissions are correct.

If the aclmode is the only thing lost, that's good. If others are lost, their consequences would obviously have to be determined based on what that parameter does.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
The ACLmode being lost is a critical error as it can drastically change the behavior of how ACLs and Unix permissions work together to not conflict. The replication will still succeed for this error, but you are at risk of corrupting the permissions that were replicated. Unless something changes the permissions on your linux box, your permissions are correct.

If the aclmode is the only thing lost, that's good. If others are lost, their consequences would obviously have to be determined based on what that parameter does.

Point taken, but the SSH error is a better example. It is constantly repeated in the GUI although it is no longer happening. I am not convinced the aclmode error is a recurring one after the initial replication, unless it is also in the contents of each new snapshot, as it occurs when the datasets are initially created. In any case, the Linux server is purely for backup. If needed the backups would be replicated back to the FreeNAS box if it still exists or to a new one. Hopefully if it is a new one I will remember what the dataset permissions should be.

My basic point is the critical error message followed by "Succeeded" may not be a disaster. BICBW

Edit: I am sure this is the wrong place to ask if "aclinherit" in zfsonlinux usefully maps to "aclmode" in FreeNAS, and whether I have solved the problem by changing the receiving datasets aclinherit value to "restricted" as the aclmode value of the originals are "restricted"?
Edit2: no, it isn't. There doesn't seem to be any useful equivalent of aclmode in zfsonlinux, so I suppose I need to discover what its behaviour in this respect actually is. Not really anything to do with FreeNAS.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
aclinherit is different from aclmode. ;)
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
After raising this as an issue, and the development team making some changes to restrict the replication of the system file when replicating to the top-level destination, I thought I'd should test it.

I removed the snapshots and replication task on my main machine (freenas1) and then destroyed the pool on my backup machine (freenas2). I then created a new pool on freenas2 and recreated the periodic snapshot (with the Exclude System Dataset checkbox ticked) and replication task on freenas1 (including the Initialise remote side once checkbox)

Replication is from /mnt/APEpool1 on freenas1 to /mnt/APEpool2 on freenas2.

First replication ran without any issues and showed as successful, but when I looked this morning I have red flashing alert and the following message on freenas1:

Code:
CRITICAL: Replication APEpool1 -> 192.168.168.65:APEpool2 failed: cannot open 'APEpool2/.system/rrd-30a263b0273743feb8ee7748ebbdf9af': dataset does not exist cannot open 'APEpool2/.system/cores': dataset does not exist cannot open 'APEpool2/.system/syslog-30a263b0273743feb8ee7748ebbdf9af': dataset does not exist cannot open 'APEpool2/.system/samba4': dataset does not exist Succeeded


Not entirely sure what the message means, but I suspect it's not working as intended?

Will add this to the incident log.

How bizarre! Last night's replication worked fine and is showing and "succeeded".

And after 4 replications with a "succeeded" message I'm back to the same error message as first reported, although as far as I can see everything copied across successfully.
 
Status
Not open for further replies.
Top