cannot receive: invalid stream (bad magic number)

Status
Not open for further replies.

Josep

Cadet
Joined
Apr 3, 2014
Messages
8
Hi,

I've updated my 9.2 to 9.2.1.3 and replication to another freenas (9.2.1.3) stopped working.
I tried recreating the task but no success. I check all the ssh, ports, everything is fine.

The main log doesn't have anything about this issue.
I then found on the forum someone that said it found relevant information in /tmp.

So I looked there and I found a few hundreds of files repl-xxxxx all of them containing the same thing: cannot receive: invalid stream (bad magic number)

Does anyone has any suggestion on what's the cause and how to fix this?
I've actually deleted all the snapshots replicated on the backup freenas, because I thought that could help.

Thanks.
 

unwired

Cadet
Joined
Oct 21, 2013
Messages
9
I am also having this problem since the 9.2.1.2 to 9.2.1.3 upgrade and have unable to resolve so far.

Code:
[root@freenas] /tmp# cat .repl-result
(dp1
I1
S'cannot receive: invalid stream (bad magic number)'
p2
sI2
S'Succeeded'
p3
sI3
S'Succeeded'
p4
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Well, tell the random number generator to start giving you a good magic number! Geez. Bunch of whiny babies!

On a serious note, is something actually not working causing that error? I see nothing to identify a broken services or process, so without much more info I can't help.

Hardware for your server might be helpful too...
 

unwired

Cadet
Joined
Oct 21, 2013
Messages
9
Well I don't know what is being send by autorepl.py to generate the bad magic number error. I am able to push snapshots to the remote box manually. But autoreply.py seems to be unable to do so;

NEW snapshots just keep piling up

Code:
Westmoorland@auto-20140403.1200-2w      LATEST
Westmoorland@auto-20140403.1300-2w      NEW
Westmoorland@auto-20140403.1400-2w      NEW
Westmoorland@auto-20140403.1500-2w      NEW


Debug logging on autorepl.py is not very helpful
Code:
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:117] Autosnap replication started
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:118] temp log file: /tmp/repl-75573
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:189] Checking dataset Westmoorland
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:216] Snapshot: Westmoorland@auto-20140403.1500-2w State: NEW
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:223] Snapshot Westmoorland@auto-20140403.1500-2w added to wanted list
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:216] Snapshot: Westmoorland@auto-20140403.1400-2w State: NEW
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:223] Snapshot Westmoorland@auto-20140403.1400-2w added to wanted list
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:216] Snapshot: Westmoorland@auto-20140403.1300-2w State: NEW
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:223] Snapshot Westmoorland@auto-20140403.1300-2w added to wanted list
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:216] Snapshot: Westmoorland@auto-20140403.1200-2w State: LATEST
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:220] Snapshot Westmoorland@auto-20140403.1200-2w is the recorded latest snapshot
Apr  3 15:03:02 freenas autorepl.py: [tools.autorepl:255] Found matching latest snapshot Westmoorland@auto-20140403.1200-2w remotely (but not the recorded one)
Apr  3 15:03:04 freenas autorepl.py: [tools.autorepl:347] Replication result: cannot receive: invalid stream (bad magic number)


It would be nice to know the specific ZFS command that is generating the error, but I have not figured out how to see the ZFS command generating the error message.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Aha! You ahve replication setup? Try disabling that temporarily to see if the errors stop. Then you can at least prove its ZFS replication tasks that are to blame. ;)
 

unwired

Cadet
Joined
Oct 21, 2013
Messages
9
This is caused by the replication task, that much is clear.. It happens when autorepl.py does something around sending the missing snapshot(s).
 

Josep

Cadet
Joined
Apr 3, 2014
Messages
8
I don't have anything in that file. I just have a few hundreds of files called repl-XXXX where XXXX are some digits.
Sorry, but hardware it's irrelevant in this case. Everything worked perfectly until the update to 9.2.1.3, so it's something related to the newer version.

If I disable replication task, no more repl-XXXX files are generated.
 

Josep

Cadet
Joined
Apr 3, 2014
Messages
8
Well, I've replaced autorepl.py with the one from GIT and now at least I get an error:
cannot unmount '/mnt/xcd/.system/syslog': Device busy

There is a bug filled for this already, unfortunately, there is no solution.
 

ser_rhaegar

Patron
Joined
Feb 2, 2014
Messages
358
The temporary solution is to replicate into a dataset on the PULL machine. Then replication won't even see the remote .syslog dataset.
 

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
I'm getting same error "invalid stream (bad magic number)" after upgrade to 9.2.1.3. Is there a workaround for it?
 

ser_rhaegar

Patron
Joined
Feb 2, 2014
Messages
358
@ser_rhaegar I don't have any other dataset on PULL. Currently it's empty.
.system is hidden and not visible in the GUI. Check Settings -> Advanced Settings and look for what pool .system resides on there.
 

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
".system" is on a PULL server pool "tank", but I only replicate specific(not ".system") datasets from "tank". However, on a PUSH pool "bk-tank" (on localhost) I also have an old ".system" dataset because it was used in prior server testing.
 

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
Manual replication worked. I ran it of each snapshot and it looks fine. Is there some other step I need to do to set automated as latest?
 

Josep

Cadet
Joined
Apr 3, 2014
Messages
8
I'm somewhat confused. In the settings, I have System dataset pool: BACKUP, which is the name of the Volume. I also have Use system dataset for syslog enabled. On my replication task on PUSH I already had BACKUP under Remote ZFS Volume/Dataset, so I don't understand what I'm supposed to do.
 

Chewie198

Cadet
Joined
Jan 15, 2014
Messages
8
I'm seeing the same error on my system as well. I can also get manual replication to work, and have followed every step in the documentation section 'Troubleshooting Replication.' When performing those steps from the command prompt, everything works fine, but the autosnap.py script seems to be having problems accomplishing the same thing.

Alex, I noticed that you filed https://bugs.freenas.org/issues/4735 to address this issue, and in the comments, Josh Paetzel noted "Setting target version to future on the assumption this is due to a pool mismatch." Do you or anybody else have any idea what he means by this? Pool layout mismatch? Version mismatch? Or something else entirely?

I'm seeing this issue when replicating locally from one pool to another on the same server, and I can easily delete/recreate the replicated pool. If there's something that I can do to eliminate this 'pool mismatch' and get it running, I'd be more than willing to give it a shot.
 

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
There was another post on similar topic and someone suggested pulling new version of autorepl.py frim Github or wait for 9.2.1.4 (which is available as Beta). I'll take a look at what changed in autorepl.py and if I feel comfortable, I may try it out. Otherwise, I'll wait for official 9.2.1.4 and just do manual replications periodically.
 
Status
Not open for further replies.
Top