cannot receive: invalid stream (bad magic number)

Josep · Apr 3, 2014

Hi,

I've updated my 9.2 to 9.2.1.3 and replication to another freenas (9.2.1.3) stopped working.
I tried recreating the task but no success. I check all the ssh, ports, everything is fine.

The main log doesn't have anything about this issue.
I then found on the forum someone that said it found relevant information in /tmp.

So I looked there and I found a few hundreds of files repl-xxxxx all of them containing the same thing: cannot receive: invalid stream (bad magic number)

Does anyone has any suggestion on what's the cause and how to fix this?
I've actually deleted all the snapshots replicated on the backup freenas, because I thought that could help.

Thanks.

unwired · Apr 3, 2014

I am also having this problem since the 9.2.1.2 to 9.2.1.3 upgrade and have unable to resolve so far.

Code:

[root@freenas] /tmp# cat .repl-result
(dp1
I1
S'cannot receive: invalid stream (bad magic number)'
p2
sI2
S'Succeeded'
p3
sI3
S'Succeeded'
p4

cyberjock · Apr 3, 2014

Well, tell the random number generator to start giving you a good magic number! Geez. Bunch of whiny babies!

On a serious note, is something actually not working causing that error? I see nothing to identify a broken services or process, so without much more info I can't help.

Hardware for your server might be helpful too...

unwired · Apr 3, 2014

Well I don't know what is being send by autorepl.py to generate the bad magic number error. I am able to push snapshots to the remote box manually. But autoreply.py seems to be unable to do so;

NEW snapshots just keep piling up

Code:

Westmoorland@auto-20140403.1200-2w      LATEST
Westmoorland@auto-20140403.1300-2w      NEW
Westmoorland@auto-20140403.1400-2w      NEW
Westmoorland@auto-20140403.1500-2w      NEW

Debug logging on autorepl.py is not very helpful

Code:

Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:117] Autosnap replication started
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:118] temp log file: /tmp/repl-75573
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:189] Checking dataset Westmoorland
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:216] Snapshot: Westmoorland@auto-20140403.1500-2w State: NEW
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:223] Snapshot Westmoorland@auto-20140403.1500-2w added to wanted list
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:216] Snapshot: Westmoorland@auto-20140403.1400-2w State: NEW
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:223] Snapshot Westmoorland@auto-20140403.1400-2w added to wanted list
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:216] Snapshot: Westmoorland@auto-20140403.1300-2w State: NEW
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:223] Snapshot Westmoorland@auto-20140403.1300-2w added to wanted list
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:216] Snapshot: Westmoorland@auto-20140403.1200-2w State: LATEST
Apr  3 15:03:01 freenas autorepl.py: [tools.autorepl:220] Snapshot Westmoorland@auto-20140403.1200-2w is the recorded latest snapshot
Apr  3 15:03:02 freenas autorepl.py: [tools.autorepl:255] Found matching latest snapshot Westmoorland@auto-20140403.1200-2w remotely (but not the recorded one)
Apr  3 15:03:04 freenas autorepl.py: [tools.autorepl:347] Replication result: cannot receive: invalid stream (bad magic number)

It would be nice to know the specific ZFS command that is generating the error, but I have not figured out how to see the ZFS command generating the error message.

cyberjock · Apr 3, 2014

Aha! You ahve replication setup? Try disabling that temporarily to see if the errors stop. Then you can at least prove its ZFS replication tasks that are to blame. ;)

unwired · Apr 3, 2014

This is caused by the replication task, that much is clear.. It happens when autorepl.py does something around sending the missing snapshot(s).

Josep · Apr 4, 2014

I don't have anything in that file. I just have a few hundreds of files called repl-XXXX where XXXX are some digits.
Sorry, but hardware it's irrelevant in this case. Everything worked perfectly until the update to 9.2.1.3, so it's something related to the newer version.

If I disable replication task, no more repl-XXXX files are generated.

Josep · Apr 4, 2014

Well, I've replaced autorepl.py with the one from GIT and now at least I get an error:
cannot unmount '/mnt/xcd/.system/syslog': Device busy

There is a bug filled for this already, unfortunately, there is no solution.

ser_rhaegar · Apr 4, 2014

The temporary solution is to replicate into a dataset on the PULL machine. Then replication won't even see the remote .syslog dataset.

alexg · Apr 6, 2014

I'm getting same error "invalid stream (bad magic number)" after upgrade to 9.2.1.3. Is there a workaround for it?

Josep · Apr 6, 2014

@ser_rhaegar I don't have any other dataset on PULL. Currently it's empty.

ser_rhaegar · Apr 6, 2014

Josep said:
@ser_rhaegar I don't have any other dataset on PULL. Currently it's empty.

.system is hidden and not visible in the GUI. Check Settings -> Advanced Settings and look for what pool .system resides on there.

alexg · Apr 6, 2014

".system" is on a PULL server pool "tank", but I only replicate specific(not ".system") datasets from "tank". However, on a PUSH pool "bk-tank" (on localhost) I also have an old ".system" dataset because it was used in prior server testing.

alexg · Apr 6, 2014

Manual replication worked. I ran it of each snapshot and it looks fine. Is there some other step I need to do to set automated as latest?

Josep · Apr 6, 2014

I'm somewhat confused. In the settings, I have System dataset pool: BACKUP, which is the name of the Volume. I also have Use system dataset for syslog enabled. On my replication task on PUSH I already had BACKUP under Remote ZFS Volume/Dataset, so I don't understand what I'm supposed to do.

Chewie198 · Apr 8, 2014

I'm seeing the same error on my system as well. I can also get manual replication to work, and have followed every step in the documentation section 'Troubleshooting Replication.' When performing those steps from the command prompt, everything works fine, but the autosnap.py script seems to be having problems accomplishing the same thing.

Alex, I noticed that you filed https://bugs.freenas.org/issues/4735 to address this issue, and in the comments, Josh Paetzel noted "Setting target version to future on the assumption this is due to a pool mismatch." Do you or anybody else have any idea what he means by this? Pool layout mismatch? Version mismatch? Or something else entirely?

I'm seeing this issue when replicating locally from one pool to another on the same server, and I can easily delete/recreate the replicated pool. If there's something that I can do to eliminate this 'pool mismatch' and get it running, I'd be more than willing to give it a shot.

alexg · Apr 9, 2014

There was another post on similar topic and someone suggested pulling new version of autorepl.py frim Github or wait for 9.2.1.4 (which is available as Beta). I'll take a look at what changed in autorepl.py and if I feel comfortable, I may try it out. Otherwise, I'll wait for official 9.2.1.4 and just do manual replications periodically.

Important Announcement for the TrueNAS Community.

cannot receive: invalid stream (bad magic number)

Josep

Cadet

unwired

Cadet

cyberjock

Inactive Account

unwired

Cadet

cyberjock

Inactive Account

unwired

Cadet

Josep

Cadet

Josep

Cadet

ser_rhaegar

Patron

alexg

Contributor

Josep

Cadet

ser_rhaegar

Patron

alexg

Contributor

alexg

Contributor

Josep

Cadet

Chewie198

Cadet

alexg

Contributor

Similar threads

Important Announcement for the TrueNAS Community.

cannot receive: invalid stream (bad magic number)

Cadet

Cadet

Inactive Account

Cadet

Inactive Account

Cadet

Cadet

Cadet

Patron

Contributor

Cadet

Patron

Contributor

Contributor

Cadet

Cadet

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "cannot receive: invalid stream (bad magic number)"

Similar threads