Replication ZFS issues

Status
Not open for further replies.

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
Hello,

I've just built a second FreeNAS server after about 7 months of positive experience with my first production one. For now I was able to replace two old dual bay QNAP's and I'm really happy with the speed resulting from this. As a little background - I have a wedding photo & video studio with over 50 weddings a year. My main needs for this system were to have 2-3 users going through photo/video selection and then editing at the same time without slowdowns. All together it comes down to lots of small files (10-25mb), but high IO.

Now I built a second system to complete my offsite backup to protect myself against fire, theft, etc.

The new system has an ASRock C2550d4i board with 16gb of Crucial RAM (certified compatible with board) and is temporarily hosting a single 8TB Seagate Archive drive, will get a second one shortly to backup my second vdev.

I have started using the replication yesterday on my 8TB array (4x4tb) & it's been giving me quite a few errors connecting and starting the whole thing most of which I was able to resolve by searching here and doing a few reconfigurations - at this point I believe my setup is correct.

The snapshots are configured so that every 30m there's a new one on the 'set4tb' vdev that is kept for 2 weeks - this is to avoid employees accidentally deleting files that were just copied over. The current usages of this vdev is 4.37TB & the replication task that's been running overnight is already at 5.12TB and is still working based on the CPU/HDD usage graphs.

Main server is 0.199 and secondary is 0.198

Now to the questions:

My current error that is still hanging is the following:
Replication set4tb/Photo -> 192.168.0.198:bkp-p1 failed: Failed: set4tb/Photo (auto-20151202.0900-2w)
Not sure what it's supposed to mean.

The other question is - I had made my share in CIFS and everything looks good, mapped the drive, seeing the usage, but once I open it - it says it's empty. I think I just missed something...

Now from my understanding I will be able to access all the files once the copy is done from a regular CIFS share? The reason is I often need to review some files from home and I would much rather do it this way instead of connecting via TeamViewer to the office.
Is there a guide I can look into for setting that up? I can't figure out whether I need a static IP or a Dynamic DNS setup?
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Which version are you on?

What is the datastore usage on the destination? look at the dataset reporting graph and zoom out a little to see if it's still filling up? Does it look sawtooth? (a screenshot will help, you can copy and paste it directly into this post.

Let's solve the replication issues first, then the CIFS share. But along those lines, are these both part of an AD domain?

And then 3rd, you implied that you are going to have your backup machine at home on a second network. They will need to have SSH connectivity for the replication to work (I presume your initial replication is done on the same office network).
 

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
I'm running on FreeNAS-9.3-STABLE-201512121950 on both of them - regular TRAIN version - no experiments here.

Here's the report for the destination:
https://dl.dropboxusercontent.com/u/316021/FreeNAS/dataset on remote.png
There was a reboot done hence 10-15 minutes downtime.

No AD configuration here - all computers log in as 'guest.'

The network will be on the same ISP, but different location. I mention it because I've had issues with cross-ISP's for the QNAP replications last year. I set up the SSH as in the user guide for replication.
Currently I am doing all this in the office as I want to make sure all works and copied so only small 10-50gb changes are copied over.
 

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
I think I fixed something - there was something stuck. I deleted my vdev on my remote server and re-created it. Now copying is complete, CPU reports went to regular 'idle' usage (1-2%) and I can access the CIFS share and see things in it.

Still a few questions left:
- Can I see the progress of a copy? Can I pause it?
- Will I see snapshots on the remote server when I log into the management console?
- Anything I need to do in terms of opening ports, DNS, etc for them to connect to each other? Given that now my replication IP won't be 192.168.0.198?
 

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
Small bump regarding the previous post :)

The third question is a bit different now: does it have to go through a VPN or is there a way for them to simply talk to each other?
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
- Can I see the progress of a copy? Can I pause it?
the Push server will show that it's running, but that's all. I'm not aware of a way to pause.
- Will I see snapshots on the remote server when I log into the management console?
Yes you will.
The third question is a bit different now: does it have to go through a VPN or is there a way for them to simply talk to each other?
No need for a VPN. You do need to configure/allow SSH, since that is what the replication uses.
 

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
No need for a VPN. You do need to configure/allow SSH, since that is what the replication uses.

I think I'm missing something here. I followed the guide as per the doc; everything worked when they were on one LAN network, but I cannot get it to sync up anymore. Getting this error:
Code:
Replication set4tb/Photo -> 192.168.0.198:remote failed: Failed: ssh: connect to host 192.168.0.198 port 22: Operation timed out
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Did you move that server somewhere off your LAN? If so, you have to make sure the destination is accessible. That probably means you need to change the destination IP and port forward on the far end.
 

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
Did you move that server somewhere off your LAN? If so, you have to make sure the destination is accessible. That probably means you need to change the destination IP and port forward on the far end.
Yes I moved it to my house.

I just changed the IP it was pointing to (WAN one) AND the port and it's running! I'm so happy!

Data loss is my biggest fear as a wedding photographer for what my studio shot since 2006!

Huge thanks to you :)
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
So it's running now?
 

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
Running , failing, running, failing. Not sure what to do about this, I'll let it be for a bit. I have snapshots every 15 minutes on this vdev.

  • Code:
     CRITICAL: Replication set4tb/Photo -> ***.***.***.***:remote failed: Failed: set4tb/Photo (auto-20151221.1915-2w)
  • Code:
     
    Jan 4 19:43:49 freenas manage.py: [system.alert:219] Alert module '<replication_status.ReplicationStatusAlert object at 0x80f229c10>' failed: UNIQUE constraint failed: system_alert.node, system_alert.message_id
 
Last edited:

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
So today I came to the office and I'm still seeing it having issues pushing the snapshot to the remote.
I'm out of errors that I can read/diagnose, now it just says 'failed'. I've deleted all the empty snapshots, so now it actually has to replicate ~200KB of changes.
The interesting thing is that the CPU usage is quite high, as if it's encoding something, not sure what though... I put the 'Replication Stream Compression' to 'plzip' to minimize bandwidth - would it try to compress everything before starting?

The other error I started getting is
Code:
Jan 5 09:27:36 freenas manage.py: [py.warnings:205] /usr/local/www/freenasUI/../freenasUI/freeadmin/middleware.py:205: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
else unicode(excp.message)

And
Code:
Jan 5 09:44:41 freenas notifier: Error: near line 1: database is locked
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Deleting "empty snapshots" doesn't really affect anything. It's the large snapshots that take time. My guess is that the 2 systems are looking at the snapshot info on each of their systems to compare and figure out what's needed.

I would disable new snapshots on pull until this gets resolved (I ran into an issue where the snapshot would expire and disappear while the replication task was trying to finish).

Are those errors on Push or Pull?
Is the CPU usage high on push or pull?

How did you configure port forwarding on Pull (that's the system at home, right?)? Are you just forwarding port 22 to your Pull freenas (not all ports, right?)?

You might want to restart the machine showing the middleware and database warnings.
 

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
Well there haven't been many changes over the holiday, I believe at most a few hundred MB here and there.

I did not set up snapshots on the PULL server for this vdev.

The CPU usage is on the PUSH - it went back to 'normal' after I removed PLZIP compression. But errors remain.

The port forwarding at Home was setup to use port 9585 (UDP/TCP) to the freenas2's IP (replication one), I have also changed the SSH settings to use that port.
I have also setup a DNS forwarding server so my dynamic IP isn't causing issues.

I had restarted it yesterday evening and it still gave errors.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I meant disable snapshot on push (not pull). My bad.

Just forward port 9585 on your home router to internal port 22, so you don't have to futz with the freenas2 ssh settings.
 

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
Disabled - no help yet.

Forwarded the port to internal 22.

As a test, I create another replication dataset, copied a file with 1MB and it succeeded!

The 4.6TB one is have lots of problems though - just failing straight up :(

My router's QoS monitoring does mention there was 1GB~ transfered via SSH to my offsite server. Somehow.
 
Last edited:

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
Small update. I think the board just toasted itself - smells a bit burned and the two ethernet ports show 100MBit VS 1Gbit while IPMI giving me a purely black screen.
RMA time!
 
Joined
Oct 19, 2015
Messages
8
Do you understand that the zfs send communication cannot be interrupted? If there's an interruption in the link, it will fail. There's a new ZFS feature, resumable send & receive, but it's not yet in FreeNAS. You can expect it in FreeBSD 11, which hasn't even been released yet. FreeNAS 10 hasn't even been released, so it's probably a couple years out for FreeNAS... Hope this explains some of the above problems for you, albeit I'm late to this party.
 

shnurov

Explorer
Joined
Jul 22, 2015
Messages
74
No I did not understand the underlying tech - but I had a different issue at the end of the day. So all is good now - other than the random error emails that say they can't reach the SSH client (pull server).
 
Status
Not open for further replies.
Top