first replication task always fail for 4.6TB arround 66%

fritoss007 · Jan 26, 2019

Hi, I'm trying to replace our actual FreeNAS that was build on a hardware RAID controller for a new system that complies (I think) with all the best practices I know. In order to swap out our system I need to move the data arround...I've been able to use snapshot replication for our smallers data sets but our big one (arround 14tb fill at arround 4.6tb) won't replicated pass 66% it fails the last 5 times I tried. What am I missing ?

old system on FS9.10.2
new system on FS11.2-release

is it possible to manually do the firt snapshot replication manually with some kind of "resume" fonction...because reaching 66% of 4.6tb over Gig ethernet take few hours each time I try somthing different...not to mention I can only do this at full gig over the weekend while the office is closed otherwise the disk usaged gets wild on that old hardware raid system

thanks in advance for any help !

dlavigne · Jan 29, 2019

Resumable isn't coming til 12...

On the 9.10.2 system, is Fast selected for the Encryption Cipher? Also, trying plzip for the Compression will reduce the size to its smallest (but will also be slower, so a bit of a gamble).

saurav · Jan 29, 2019

fritoss007 said:
Hi, I'm trying to replace our actual FreeNAS that was build on a hardware RAID controller for a new system that complies (I think) with all the best practices I know. In order to swap out our system I need to move the data arround...I've been able to use snapshot replication for our smallers data sets but our big one (arround 14tb fill at arround 4.6tb) won't replicated pass 66% it fails the last 5 times I tried. What am I missing ?

old system on FS9.10.2
new system on FS11.2-release

is it possible to manually do the firt snapshot replication manually with some kind of "resume" fonction...because reaching 66% of 4.6tb over Gig ethernet take few hours each time I try somthing different...not to mention I can only do this at full gig over the weekend while the office is closed otherwise the disk usaged gets wild on that old hardware raid system

thanks in advance for any help !

When this happened to me once, I had found that I had quotas set on the destination dataset. The pool had enough space but the quota on destination was smaller than the source dataset's size. Relaxing the quota restriction fixed the issue for me.

fritoss007 · Jan 29, 2019

Thanks saurav, this was almost a good call :) but I have no quotas on remote freenas what so ever.

Is there any logs available to see what's going on arround that famous 66% on my replication ? maybe I could find an exit code with errors ?

When I do use "zsf send / zfs recv" to manually replicate a snapshot to remote freenas it goes up until the end...so size should not be an issue I guess ?

PS. maybe I should change me tilte...cause now it's more like 4.8TB :) :)

Apollo · Jan 30, 2019

Resuming replication is available since Freenas 10 or 11. IT requires the use of a resume token and must be done over CLI.
It is not supported by Freenas 9, however.
Are you doing Recursive replication?
Use the -vv option on both send and receive to get output to CLI.
Make sure you use "screen" under CLI to run your replication.

Johnny Fartpants · Jan 30, 2019

I second what @saurav says and In my experience the only time replication fails is when the destination doesn't have enough space be it because for quotas or simply not enough space on the destination pool or sometimes you find incremental snapshots run out of sync and therefore fail however FreeNAS is pretty good at automatically sorting this out.

How many snapshots are you trying to send? Could you disable the replication, delete the dataset and corresponding snapshots on the destination side and then try again from new?

fritoss007 · Jan 30, 2019

Alright ! thanks guys....it might be a space issue....so @saurav was partly right...at least he (she) point me in the right direction. Yesterday evening I ran this command :

Code:

zfs send -R -P -v Stripe/users@auto-20190127.2334-2w | ssh 10.74.0.15 zfs recv -v -s -F RaidZ3/users

Destination RaidZ3/users already existing and here is the output of that verbose (-v) command :

Code:

cannot receive new filesystem stream: out of space
warning: cannot send 'Stripe/users@auto-20190127.2334-2w': signal received

Well now I need help on how the hell to decode the "volume" tab "used/available" section...here is what I have on the source and on the receive server :

Source :

Destination :

and this is the snapshot file I want to move arround :

Johnny Fartpants · Jan 30, 2019

I wonder if you are sending an incremental and the destination snapshot has expired so it’s having to send the data all over again all 14.8TB of it. I would be inclined to delete the users dataset on the destination and start a fresh.

Johnny Fartpants · Jan 30, 2019

PS: why are you sending via the CLI and not the UI?

Johnny Fartpants · Jan 30, 2019

So your users dataset is 14.8TB in size. Your snapshot is referring to 4.7TB. Something not quite right, looks like your snapshots are out of sync. Start again.

fritoss007 · Jan 30, 2019

I'm sending from CLI just to get some kind of feedback on what was going on while troubleshooting

On this dataset, there is a "Mapped RAW LUN" drive from a virtual ESXi Windows 2012 R2 based file server pointing into it. The data size reported by this Windows server is exactly the same as the snapshot size (arround 4.7TB) but the virtual raw disk and the partation on this server is 10TB. We did shrunk from 14.3TB to 10TB just before Christmas as the new server is on RAIDZ3 as oppose to HW RAID5 on the source FS, that why we had to shrunk it...as we lost drives from HW RAID5 to RAIDZ3. Shrunk was made following the best practice...I think :)

shows 14.4TB on main screen but 10T on the dataset settings windows :

fritoss007 · Jan 31, 2019

Here's an update, it worked...well kind of !

I've deleted every sub dataset and start manually again the transfer and this morning it went well over the 3.7TB it failed yesterday !

But sadly I had to stop the transfer this morning at 7:50....so I'm going to delete my dataset again and use the GUI to setup a snapshot replication this time but with some throttling so I can run it 24/7...I'll let you know guys !

Important Announcement for the TrueNAS Community.

first replication task always fail for 4.6TB arround 66%

fritoss007

Cadet

dlavigne

Guest

saurav

Contributor

fritoss007

Cadet

Apollo

Wizard

Johnny Fartpants

Guru

fritoss007

Cadet

Johnny Fartpants

Guru

Johnny Fartpants

Guru

Johnny Fartpants

Guru

fritoss007

Cadet

fritoss007

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

first replication task always fail for 4.6TB arround 66%

Cadet

dlavigne

Guest

Contributor

Cadet

Wizard

Guru

Cadet

Guru

Guru

Guru

Cadet

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "first replication task always fail for 4.6TB arround 66%"

Similar threads