Volume not imported properly or it seems?

Status
Not open for further replies.

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
I have been doing replication of my main volume to a backup drive attached to one of the available SATA ports.

I am doing it as follow:

1: Creating a manual recursive snapshot of the volume.
2: I delete the snapshots of the datasets I do not want to replicate
3: I set the backup drive as read only "zfs set readonly=on backup_vol"
4: I run the zfs send -vR ... |zfs receive -vFd ... comand.
5: Replication takes place. and all the necessary datasets seem to have been replicated properly.
6: Runnig scrub on the backup drive. Everything is fine.
7: Reporting Partition doesn't show dataset on backup drive. Doing "ls" doesn't return dataset contents, maybe because replicated dataset have "root" ownership. Ownership was not replicated, but it may be a normal condition.
8: Dettaching backup drive without deleting snapshots.
9: Importing drive. Everything seem to work fine. I can list dataset contnets withing the backup drive.
10: Adding a share to the backup volume and accessing the files via SMB. They are not all visible due to ownership (i think).
11: Doing replication of some more dataset to backup drive.
12: Detaching the drive without removing SMB share.
13: Importing the drive again fails. Being told by GUI to check zfs status. Everything is fine, all 3 volume are available and online (USB boot, Main Volume and backup). Backup volume doesn't show under "Storage" GUI. "Reporting" Partition list various dataset bur graphs are not filled with data.
14: Scrub has been launched again and is still underway with no issue so far.
15: Some of the datset content can be listed, some can't.
16: I can replicate snapshots for the dataset that cannot be listed on the backup on the main volume. newly created dataset can be listed without any problem.
17: I have rebooted Freenas a few times and the drive remain in the same state.


I don't think this is an issue with the new drive as I had a very similar scenario a few months back with a different backup drive. I think, when it occurred previously, I did an update after I replicated the volume and when it rebooted, I couldn't seen the drive anymore. I think I reverted to the previous boot.

Has anyone experienced similar scenario?
I am second guessing my process, but I do not exclude any underlying bugs either.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Wow, that's a complex process...

If you know of a simpler process let me know.

I am going to break down the process to explain the reason why it is so complexe:

Code:
1: Creating a manual recursive snapshot of the volume.
2: I delete the snapshots of the datasets I do not want to replicate 

I want to replicate all the dataset that are within the volume except a selected few. But I want to replicate ownership and not have some folders or files left un-replicated because they would not be part of a dataset. Nothing seems to tell if a folder is or is not a dataset.

Code:
3: I set the backup drive as read only "zfs set readonly=on backup_vol"

If I don't set it as read only, next time I want to access the content of the backup drive, it will modify it's content (metadata I think) and then will not be able to accept incremental snapshots anymore.

Code:
4: I run the zfs send -vR ... |zfs receive -vFd ... comand.
5: Replication takes place. and all the necessary datasets seem to have been replicated properly.

Performing recursive replication, destroying existing data on backup drive.

Note: Last night I still couldn't fix the import issue with my backup drive so I decided to perform replication again. This was not straight forward, because I couldn't initialize the drive as it was improperly mounted (Mounted under ZFS, but not under GUI) so I had to reboot several times without the drive plugged in. Then inserting it again, I was able to recreate a new volume through Volume Manager.
I ran the replication command again, and this time it didn't complete replication of all the dataset. Barely 2TB out of the 5TB where sent.
I got those errors:
Code:
cannot mount ....: failed to create mountpoint
cannot receive new filesystem stream: failed to mount ancestor ....
warning: cannot send ...: Broken pipe

I edited the content, the "...." is normally filled with snapshot name.

At this point I was able to see the volume and dataset under "Storage" tab, but not under Reporting.
I detached the volume, tried to import it again, and now problems start. The volume cannot be mounted under GUI and I am getting the GUI error message:
Code:
 Error: The volume "backup" failed to import, for further details check pool status


Running "zpool status" doens't return any errors.

I will be experimenting a bit further. I suspect the root cause is related to the deleted snapshot, that contains more datasets (for which I have not deleted the corresponding snapshots.), and maybe this is the cause of the problem.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
In your '5' did you replicate to the root of the receiving drive? If you did, then somewhere there is a snapshot of your sending pool occupying the root of the receiving pool. Is it possible that by replicating into that pool in your step '11' you are making the original pool snapshot inconsistent?

I may be talking nonsense, but it certainly seems to reduce problems if you replicate into a dataset on the receiving system rather than into the root of the pool.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
In your '5' did you replicate to the root of the receiving drive? If you did, then somewhere there is a snapshot of your sending pool occupying the root of the receiving pool. Is it possible that by replicating into that pool in your step '11' you are making the original pool snapshot inconsistent?

I may be talking nonsense, but it certainly seems to reduce problems if you replicate into a dataset on the receiving system rather than into the root of the pool.
I haven't tried, but will definitely.
I have one more TB or so to go before I have replicated some of my datasets.
But, if I replicate into a dataset, then what about the permission?
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
I haven't tried, but will definitely.
I have one more TB or so to go before I have replicated some of my datasets.
But, if I replicate into a dataset, then what about the permission?
Again, I don't really know what happens to permissions if you try to replicate a whole pool recursively into a dataset. But if you set the receiving dataset not to apply permissions to its contents, and you only replicate datasets to it, not the whole pool, then I don't think there will be any problem and the replicated data sets will have the same permissions (at least in terms of UIDs and GIDs) as their original sources.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You should NOT be setting the backup volume as read-only. What you SHOULD be doing is forcing the incremental (there's a parameter for it but I forget what it is) but it basically allows you to replicate to a second system while ignoring the fact that "live data" since the snapshot will be discarded. I promise you that LOTS of servers I've worked with have had replication going (some are A->B for a couple of datasets, then B->A for a couple of other datasets) and they do not have to set read-only. I'm pretty sure that's where your problems are.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
I would certainly like to know what is the option that will allow to force an update to the backup drive that will not touch already replicated snapshots.

Here is a link to the Oracle ZFS explaining why it is good to use backup drive as readonly.


https://docs.oracle.com/cd/E17952_01/refman-5.5-en/ha-zfs-config.html

I strongly believe the real issue happens for this specific scenario, remember, I am trying to replicate an entire pool except for a few datasets:

In this case I have the following pool structure:

Pool_source
Pool_source/dataset_A
Pool_source/dataset_B
Pool_source/dataset_1st_level/dataset_2nd_level_A
Pool_source/dataset_1st_level/dataset_2nd_level_B
Pool_source/dataset_1st_level/dataset_2nd_level_C
Pool_source/dataset_2


When I ask to create a recursive snapshot of Pool_source using manual-201509015 I will get the following:

Pool_source@manual-201509015
Pool_source/dataset_A@manual-201509015
Pool_source/dataset_B@manual-201509015
Pool_source/dataset_1st_level@manual-201509015
Pool_source/dataset_1st_level/dataset_2nd_level_A@manual-201509015
Pool_source/dataset_1st_level/dataset_2nd_level_B@manual-201509015
Pool_source/dataset_1st_level/dataset_2nd_level_C@manual-201509015
Pool_source/dataset_2@manual-201509015

If I have automatic snapshot taken, they will be present too.

Now, let say I want to replicate the entire pool except for dataset_1st_level and the datasets it contains below its level.
All the dataset I want to replicate should be like this:

Pool_source
Pool_source/dataset_A
Pool_source/dataset_B
Pool_source/dataset_2

In order to achieve this, I will remove the following snapshot:

Pool_source/dataset_1st_level@manual-201509015

This will leave on the source:

Pool_source@manual-201509015
Pool_source/dataset_A@manual-201509015
Pool_source/dataset_B@manual-201509015
Pool_source/dataset_1st_level/dataset_2nd_level_A@manual-201509015
Pool_source/dataset_1st_level/dataset_2nd_level_B@manual-201509015
Pool_source/dataset_1st_level/dataset_2nd_level_C@manual-201509015
Pool_source/dataset_2@manual-201509015


To perform the replication I would write the following command:

zfs send -vR Pool_source@manual-201509015 | zfs recv -vF Pool_destination

Along the way I will have replication issue complaining about "dataset_1st_level" doesn't exist and every remaining single snapshot within that dataset will not be able to be replicated.
All the other snapshots will complete however, but ZFS will not be able to recover from it entirely and GUI will be messed up.

If I were to remove all the snapshots withing "dataset_1st_level" with "@manual-201509015" then I will be fine, eventhough there maybe other snapshot taken prior this one.

I would then get:

Pool_source@manual-201509015
Pool_source/dataset_A@manual-201509015
Pool_source/dataset_B@manual-201509015
Pool_source/dataset_2@manual-201509015

Running the replication command as above and replication should run without issue.

The backupo drive will now look like this:

Pool_destination@manual-201509015
Pool_destination/dataset_A@manual-201509015
Pool_destination/dataset_B@manual-201509015
Pool_destination/dataset_2@manual-201509015

The backup drive named Pool_destination should be an exact copy of Pool_source. If I physycally replace the Pool_source with the backup drive and restart Freenas, I couldn't see the difference, except for the missing snapshot of course.

Does it make sense to you?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Sorry, but I'm not going to read that article. Oracle's documentation only truly applies to Oracle's ZFS implementation. Yes, much of it does apply to the OpenZFS, but unless you are a pro and know what is and isn't related, it's better to not use that any more than you have to.

Here's the thing.. you are doing manual snapshots. Those aren't easily support on FreeNAS for replication using the replication system that FreeNAS uses. If you are a ZFS wizard you can absolutely do it (I do it regularly for TrueNAS customers). Obviously your replication-fu needs a little work (nothing wrong with that btw).

I can guarantee you that if you setup scheduled snapshots and replication from the WebGUI you will NOT have to do read-only on your destination zpool and you will NOT have the problems you are having (to be honest, I wasn't even sure if you could replicate to a dataset or zpool that is marked as read-only).

Unfortunately, there's so many steps that you are doing that even if I follow your steps that doesn't mean I will do it exactly like what you are doing because I wouldn't do it your way.

I think your best bet is to give up the manual snapshots and replication and setup a more automated process using FreeNAS' built-in snapshot and replication system.

Sorry, but that's all the "good" advice I can give you at this time. :/
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
The issue is that I cannot afford a remote Freenas replication server that will act as a backup mostly because I have nobody near me that can take care if it, nor have descent internet provider (here in Canada, it's like the 3rd world country) , low speed and capped capacity which makes replication a more lengthy and hazardous process. So far the most reliable solution and I quote "reliable" is through local replication to a hard drive. The hard drive is then being store off site, such as a safe deposit box. I can run the backup or series of backup on different drives to improve the reliability, but I do agree this is not by far the most efficient nor trouble free solution.
Sometime last year, I did try local replication or across the network on a second computer, but then this spare computer is not ECC compliant. I was however able to do replication, but then there were many other issues. As the backup drives where not permanently part of the system,automatic replication would then throw errors or warning due to the inability to performing replication. I did complain about the hundreds of e-mails a day thrown every minutes or so for every failing replication tasks. There was also the issue of not being able to do volume replication while excluding specific datasets.
It is also my understanding that Freenas is not able to perform replication to multiple systems. For that each system would replicate themselves onto the next server. When automatic snapshot is enabled, the setting and snapshot retention at the pool level may not be the same as the various dataset within the pool. So replication is once more limited.
So in a nutshell, Freenas GUI automatic replication couldn't do as I intended to, and has become more of a burden than anything else. Also, when replication is not done automaticaly on the fly when a snapshot is being taken, then snapshots awaiting replication would add to the queue, and when replication is able to take place, then each snapshot would be sent at a roughly one minute interval. Not sure if it was or is a time based event or if it is how Freenas handle the job as it may be looking for existing snapshots.
Because of those many reasons, I have given up on automatic replication.
If there was an automatic replication process that would take care of the issues above, then yes I would rather be using Freenas GUI (I guess).
All I really want is a process that is expected to handle automatic replication when an external drive is dedicated to storing those snapshots. Ideally, Freenas should be able to compare which snapshots are missing on the backup drive and add all the missing snapshots to it.

Has there been significant improvement in this matter?
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
You should NOT be setting the backup volume as read-only. What you SHOULD be doing is forcing the incremental (there's a parameter for it but I forget what it is) but it basically allows you to replicate to a second system while ignoring the fact that "live data" since the snapshot will be discarded. I promise you that LOTS of servers I've worked with have had replication going (some are A->B for a couple of datasets, then B->A for a couple of other datasets) and they do not have to set read-only. I'm pretty sure that's where your problems are.
Hi Cyberjock,

After performing troubleshooting steps, I have come to the conclusion the issue is with the volume being placed as readonly. It is as you suggested.

Performing normal GUI import of the volume will be impossible, but no specifics about of the issue, especially when the pool show as ONLINE.
Performing manual import through CLI, returned a series of errors related to several dataset that could not been mounted. No reason to the why.
All the snapshots seemed present, but doing "ls" within a dataset would remain empty.

Placing the volume as non-readonly as follow will fix the issue:

zfs set readonly=off Pool_destination

Importing pool via GUI (if not imported already, if it is already imported, perform: "zpool export Pool_destination". By the way, in such occurrence, GUI is not capable of detaching the volume as it doesn't appear in the GUI storage area. This has to be done via CLI)

Now GUI will import pool as it should, making all the datasets available, both within Storage tab, and under partitions tab.


By the way, last week I have tried to use GUI replication as I did once before with FreeNAS-9.3-STABLE-201509220011, and I still was not able to perform replication that will fit my needs(as described on this post earlier, due to GUI/script limitations), so for now I will stay with CLI replication.

I would be very grateful if you could provide me with the option that will allow me to perform forced incremental replication.

If reading Oracle ZFS documentation is to be discouraged, then what reading material can I rely on to find my way around ZFS?
 
Status
Not open for further replies.
Top