Nothing showing in Active Volumes of GUI, CIFS share not showing

Status
Not open for further replies.

JayG30

Contributor
Joined
Jun 26, 2013
Messages
158
So last night I made some changes to my FreeNAS server and a bunch of stuff somehow got messed up in the process. It seems to related to replication and/or CIFS read only share. The machine is running FreeNAS-9.2.1.7-RELEASE-x64 (fdbe9a0), which I know is outdated and needs to be updated.

The HOST machine was replication at Level 1 recursively for a dataset structure like this;
-Level 1 (dataset)
-Level 2 (dataset)
-Level 3 (dataset)
-Level 3 (dataset)​
Yesterday I added another Level 2 dataset and had to change the replication task to not perform at Level 1, but at Level 2 instead. So I disabled the replication task and added new ones. So, something like this;
-Level 1 (dataset)
-Level 2 (dataset) (replication task)
-Level 2 (dataset) (replication task)
-Level 3 (dataset)
-Level 3 (dataset)​

On the TARGET side I wanted to make these replicated datasets READ ONLY CIFS shares. The new Level 2 dataset I added had not been replicated at this time, so I only did this for the Level 2 dataset that had been replicated previously via the OLD Level 1 replication task.

Now, I wake up this morning and strange things have happened.
First, the FreeNas GUI shows NO VOLUMES OR DATASETS in the Active Volumes tab. HOWEVER, I can still browse the datasets and change settings in the left hand tree navigation! ALSO I MUST NOTE, THE DATA IS ALL STILL THERE. FROM THE CIFS SHARES AND THE CLI.
no datasets in active volume.PNG

Second, the Read Only share that I created yesterday is GONE from the GUI, BUT I can still browse to it on the network! Looked at the smb4.conf file and the share wasn't listed in there either.
shares missing citrano read only.PNG

Third, something is not right in the snapshots section of the GUI either. It won't list certain snapshots that were replicated over to it, and if I try to filter to show them it returns an "sorry, error" message. You can see in the screenshot how there are snapshots at the end there that don't load (no matter how long you wait). If I check the snapshots at the CLI I can see them there!
snapshot issue.PNG

I restarted the web GUI for good measure, just in case it was just GUI related, but still the issue exists.


Is there an issue with what I was trying to do? Can anyone explain any of this behavior?
 
Last edited:

JayG30

Contributor
Joined
Jun 26, 2013
Messages
158
So, I think this might be related.
I'm deleting the replicated datasets through the CLI (since I can't delete through the GUI) and one is stuck with "device busy".
It is the Level 2 dataset that I had created the CIFS read only share for.

I'm guessing the CIFS share is causing it to be busy. I can't see or delete the CIFS share through the web GUI. I could perhaps stop CIFS?
smbcontrol and smbclient don't show the CIFS share either. So I'm not sure how else I could "remove" it.
 

JayG30

Contributor
Joined
Jun 26, 2013
Messages
158
Well, I seem to have fixed it.
After restarting CIFS the "Read Only" share that wasn't showing in the GUI but I could still browse to went away completely. Still have no clue why it disappeared though.

Are you not able to share out replicated data?

After that I was able to delete some of the other replicated datasets. The NEW Level 2 Dataset I created seemed to be the issue for some reason. Not sure why. But once I deleted that the Active Volumes display started to work and snapshots list correctly.
 

JayG30

Contributor
Joined
Jun 26, 2013
Messages
158
This is crazy. Every time I replicate this new Level 2 dataset I created, everything in active volumes disappears and the snapshot doesn't show in the GUI.
From the CLI everything looks normal. And all the datasets and CIFS shares on the TARGET server that I'm replicating to continue to work fine.

Also, the snapshots are getting held by replication on the HOST server, so can't delete them or the dataset without first releasing the hold. The replication task shows completed and the data is on the other TARGET freenas server.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, you shouldn't have two replication tasks for the same dataset....
 

JayG30

Contributor
Joined
Jun 26, 2013
Messages
158
I don't.
I removed the existing snapshot and replication task at level 1 and replaced it with individual snapshot and replication tasks at level 2 datasets.

However, it might be that FreeNAS was not able to handle this alteration without first removing all the old replicated datasets and snapshots from the TARGET.

I say that because I've been working my way through this and it didn't work when deleting all the snapshots. It didn't work after deleting everything up to the Level 2 datasets. But I just deleted the Level 1 datasets that were originally replicated to the TARGET machine with the now removed snapshot/replication task, performed a manual snapshot on the HOST machine, and force a replication. Now things appear to be working in the GUI.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, sounds like you hit the nail on the head. Glad you figured it out though. :)
 

JayG30

Contributor
Joined
Jun 26, 2013
Messages
158
Thanks. For some reason I tend to figure things out when I'm posting them. I think seeing my thoughts written out helps me work through it. Even if others don't know what I'm saying, lol. :)
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I feel like being able to modify the replication task in this way should be a supported feature since none of the data is actually changing you are just modifying the point where the replication happens(I understand the difficulties in this). Does this fail with normal zfs replication or is this a FreeNAS gotcha?
 

JayG30

Contributor
Joined
Jun 26, 2013
Messages
158
Well, like I said, it does appear to be functioning fine at the CLI. The replication occurs and all snapshots are available at the CLI, both the old and new snapshots.

I thought about bringing it up as a bug/feature request, but as you mentioned I can understand that it might be rather difficult.

I also would like to see someone else replicate the behavior before I go to far just to make sure that it wasn't something else entirely (or perhaps an outdated freenas version). All I was doing is a daily snapshot and replication at the top level with 2 sub levels. Retention was 1 week. Once that had run a few weeks, I changed my mind and wanted to instead snapshot at level 2 (1st sub level) so that I could potentially have more versatility in now the two datasets are snapshot. Now this is where I'm a bit hazy (it was late last night) but I THINK I might have edited the existing snapshot task to point to the different dataset. I did NOT however change the frequency or retention. And I then altered the replication task to use the updated snapshot task. If I did not edit the task, then I must have deleted and recreated it. I'm not sure if one method would cause the issue and the other wouldn't. It would require more testing which I don't have the time to do right now.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
I think I need to restart my replication tasks because they have got so far behind with the receiving server being offline that they don't seem to be catching up. What I shall do is restart the replication to a totally new dataset on the receiving server, and leave the old ones until the new ones are complete. Of course one can only do this if there is room on the receiving system for two copies of the data (including snapshots) but it seems so much safer than trying to back up to the same datasets at a different level.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
NO matter how far behind things get, they should catch up (of course, assuming it is physically possible to transfer the data fast enough to catch up). Holds are placed on snapshots when created if they are intended to be replicated later. So unless you remove those holds yourself, they shouldn't get out of sync on the push side.

On the pull side, snapshots aren't deleted when expired except when it is safe to do so.

So each side has it's own protection that should make it impossible to get "so far out of sync". I've seen people more than 2 months out of sync on snapshots that expire within just a couple of days. They got upset because their server hit 90% full and got the email. That was their hint that something wasn't quite right with their server.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
NO matter how far behind things get, they should catch up (of course, assuming it is physically possible to transfer the data fast enough to catch up). Holds are placed on snapshots when created if they are intended to be replicated later. So unless you remove those holds yourself, they shouldn't get out of sync on the push side.

If I do half-hourly snapshots they get progressively further behind. Hourly ones can catch up slowly. Forum threads on this seem to have ended with looking forward to a replication re-write. Most of them are empty, so transfer speed doesn't enter into it.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
OK, if they are getting further behind then your replication window is probably too small to allow the server to even keep up with the workload. So you need to expand the replication window so that the replication can complete.

This isn't a "FreeNAS issue" per se. This is an issue with the push server not being able to push enough data to the pull server in the required time. Now as for why that is happening, there's a bunch of reasons that *could* happen so I'm only speculating on the replication schedule being too short. There's no doubt dozens of ways I could cause problems for replication inadvertently (or deliberately). You'll have to find out for yourself why that is happening for you.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
OK, if they are getting further behind then your replication window is probably too small to allow the server to even keep up with the workload. So you need to expand the replication window so that the replication can complete.

This isn't a "FreeNAS issue" per se. This is an issue with the push server not being able to push enough data to the pull server in the required time. Now as for why that is happening, there's a bunch of reasons that *could* happen so I'm only speculating on the replication schedule being too short. There's no doubt dozens of ways I could cause problems for replication inadvertently (or deliberately). You'll have to find out for yourself why that is happening for you.
It's nothing to do with data transfer. It is at the most a few hundred megabytes in 24 hours, usually less. The window is at maximum. It is something to do with snapshot handling. However, you may be right about it being nothing to do with FreeNAS, it could be the receiving (non-FreeNAS) server.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
OH. you are on 9.2.1.7. It has a nasty bug related to snapshots and scripting. It makes the scripts perform horrendously slow. Upgrade to the latest 9.2.1.x or 9.3 to resolve the issue!

That's what you get for being a year out of date. :P
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
OH. you are on 9.2.1.7.
:p
If you got that from my signature you must have an old cached copy - I have been on 9.3 since the beta release.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
OH. you are on 9.2.1.7. It has a nasty bug related to snapshots and scripting. It makes the scripts perform horrendously slow. Upgrade to the latest 9.2.1.x or 9.3 to resolve the issue!

That's what you get for being a year out of date. :p

BTW, I note you weren't talking to me and I apologise, but bug #5611 is still open, so I don't think the developers think the problem of slow replication has been solved in 9.3. As I said, I work round it but it does limit what one can do, especially in terms of total numbers of snapshots.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
BTW, I note you weren't talking to me and I apologise, but bug #5611 is still open, so I don't think the developers think the problem of slow replication has been solved in 9.3. As I said, I work round it but it does limit what one can do, especially in terms of total numbers of snapshots.

Ok. Sure the ticket is open. In my book that means nothing though. It's totally unconfirmed, except by a single person. There's literally no debug file attached, no confirmation from the dev that there is a problem.

But do you know what there is?

> Snapshots every 15 minutes for a year

You have to be an utter fool (I'd use harsher language, but I'll just say that) to try to keep 35000 snapshots for each and every dataset you have. In fact, most ZFS recommendations recommend keeping the total number of snapshots to 2000 or less where possible.

Plus, the fact that it was logged 10 months ago, has only 2 people saying they have that problem (one of which is crazy with trying to keep that many snapshots), doesn't really mean it's confirmed. If the ticket was a week old, had a dozen people compaining of the same problem, *then* I'd say we'd have a problem.

So I do NOT take that bug as any kind of validation that there is a problem. This screams to be of someone that doesn't have a clue what he's doing, has over 100k snapshots and is somehow surprised that the server is slow. Well, no duh! To me, it just means that the developers haven't closed the bug to "cannot reproduce" or "user configuration issue".

I can tell you that just 2 weeks ago I helped someone replicate multiple TBs over a 100Mb VPN connection to a remote site, they had about 500 snapshots to replicate, and it all finished in less than 48 hours. It basically saturated their VPN connection the whole weekend and performed exactly as expected.

So you are welcome to disagree with my assessment of the evidence presented. But I've seen, firsthand, that this problem is not as bad as it sounds. And from my experience most people that have slow replication/snapshot problems are either dealing with self-inflicted problems (for example, too many snapshots), something misconfigured on a seriously obvious level (like broken network settings), or the bug that existed in 2 or 3 versions of FreeNAS in the middle of 9.2.1.x.

I recommended the upgrade option because that rules out one of the 3 most common problems. If you've stabbed yourself with a sharp knife by creating 50k+ snapshots then that's a problem I'm not going to be able to resolve. And if you've got really broken network settings or something else misconfigured it's again very unlikely I'd be able to identify that in a forum setting. So I provided the only information that I have:

1. I know this works. I know this can work. I've *seen* it work.
2. I know what problems I can help you rule out extremely easily.. (upgrade and see if it fixes it).

Sorry if that doesn't help. All I'm trying to do is help. ;)
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
So we are in violent agreement on this! It works well with a low enough number of snapshots. I wonder if it is practical to run a snapshot regime with exponentially increasing intervals in the past, presumably by pruning snapshots (e.g. hourly for a day, daily for a week, etc. etc.)? With the GUI you can do it with multiple snapshot tasks, but then, AIUI, you can't replicate the multiple tasks. Or can you?
 
Status
Not open for further replies.
Top