"Mounting Local File System" Hang

Status
Not open for further replies.

typeVOID

Cadet
Joined
Jul 6, 2015
Messages
3
all-

first off thanks for all of the help in the past and I apologize that this is my first noob post after 1.5yrs of using the forum as a great resource for my home diy nas server.

here is a quick run down of the system:
Asrock H67M-ITX mobo
Intel i3-2.6ghz
2x4gb ram
6x2tb WD RED drives ZFS5

recently I upgraded from 9.2.1.2 to 9.2.1.9 and that went without issue - I had an error that I was at 90% capacity and so I tried destroying a 2tb dataset which was older timemachine backups which had dedup on. This went on for a really long time and the server eventually became unresponsive. I rebooted and the system hangs at the "Mounting local file systems" prompt and stays there indefinitely.

I tried a fresh install on a 8gb USB and it will not auto-import the volume (says error has occurred) and then I tried restoring the config which replicates the same deal "mounting..."

There is no access to the nas from the network and the only thing I can get it to do is when you press ctrl+t you get a system process status.

Are there any troubleshooting techniques I should use / log files I should be drilling through. I have some linux experience but not enough to really diagnose an unresponsive system

Any insight would be awesome as there is some data that I would like to preserve - I have searched the forums but nothing that I have tried has worked and I do not want to destroy the data.

anthony
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I have a feeling you should have let the delete finish.

After the reboot it went back into deleting the dataset but after the reinstall I'm not sure what is happening.

Curious to see if anyone else has info.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So if we assume you didn't use the latest feature flags for ZFS, you would have to synchronously destroy the dataset. Unfortunately, since you enabled dedup, you may have to buy more RAM to get out of this pickle you are in.

If the data is important, upgrade the system to 32GB of RAM and try to import the zpool. Let it run for a few days, even if it seems locked up or unresponsive (this is normal for a very large sync destroy in ZFS). Synchronously destroying a dataset requires ZFS to find every block that needs to be free and clears all of the blocks in one transaction. This can take time... a loooong time. It also takes a lot of resources, something that is in extremely short supply since you chose to use dedup with the minimum of RAM.

With sync destroying a dataset that contains data, ZFS must find all the blocks to clear them all at once. This is very time consuming and will lock up the system while ZFS does its magic. Unfortunately users that make this mistake often hit the power button when the system goes unresponsive for a long time. This doesn't help at all because ZFS must complete the open transaction (the zfs dataset destruction) so all you really did was force ZFS to begin doing its job all over again. I have personally seen users have to wait 4-5 days for a dataset destroy to complete (without dedup of course). There are lots of documented situations where it has taken 3+ weeks to recover from extremely large dataset destroys because of all of the work involved. The only solution is to wait it out, unfortunately.

If your problem isn't from a synchronous destroy then you're left with 2 other likely situations:

1. You need more RAM because of the deduplication. How much is anyone's guess since dedup can require up to 800GB of RAM per 1TB of data... in theory. No, I didn't have a typo there. I really meant 800 gigabytes of RAM per 1 terabyte of data. We didn't put warning about how dangerous dedup is for nothing. :P

2. Your zpool potentially has corruption that you have been unaware of (or maybe aware of but have ignored) and now that you are trying to do something that conflicts with the corruption, it's bad news for you and your data. RAIDZ1 isn't very "safe" in the bigger picture. It can cause problems when doing things like resilvering a failed disk. You can have corruption as a result of RAIDZ1 that may not be immediately apparent until it is too late. There's a reason why I have the RAIDZ1/RAID5 is dead link in my signature.

There is a chance that if you try to do a transaction rollback that it just *might* work. Note that I've never tried to do this during a dataset destroy nor while running a zpool that has any deduplication enabled (both add major variables that are not in your favor right now). The downside is that rolling back the transaction could be suicide for your zpool, so I wouldn't do that until you've exhausted all other options as rolling back transactions is a one-way street and if it damages your zpool there is no "undo".

If you get it working it would be nice to update the forum with what you did. ;)
 

typeVOID

Cadet
Joined
Jul 6, 2015
Messages
3
So if we assume you didn't use the latest feature flags for ZFS, you would have to synchronously destroy the dataset. Unfortunately, since you enabled dedup, you may have to buy more RAM to get out of this pickle you are in.

If the data is important, upgrade the system to 32GB of RAM and try to import the zpool. Let it run for a few days, even if it seems locked up or unresponsive (this is normal for a very large sync destroy in ZFS). Synchronously destroying a dataset requires ZFS to find every block that needs to be free and clears all of the blocks in one transaction. This can take time... a loooong time. It also takes a lot of resources, something that is in extremely short supply since you chose to use dedup with the minimum of RAM.

With sync destroying a dataset that contains data, ZFS must find all the blocks to clear them all at once. This is very time consuming and will lock up the system while ZFS does its magic. Unfortunately users that make this mistake often hit the power button when the system goes unresponsive for a long time. This doesn't help at all because ZFS must complete the open transaction (the zfs dataset destruction) so all you really did was force ZFS to begin doing its job all over again. I have personally seen users have to wait 4-5 days for a dataset destroy to complete (without dedup of course). There are lots of documented situations where it has taken 3+ weeks to recover from extremely large dataset destroys because of all of the work involved. The only solution is to wait it out, unfortunately.

If your problem isn't from a synchronous destroy then you're left with 2 other likely situations:

1. You need more RAM because of the deduplication. How much is anyone's guess since dedup can require up to 800GB of RAM per 1TB of data... in theory. No, I didn't have a typo there. I really meant 800 gigabytes of RAM per 1 terabyte of data. We didn't put warning about how dangerous dedup is for nothing. :p

2. Your zpool potentially has corruption that you have been unaware of (or maybe aware of but have ignored) and now that you are trying to do something that conflicts with the corruption, it's bad news for you and your data. RAIDZ1 isn't very "safe" in the bigger picture. It can cause problems when doing things like resilvering a failed disk. You can have corruption as a result of RAIDZ1 that may not be immediately apparent until it is too late. There's a reason why I have the RAIDZ1/RAID5 is dead link in my signature.

There is a chance that if you try to do a transaction rollback that it just *might* work. Note that I've never tried to do this during a dataset destroy nor while running a zpool that has any deduplication enabled (both add major variables that are not in your favor right now). The downside is that rolling back the transaction could be suicide for your zpool, so I wouldn't do that until you've exhausted all other options as rolling back transactions is a one-way street and if it damages your zpool there is no "undo".

If you get it working it would be nice to update the forum with what you did. ;)
 

typeVOID

Cadet
Joined
Jul 6, 2015
Messages
3
So if we assume you didn't use the latest feature flags for ZFS, you would have to synchronously destroy the dataset. Unfortunately, since you enabled dedup, you may have to buy more RAM to get out of this pickle you are in.

If the data is important, upgrade the system to 32GB of RAM and try to import the zpool. Let it run for a few days, even if it seems locked up or unresponsive (this is normal for a very large sync destroy in ZFS). Synchronously destroying a dataset requires ZFS to find every block that needs to be free and clears all of the blocks in one transaction. This can take time... a loooong time. It also takes a lot of resources, something that is in extremely short supply since you chose to use dedup with the minimum of RAM.

With sync destroying a dataset that contains data, ZFS must find all the blocks to clear them all at once. This is very time consuming and will lock up the system while ZFS does its magic. Unfortunately users that make this mistake often hit the power button when the system goes unresponsive for a long time. This doesn't help at all because ZFS must complete the open transaction (the zfs dataset destruction) so all you really did was force ZFS to begin doing its job all over again. I have personally seen users have to wait 4-5 days for a dataset destroy to complete (without dedup of course). There are lots of documented situations where it has taken 3+ weeks to recover from extremely large dataset destroys because of all of the work involved. The only solution is to wait it out, unfortunately.

If your problem isn't from a synchronous destroy then you're left with 2 other likely situations:

1. You need more RAM because of the deduplication. How much is anyone's guess since dedup can require up to 800GB of RAM per 1TB of data... in theory. No, I didn't have a typo there. I really meant 800 gigabytes of RAM per 1 terabyte of data. We didn't put warning about how dangerous dedup is for nothing. :p

2. Your zpool potentially has corruption that you have been unaware of (or maybe aware of but have ignored) and now that you are trying to do something that conflicts with the corruption, it's bad news for you and your data. RAIDZ1 isn't very "safe" in the bigger picture. It can cause problems when doing things like resilvering a failed disk. You can have corruption as a result of RAIDZ1 that may not be immediately apparent until it is too late. There's a reason why I have the RAIDZ1/RAID5 is dead link in my signature.

There is a chance that if you try to do a transaction rollback that it just *might* work. Note that I've never tried to do this during a dataset destroy nor while running a zpool that has any deduplication enabled (both add major variables that are not in your favor right now). The downside is that rolling back the transaction could be suicide for your zpool, so I wouldn't do that until you've exhausted all other options as rolling back transactions is a one-way street and if it damages your zpool there is no "undo".

If you get it working it would be nice to update the forum with what you did. ;)


cyberjock- thanks for the quick reply

I am going to install some new ram and then let it run for days - is there at all a way to check as to whether this is taking place or a log that I can access that might tell me the symptoms.

I was regularly scrubbing the pool 30days and it had just finished a scrub before I updated with no errors reported. Is there a log I can check to see if there is a problem there?

In the de-duped data set there was probably only 400gb of data but I assume by your answer that the deletion of the dataset is not dependant on the amount of data in it but the size allocated.

I will look at the PDF in your signature and figure out what the next setup should be after *fingers cross* I get the data of my current system.

Many thanks again for the help.

Anthony
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No logs or anything will be accessible because of the nature of the problem. There are ways you could go into zfs debug mode and do some weird and wacky things, but its something I'm not really keen on discussing in an open environment as it would be something that many would say "hey, I know what I'm doing" when they don't. It's likely something that "if you have to ask you shouldn't be playing with it".

The dataset deletion is based on the entire zpool's used space.
 
Status
Not open for further replies.
Top