How to recover from a completely full pool?

max322223

Dabbler
Joined
Nov 23, 2015
Messages
13
I have a Freenas 11.2 box which seems to have a run out of space on the main drive pool. The problem is that the machine auto-reboots about half a minute after starting up. It looks like it tries to run some background stuff, has no space to write logs, no space to write crash dumps, and so it gives up and reboots. I tried booting in single-user mode, but the same thing happened. Is it possible to boot into a "safe mode", where it would give me time to delete some snapshots and free up space?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
You could try making a fresh install of FreeNAS on a temorary stick, having the system dataset on the boot stick. Then import your pool... maybe you will be able to copy everything off to another location without the crashing you see on your current setup.
 

max322223

Dabbler
Joined
Nov 23, 2015
Messages
13
You could try making a fresh install of FreeNAS on a temorary stick, having the system dataset on the boot stick. Then import your pool... maybe you will be able to copy everything off to another location without the crashing you see on your current setup.
Hmmm... Would it work to make a live-boot Ubuntu USB stick, boot the server from that, install ZFS tools, import the main pool, delete the snapshots, and then boot Freenas back? Basically, will importing the pool into linux corrupt it/make it unuzable by Freenas?
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hi Max,

I doubt a separate boot as suggested by Sretalla will be of any help. It does not cost much to try, but I do not expect any improvement.

As of now, the goal is to get more space in that pool. For that, either you add space or free existing space.

To add space can be achieved by autoexpand, by replacing all the drives in the pool with bigger ones. Unfortunately, that requires a re-silvering between each drives and with the 30 minutes deadline you are talking about, I do not consider this option as good.

To add an extra vdev in the pool is possible and another way to increase space. If you can plug more drives in that server, create a new vdev and add it to your pool, you will get that much space. Beware not to compromise your pool by adding a vdev with a lower redundancy level than your actual one. If you add a 1 drive vdev to your pool, your entire pool will get destroyed if that single drive is lost.

Even with some redundancy, this would be a band-aid solution. Ex: you are RaidZ2 and add a pair of drives configured in a mirror. The mirror itself has some redundancy, so does the RaidZ2, but the pool is not uniform, neither is the protection. This would be just good enough for your NAS to go back online, you move your data outside the server and rebuild yourself a completely new pool at least 3 times as big as your actual one.

If you can not add capacity, then you must clean up your data. For that, I do not recommend you to delete actual data from the exposed filesystem. The reason is simply that by doing so, you will not free any space if there are snapshots. Deleted files are not actually deleted and they remain in snapshots.

So my suggestion to you would be to :
--Boot your system once for a first 30 minutes
--During these 30 minutes, explore your snapshots and define how much you can delete. You can sort them by date and size
--Identify them by name on a paper (or a file from a client computer connecting the NAS)
--Once done, reboot yourself or the system may have crash / reboot itself

--On the second 30 minutes, go back in snapshots and delete as much as you can
--Once you managed to delete a few snapshots, reboot cleanly to protect against a new crash and the associated risk for corruption
--Keep deleting snapshots / rebooting manually until you freed enough space for the server to stop crashing

--Once the server stops crashing, you can now migrate your data outside that server to a new one with the required resources

Good luck,
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
My bad on that one. Not sure, but indeed switched your 30 seconds to 30 minutes....

Then, a new FreeNAS would be of help by not mounting the pool by default. So the bootable USB stick is a much better idea. The question would then be how to mount the pool in a safe way.

Unfortunately, I never had to do somehting like that, so I hope others will help you more than just saying that every system must be monitored and taken care of...

Good luck,
 

max322223

Dabbler
Joined
Nov 23, 2015
Messages
13
that every system must be monitored and taken care of...
Fair point...

On the other hand, I was under the impression that Freenas is fairly "appliancy", and "just works", at least for simple use cases. I would understand if after running out of space the system would be stuck, and I had to ssh in to clear some space. But autorebooting in a loop because some cron thing wants to run and can't seems uncool. :(
 

max322223

Dabbler
Joined
Nov 23, 2015
Messages
13
OK, it turns out single user mode is what I needed, I was just not entering it correctly. Here's the right way: https://forums.freenas.org/index.php?threads/zfs-has-failed-you.11951/ Once the system boots, I can switch the freenas-boot filesystem into read-write mode by

Code:
zfs set readonly=off [dataset]


Then I am able to import the pool. When I try to free up the space though, I am getting various errors. Most notably, when trying to destroy some snapshots I am seeing kernel panic:
KernelPanicScreenshot.png
 

max322223

Dabbler
Joined
Nov 23, 2015
Messages
13
interestingly, truncate -s 0 <LargeFileName> also fails with "no space left on device". And attempting to zfs destroy a dataset with snapshots kernel panics. Hm... Let me try to destroy a dataset that does not have snapshots...
 

max322223

Dabbler
Joined
Nov 23, 2015
Messages
13
OK. Couple more things I tried, which, together, worked. The system is back with all the data intact, as far as I can tell so far. We shall never know if these things would've worked individually:

1. Boot into an older boot environment. Specifically, a 9.10 one. (The pool was created by Freenas 9.3, and never "upgraded".)
2. "zfs destroy" a dataset with no snapshots.

After I did both #1 and #2, I was able to zfs destroy snapshots without causing a kernel panic...
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
I suspect that with a 100% full pool and ZFS being copy on write, even removing snapshots would have been unable to complete due to a lack of blocks to write to.

Looks like destroying a pool gets around that.
 
Top