ZFS is full and system is now unuseable

solarisguy · Jun 3, 2014

zfs destroy Main_Storage/bil@auto-20140514.0706-10y

And continue with the other entries.

But these would be the newest ones, so you may want to go 70,000 lines earlier... Unless you know what you had deleted three weeks ago.

You do not have to destroy all the snapshots!

dovaka · Jun 3, 2014

is there a way to just delete all my snapshots. i dont need them as the server is replicated else where?

krikboh · Jun 3, 2014

The command is zfs destroy snapshot_name with one of the snapshots you would like to delete.

krikboh · Jun 3, 2014

Oh too slow.

dovaka · Jun 3, 2014

i mean is there a equiv to zfs destroy *.* to just wipe out all of them since i have approx 70,000 of them and i have no idea which ones are big to make any real difference

solarisguy · Jun 3, 2014

In the shell or SSH session:

Code:

bash
for std in `zfs list -H -o name -t snapshot | grep -- '@auto-2014'`
do
    zfs destroy $std
done
exit

That destroys all the automatic snapshots from year 2014. Then repeat with 2013, and 2012. I am afraid that it might be too much for a shell variable to keep all your snapshots.

std = snapshot to destroy (could be anything, just chosen a meaningful one)

P.S. Try to destroy some manually first. Those from that list you had posted earlier. So the pool is not 100% full.

solarisguy · Jun 3, 2014

That would not destroy all your snapshots, only automatic ones. After you are done with destroying automatic snapshots, execute
zfs list -H -o name -t snapshot
that would show you remaining snapshots, and they could be needed by the system.

dovaka · Jun 4, 2014

and alas after running that command for 2014 and 2013 it still says,

Code:

[root@studionas ~]# echo > /mnt/Main_Storage/Movies/9.avi                     
bash: /mnt/Main_Storage/Movies/9.avi: No space left on device                 
[root@studionas ~]#

also on the on board monitor i noticed that it says this several times when i reboot it

Code:

sed: stdout: No space left on device

solarisguy · Jun 4, 2014

Code:

zfs list -H -o name -t snapshot > /tmp/snapshot_list
 
grep @auto /tmp/snapshot_list | wc -l
 
wc -l /tmp/snapshot_list

solarisguy · Jun 4, 2014

1. Since the list could be very long, let's write it to a file.

3. Let's count the number of automatic snapshots.

5. Let's count the number of all the snapshots.

If both numbers are the same, we can remove all the snapshots by

Code:

bash
for std in `cat /tmp/snapshot_list`
do
    zfs destroy $std
done
exit

And yes, there is an explanation for dataset still full, after you had removed some snapshots... A snapshot does not need to occupy any space.

dovaka · Jun 5, 2014

the numbers are ever so slightly different, which is nearly triple what i thought it was going to be.
should i still run that second set of commands even though they are slightly different?

Code:

[root@studionas ~]# zfs list -H -o name -t snapshot > /tmp/snapshot_list       
[root@studionas ~]# grep @auto /tmp/snapshot_list | wc -l                     
  191558                                                                       
[root@studionas ~]# wc -l /tmp/snapshot_list                                   
  191562 /tmp/snapshot_list                                                   
[root@studionas ~]#

fracai · Jun 5, 2014

The discrepency (four snapshots) is probably from jails (do you have any?) or just a few manual snapshots that you or some other process created.

I'd still stick with:

Code:

for SNAP in `grep '@auto' /tmp/snapshot_list`
do
    zfs destroy "$SNAP"
done

You could run this instead to watch them be deleted

Code:

for SNAP in `grep '@auto' /tmp/snapshot_list`
do
    echo "zfs destroy $SNAP"
    zfs destroy "$SNAP"
done

Even if this doesn't fix your issue, do you really need snapshots at 15 minute intervals for two or more years? Have those expire and set a higher interval to keep for a longer period (once a day expiring after a month?) or use something like my rollup script to keep things manageable. Two hundred thousand snapshots is insane; how would you expect to find what you want to restore?

solarisguy · Jun 5, 2014

Please follow fracai's advice.

In the meantime, in another SSH session, execute grep -v @auto /tmp/snapshot_list

dovaka · Jun 5, 2014

i do have a few jails so im sure thats what it is. it was setup that way because at the time my understanding of how the snap shots worked was not how they actually did. i had no idea i had that many of them but im sure this explains a lot of instability issues ive been having for quite some time.
i went with the visual command and it now seems to be deleting what it should be. but at the rate its going "one every 2 seconds or so" it will take about 100 hours to finish. is this something i can close my browser on and it will still finish or could i restart it if that did stop it?

Code:

zfs destroy Main_Storage@auto-20130405.0000-10y                             
zfs destroy Main_Storage@auto-20130405.0015-10y                             
zfs destroy Main_Storage@auto-20130405.0030-10y                             
zfs destroy Main_Storage@auto-20130405.0045-10y                             
zfs destroy Main_Storage@auto-20130405.0100-10y                             
zfs destroy Main_Storage@auto-20130405.0115-10y                             
zfs destroy Main_Storage@auto-20130405.0130-10y                             
zfs destroy Main_Storage@auto-20130405.0145-10y                             
zfs destroy Main_Storage@auto-20130405.0200-10y                             
zfs destroy Main_Storage@auto-20130405.0215-10y                             
zfs destroy Main_Storage@auto-20130405.0230-10y                             
zfs destroy Main_Storage@auto-20130405.0245-10y                             
zfs destroy Main_Storage@auto-20130405.0300-10y                             
zfs destroy Main_Storage@auto-20130405.0315-10y                             
zfs destroy Main_Storage@auto-20130405.0330-10y                             
zfs destroy Main_Storage@auto-20130405.0345-10y                             
zfs destroy Main_Storage@auto-20130405.0400-10y                             
zfs destroy Main_Storage@auto-20130405.0415-10y                             
zfs destroy Main_Storage@auto-20130405.0430-10y                             
zfs destroy Main_Storage@auto-20130405.0445-10y

fracai · Jun 5, 2014

While it's running you could also execute the following in another session, but with hundreds of thousands of snapshots, it's going to take a while to run and just slow down the destroys.

Code:

zfs list -H -o name -t snapshot | grep @auto | wc -l

Instead, you can have a counter keep track of how many have been destroyed.

Code:

COUNTER=0
for SNAP in `grep '@auto' /tmp/snapshot_list`
do
    echo "${COUNTER}: zfs destroy ${SNAP}"
    zfs destroy "$SNAP"
    COUNTER=$((COUNTER + 1))
done

Might not be a bad idea to request a "delete all snapshots" button in the GUI, or at least a script that can be run from the shell. This shouldn't be a common problem, but providing a few tools to help users recover would be nice.

fracai · Jun 5, 2014

To keep it running you can start a tmux session and then close the window.

"tmux" will start a new session. And you can use "tmux -a" to reattach to the session to check the status.

If the loop does die, you can safely start it again at any time. "zfs destroy" on an already deleted snapshot will just fail without causing any problems.

dovaka · Jun 5, 2014

So now because this wasnt interesting enough. My power in my office went out for a couple hours today and my UPS died so the server restarted and now when it boots it says

Code:

studionas kernel: NLM: failed to contact remote rpcbind, stat = 5, port = 28416
studionas kernel: Can't start NLM - unable to contact NSM

and it just hangs there

solarisguy · Jun 5, 2014

It stops booting at the next step, not because of these messages...

Reinstall FreeNAS on your USB device or if you have a spare, install on a spare
Get one more USB device, insert it into the system (so you have two plugged at the same time)
Start FreeNAS, create a new ZFS volume on the USB device (do not worry, only the non-system one would be shown as available)
Now you have .system dataset on the USB device
Import your pool, continue removing snapshots

solarisguy · Jun 5, 2014

The old list of snapshots is gone. You have to create a new one:

Code:

zfs list -H -o name -t snapshot > /tmp/snapshot_list
 
grep @auto /tmp/snapshot_list | wc -l

Tell us how many you still have... Please give us also the output of

Code:

zpool status -v
zpool list -v

Continue removing automatic snapshots:

Code:

bash
for std in `grep @auto /tmp/snapshot_list`
do
    zfs destroy $std
done
exit

dovaka · Jun 5, 2014

i made a new stick and got the pool running, now im recreating the list which took quite a while last time. ill update when i can run the count and start the destroy command again.

Important Announcement for the TrueNAS Community.

ZFS is full and system is now unuseable

Guru

Dabbler

Patron

Patron

Dabbler

Guru

Guru

Dabbler

Guru

Guru

Dabbler

Guru

Guru

Dabbler

Guru

Guru

Dabbler

Guru

Guru

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZFS is full and system is now unuseable"

Similar threads