ZFS is full and system is now unuseable

Status
Not open for further replies.

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
zfs destroy Main_Storage/bil@auto-20140514.0706-10y

And continue with the other entries.

But these would be the newest ones, so you may want to go 70,000 lines earlier... Unless you know what you had deleted three weeks ago.

You do not have to destroy all the snapshots!
 

dovaka

Dabbler
Joined
Apr 2, 2013
Messages
31
is there a way to just delete all my snapshots. i dont need them as the server is replicated else where?
 

krikboh

Patron
Joined
Sep 21, 2013
Messages
209
The command is zfs destroy snapshot_name with one of the snapshots you would like to delete.
 

krikboh

Patron
Joined
Sep 21, 2013
Messages
209
Oh too slow.
 

dovaka

Dabbler
Joined
Apr 2, 2013
Messages
31
i mean is there a equiv to zfs destroy *.* to just wipe out all of them since i have approx 70,000 of them and i have no idea which ones are big to make any real difference
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
In the shell or SSH session:
Code:
bash
for std in `zfs list -H -o name -t snapshot | grep -- '@auto-2014'`
do
    zfs destroy $std
done
exit


That destroys all the automatic snapshots from year 2014. Then repeat with 2013, and 2012. I am afraid that it might be too much for a shell variable to keep all your snapshots.

std = snapshot to destroy (could be anything, just chosen a meaningful one)


P.S. Try to destroy some manually first. Those from that list you had posted earlier. So the pool is not 100% full.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
That would not destroy all your snapshots, only automatic ones. After you are done with destroying automatic snapshots, execute
zfs list -H -o name -t snapshot
that would show you remaining snapshots, and they could be needed by the system.
 

dovaka

Dabbler
Joined
Apr 2, 2013
Messages
31
and alas after running that command for 2014 and 2013 it still says,
Code:
[root@studionas ~]# echo > /mnt/Main_Storage/Movies/9.avi                     
bash: /mnt/Main_Storage/Movies/9.avi: No space left on device                 
[root@studionas ~]#    

also on the on board monitor i noticed that it says this several times when i reboot it
Code:
sed: stdout: No space left on device
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Code:
zfs list -H -o name -t snapshot > /tmp/snapshot_list
 
grep @auto /tmp/snapshot_list | wc -l
 
wc -l /tmp/snapshot_list
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
1. Since the list could be very long, let's write it to a file.

3. Let's count the number of automatic snapshots.

5. Let's count the number of all the snapshots.

If both numbers are the same, we can remove all the snapshots by
Code:
bash
for std in `cat /tmp/snapshot_list`
do
    zfs destroy $std
done
exit
And yes, there is an explanation for dataset still full, after you had removed some snapshots... A snapshot does not need to occupy any space.
 

dovaka

Dabbler
Joined
Apr 2, 2013
Messages
31
the numbers are ever so slightly different, which is nearly triple what i thought it was going to be.
should i still run that second set of commands even though they are slightly different?
Code:
[root@studionas ~]# zfs list -H -o name -t snapshot > /tmp/snapshot_list       
[root@studionas ~]# grep @auto /tmp/snapshot_list | wc -l                     
  191558                                                                       
[root@studionas ~]# wc -l /tmp/snapshot_list                                   
  191562 /tmp/snapshot_list                                                   
[root@studionas ~]#                                                           
                          
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
The discrepency (four snapshots) is probably from jails (do you have any?) or just a few manual snapshots that you or some other process created.

I'd still stick with:

Code:
for SNAP in `grep '@auto' /tmp/snapshot_list`
do
    zfs destroy "$SNAP"
done


You could run this instead to watch them be deleted

Code:
for SNAP in `grep '@auto' /tmp/snapshot_list`
do
    echo "zfs destroy $SNAP"
    zfs destroy "$SNAP"
done


Even if this doesn't fix your issue, do you really need snapshots at 15 minute intervals for two or more years? Have those expire and set a higher interval to keep for a longer period (once a day expiring after a month?) or use something like my rollup script to keep things manageable. Two hundred thousand snapshots is insane; how would you expect to find what you want to restore?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Please follow fracai's advice.

In the meantime, in another SSH session, execute grep -v @auto /tmp/snapshot_list
 

dovaka

Dabbler
Joined
Apr 2, 2013
Messages
31
i do have a few jails so im sure thats what it is. it was setup that way because at the time my understanding of how the snap shots worked was not how they actually did. i had no idea i had that many of them but im sure this explains a lot of instability issues ive been having for quite some time.
i went with the visual command and it now seems to be deleting what it should be. but at the rate its going "one every 2 seconds or so" it will take about 100 hours to finish. is this something i can close my browser on and it will still finish or could i restart it if that did stop it?
Code:
zfs destroy Main_Storage@auto-20130405.0000-10y                             
zfs destroy Main_Storage@auto-20130405.0015-10y                             
zfs destroy Main_Storage@auto-20130405.0030-10y                             
zfs destroy Main_Storage@auto-20130405.0045-10y                             
zfs destroy Main_Storage@auto-20130405.0100-10y                             
zfs destroy Main_Storage@auto-20130405.0115-10y                             
zfs destroy Main_Storage@auto-20130405.0130-10y                             
zfs destroy Main_Storage@auto-20130405.0145-10y                             
zfs destroy Main_Storage@auto-20130405.0200-10y                             
zfs destroy Main_Storage@auto-20130405.0215-10y                             
zfs destroy Main_Storage@auto-20130405.0230-10y                             
zfs destroy Main_Storage@auto-20130405.0245-10y                             
zfs destroy Main_Storage@auto-20130405.0300-10y                             
zfs destroy Main_Storage@auto-20130405.0315-10y                             
zfs destroy Main_Storage@auto-20130405.0330-10y                             
zfs destroy Main_Storage@auto-20130405.0345-10y                             
zfs destroy Main_Storage@auto-20130405.0400-10y                             
zfs destroy Main_Storage@auto-20130405.0415-10y                             
zfs destroy Main_Storage@auto-20130405.0430-10y                             
zfs destroy Main_Storage@auto-20130405.0445-10y
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
While it's running you could also execute the following in another session, but with hundreds of thousands of snapshots, it's going to take a while to run and just slow down the destroys.

Code:
zfs list -H -o name -t snapshot | grep @auto | wc -l


Instead, you can have a counter keep track of how many have been destroyed.

Code:
COUNTER=0
for SNAP in `grep '@auto' /tmp/snapshot_list`
do
    echo "${COUNTER}: zfs destroy ${SNAP}"
    zfs destroy "$SNAP"
    COUNTER=$((COUNTER + 1))
done


Might not be a bad idea to request a "delete all snapshots" button in the GUI, or at least a script that can be run from the shell. This shouldn't be a common problem, but providing a few tools to help users recover would be nice.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
To keep it running you can start a tmux session and then close the window.

"tmux" will start a new session. And you can use "tmux -a" to reattach to the session to check the status.

If the loop does die, you can safely start it again at any time. "zfs destroy" on an already deleted snapshot will just fail without causing any problems.
 

dovaka

Dabbler
Joined
Apr 2, 2013
Messages
31
So now because this wasnt interesting enough. My power in my office went out for a couple hours today and my UPS died so the server restarted and now when it boots it says
Code:
studionas kernel: NLM: failed to contact remote rpcbind, stat = 5, port = 28416
studionas kernel: Can't start NLM - unable to contact NSM

and it just hangs there
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
It stops booting at the next step, not because of these messages...
  • Reinstall FreeNAS on your USB device or if you have a spare, install on a spare
  • Get one more USB device, insert it into the system (so you have two plugged at the same time)
  • Start FreeNAS, create a new ZFS volume on the USB device (do not worry, only the non-system one would be shown as available)
  • Now you have .system dataset on the USB device
  • Import your pool, continue removing snapshots
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
The old list of snapshots is gone. You have to create a new one:
Code:
zfs list -H -o name -t snapshot > /tmp/snapshot_list
 
grep @auto /tmp/snapshot_list | wc -l
Tell us how many you still have... Please give us also the output of
Code:
zpool status -v
zpool list -v


Continue removing automatic snapshots:
Code:
bash
for std in `grep @auto /tmp/snapshot_list`
do
    zfs destroy $std
done
exit
 

dovaka

Dabbler
Joined
Apr 2, 2013
Messages
31
i made a new stick and got the pool running, now im recreating the list which took quite a while last time. ill update when i can run the count and start the destroy command again.
 
Status
Not open for further replies.
Top