Trying to delete lots of snapshots results in system hang

kotani

Cadet
Joined
Sep 8, 2019
Messages
5
I have setup my Freenas box's snapshots wrong, and I now have about 30,000 snapshots that need to be deleted.
Even just listing the snapshots on the CLI takes forever.

It's on a production system, and I ran the following command to try to clear all snapshots:
zfs destroy -r pool/dataset@%

It seemed to be working for a while, but after about an hour, the smb service and the webUI went down. I couldn't type anything into the console, and the only thing the server would do is to respond to pings. I kept it at that state for about 6 hours, but with the office opening in the morning, I had to do a hard reset, which thankfully worked without issue.

Is there any way where I can delete all of the snapshots and start fresh without the system services going down?
I'm at a totally loss. Please help!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
zfs destroy -r pool/dataset@%
Maybe consider naming the oldest snapshot and another one some time after it either side of the %, then repeat.

You can probably do hundreds at a time, but not many thousands.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Another suggestion is to write a script that deletes one snapshot at a time, and sleeps til the Pool's "freeing" count is zero. May be to be on the safe side, sleep a few seconds between snapshot deletions.

Also, you can start with the snapshots that use the least amount of data. Just to get them out of the way. This may not free up much space, but it does free up Metadata space.

Another thing you can add to the processing loop is a way to change the sleep time between deletions. Or allow pausing your process, in case your user's notice things being slow for long periods of time. Give them faster access for 5 or 10 minutes, then delete some snapshots.

In essence, engineer a solution that is flexible enough to run during the day, when users are working. And during the night, when you can assume more snapshot deletions can be done. With 30,000 to delete, and if an average deletion takes 1 minute, that's 21 days, without any other breaks. So, this solution is worth thinking about before you implement.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Such a script could run zfs list -H -r -t snapshot -o name <your pool name>| head -1 to get the top-most snapshot, and save it as NEXT_SNAP_TO_DEL. Then, while $NEXT_SNAP_TO_DEL != '', it runs zfs destroy $NEXT_SNAP_TO_DEL and then refreshes NEXT_SNAP_TO_DEL=`zfs list -H -r -t snapshot -o name <your pool name> | head -1`.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Gee, if we are going to get fancy, (replace "pool" with your data pool's name, if not "pool");
Code:
#!/bin/bash
MY_POOL=pool
zfs list -H -r -t snapshot -o name ${MY_POOL} | \
while read MY_SNAP
do
    echo "Destroying snapshot - ${MY_SNAP}"
    zfs destroy ${MY_SNAP}
    sleep 2
    while [ `zpool get freeing ${MY_POOL}` -gt 0 ]; do
        sleep 5
    done
done

Note that the first sleep lets things calm down, so if it's a small snapshot and gone in 2 seconds, we don't run the loop with 5 seconds sleep.
 
Last edited:

Dan Tudora

Patron
Joined
Jul 6, 2017
Messages
276
hello
is a problem of "timing post, I am BAN and must to accept the punish"
the OP must to stop process of creating snapshot, for now, and after that recreate cron job
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Just delete them serially.
zfs list -H -t snap -r pool/dataset | awk '{ printf "zfs destroy %s\n", $1 }' | sh

Provided you don't have spaces in those dataset or snapshot names.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@Patrick M. Hausen Deleting the snapshots serially does not work quite the way you think. One of the first OpenZFS features added, (that Sun ZFS did not have until later), is Async destroy. Meaning the command for the "zfs destroy" can return BEFORE the actual snapshot is completely freed up. So starting them your way, might end up with hundreds or thousands of snapshot deletes trying to free up space. Almost the same as the recursive option.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
:oops:
Oops. I had no idea such a feature existed. Thanks.
 

kotani

Cadet
Joined
Sep 8, 2019
Messages
5
Wow. Thank you everyone for your input and though examples. The more I learn about the ZFS, the more I admire the amount of work that went into this system. I will give it a shot. I have stopped creating snapshots for the time being until I can clear all of the old ones.

Thank you so much!
 
Top