How to recover from 100% used space?

sjtuross

Cadet
Joined
May 7, 2013
Messages
6
I created a replication task for the entire pool, including all datasets and zvols recursively. It automatically creates snapshots. Now the pool has 0 space left. I can't even delete any snapshot. It says "I/O error" if I try to delete a snapshot. I can't even delete any dataset or zvol. I can still access the files though. How can I force delete those auto snapshots to reclaim space and recover from this situation?

1.png


2.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
How can I force delete those auto snapshots to reclaim space and recover from this situation?
If it's really 100% full, you can't. Any activity will need space to be written into effect.

You can copy off your data and then rebuild the pool and put it back.

If you're even a little under 100%, maybe something can be done, but it seems not to be the case from what you shared so far.
 

sjtuross

Cadet
Joined
May 7, 2013
Messages
6
Thank you. It's really 100% full. I'm disappointed rebuilding the pool is the only option.

Before the snapshot was created, I still had around 40% free space. Shouldn't only the changes made since snapshot occupy space?

This seems to me a bug. If the estimated space needed for a snapshot is over the remaining space, TrueNAS shouldn't proceed, otherwise it locks or kills itself. I feel stupid I did this.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Before the snapshot was created, I still had around 40% free space. Shouldn't only the changes made since snapshot occupy space?

This seems to me a bug. If the estimated space needed for a snapshot is over the remaining space, TrueNAS shouldn't proceed, otherwise it locks or kills itself.
I'm not sure you understand what a snapshot is...

If you take a snapshot, it doesn't add any data to the disk, just takes a copy of the allocation table at that point in time and ensures that the blocks currently in use aren't overwritten while the snapshot is retained.

So if you had 40% free space and the only thing you did was take a snapshot, then you still had 40% free space until you did something that caused new blocks to be written.

If your replication target was also in the same pool, the data would actually be duplicated into the target location, so that may be what has happened.
 

sjtuross

Cadet
Joined
May 7, 2013
Messages
6
I have the same understanding on snapshot. The replication target is a remote server. I didn't pay attention to the remaining space when creating snapshots as I thought I had plenty of space, so I blamed the snapshot mechanism that might reserve the same amount of used space.

As best practice, how can I prevent this happening again? Reserve some space at pool level?
 

sjtuross

Cadet
Joined
May 7, 2013
Messages
6
I finally understand what happened by reading this blog post - Reservation & Ref Reservation - An Explanation (Attempt)

Basically I took a snapshot on non-sparse zvol (thick provisioned, which is by default), but I didn't realize the space implication of taking a snapshot of zvol with refreservation, so below happened as described in the blog post.

"I've seen many users hit this 'no space' message, or even worse, have just barely sufficient space in their pool to take that snapshot, and then all their other datasets quickly started running out of space, even though their pool may have had tons of space left from a physical perspective. A proper understanding of refreserv would have saved them a lot of headache."

To prevent such out-of-space happening again, I set quota on the root dataset. OR if you want to snapshot zvol, don't set refreservation (i.e. use sparse zvol) or turn it off by this command: zfs set refreservation=none tank/zvol
 
Last edited:
Top