SOLVED System slowly crashes when deleting zvol or dataset

Joined
Jul 27, 2017
Messages
16
When deleting a zvol or dataset from my system, the system slowly crashes.
  1. First the WebUI starts being unable to retrieve pool data.
  2. The WebUI then fails to load completely.
  3. At this point SMB service also is failing.
  4. Then virtual machines freeze and are inaccessible remotely. Their services stop being accessible. Jails also stop and crash around this point. The jls command returns nothing.
SSH continues to work, but if I try any zfs commands, they freeze.
I feel like the "delete process" is essentially shutting down pool access.

Now I do have frequent snapshots (5 minutes) that are supposed to delete frequently (2 hours), but I recently discovered that they weren't deleting automatically. I saw there were over 420,000 snapshots on the system. I've stopped snapshots and removed most of them but I'm still having this problem.

If I restart from SSH, it takes about an hour. When it comes back online, the zvol/dataset is still there having not been deleted.

This has now happened for 4 attempts both for zvol and datasets.

My system specs:
160GB RAM
7x10TB HDD (Raidz2)
256GB L2ARC
16-core 2GHz Xeon Processor
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
How full is your pool ?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Can you report it as a bug.... need to know which version of software your are running.

420,000 snapshots is a lot.... how many have you removed? Removing those 1st might be the best strategy. It might be that metadata is no longer fitting in your RAM. Do see high disk utilization...
 
Joined
Jul 27, 2017
Messages
16
When the system freezes, I'm unable to determine disk utilization.

This snapshot issue was resolved in 11.3-U4. (https://jira.ixsystems.com/browse/NAS-105966) I had a clone based on a snapshot that was supposed to expire, but couldn't because of the clone, so I think that issue is fixed. As to whether or not this is the cause of the crash is yet to be determined. I will have to wait until the weekend to slow the system to find out before I try again.
 
Joined
Jul 27, 2017
Messages
16
I upgraded to 11.3-U4 and I am down to < 4,000 snapshots. I have also confirmed that this was the issue because deleting volumes is now almost instant.

Thank you all for your help.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Its useful to know that 420,000 snapshots with 160GB RAM and 70TB of drives is too much....
 
Top