FreeNAS VM's randomly go down with "blocked for more than 120 seconds." error.

d3vnu77 · May 4, 2020

I have 5 VM's running on a 32core AMD Threadripper 2990wx with 128 GB of ram.

Randomly it seems these servers are going down where you can't access them via SSH and when you use the VNC built into FreeNAS, I see the errors in the attached screenshot.

It seems the only way to fix the problem (temporarily) is to reboot the server, till it happens again.

Not sure where to look on this matter.

Elliot Dierksen · May 4, 2020

What kind of drives and HBA are you using? Also, what is your network setup?

d3vnu77 · May 4, 2020

Ok, it seems I'm also getting this error too on the actual physical server. is a drive bad... the front-end seems ok as the drives show "Green/Healthy". Not sure if this is related.

No HBA, just using the SATA slots on the motherboard to power 6 2TB Seagate Barracuda Compute harddrives in a raid Z2 pool.

The VM's all run on 3 Samsung 970 Evo NVMe M.2 1TB SSDs in a raid Z1 pool.

The server has two network connections.

1. The first connection is a LAN connection with an internal ip of 192.168.1.210
2. The second NIC has an outside static ip. The VMs are webservers and each have their own static external ip.

HoneyBadger · May 4, 2020

d3vnu77 said:
The second NIC has an outside static IP.

Outside as in "public IPv4 address"? Don't do that. FreeNAS is not a hardened appliance and really should not be directly Internet accessible.

Regarding your errors, it definitely looks like your ada0 device is failing; but you mentioned the VMs all run from the Z1 of NVMe devices? It could be that the ZFS processes in general are getting hung up on a bad drive/vdev though and are blocking on that.

d3vnu77 said:
6 2TB Seagate Barracuda Compute harddrives in a raid Z2 pool.

Post the model number of these drives please. A regular 6-drive Z2 will be slow but shouldn't choke on a little bit of I/O - but if those are secretly SMR then you might be experiencing the wonderful world of reshingling.

d3vnu77 · May 5, 2020

Yes, all VMs run on the NVMe drives. Attached is an image of the drives in question. Pretty much the only real writing going on at the moment on those drives is the gradual writing of the bitcoin and litecoin blockchains.

To replace the drive do I just pull that drive and insert another or is there another process?

Elliot Dierksen · May 5, 2020

Here is @danb35 's guide to replacing drives. https://www.ixsystems.com/community/resources/replacing-a-failed-failing-disk.75/. There is also information in the manual as well.

HoneyBadger · May 5, 2020

d3vnu77 said:
Yes, all VMs run on the NVMe drives. Attached is an image of the drives in question. Pretty much the only real writing going on at the moment on those drives is the gradual writing of the bitcoin and litecoin blockchains.

To replace the drive do I just pull that drive and insert another or is there another process?

Those models (ST2000DM008) are indeed shingled (SMR) drives, but unless you are putting constant random I/O to them (BTC and LTC chain updates shouldn't be that heavy) it shouldn't be overwhelming them to the point of timing out. A bad drive though could be causing problems.

Here is the documentation page specifically on replacing a failed disk:

10. Storage — FreeNAS®11.3-U2 User Guide Table of Contents

www.ixsystems.com

Important Announcement for the TrueNAS Community.

FreeNAS VM's randomly go down with "blocked for more than 120 seconds." error.

d3vnu77

Cadet

Attachments

Elliot Dierksen

Guru

d3vnu77

Cadet

Attachments

HoneyBadger

actually does care

d3vnu77

Cadet

Attachments

Elliot Dierksen

Guru

HoneyBadger

actually does care

10. Storage — FreeNAS®11.3-U2 User Guide Table of Contents

Similar threads