Every VM on shared storage crashed this morning.

pnunn

Dabbler
Joined
Jan 31, 2015
Messages
39
Hi guys,

I'm running a setup with TrueNAS as the backing storage for two XCP-NG hosts and was bitten by the recent ZFS bug to do with the async writes.

I'm currently running TrueNAS-12.0-U1.1 with a 10GB network.

This morning, every machine on the shared storage was crashed.

All of the consoles are showing something like the attached image.

I can see no errors on either the console or in /var/log/messages however, so really have NO clue where to start looking for this now.

Any ideas?
 

Attachments

  • Crash1.PNG
    Crash1.PNG
    66.3 KB · Views: 141

pnunn

Dabbler
Joined
Jan 31, 2015
Messages
39
XCP-ng support have just had a look at the logs on the two hosts and can see, at exactly the same time this


From cwp-xcp1:

[Wed Feb 24 06:34:00 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:34:00 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:36:41 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:36:42 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:36:43 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:42:46 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:42:47 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:42:57 2021] nfs: server 192.168.22.10 not responding, still trying


From cwp-xcp2:

[Wed Feb 24 06:33:40 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:33:40 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:33:40 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:36:22 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:36:23 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:36:44 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:36:44 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:39:24 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:39:25 2021] nfs: server 192.168.22.10 not responding, still trying
[Wed Feb 24 06:39:25 2021] nfs: server 192.168.22.10 not responding, still trying

Came back again 2 hours later, but the mounts had failed by then.

So it is clear that the NFS mount went away. Now the question is why?

We've looked in dmesg and /var/log/messages on the TrueNAS box and can't see anything, so would this suggest the network is at fault?
 
Last edited:
Top