Hard Reset Caused by Failed Mirror

Popolou · Mar 11, 2024

I experienced a situation where a failing controller chip on a single NVME drive of a two vdev striped mirror (4 drives) brought the whole system down. TrueNAS Core v13.0-U5.3 is running bare metal on a Dell r730 and the NVME vdevs are installed on a quad NVME PCIe adaptor (Bus ID 128). It appears that the controller was gradually failing and this was reported in the server log: -

A fatal error was detected on a component at bus 128 device 2 function 0

The console messages did record that one of the NVME drives became read-only but testing later showed that the controller froze up when any significant load was applied (the drive is a WD Black SN750). The main problem however is that instead of the zpool running in a degraded state, the system then does a hard reset. My assumption is that the reset is perhaps as a result of a watchdog timer somewhere and not related to the OS?

Has anyone experienced this before and if perhaps this is specific to Dell?

Cheers

Important Announcement for the TrueNAS Community.

Hard Reset Caused by Failed Mirror

Popolou

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Hard Reset Caused by Failed Mirror

Popolou

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Hard Reset Caused by Failed Mirror"

Similar threads