So came in this AM and none of the network drives were accessible. traced the problem to the POOL in the TrueNAS being offline and then found all the messages TreuNAS was sending last night about the situation in my EMAIL:
At 5:08 PM yesterday:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
The following devices are not healthy:
At 6:09 PM this:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
At 6:29 this:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
At 9:03 this:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
By 9:10 this:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
* Device: /dev/da0, failed to read SMART values.
* Device: /dev/da0, Read SMART Self-Test Log Failed.
* Device: /dev/da4, failed to read SMART values.
* Device: /dev/da4, Read SMART Self-Test Log Failed.
At 9:11 this:
These alerts have been cleared:
* Device: /dev/da4, failed to read SMART values.
* Device: /dev/da4, Read SMART Self-Test Log Failed.
Current alerts:
* Snapshot Task For Dataset "VIPER_AFA/viperiscsiafa" failed: cannot create snapshot 'VIPER_AFA/viperiscsiafa@auto-2023-01-22_00-00': out of space
no snapshots were created..
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
* Device: /dev/da0, failed to read SMART values.
* Device: /dev/da0, Read SMART Self-Test Log Failed.
At MIDNIGHT:
Current alerts:
* Device: /dev/da0, failed to read SMART values.
* Device: /dev/da0, Read SMART Self-Test Log Failed.
* Snapshot Task For Dataset "VIPER_AFA/viperiscsiafa" failed: cannot create snapshot 'VIPER_AFA/viperiscsiafa@auto-2023-01-23_00-00': out of space
no snapshots were created..
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
I have to be honest that I do not know as much about TrueNAS as a should. How the heck do I go about troubleshooting this? I am filling in at my company for the empty IT role and we are of course totally down. I can't believe all these drives failed at once!
The hardware is a Supermicro 4U X10DRH-iT 72x 2.5" 2xE5-2667v3 64GB SAS9300-8i 12Gbps SAS3 Server. It's actually a former Ciara storage box. It's been running fine with no issues for a year. Can someone please just tell me kind of where to look? Is this really a drive issue? Something else? I admit I know less about this than I should. Thank you!!
At 5:08 PM yesterday:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
The following devices are not healthy:
- Disk STEC S842E400M2 STM000187ABF is UNAVAIL
- Disk SanDisk LT0400MO 42654968 is FAULTED
- Disk SanDisk LT0400MO 42655780 is FAULTED
- Disk SanDisk LT0400MO 42653224 is FAULTED
- Disk SanDisk LT0400MO 42660276 is FAULTED
- Disk SanDisk LT0400MO 42652372 is FAULTED
At 6:09 PM this:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
- Disk STEC S842E400M2 STM000187ABF is UNAVAIL
- Disk SanDisk LT0400MO 42654968 is FAULTED
- Disk SanDisk LT0400MO 42655780 is FAULTED
- Disk SanDisk LT0400MO 42653224 is FAULTED
- Disk SanDisk LT0400MO 42660276 is FAULTED
- Disk SanDisk LT0400MO 42652372 is FAULTED
At 6:29 this:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
- Disk STEC S842E400M2 STM000187ABF is UNAVAIL
- Disk SanDisk LT0400MO 42653820 is DEGRADED
- Disk SanDisk LT0400MO 42653312 is DEGRADED
- Disk SanDisk LT0400MO 42654968 is FAULTED
- Disk SanDisk LT0400MO 42659844 is DEGRADED
- Disk SanDisk LT0400MO 42653456 is DEGRADED
- Disk SanDisk LT0400MO 42654368 is DEGRADED
- Disk SanDisk LT0400MO 42655780 is FAULTED
- Disk SanDisk LT0400MO 42654764 is DEGRADED
- Disk SanDisk LT0400MO 42656792 is DEGRADED
- Disk SanDisk LT0400MO 42653224 is FAULTED
- Disk SanDisk LT0400MO 42652144 is DEGRADED
- Disk SanDisk LT0400MO 42660276 is FAULTED
- Disk SanDisk LT0400MO 42652372 is FAULTED
At 9:03 this:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
- Disk STEC S842E400M2 STM000187ABF is UNAVAIL
- Disk SanDisk LT0400MO 42653820 is DEGRADED
- Disk SanDisk LT0400MO 42653312 is DEGRADED
- Disk SanDisk LT0400MO 42654968 is FAULTED
- Disk SanDisk LT0400MO 42659844 is DEGRADED
- Disk SanDisk LT0400MO 42653456 is DEGRADED
- Disk SanDisk LT0400MO 42654368 is FAULTED
- Disk SanDisk LT0400MO 42655780 is FAULTED
- Disk SanDisk LT0400MO 42654764 is FAULTED
- Disk SanDisk LT0400MO 42656792 is DEGRADED
- Disk SanDisk LT0400MO 42653224 is FAULTED
- Disk SanDisk LT0400MO 42652144 is DEGRADED
- Disk SanDisk LT0400MO 42660276 is FAULTED
- Disk SanDisk LT0400MO 42652372 is FAULTED
By 9:10 this:
New alert:
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
- Disk STEC S842E400M2 STM000187ABF is UNAVAIL
- Disk SanDisk LT0400MO 42653820 is DEGRADED
- Disk SanDisk LT0400MO 42653312 is FAULTED
- Disk SanDisk LT0400MO 42654968 is FAULTED
- Disk SanDisk LT0400MO 42659844 is DEGRADED
- Disk SanDisk LT0400MO 42653456 is DEGRADED
- Disk SanDisk LT0400MO 42654368 is FAULTED
- Disk SanDisk LT0400MO 42655780 is FAULTED
- Disk SanDisk LT0400MO 42654764 is FAULTED
- Disk SanDisk LT0400MO 42656792 is DEGRADED
- Disk SanDisk LT0400MO 42653224 is FAULTED
- Disk SanDisk LT0400MO 42652144 is FAULTED
- Disk SanDisk LT0400MO 42657568 is FAULTED
- Disk SanDisk LT0400MO 42654316 is FAULTED
- Disk SanDisk LT0400MO 42654540 is FAULTED
- Disk SanDisk LT0400MO 42652260 is FAULTED
- Disk SanDisk LT0400MO 42660276 is FAULTED
- Disk SanDisk LT0400MO 42652372 is FAULTED
* Device: /dev/da0, failed to read SMART values.
* Device: /dev/da0, Read SMART Self-Test Log Failed.
* Device: /dev/da4, failed to read SMART values.
* Device: /dev/da4, Read SMART Self-Test Log Failed.
At 9:11 this:
These alerts have been cleared:
* Device: /dev/da4, failed to read SMART values.
* Device: /dev/da4, Read SMART Self-Test Log Failed.
Current alerts:
* Snapshot Task For Dataset "VIPER_AFA/viperiscsiafa" failed: cannot create snapshot 'VIPER_AFA/viperiscsiafa@auto-2023-01-22_00-00': out of space
no snapshots were created..
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
- Disk STEC S842E400M2 STM000187ABF is UNAVAIL
- Disk SanDisk LT0400MO 42653820 is DEGRADED
- Disk SanDisk LT0400MO 42653312 is FAULTED
- Disk SanDisk LT0400MO 42654968 is FAULTED
- Disk SanDisk LT0400MO 42659844 is DEGRADED
- Disk SanDisk LT0400MO 42653456 is DEGRADED
- Disk SanDisk LT0400MO 42654368 is FAULTED
- Disk SanDisk LT0400MO 42655780 is FAULTED
- Disk SanDisk LT0400MO 42654764 is FAULTED
- Disk SanDisk LT0400MO 42656792 is DEGRADED
- Disk SanDisk LT0400MO 42653224 is FAULTED
- Disk SanDisk LT0400MO 42652144 is FAULTED
- Disk SanDisk LT0400MO 42657568 is FAULTED
- Disk SanDisk LT0400MO 42654316 is FAULTED
- Disk SanDisk LT0400MO 42654540 is FAULTED
- Disk SanDisk LT0400MO 42652260 is FAULTED
- Disk SanDisk LT0400MO 42660276 is FAULTED
- Disk SanDisk LT0400MO 42652372 is FAULTED
* Device: /dev/da0, failed to read SMART values.
* Device: /dev/da0, Read SMART Self-Test Log Failed.
At MIDNIGHT:
Current alerts:
* Device: /dev/da0, failed to read SMART values.
* Device: /dev/da0, Read SMART Self-Test Log Failed.
* Snapshot Task For Dataset "VIPER_AFA/viperiscsiafa" failed: cannot create snapshot 'VIPER_AFA/viperiscsiafa@auto-2023-01-23_00-00': out of space
no snapshots were created..
* Pool VIPER_AFA state is UNAVAIL: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to continue functioning.
The following devices are not healthy:
- Disk STEC S842E400M2 STM000187ABF is UNAVAIL
- Disk SanDisk LT0400MO 42653820 is DEGRADED
- Disk SanDisk LT0400MO 42653312 is FAULTED
- Disk SanDisk LT0400MO 42654968 is FAULTED
- Disk SanDisk LT0400MO 42659844 is DEGRADED
- Disk SanDisk LT0400MO 42653456 is DEGRADED
- Disk SanDisk LT0400MO 42654368 is FAULTED
- Disk SanDisk LT0400MO 42655780 is FAULTED
- Disk SanDisk LT0400MO 42654764 is FAULTED
- Disk SanDisk LT0400MO 42656792 is DEGRADED
- Disk SanDisk LT0400MO 42653224 is FAULTED
- Disk SanDisk LT0400MO 42652144 is FAULTED
- Disk SanDisk LT0400MO 42657568 is FAULTED
- Disk SanDisk LT0400MO 42654316 is FAULTED
- Disk SanDisk LT0400MO 42654540 is FAULTED
- Disk SanDisk LT0400MO 42652260 is FAULTED
- Disk SanDisk LT0400MO 42660276 is FAULTED
- Disk SanDisk LT0400MO 42652372 is FAULTED
I have to be honest that I do not know as much about TrueNAS as a should. How the heck do I go about troubleshooting this? I am filling in at my company for the empty IT role and we are of course totally down. I can't believe all these drives failed at once!
The hardware is a Supermicro 4U X10DRH-iT 72x 2.5" 2xE5-2667v3 64GB SAS9300-8i 12Gbps SAS3 Server. It's actually a former Ciara storage box. It's been running fine with no issues for a year. Can someone please just tell me kind of where to look? Is this really a drive issue? Something else? I admit I know less about this than I should. Thank you!!