Alright, long day but things are ok... for now.
It started when I rebooted my server troubleshooting Plex and port-forwarding. After fixing that, I got a critical alarm email several minutes later. Upon logging in, I had the below output, I suspected my drive with SN Z1E5674Y had failed. No problem I thought, that's why I'm using ZFS, I'll just replace the drive, but lets reboot to see if this is just a fluke.
Then I hit the below. Very confused how the volume had failed with 2 active drives. However it now looked like Z1E5676K had failed. So I panicked. Rebooted several times tried different cables, sata ports etc until...
It looks like it is working. But the question remains, what happened, and should I start to take some sort of action?
I'm pretty worried about whats going to happen when I power cycle the box again.
Think I should get a 4TB drive and either do a full backup or just add it to the pool before any power cycle?
Other relevant data. Thanks in advance for the help.
It started when I rebooted my server troubleshooting Plex and port-forwarding. After fixing that, I got a critical alarm email several minutes later. Upon logging in, I had the below output, I suspected my drive with SN Z1E5674Y had failed. No problem I thought, that's why I'm using ZFS, I'll just replace the drive, but lets reboot to see if this is just a fluke.
Code:
--------------------------------After reboot----------------- root@NAS:~ # zpool status -v pool: NAS state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-2Q scan: scrub repaired 0 in 0 days 03:44:25 with 0 errors on Sun Mar 10 04:44:26 2019 config: NAME STATE READ WRITE CKSUM NAS DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 16998016746606625039 UNAVAIL 0 0 0 was /dev/gptid/5d9c2204-7588-11e3-b46b-000c29403d9a gptid/5e351433-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0 gptid/5ec15c36-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0 errors: No known data errors pool: freenas-boot state: ONLINE scan: scrub repaired 0 in 0 days 00:00:28 with 0 errors on Tue Mar 19 03:45:28 2019 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 da0p2 ONLINE 0 0 0 errors: No known data errors root@NAS:~ # smartctl -a /dev/ada0 | grep ^Serial Serial Number: Z1E5676K root@NAS:~ # smartctl -a /dev/ada1 | grep ^Serial Serial Number: Z1E57PJS root@NAS:~ # smartctl -a /dev/ada2 | grep ^Serial root@NAS:~ # glabel status Name Status Components gptid/796bbfc9-b651-11e5-a3e6-000c29403d9a N/A da0p1 gptid/5e351433-7588-11e3-b46b-000c29403d9a N/A ada0p2 gptid/5ec15c36-7588-11e3-b46b-000c29403d9a N/A ada1p2
Then I hit the below. Very confused how the volume had failed with 2 active drives. However it now looked like Z1E5676K had failed. So I panicked. Rebooted several times tried different cables, sata ports etc until...
Code:
---------------AFTER Reboot Troubleshooting Drives----------------------------------- Name Status Components gptid/796bbfc9-b651-11e5-a3e6-000c29403d9a N/A da0p1 gptid/5d9c2204-7588-11e3-b46b-000c29403d9a N/A ada0p2 gptid/5ec15c36-7588-11e3-b46b-000c29403d9a N/A ada1p2 root@NAS:~ # zpool import pool: NAS id: 393758721156783558 state: FAULTED status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. The pool may be active on another system, but can be imported using the '-f' flag. see: http://illumos.org/msg/ZFS-8000-3C config: NAS FAULTED corrupted data raidz1-0 FAULTED corrupted data gptid/5d9c2204-7588-11e3-b46b-000c29403d9a ONLINE 17347583837171412057 UNAVAIL cannot open gptid/5ec15c36-7588-11e3-b46b-000c29403d9a ONLINE root@NAS:~ # zpool status -v pool: freenas-boot state: ONLINE scan: scrub repaired 0 in 0 days 00:00:28 with 0 errors on Tue Mar 19 03:45:28 2019 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 da0p2 ONLINE 0 0 0 errors: No known data errors root@NAS:~ # root@NAS:~ # zpool status -v pool: freenas-boot state: ONLINE scan: scrub repaired 0 in 0 days 00:00:28 with 0 errors on Tue Mar 19 03:45:28 2019 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 da0p2 ONLINE 0 0 0 errors: No known data errors
It looks like it is working. But the question remains, what happened, and should I start to take some sort of action?
I'm pretty worried about whats going to happen when I power cycle the box again.
Think I should get a 4TB drive and either do a full backup or just add it to the pool before any power cycle?
Code:
---------------NOW------------------------------------------- root@NAS:~ # zpool status -v pool: NAS state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: resilvered 879M in 0 days 00:01:34 with 0 errors on Fri Mar 22 13:49:59 2019 config: NAME STATE READ WRITE CKSUM NAS ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/5d9c2204-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0 gptid/5e351433-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0 gptid/5ec15c36-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0 errors: No known data errors pool: freenas-boot state: ONLINE scan: scrub repaired 0 in 0 days 00:00:28 with 0 errors on Tue Mar 19 03:45:28 2019 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 da0p2 ONLINE 0 0 0 errors: No known data errors root@NAS:~ # zpool import root@NAS:~ # glabel status Name Status Components gptid/796bbfc9-b651-11e5-a3e6-000c29403d9a N/A da0p1 gptid/5ec15c36-7588-11e3-b46b-000c29403d9a N/A ada0p2 gptid/5e351433-7588-11e3-b46b-000c29403d9a N/A ada1p2 gptid/5d9c2204-7588-11e3-b46b-000c29403d9a N/A ada2p2 gptid/5eaaea3c-7588-11e3-b46b-000c29403d9a N/A ada0p1 root@NAS:~ # root@NAS:~ # smartctl -a /dev/ada0 | grep ^Serial Serial Number: Z1E57PJS root@NAS:~ # smartctl -a /dev/ada1 | grep ^Serial Serial Number: Z1E5676K root@NAS:~ # smartctl -a /dev/ada2 | grep ^Serial Serial Number: Z1E5674Y
Other relevant data. Thanks in advance for the help.
Code:
oot@NAS:~ # smartctl -a /dev/ada0 | grep self-assessment SMART overall-health self-assessment test result: PASSED root@NAS:~ # smartctl -a /dev/ada1 | grep self-assessment SMART overall-health self-assessment test result: PASSED root@NAS:~ # smartctl -a /dev/ada2 | grep self-assessment SMART overall-health self-assessment test result: PASSED root@NAS:~ # camcontrol devlist <NECVMWar VMware IDE CDR10 1.00> at scbus1 target 0 lun 0 (cd0,pass0) <VMware Virtual disk 1.0> at scbus2 target 0 lun 0 (pass1,da0) <ST2000DM001-1CH164 CC27> at scbus4 target 0 lun 0 (pass2,ada0) <ST2000DM001-1CH164 CC27> at scbus5 target 0 lun 0 (pass3,ada1) <ST2000DM001-1CH164 CC27> at scbus6 target 0 lun 0 (pass4,ada2)