Alright, long day but things are ok... for now.
It started when I rebooted my server troubleshooting Plex and port-forwarding. After fixing that, I got a critical alarm email several minutes later. Upon logging in, I had the below output, I suspected my drive with SN Z1E5674Y had failed. No problem I thought, that's why I'm using ZFS, I'll just replace the drive, but lets reboot to see if this is just a fluke.
Then I hit the below. Very confused how the volume had failed with 2 active drives. However it now looked like Z1E5676K had failed. So I panicked. Rebooted several times tried different cables, sata ports etc until...
It looks like it is working. But the question remains, what happened, and should I start to take some sort of action?
I'm pretty worried about whats going to happen when I power cycle the box again.
Think I should get a 4TB drive and either do a full backup or just add it to the pool before any power cycle?
Other relevant data. Thanks in advance for the help.
It started when I rebooted my server troubleshooting Plex and port-forwarding. After fixing that, I got a critical alarm email several minutes later. Upon logging in, I had the below output, I suspected my drive with SN Z1E5674Y had failed. No problem I thought, that's why I'm using ZFS, I'll just replace the drive, but lets reboot to see if this is just a fluke.
Code:
--------------------------------After reboot-----------------
root@NAS:~ # zpool status -v
pool: NAS
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 0 days 03:44:25 with 0 errors on Sun Mar 10 04:44:26 2019
config:
NAME STATE READ WRITE CKSUM
NAS DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
16998016746606625039 UNAVAIL 0 0 0 was /dev/gptid/5d9c2204-7588-11e3-b46b-000c29403d9a
gptid/5e351433-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0
gptid/5ec15c36-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0
errors: No known data errors
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:28 with 0 errors on Tue Mar 19 03:45:28 2019
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da0p2 ONLINE 0 0 0
errors: No known data errors
root@NAS:~ # smartctl -a /dev/ada0 | grep ^Serial
Serial Number: Z1E5676K
root@NAS:~ # smartctl -a /dev/ada1 | grep ^Serial
Serial Number: Z1E57PJS
root@NAS:~ # smartctl -a /dev/ada2 | grep ^Serial
root@NAS:~ # glabel status
Name Status Components
gptid/796bbfc9-b651-11e5-a3e6-000c29403d9a N/A da0p1
gptid/5e351433-7588-11e3-b46b-000c29403d9a N/A ada0p2
gptid/5ec15c36-7588-11e3-b46b-000c29403d9a N/A ada1p2Then I hit the below. Very confused how the volume had failed with 2 active drives. However it now looked like Z1E5676K had failed. So I panicked. Rebooted several times tried different cables, sata ports etc until...
Code:
---------------AFTER Reboot Troubleshooting Drives-----------------------------------
Name Status Components
gptid/796bbfc9-b651-11e5-a3e6-000c29403d9a N/A da0p1
gptid/5d9c2204-7588-11e3-b46b-000c29403d9a N/A ada0p2
gptid/5ec15c36-7588-11e3-b46b-000c29403d9a N/A ada1p2
root@NAS:~ # zpool import
pool: NAS
id: 393758721156783558
state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: http://illumos.org/msg/ZFS-8000-3C
config:
NAS FAULTED corrupted data
raidz1-0 FAULTED corrupted data
gptid/5d9c2204-7588-11e3-b46b-000c29403d9a ONLINE
17347583837171412057 UNAVAIL cannot open
gptid/5ec15c36-7588-11e3-b46b-000c29403d9a ONLINE
root@NAS:~ # zpool status -v
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:28 with 0 errors on Tue Mar 19 03:45:28 2019
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da0p2 ONLINE 0 0 0
errors: No known data errors
root@NAS:~ #
root@NAS:~ # zpool status -v
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:28 with 0 errors on Tue Mar 19 03:45:28 2019
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da0p2 ONLINE 0 0 0
errors: No known data errorsIt looks like it is working. But the question remains, what happened, and should I start to take some sort of action?
I'm pretty worried about whats going to happen when I power cycle the box again.
Think I should get a 4TB drive and either do a full backup or just add it to the pool before any power cycle?
Code:
---------------NOW-------------------------------------------
root@NAS:~ # zpool status -v
pool: NAS
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 879M in 0 days 00:01:34 with 0 errors on Fri Mar 22 13:49:59 2019
config:
NAME STATE READ WRITE CKSUM
NAS ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/5d9c2204-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0
gptid/5e351433-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0
gptid/5ec15c36-7588-11e3-b46b-000c29403d9a ONLINE 0 0 0
errors: No known data errors
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:28 with 0 errors on Tue Mar 19 03:45:28 2019
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da0p2 ONLINE 0 0 0
errors: No known data errors
root@NAS:~ # zpool import
root@NAS:~ # glabel status
Name Status Components
gptid/796bbfc9-b651-11e5-a3e6-000c29403d9a N/A da0p1
gptid/5ec15c36-7588-11e3-b46b-000c29403d9a N/A ada0p2
gptid/5e351433-7588-11e3-b46b-000c29403d9a N/A ada1p2
gptid/5d9c2204-7588-11e3-b46b-000c29403d9a N/A ada2p2
gptid/5eaaea3c-7588-11e3-b46b-000c29403d9a N/A ada0p1
root@NAS:~ #
root@NAS:~ # smartctl -a /dev/ada0 | grep ^Serial
Serial Number: Z1E57PJS
root@NAS:~ # smartctl -a /dev/ada1 | grep ^Serial
Serial Number: Z1E5676K
root@NAS:~ # smartctl -a /dev/ada2 | grep ^Serial
Serial Number: Z1E5674YOther relevant data. Thanks in advance for the help.
Code:
oot@NAS:~ # smartctl -a /dev/ada0 | grep self-assessment SMART overall-health self-assessment test result: PASSED root@NAS:~ # smartctl -a /dev/ada1 | grep self-assessment SMART overall-health self-assessment test result: PASSED root@NAS:~ # smartctl -a /dev/ada2 | grep self-assessment SMART overall-health self-assessment test result: PASSED root@NAS:~ # camcontrol devlist <NECVMWar VMware IDE CDR10 1.00> at scbus1 target 0 lun 0 (cd0,pass0) <VMware Virtual disk 1.0> at scbus2 target 0 lun 0 (pass1,da0) <ST2000DM001-1CH164 CC27> at scbus4 target 0 lun 0 (pass2,ada0) <ST2000DM001-1CH164 CC27> at scbus5 target 0 lun 0 (pass3,ada1) <ST2000DM001-1CH164 CC27> at scbus6 target 0 lun 0 (pass4,ada2)