Hi everyone,
returning from few days working trip I suddenly find one of my pools offline.
zpool status for the pool yields:
pool: ESXIVOL
state: UNAVAIL
status: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to
continue functioning.
action: Destroy and re-create the pool from a backup source. Manually marking the device
repaired using 'zpool clear' may allow some data to be recovered.
scan: resilvered 34.6G in 0 days 00:19:57 with 0 errors on Mon May 27 08:17:09 2019
config:
NAME STATE READ WRITE CKSUM
ESXIVOL UNAVAIL 0 22 0
raidz1-0 DEGRADED 0 0 0
gptid/04d61697-53d7-11e9-bdc1-5065f366e21a ONLINE 0 0 0
gptid/c1f6ae17-7ccc-11e9-95f2-5065f366e21a ONLINE 0 0 0
gptid/c779e732-682c-11e9-ace4-5065f366e21a FAULTED 3 188 0 too many errors
gptid/9a22345f-683d-11e9-b8ac-5065f366e21a ONLINE 0 0 0
raidz1-1 UNAVAIL 0 44 0
gptid/8e505027-7ccd-11e9-95f2-5065f366e21a ONLINE 0 0 0
gptid/3cd86ea5-6829-11e9-ace4-5065f366e21a FAULTED 3 144 0 too many errors
gptid/c7b0545f-67fd-11e9-ace4-5065f366e21a FAULTED 3 60 0 too many errors
gptid/4dea18b1-6f03-11e9-b9c9-5065f366e21a FAULTED 0 4 0 too many errors
raidz1-2 DEGRADED 0 0 0
gptid/dd2cd7a2-803b-11e9-95f2-5065f366e21a ONLINE 0 0 0
gptid/26137960-7ccd-11e9-95f2-5065f366e21a ONLINE 0 0 0
gptid/a5814bbe-680f-11e9-ace4-5065f366e21a ONLINE 0 0 0
gptid/fba7b20e-6838-11e9-b8ac-5065f366e21a FAULTED 3 119 0 too many errors
raidz1-3 DEGRADED 0 0 0
gptid/31f55316-6836-11e9-b8ac-5065f366e21a ONLINE 0 0 0
gptid/13124900-680c-11e9-ace4-5065f366e21a ONLINE 0 0 0
gptid/9a038ea8-6801-11e9-ace4-5065f366e21a FAULTED 3 20 0 too many errors
gptid/fe82980a-549e-11e9-bdc1-5065f366e21a ONLINE 0 0 0
errors: 2413 data errors, use '-v' for a list
However SMART for the individual disks looks ok e.g. for the 3 failed drives in same vdev:
=== START OF INFORMATION SECTION ===
Vendor: SmrtStor
Product: DOPA0920S5xnNMRI
Revision: 3P00
Compliance: SPC-4
User Capacity: 915,954,950,144 bytes [915 GB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=0
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x500117310020b3ac
Serial number: FG007T45
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Wed May 29 23:22:30 2019 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 29 C
Drive Trip Temperature: 70 C
Manufactured in week 27 of year 2013
Specified cycle count over device lifetime: 0
Accumulated start-stop cycles: 146
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 9900.620 0
write: 0 0 0 0 0 11396.555 0
Non-medium error count: 0
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Default Completed - 3592 - [- - -]
Long (extended) Self Test duration: 2880 seconds [48.0 minutes]
=== START OF INFORMATION SECTION ===
Vendor: SmrtStor
Product: DOPA0920S5xnNMRI
Revision: 3P00
Compliance: SPC-4
User Capacity: 915,954,950,144 bytes [915 GB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=0
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x500117310020b4f4
Serial number: FG007T6M
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Wed May 29 23:23:08 2019 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 30 C
Drive Trip Temperature: 70 C
Manufactured in week 27 of year 2013
Specified cycle count over device lifetime: 0
Accumulated start-stop cycles: 125
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 3410.039 0
write: 0 0 0 0 0 9640.794 0
Non-medium error count: 0
No self-tests have been logged
=== START OF INFORMATION SECTION ===
Vendor: SmrtStor
Product: DOPA0920S5xnNMRI
Revision: 3P00
Compliance: SPC-4
User Capacity: 915,954,950,144 bytes [915 GB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=0
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x500117310020b6b8
Serial number: FG007TA2
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Wed May 29 23:23:31 2019 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 30 C
Drive Trip Temperature: 70 C
Manufactured in week 27 of year 2013
Specified cycle count over device lifetime: 0
Accumulated start-stop cycles: 117
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 3437.090 0
write: 0 0 0 0 0 9462.567 0
Non-medium error count: 0
No self-tests have been logged
FreeNAS message log is full of errors as this:
May 29 02:50:50 freenas (da1:mps0:0:33:0): WRITE(10). CDB: 2a 00 6a a1 92 70 00 00 08 00
May 29 02:50:50 freenas (da1:mps0:0:33:0): CAM status: SCSI Status Error
May 29 02:50:50 freenas (da1:mps0:0:33:0): SCSI status: Check Condition
May 29 02:50:50 freenas (da1:mps0:0:33:0): SCSI sense: MEDIUM ERROR asc:31,0 (Medium format corrupted)
May 29 02:50:50 freenas (da1:mps0:0:33:0): Field Replaceable Unit: 24
May 29 02:50:50 freenas (da1:mps0:0:33:0): Actual Retry Count: 0
May 29 02:50:50 freenas (da1:mps0:0:33:0): Retrying command (per sense data)
May 29 02:50:50 freenas (da1:mps0:0:33:0): WRITE(10). CDB: 2a 00 00 40 02 70 00 00 08 00
May 29 02:50:50 freenas (da1:mps0:0:33:0): CAM status: SCSI Status Error
May 29 02:50:50 freenas (da1:mps0:0:33:0): SCSI status: Check Condition
May 29 02:50:50 freenas (da1:mps0:0:33:0): SCSI sense: MEDIUM ERROR asc:31,0 (Medium format corrupted)
May 29 02:50:50 freenas (da1:mps0:0:33:0): Field Replaceable Unit: 24
May 29 02:50:50 freenas (da1:mps0:0:33:0): Actual Retry Count: 0
I have tried to look up Field Replaceable Unit 24 but in vain?
Is anyone able to provide insight into this including how/which components to troubleshoot?
Regards,
Thomas
returning from few days working trip I suddenly find one of my pools offline.
zpool status for the pool yields:
pool: ESXIVOL
state: UNAVAIL
status: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to
continue functioning.
action: Destroy and re-create the pool from a backup source. Manually marking the device
repaired using 'zpool clear' may allow some data to be recovered.
scan: resilvered 34.6G in 0 days 00:19:57 with 0 errors on Mon May 27 08:17:09 2019
config:
NAME STATE READ WRITE CKSUM
ESXIVOL UNAVAIL 0 22 0
raidz1-0 DEGRADED 0 0 0
gptid/04d61697-53d7-11e9-bdc1-5065f366e21a ONLINE 0 0 0
gptid/c1f6ae17-7ccc-11e9-95f2-5065f366e21a ONLINE 0 0 0
gptid/c779e732-682c-11e9-ace4-5065f366e21a FAULTED 3 188 0 too many errors
gptid/9a22345f-683d-11e9-b8ac-5065f366e21a ONLINE 0 0 0
raidz1-1 UNAVAIL 0 44 0
gptid/8e505027-7ccd-11e9-95f2-5065f366e21a ONLINE 0 0 0
gptid/3cd86ea5-6829-11e9-ace4-5065f366e21a FAULTED 3 144 0 too many errors
gptid/c7b0545f-67fd-11e9-ace4-5065f366e21a FAULTED 3 60 0 too many errors
gptid/4dea18b1-6f03-11e9-b9c9-5065f366e21a FAULTED 0 4 0 too many errors
raidz1-2 DEGRADED 0 0 0
gptid/dd2cd7a2-803b-11e9-95f2-5065f366e21a ONLINE 0 0 0
gptid/26137960-7ccd-11e9-95f2-5065f366e21a ONLINE 0 0 0
gptid/a5814bbe-680f-11e9-ace4-5065f366e21a ONLINE 0 0 0
gptid/fba7b20e-6838-11e9-b8ac-5065f366e21a FAULTED 3 119 0 too many errors
raidz1-3 DEGRADED 0 0 0
gptid/31f55316-6836-11e9-b8ac-5065f366e21a ONLINE 0 0 0
gptid/13124900-680c-11e9-ace4-5065f366e21a ONLINE 0 0 0
gptid/9a038ea8-6801-11e9-ace4-5065f366e21a FAULTED 3 20 0 too many errors
gptid/fe82980a-549e-11e9-bdc1-5065f366e21a ONLINE 0 0 0
errors: 2413 data errors, use '-v' for a list
However SMART for the individual disks looks ok e.g. for the 3 failed drives in same vdev:
=== START OF INFORMATION SECTION ===
Vendor: SmrtStor
Product: DOPA0920S5xnNMRI
Revision: 3P00
Compliance: SPC-4
User Capacity: 915,954,950,144 bytes [915 GB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=0
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x500117310020b3ac
Serial number: FG007T45
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Wed May 29 23:22:30 2019 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 29 C
Drive Trip Temperature: 70 C
Manufactured in week 27 of year 2013
Specified cycle count over device lifetime: 0
Accumulated start-stop cycles: 146
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 9900.620 0
write: 0 0 0 0 0 11396.555 0
Non-medium error count: 0
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Default Completed - 3592 - [- - -]
Long (extended) Self Test duration: 2880 seconds [48.0 minutes]
=== START OF INFORMATION SECTION ===
Vendor: SmrtStor
Product: DOPA0920S5xnNMRI
Revision: 3P00
Compliance: SPC-4
User Capacity: 915,954,950,144 bytes [915 GB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=0
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x500117310020b4f4
Serial number: FG007T6M
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Wed May 29 23:23:08 2019 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 30 C
Drive Trip Temperature: 70 C
Manufactured in week 27 of year 2013
Specified cycle count over device lifetime: 0
Accumulated start-stop cycles: 125
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 3410.039 0
write: 0 0 0 0 0 9640.794 0
Non-medium error count: 0
No self-tests have been logged
=== START OF INFORMATION SECTION ===
Vendor: SmrtStor
Product: DOPA0920S5xnNMRI
Revision: 3P00
Compliance: SPC-4
User Capacity: 915,954,950,144 bytes [915 GB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=0
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x500117310020b6b8
Serial number: FG007TA2
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Wed May 29 23:23:31 2019 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 30 C
Drive Trip Temperature: 70 C
Manufactured in week 27 of year 2013
Specified cycle count over device lifetime: 0
Accumulated start-stop cycles: 117
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 3437.090 0
write: 0 0 0 0 0 9462.567 0
Non-medium error count: 0
No self-tests have been logged
FreeNAS message log is full of errors as this:
May 29 02:50:50 freenas (da1:mps0:0:33:0): WRITE(10). CDB: 2a 00 6a a1 92 70 00 00 08 00
May 29 02:50:50 freenas (da1:mps0:0:33:0): CAM status: SCSI Status Error
May 29 02:50:50 freenas (da1:mps0:0:33:0): SCSI status: Check Condition
May 29 02:50:50 freenas (da1:mps0:0:33:0): SCSI sense: MEDIUM ERROR asc:31,0 (Medium format corrupted)
May 29 02:50:50 freenas (da1:mps0:0:33:0): Field Replaceable Unit: 24
May 29 02:50:50 freenas (da1:mps0:0:33:0): Actual Retry Count: 0
May 29 02:50:50 freenas (da1:mps0:0:33:0): Retrying command (per sense data)
May 29 02:50:50 freenas (da1:mps0:0:33:0): WRITE(10). CDB: 2a 00 00 40 02 70 00 00 08 00
May 29 02:50:50 freenas (da1:mps0:0:33:0): CAM status: SCSI Status Error
May 29 02:50:50 freenas (da1:mps0:0:33:0): SCSI status: Check Condition
May 29 02:50:50 freenas (da1:mps0:0:33:0): SCSI sense: MEDIUM ERROR asc:31,0 (Medium format corrupted)
May 29 02:50:50 freenas (da1:mps0:0:33:0): Field Replaceable Unit: 24
May 29 02:50:50 freenas (da1:mps0:0:33:0): Actual Retry Count: 0
I have tried to look up Field Replaceable Unit 24 but in vain?
Is anyone able to provide insight into this including how/which components to troubleshoot?
Regards,
Thomas