hardwarejunky009
Cadet
- Joined
- Dec 22, 2018
- Messages
- 4
Hello,
HW:
KALEA-INFORMATIQUE PCI Express Karte - 4X SATA 6Gb/s, mit miniSAS Kabel. PCIe 2.0. Marvell 88SE9215
and SilverStone SST-FS304B
5 drives in ZFS2 - array (all 6TB drives)
3xST6000VX001-2BD186
1xWDC WD6002FRYZ-01WD5B1
1xWDC WD60EFZX-68B3FN0
I weekly check the status of my Nas and became an array status degraded:
One seagate ST6000VX001-2BD186 has some ATA Error Count: 167 increast:
I then started a extended offline Selftest, but am not sure is the drive failing or the SAS cabel faulty? I never had any problems with SAS cabels so im not sure how I would approach this kind of error. Should I order new SAS cabel and see if the ATA error count is stabel or should I run more smart test on the drive?
best regards
hardwarejunky
HW:
KALEA-INFORMATIQUE PCI Express Karte - 4X SATA 6Gb/s, mit miniSAS Kabel. PCIe 2.0. Marvell 88SE9215
and SilverStone SST-FS304B
5 drives in ZFS2 - array (all 6TB drives)
3xST6000VX001-2BD186
1xWDC WD6002FRYZ-01WD5B1
1xWDC WD60EFZX-68B3FN0
I weekly check the status of my Nas and became an array status degraded:
One seagate ST6000VX001-2BD186 has some ATA Error Count: 167 increast:
Code:
root@freenas[~]# smartctl -a /dev/ada8
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Skyhawk
Device Model: ST6000VX001-2BD186
Serial Number: XXXXXXXXXXXXXX
LU WWN Device Id: 5 000c50 0dbaa881f
Firmware Version: CV12
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Oct 3 17:09:22 2021 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 249) Self-test routine in progress...
90% of test remaining.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 724) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x70bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 076 064 006 Pre-fail Always - 35548372
3 Spin_Up_Time 0x0003 091 091 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 107
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 070 060 045 Pre-fail Always - 9856939
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1779h+00m+00.000s
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 3
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 167
188 Command_Timeout 0x0032 098 098 000 Old_age Always - 4295032834
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 069 067 040 Old_age Always - 31 (Min/Max 23/33)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 2
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 530
194 Temperature_Celsius 0x0022 031 040 000 Old_age Always - 31 (0 23 0 0 0)
195 Hardware_ECC_Recovered 0x001a 076 064 000 Old_age Always - 35548372
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 746h+54m+56.288s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 11714087042
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 7192183930
SMART Error Log Version: 1
ATA Error Count: 167 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 167 occurred at disk power-on lifetime: 1759 hours (73 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 b0 ff ff ff 4f 00 23d+14:02:57.719 READ DMA EXT
61 00 00 ff ff ff 4f 00 23d+14:02:57.717 WRITE FPDMA QUEUED
b0 d5 01 09 4f c2 40 00 23d+14:02:57.689 SMART READ LOG
25 00 00 ff ff ff 4f 00 23d+14:02:54.022 READ DMA EXT
b0 d5 01 06 4f c2 40 00 23d+14:02:53.886 SMART READ LOG
Error 166 occurred at disk power-on lifetime: 1759 hours (73 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 23d+14:02:54.022 READ DMA EXT
b0 d5 01 06 4f c2 40 00 23d+14:02:53.886 SMART READ LOG
25 00 00 ff ff ff 4f 00 23d+14:02:50.223 READ DMA EXT
b0 d5 01 01 4f c2 40 00 23d+14:02:50.183 SMART READ LOG
25 00 00 ff ff ff 4f 00 23d+14:02:46.515 READ DMA EXT
Error 165 occurred at disk power-on lifetime: 1759 hours (73 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 23d+14:02:50.223 READ DMA EXT
b0 d5 01 01 4f c2 40 00 23d+14:02:50.183 SMART READ LOG
25 00 00 ff ff ff 4f 00 23d+14:02:46.515 READ DMA EXT
b0 d5 01 00 4f c2 40 00 23d+14:02:46.513 SMART READ LOG
25 00 00 ff ff ff 4f 00 23d+14:02:42.801 READ DMA EXT
Error 164 occurred at disk power-on lifetime: 1759 hours (73 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 23d+14:02:46.515 READ DMA EXT
b0 d5 01 00 4f c2 40 00 23d+14:02:46.513 SMART READ LOG
25 00 00 ff ff ff 4f 00 23d+14:02:42.801 READ DMA EXT
b0 da 00 00 4f c2 40 00 23d+14:02:42.494 SMART RETURN STATUS
25 00 00 ff ff ff 4f 00 23d+14:02:38.202 READ DMA EXT
Error 163 occurred at disk power-on lifetime: 1759 hours (73 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 23d+14:02:42.801 READ DMA EXT
b0 da 00 00 4f c2 40 00 23d+14:02:42.494 SMART RETURN STATUS
25 00 00 ff ff ff 4f 00 23d+14:02:38.202 READ DMA EXT
b0 d1 01 01 4f c2 40 00 23d+14:02:38.188 SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
2f 00 01 10 00 00 00 00 23d+14:02:38.188 READ LOG EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 90% 1779 -
# 2 Short offline Completed without error 00% 1762 -
# 3 Short offline Completed without error 00% 1594 -
# 4 Short offline Completed without error 00% 1426 -
# 5 Short offline Completed without error 00% 1258 -
# 6 Short offline Completed without error 00% 1098 -
# 7 Short offline Completed without error 00% 922 -
# 8 Short offline Completed without error 00% 754 -
# 9 Short offline Completed without error 00% 586 -
#10 Short offline Completed without error 00% 418 -
#11 Short offline Completed without error 00% 250 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
root@freenas[~]# smartctl -l selftest /dev/ada8
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 90% 1779 -
# 2 Short offline Completed without error 00% 1762 -
# 3 Short offline Completed without error 00% 1594 -
# 4 Short offline Completed without error 00% 1426 -
# 5 Short offline Completed without error 00% 1258 -
# 6 Short offline Completed without error 00% 1098 -
# 7 Short offline Completed without error 00% 922 -
# 8 Short offline Completed without error 00% 754 -
# 9 Short offline Completed without error 00% 586 -
#10 Short offline Completed without error 00% 418 -
#11 Short offline Completed without error 00% 250 -
I then started a extended offline Selftest, but am not sure is the drive failing or the SAS cabel faulty? I never had any problems with SAS cabels so im not sure how I would approach this kind of error. Should I order new SAS cabel and see if the ATA error count is stabel or should I run more smart test on the drive?
best regards
hardwarejunky