SOLVED What info do I need to RMA a HardDrive

Status
Not open for further replies.

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Hi, I have a drive that is showing errors, what info do I need to take it back to shop and get an exchange this is what I have so far.

Sunday 3:00am I got this E-Mail

Checking status of zfs pools:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
freenas-boot 55.5G 1.19G 54.3G - - 2% 1.00x ONLINE -
tank 27.2T 7.88T 19.4T - 6% 28% 1.00x ONLINE /mnt
vol_1 21.8T 2.95T 18.8T - 5% 13% 1.00x DEGRADED /mnt
pool: vol_1
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0 in 1h57m with 0 errors on Fri Apr 1 05:57:02 2016
config:
NAME STATE READ WRITE CKSUM
vol_1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/79f2d25c-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
gptid/7cffbe32-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
gptid/800f7676-e578-11e5-8e14-00074305cc80 FAULTED 6 149 0 too many errors
gptid/831e891c-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
gptid/863c075e-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
gptid/89555826-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
errors: No known data errors

Sunday 8:33 I got this E-Mail

The volume vol_1 (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

Sunday 8:35 I got this E-Mail

The volume vol_1 (ZFS) state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.

Sunday 9:03 I got this E-Mail

Device: /dev/da11 [SAT], ATA error count increased from 5 to 10 The volume vol_1 (ZFS) state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.

Monday 3:00 I got this E-Mail (I had done a Re-Boot Sunday Evening)

freenas.local changes in mounted filesystems:
8c8
< freenas-boot/ROOT/FreeNAS-9.3-STABLE-201602031011 / zfs rw,noatime,nfsv4acls 0 0
---
> freenas-boot/ROOT/FreeNAS-9.3-STABLE-201604041648 / zfs rw,noatime,nfsv4acls 0 0
freenas.local kernel log messages:
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 4a ea cb 90 00 00 18 00 length 12288 SMID 931 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 4a ea cb 90 00 00 18 00
> (da11:mps1:0:13:0): WRITE(16). CDB: 8a 00 00 00 00 01 d1 c0 bc a0 00 00 00 e0 00 00 length 114688 SMID 520 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(16). CDB: 8a 00 00 00 00 01 d1 c0 bc a0 00 00 00 e0 00 00
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 4a ea cc 98 00 00 50 00 length 40960 SMID 562 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 4a ea cc 98 00 00 50 00
> (da11:mps1:0:13:0): WRITE(16). CDB: 8a 00 00 00 00 01 d1 c0 bc a0 00 00 00 e0 00 00 length 114688 SMID 979 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(16). CDB: 8a 00 00 00 00 01 d1 c0 bc a0 00 00 00 e0 00 00
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 88 1f 18 00 00 08 00
> (da11:mps1:0:13:0): CAM status: SCSI Status Error
> (da11:mps1:0:13:0): SCSI status: Check Condition
> (da11:mps1:0:13:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da11:mps1:0:13:0): Retrying command (per sense data)
> ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0
> ipmi0: KCS mode found at io 0xca2 on acpi
> ipmi0: IPMI device rev. 1, firmware rev. 3.38, version 2.0
> ipmi0: Number of channels 2
> ipmi0: Attached watchdog
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 98 40 24 a8 00 00 28 00 length 20480 SMID 339 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 98 40 24 a8 00 00 28 00
> (da11:mps1:0:13:0): CAM status: SCSI Status Error
> (da11:mps1:0:13:0): SCSI status: Check Condition
> (da11:mps1:0:13:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da11:mps1:0:13:0): Retrying command (per sense data)
> GEOM_ELI: Device da8p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da0p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da1p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da7p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da3p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da4p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da5p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da6p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device ada1p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da2p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da9p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da10p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da11p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> (da11:mps1:0:13:0): READ(10). CDB: 28 00 00 40 00 78 00 00 08 00
> (da11:mps1:0:13:0): CAM status: SCSI Status Error
> (da11:mps1:0:13:0): SCSI status: Check Condition
> (da11:mps1:0:13:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da11:mps1:0:13:0): Retrying command (per sense data)
> GEOM_ELI: Device da12p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da13p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device da14p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> vboxdrv: fAsync=0 offMin=0x312 offMax=0xfe7
> bridge0: Ethernet address: 02:3e:c4:50:92:00
> bridge0: link state changed to UP
> em0: promiscuous mode enabled
> epair0a: Ethernet address: 02:02:29:00:09:0a
> epair0b: Ethernet address: 02:02:29:00:0a:0b
> epair0a: link state changed to UP
> epair0b: link state changed to UP
> em0: link state changed to DOWN
> epair0a: promiscuous mode enabled
> ng_ether_ifnet_arrival_event: can't re-name node epair0b
> em0: link state changed to UP
> epair1a: Ethernet address: 02:f9:85:00:0a:0a
> epair1b: Ethernet address: 02:f9:85:00:0b:0b
> epair1a: link state changed to UP
> epair1b: link state changed to UP
> epair1a: promiscuous mode enabled
> ng_ether_ifnet_arrival_event: can't re-name node epair1b
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00 length 4096 SMID 360 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00
> (da11:mps1:0:13:0): CAM status: SCSI Status Error
> (da11:mps1:0:13:0): SCSI status: Check Condition
> (da11:mps1:0:13:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da11:mps1:0:13:0): Retrying command (per sense data)
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00 length 4096 SMID 352 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00
> (da11:mps1:0:13:0): CAM status: SCSI Status Error
> (da11:mps1:0:13:0): SCSI status: Check Condition
> (da11:mps1:0:13:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da11:mps1:0:13:0): Retrying command (per sense data)
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00 length 4096 SMID 348 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00
> (da11:mps1:0:13:0): CAM status: SCSI Status Error
> (da11:mps1:0:13:0): SCSI status: Check Condition
> (da11:mps1:0:13:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da11:mps1:0:13:0): Retrying command (per sense data)
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00 length 4096 SMID 359 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00
> (da11:mps1:0:13:0): CAM status: SCSI Status Error
> (da11:mps1:0:13:0): SCSI status: Check Condition
> (da11:mps1:0:13:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da11:mps1:0:13:0): Retrying command (per sense data)
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00 length 4096 SMID 368 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mps1:0:13:0): WRITE(10). CDB: 2a 00 50 98 35 18 00 00 08 00
> (da11:mps1:0:13:0): CAM status: SCSI Status Error
> (da11:mps1:0:13:0): SCSI status: Check Condition
> (da11:mps1:0:13:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da11:mps1:0:13:0): Error 6, Retries exhausted
> (da11:mps1:0:13:0): Invalidating pack
-- End of security output --

Monday 3:00 I got this E-Mail

Checking status of zfs pools:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
freenas-boot 55.5G 1.23G 54.3G - - 2% 1.00x ONLINE -
tank 27.2T 7.75T 19.5T - 6% 28% 1.00x ONLINE /mnt
vol_1 21.8T 2.95T 18.8T - 5% 13% 1.00x DEGRADED /mnt
pool: vol_1
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0 in 1h57m with 0 errors on Fri Apr 1 05:57:02 2016
config:
NAME STATE READ WRITE CKSUM
vol_1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/79f2d25c-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
gptid/7cffbe32-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
gptid/800f7676-e578-11e5-8e14-00074305cc80 FAULTED 6 1 29 too many errors
gptid/831e891c-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
gptid/863c075e-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
gptid/89555826-e578-11e5-8e14-00074305cc80 ONLINE 0 0 0
errors: No known data errors
- End of daily output --


Output from smartctl Monday 11:45

[root@freenas] ~/scripts# smartctl -a /dev/da11
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: HGST Deskstar NAS
Device Model: HGST HDN724040ALE640
Serial Number: PK1334PCK3VBES
LU WWN Device Id: 5 000cca 24cec069c
Firmware Version: MJAOA5E0
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Apr 11 11:10:40 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 24) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 574) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 137 137 054 Pre-fail Offline - 78
3 Spin_Up_Time 0x0007 126 126 024 Pre-fail Always - 613 (Average 609)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 19
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 119 119 020 Pre-fail Offline - 35
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 1653
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 19
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 19
194 Temperature_Celsius 0x0002 130 130 000 Old_age Always - 46 (Min/Max 18/57)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 10

SMART Error Log Version: 1
ATA Error Count: 10 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 10 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 1f 35 98 00 Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 18 35 98 40 00 4d+11:15:59.743 WRITE FPDMA QUEUED
ef 10 02 00 00 00 00 00 4d+11:15:59.560 SET FEATURES [Enable SATA feature]
ef 02 00 00 00 00 00 00 4d+11:15:59.560 SET FEATURES [Enable write cache]
ef aa 00 00 00 00 00 00 4d+11:15:59.560 SET FEATURES [Enable read look-ahead]
ef 03 46 00 00 00 00 00 4d+11:15:59.559 SET FEATURES [Set transfer mode]

Error 9 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 1f 35 98 00 Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 18 35 98 40 00 4d+11:15:59.487 WRITE FPDMA QUEUED
ef 10 02 00 00 00 00 00 4d+11:15:59.166 SET FEATURES [Enable SATA feature]
ef 02 00 00 00 00 00 00 4d+11:15:59.166 SET FEATURES [Enable write cache]
ef aa 00 00 00 00 00 00 4d+11:15:59.165 SET FEATURES [Enable read look-ahead]
ef 03 46 00 00 00 00 00 4d+11:15:59.165 SET FEATURES [Set transfer mode]

Error 8 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 1f 35 98 00 Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 18 35 98 40 00 4d+11:15:58.747 WRITE FPDMA QUEUED
ef 10 02 00 00 00 00 00 4d+11:15:58.564 SET FEATURES [Enable SATA feature]
ef 02 00 00 00 00 00 00 4d+11:15:58.563 SET FEATURES [Enable write cache]
ef aa 00 00 00 00 00 00 4d+11:15:58.563 SET FEATURES [Enable read look-ahead]
ef 03 46 00 00 00 00 00 4d+11:15:58.563 SET FEATURES [Set transfer mode]

Error 7 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 1f 35 98 00 Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 18 35 98 40 00 4d+11:15:58.491 WRITE FPDMA QUEUED
ef 10 02 00 00 00 00 00 4d+11:15:58.251 SET FEATURES [Enable SATA feature]
ef 02 00 00 00 00 00 00 4d+11:15:58.251 SET FEATURES [Enable write cache]
ef aa 00 00 00 00 00 00 4d+11:15:58.250 SET FEATURES [Enable read look-ahead]
ef 03 46 00 00 00 00 00 4d+11:15:58.250 SET FEATURES [Set transfer mode]

Error 6 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 1f 35 98 00 Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 18 35 98 40 00 4d+11:15:58.184 WRITE FPDMA QUEUED
60 08 00 18 35 98 40 00 4d+11:15:58.156 READ FPDMA QUEUED
60 08 00 28 fc 79 40 00 4d+11:15:58.138 READ FPDMA QUEUED
60 08 00 28 e7 79 40 00 4d+11:15:58.134 READ FPDMA QUEUED
60 08 00 00 02 7a 40 00 4d+11:15:58.110 READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1552 -
# 2 Extended offline Completed without error 00% 1439 -
# 3 Short offline Completed without error 00% 1290 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas] ~/scripts#

Sorry for the long post I wanted to put as much info as I had.

Any help/pointers would be appreciated
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
It would be very helpful, particularly for things like the SMART output, to have it in code tags--that just makes the post easier to read. But here's what I'm seeing:
  • Your drive is too hot--it's at 46C, and it's seen 57C. For best life, drives should be kept under 40C.
  • You don't seem to have regular SMART self-tests running.
If this drive is too hot, there's a good chance others are as well, so check your cooling to get those temperatures down. That's not what's causing your problem, though. Try running a long SMART self-test (smartctl -t long /dev/da11), and see what the SMART output looks like once that finishes.
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Hi danb35,

Thanks for reply, sorry about the way I posted above I'd forgot about using 'code tags' I will do in future. I thought I was running regular SMART self-test here is how I have them set up

Smart Test.png

Is this to infrequent?

I have set off the long test you suggested 'smartctl -t long /dev/da11' it says it will take just under 10 hours to run, silly question but where does the output go once it's completed (how do I view it).

With the output I have so far I assume I need to replace the HDD as it's only 2 months old I assume I can get it exchanged.

Thanks for help
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Did you just set up that test schedule? Because your drive shows it's only seen three tests total. The schedule is probably frequent enough, though the short schedule looks pretty irregular.
silly question but where does the output go once it's completed (how do I view it).
Run smartctl -a /dev/da11 again and it will show in the SMART self-test log.
With the output I have so far I assume I need to replace the HDD as it's only 2 months old I assume I can get it exchanged.
Not sure. It's showing a number of errors in the error log, but it isn't showing any real problems in the SMART attributes. The error log can indicate cabling problems, but all the errors seem to be at the same block, which wouldn't seem consistent with that theory.
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Did you just set up that test schedule? Because your drive shows it's only seen three tests total. The schedule is probably frequent enough, though the short schedule looks pretty irregular.

No I have it set like that since I built this box just over 2 months ago. I set it up like that so I have a smart test (long or short) run every 5 days.

Run smartctl -a /dev/da11 again and it will show in the SMART self-test log.

OK thanks, another daft question (shows my total lack of experience) how do I know when test has actually completed.

Not sure. It's showing a number of errors in the error log, but it isn't showing any real problems in the SMART attributes. The error log can indicate cabling problems, but all the errors seem to be at the same block, which wouldn't seem consistent with that theory.

I will try changing the SFF-8087 cable to see if that helps
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
how do I know when test has actually completed.
You guess. Unfortunately there really isn't a better answer. So it says it will take 10 hours (which is an estimate), so check back in 10 hours. If it's complete, it will show in the self-test log; if not, wait a while and try again.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
what info do I need to take it back to shop and get an exchange this is what I have so far.
Depends on where you are taking it back. I know WD and Seagate have accepted when I told them it failed a SMART test, and gave them the error code. I've never tried to return one to a local shop, but I imagine they should accept the same info.
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
You guess. Unfortunately there really isn't a better answer. So it says it will take 10 hours (which is an estimate), so check back in 10 hours. If it's complete, it will show in the self-test log; if not, wait a while and try again.

OK thanks.

Depends on where you are taking it back. I know WD and Seagate have accepted when I told them it failed a SMART test, and gave them the error code. I've never tried to return one to a local shop, but I imagine they should accept the same info.

I bought it from SCAN Computers a major UK retailer they are only 20 mins away so I call them and see what they say.
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
It would be very helpful, particularly for things like the SMART output, to have it in code tags--that just makes the post easier to read. But here's what I'm seeing:
  • Your drive is too hot--it's at 46C, and it's seen 57C. For best life, drives should be kept under 40C.
  • You don't seem to have regular SMART self-tests running.
If this drive is too hot, there's a good chance others are as well, so check your cooling to get those temperatures down. That's not what's causing your problem, though. Try running a long SMART self-test (smartctl -t long /dev/da11), and see what the SMART output looks like once that finishes.

This is the output from the Long Test do I need to replace my HDD??

Code:
[root@freenas] ~# smartctl -a /dev/da11
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     HGST Deskstar NAS
Device Model:     HGST HDN724040ALE640
Serial Number:    PK1334PCK3VBES
LU WWN Device Id: 5 000cca 24cec069c
Firmware Version: MJAOA5E0
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Apr 11 21:22:00 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   24) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 574) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   137   137   054    Pre-fail  Offline      -       78
  3 Spin_Up_Time            0x0007   126   126   024    Pre-fail  Always       -       613 (Average 609)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       19
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   119   119   020    Pre-fail  Offline      -       35
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       1663
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       19
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       19
194 Temperature_Celsius     0x0002   125   125   000    Old_age   Always       -       48 (Min/Max 18/57)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       10

SMART Error Log Version: 1
ATA Error Count: 10 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 10 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:59.743  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 00 00   4d+11:15:59.560  SET FEATURES [Enable SATA feature]
  ef 02 00 00 00 00 00 00   4d+11:15:59.560  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 00 00   4d+11:15:59.560  SET FEATURES [Enable read look-ahead]
  ef 03 46 00 00 00 00 00   4d+11:15:59.559  SET FEATURES [Set transfer mode]

Error 9 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:59.487  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 00 00   4d+11:15:59.166  SET FEATURES [Enable SATA feature]
  ef 02 00 00 00 00 00 00   4d+11:15:59.166  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 00 00   4d+11:15:59.165  SET FEATURES [Enable read look-ahead]
  ef 03 46 00 00 00 00 00   4d+11:15:59.165  SET FEATURES [Set transfer mode]

Error 8 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:58.747  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 00 00   4d+11:15:58.564  SET FEATURES [Enable SATA feature]
  ef 02 00 00 00 00 00 00   4d+11:15:58.563  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 00 00   4d+11:15:58.563  SET FEATURES [Enable read look-ahead]
  ef 03 46 00 00 00 00 00   4d+11:15:58.563  SET FEATURES [Set transfer mode]

Error 7 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:58.491  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 00 00   4d+11:15:58.251  SET FEATURES [Enable SATA feature]
  ef 02 00 00 00 00 00 00   4d+11:15:58.251  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 00 00   4d+11:15:58.250  SET FEATURES [Enable read look-ahead]
  ef 03 46 00 00 00 00 00   4d+11:15:58.250  SET FEATURES [Set transfer mode]

Error 6 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:58.184  WRITE FPDMA QUEUED
  60 08 00 18 35 98 40 00   4d+11:15:58.156  READ FPDMA QUEUED
  60 08 00 28 fc 79 40 00   4d+11:15:58.138  READ FPDMA QUEUED
  60 08 00 28 e7 79 40 00   4d+11:15:58.134  READ FPDMA QUEUED
  60 08 00 00 02 7a 40 00   4d+11:15:58.110  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1663         -
# 2  Short offline       Completed without error       00%      1552         -
# 3  Extended offline    Completed without error       00%      1439         -
# 4  Short offline       Completed without error       00%      1290         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas] ~#
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
This is the output from the Long Test do I need to replace my HDD??

Code:
[root@freenas] ~# smartctl -a /dev/da11
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     HGST Deskstar NAS
Device Model:     HGST HDN724040ALE640
Serial Number:    PK1334PCK3VBES
LU WWN Device Id: 5 000cca 24cec069c
Firmware Version: MJAOA5E0
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Apr 11 21:22:00 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   24) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 574) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   137   137   054    Pre-fail  Offline      -       78
  3 Spin_Up_Time            0x0007   126   126   024    Pre-fail  Always       -       613 (Average 609)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       19
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   119   119   020    Pre-fail  Offline      -       35
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       1663
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       19
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       19
194 Temperature_Celsius     0x0002   125   125   000    Old_age   Always       -       48 (Min/Max 18/57)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       10

SMART Error Log Version: 1
ATA Error Count: 10 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 10 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:59.743  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 00 00   4d+11:15:59.560  SET FEATURES [Enable SATA feature]
  ef 02 00 00 00 00 00 00   4d+11:15:59.560  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 00 00   4d+11:15:59.560  SET FEATURES [Enable read look-ahead]
  ef 03 46 00 00 00 00 00   4d+11:15:59.559  SET FEATURES [Set transfer mode]

Error 9 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:59.487  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 00 00   4d+11:15:59.166  SET FEATURES [Enable SATA feature]
  ef 02 00 00 00 00 00 00   4d+11:15:59.166  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 00 00   4d+11:15:59.165  SET FEATURES [Enable read look-ahead]
  ef 03 46 00 00 00 00 00   4d+11:15:59.165  SET FEATURES [Set transfer mode]

Error 8 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:58.747  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 00 00   4d+11:15:58.564  SET FEATURES [Enable SATA feature]
  ef 02 00 00 00 00 00 00   4d+11:15:58.563  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 00 00   4d+11:15:58.563  SET FEATURES [Enable read look-ahead]
  ef 03 46 00 00 00 00 00   4d+11:15:58.563  SET FEATURES [Set transfer mode]

Error 7 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:58.491  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 00 00   4d+11:15:58.251  SET FEATURES [Enable SATA feature]
  ef 02 00 00 00 00 00 00   4d+11:15:58.251  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 00 00   4d+11:15:58.250  SET FEATURES [Enable read look-ahead]
  ef 03 46 00 00 00 00 00   4d+11:15:58.250  SET FEATURES [Set transfer mode]

Error 6 occurred at disk power-on lifetime: 1627 hours (67 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 1f 35 98 00  Error: ICRC, ABRT at LBA = 0x0098351f = 9975071

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 18 35 98 40 00   4d+11:15:58.184  WRITE FPDMA QUEUED
  60 08 00 18 35 98 40 00   4d+11:15:58.156  READ FPDMA QUEUED
  60 08 00 28 fc 79 40 00   4d+11:15:58.138  READ FPDMA QUEUED
  60 08 00 28 e7 79 40 00   4d+11:15:58.134  READ FPDMA QUEUED
  60 08 00 00 02 7a 40 00   4d+11:15:58.110  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1663         -
# 2  Short offline       Completed without error       00%      1552         -
# 3  Extended offline    Completed without error       00%      1439         -
# 4  Short offline       Completed without error       00%      1290         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas] ~#
No, it's an interface issue. Probably bad cables.
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
No, it's an interface issue. Probably bad cables.

So how do I get my Zpool back to a non degraded state if I don't need to replace HDD and resilver?

This was this mornings E-Mail

Code:
Checking status of zfs pools:
NAME           SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
freenas-boot  55.5G  1.23G  54.3G         -      -     2%  1.00x  ONLINE  -
tank          27.2T  7.65T  19.6T         -     5%    28%  1.00x  ONLINE  /mnt
vol_1         21.8T  2.95T  18.8T         -     5%    13%  1.00x  DEGRADED  /mnt

  pool: vol_1
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: scrub repaired 0 in 1h57m with 0 errors on Fri Apr  1 05:57:02 2016
config:

    NAME                                            STATE     READ WRITE CKSUM
    vol_1                                           DEGRADED     0     0     0
      raidz2-0                                      DEGRADED     0     0     0
        gptid/79f2d25c-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0
        gptid/7cffbe32-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0
        gptid/800f7676-e578-11e5-8e14-00074305cc80  FAULTED      6     1    29  too many errors
        gptid/831e891c-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0
        gptid/863c075e-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0
        gptid/89555826-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0

errors: No known data errors
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Fix the interface issue and scrub the pool.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Your harddrive has also gotten as hot as 57C. That's really hot. Disks I've seen at those temps before had errors because of their temperatures.

After you fix the issue you'll need to reboot to get the zpool mounted properly. Then do a scrub. :D
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Thanks Ericloewe & cyberjock
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Your harddrive has also gotten as hot as 57C. That's really hot. Disks I've seen at those temps before had errors because of their temperatures.
After you fix the issue you'll need to reboot to get the zpool mounted properly. Then do a scrub.

So I've replaced the cable, got the temps down, now all drives between 33c-37c. I've rebooted and done a scrub. The volume is still degraded here's the output zpool status

Code:
[root@freenas] ~/scripts# zpool status vol_1
  pool: vol_1
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 1.27G in 1h41m with 0 errors on Wed Apr 13 10:46:11 2016
config:

        NAME                                            STATE     READ WRITE CKSUM
        vol_1                                           DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/79f2d25c-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0
            gptid/7cffbe32-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0
            gptid/800f7676-e578-11e5-8e14-00074305cc80  DEGRADED     0     0 41.9K  too many errors
            gptid/831e891c-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0
            gptid/863c075e-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0
            gptid/89555826-e578-11e5-8e14-00074305cc80  ONLINE       0     0     0

errors: No known data errors
 
Last edited:

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
I did another reboot , scrub then reboot and all appears OK now. Thanks for all the above help, much appreciated.
 
Status
Not open for further replies.
Top