Possible Disk Failure

Status
Not open for further replies.

eye3

Dabbler
Joined
Feb 20, 2016
Messages
16
Two days ago I started seeing the following error in /var/log/messages.

(ada4:ahcich4:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 58 30 4d 40 60 00 00 00 00 00
(ada4:ahcich4:0:0:0): CAM status: ATA Status Error
(ada4:ahcich4:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
(ada4:ahcich4:0:0:0): RES: 41 10 58 30 4d 40 60 00 00 00 00
(ada4:ahcich4:0:0:0): Retrying command

This error occurred about 15 times and I haven't seen it since. After I saw the message I ran a SMART long self test on ada4 and here is the output.

Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   178   178   021    Pre-fail  Always       -       8075
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       344
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       52
194 Temperature_Celsius     0x0022   122   120   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       8

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       10%       343         2407105248
# 2  Extended offline    Completed: read failure       10%       315         2407105248
# 3  Short offline       Completed without error       00%       308         -
# 4  Extended offline    Completed without error       00%       228         -
# 5  Extended offline    Completed without error       00%        96         -
# 6  Extended offline    Completed without error       00%         9         -
# 7  Conveyance offline  Completed without error       00%         0         -
# 8  Short offline       Completed without error       00%         0         -


Do any of you believe that this disk is dying. I read one of the SMART results stickies and ID 200 wasn't listed as one of the errors to keep an eye on. This is a brand new system and I'm still within my 30 return period but I'll not return the disk if this is something that is not an issue. If it is then I'll definitely return it.

Thanks.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Based on your posting the drive is failing and Multizone errors are not typically always a hard error condition. Since the Extended test should be all internal to the drive I wouldn't say a SATA cable is the issue however for completeness replace the SATA cable and run the SMART Long Test once again. Make sure you power down your system during this cable change. Report back the SMART results once done. I'm on vacation so I may not get back to you but someone else will.

Also, you can RMA the drive based on the failed Extended test results. And unfortunately you didn't state what drive you have since you cut off the header of the SMART data so I can't offer up any additional help.

Good Luck.
 

eye3

Dabbler
Joined
Feb 20, 2016
Messages
16
Based on your posting the drive is failing and Multizone errors are not typically always a hard error condition. Since the Extended test should be all internal to the drive I wouldn't say a SATA cable is the issue however for completeness replace the SATA cable and run the SMART Long Test once again. Make sure you power down your system during this cable change. Report back the SMART results once done. I'm on vacation so I may not get back to you but someone else will.

Also, you can RMA the drive based on the failed Extended test results. And unfortunately you didn't state what drive you have since you cut off the header of the SMART data so I can't offer up any additional help.

Good Luck.
Sorry about that. They are WD Red 4TB drives. I have eight of them installed.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I saw those in your signature but I wasn't certain it was the drive in question. I hope replacing the SATA cable fixes your problem but if not, RMA the drive. Some do fail prematurely, you may just be one of those statistics.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I would regard two consecutive SMART extended tests failing at 10% as more than enough reason to return the drive. In my experience, drives can be sick without any SMART failures, but drives with failing SMART tests are never healthy.
 

eye3

Dabbler
Joined
Feb 20, 2016
Messages
16
I replaced the SATA cable and ran another long SMART test. Smart test failed again at 10%. I'll go ahead and get the drive replaced with another one. Thanks everyone for your help.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Yup, I was thinking that would be the outcome.
 
Status
Not open for further replies.
Top