Another, is my drive dead question?

Status
Not open for further replies.

Agi

Dabbler
Joined
Feb 26, 2016
Messages
14
Howdy folks,
Been running FreeNAS problem free for a year nearly, until earlier this week.
I started getting the below in daily email reports. Did a search, so I've tried different cables (3 now) and tried two different ports on the motherboard (supermicro x11ssh)

Also below is the SMART results, which look clean to me. I have a spare drive I can pop in and run the resilver, but am I missing anything obvious, before trying that?

Z2 pool consisting of 6x3TB wd reds
Xeon E3-1240 v5
X11SSH
32GB EEC Samsung RAM (straight off the QVL)
Seasonic X850

(Using onboard sata ports, but do have a couple of HBA's here if needed.)

Thanks in advance.

Code:
Jan  4 17:41:50 freenas (ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 d8 dc 98 40 fd 00 00 00 00 00
Jan  4 17:41:50 freenas (ada1:ahcich2:0:0:0): CAM status: ATA Status Error
Jan  4 17:41:50 freenas (ada1:ahcich2:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan  4 17:41:50 freenas (ada1:ahcich2:0:0:0): RES: 41 10 d8 dc 98 40 fd 00 00 00 00
Jan  4 17:41:50 freenas (ada1:ahcich2:0:0:0): Retrying command
Jan  4 17:41:57 freenas (ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 18 20 1a 05 40 83 00 00 00 00 00
Jan  4 17:41:57 freenas (ada1:ahcich2:0:0:0): CAM status: ATA Status Error
Jan  4 17:41:57 freenas (ada1:ahcich2:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan  4 17:41:57 freenas (ada1:ahcich2:0:0:0): RES: 41 10 20 1a 05 40 83 00 00 00 00
Jan  4 17:41:57 freenas (ada1:ahcich2:0:0:0): Retrying command
Jan  4 18:55:50 freenas (ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 70 00 42 40 20 00 00 00 00 00
Jan  4 18:55:50 freenas (ada1:ahcich2:0:0:0): CAM status: ATA Status Error
Jan  4 18:55:50 freenas (ada1:ahcich2:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan  4 18:55:50 freenas (ada1:ahcich2:0:0:0): RES: 41 10 70 00 42 40 20 00 00 00 00
Jan  4 18:55:50 freenas (ada1:ahcich2:0:0:0): Retrying command
Jan  4 18:55:57 freenas (ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 18 18 9e a2 40 fd 00 00 00 00 00
Jan  4 18:55:57 freenas (ada1:ahcich2:0:0:0): CAM status: ATA Status Error
Jan  4 18:55:57 freenas (ada1:ahcich2:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan  4 18:55:57 freenas (ada1:ahcich2:0:0:0): RES: 41 10 18 9e a2 40 fd 00 00 00 00
Jan  4 18:55:57 freenas (ada1:ahcich2:0:0:0): Retrying command
Jan  4 21:19:00 freenas (ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 38 ca 1e 40 fb 00 00 00 00 00
Jan  4 21:19:00 freenas (ada1:ahcich2:0:0:0): CAM status: ATA Status Error
Jan  4 21:19:00 freenas (ada1:ahcich2:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan  4 21:19:00 freenas (ada1:ahcich2:0:0:0): RES: 41 10 38 ca 1e 40 fb 00 00 00 00
Jan  4 21:19:00 freenas (ada1:ahcich2:0:0:0): Retrying command
Jan  4 21:43:15 freenas (ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 10 df 24 40 fb 00 00 01 00 00
Jan  4 21:43:15 freenas (ada1:ahcich2:0:0:0): CAM status: ATA Status Error
Jan  4 21:43:15 freenas (ada1:ahcich2:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan  4 21:43:15 freenas (ada1:ahcich2:0:0:0): RES: 41 10 10 df 24 40 fb 00 00 00 00
Jan  4 21:43:15 freenas (ada1:ahcich2:0:0:0): Retrying command
Jan  4 21:45:24 freenas (ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 b8 01 40 40 00 00 00 00 00 00
Jan  4 21:45:24 freenas (ada1:ahcich2:0:0:0): CAM status: ATA Status Error
Jan  4 21:45:24 freenas (ada1:ahcich2:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan  4 21:45:24 freenas (ada1:ahcich2:0:0:0): RES: 41 10 b8 01 40 40 00 00 00 00 00
Jan  4 21:45:24 freenas (ada1:ahcich2:0:0:0): Retrying command
Jan  4 21:45:31 freenas (ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 48 80 25 40 fb 00 00 00 00 00
Jan  4 21:45:31 freenas (ada1:ahcich2:0:0:0): CAM status: ATA Status Error
Jan  4 21:45:31 freenas (ada1:ahcich2:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan  4 21:45:31 freenas (ada1:ahcich2:0:0:0): RES: 41 10 48 80 25 40 fb 00 00 00 00
Jan  4 21:45:31 freenas (ada1:ahcich2:0:0:0): Retrying command


Code:
[root@freenas] ~# smartctl -a /dev/ada1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD30EFRX-68EUZN0
Serial Number:  WD-WCC4N5VPZL04
LU WWN Device Id: 5 0014ee 26204ddaf
Firmware Version: 82.00A82
User Capacity:  3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Fri Jan  6 13:53:22 2017 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
  was never started.
  Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (39360) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 395) minutes.
Conveyance self-test routine
recommended polling time:  (  5) minutes.
SCT capabilities:  (0x703d) SCT Status supported.
  SCT Error Recovery Control supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  176  176  021  Pre-fail  Always  -  6166
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  19
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  090  090  000  Old_age  Always  -  7573
10 Spin_Retry_Count  0x0032  100  253  000  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  253  000  Old_age  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  19
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  8
193 Load_Cycle_Count  0x0032  200  200  000  Old_age  Always  -  136
194 Temperature_Celsius  0x0022  135  117  000  Old_age  Always  -  15
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  200  200  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline  Completed without error  00%  7537  -
# 2  Short offline  Completed without error  00%  7301  -
# 3  Extended offline  Completed without error  00%  7213  -
# 4  Short offline  Completed without error  00%  7133  -
# 5  Short offline  Completed without error  00%  6965  -
# 6  Extended offline  Completed without error  00%  6878  -
# 7  Short offline  Completed without error  00%  6797  -
# 8  Short offline  Completed without error  00%  6581  -
# 9  Extended offline  Completed without error  00%  6494  -
#10  Short offline  Completed without error  00%  6413  -
#11  Short offline  Completed without error  00%  6246  -
#12  Extended offline  Completed without error  00%  6160  -
#13  Short offline  Completed without error  00%  6079  -
#14  Short offline  Completed without error  00%  5838  -
#15  Extended offline  Completed without error  00%  5751  -
#16  Short offline  Completed without error  00%  5671  -
#17  Short offline  Completed without error  00%  5503  -
#18  Extended offline  Completed without error  00%  5415  -
#19  Short offline  Completed without error  00%  5335  -
#20  Short offline  Completed without error  00%  5119  -
#21  Extended offline  Completed without error  00%  5031  -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited by a moderator:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Looks like a cable problem, are you sure you got the right drive?

Also why are your drive temps so low? 15c is probably too cold for disks, is this server outside in winter?

Sent from my Nexus 5X using Tapatalk
 

Agi

Dabbler
Joined
Feb 26, 2016
Messages
14
I pulled the drive and double checked the serial, also the "ada1:ahcich2:0:0:0" changed to "ada1:ahcich3:0:0:0" etc... when I tried different ports on the motherboard.

The server is in the garage, humidity 65%, stays between 12-18c all year round in there. Is this too cold? I can move it, but I thought considering it's inside a brick insulated garage (only 1 external wall) and I have a hygrometer in there which I check periodically, it would be alright.

I'll pull it again and triple check and try another cable make.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
At this point it might be interesting to try a hba. Maybe something is messed up on your motherboard. This is a little strange because you are only having issues with one drive. Make sure the hba is flashed to the correct firmware before you start.

Sent from my Nexus 5X using Tapatalk
 
  • Like
Reactions: Agi

Agi

Dabbler
Joined
Feb 26, 2016
Messages
14
Ok thanks,

Would it be worth swapping the connectors over, so try ada1 in place of ada2 on the board, then see if problems persist on the port, or on the drive?
Or is that too stupid with the above fault? I don't understand how serious the 'failure' it's reporting is, so don't want to blindly go swapping ports for the risk of integrity.

The HBA is a dell H200, flashed clean and then with v20 lsi 9211 flashed onto it, in IT mode. Need breakout cables, so will update on progress in the next few days when they arrive (finding stuff like this at a local store is impossible in rural England o_O)
 

Agi

Dabbler
Joined
Feb 26, 2016
Messages
14
Alright, so I swapped the drives over (kept the cables on the board and swapped the drives ends) problem is persistent to the drive. In a couple of days I had one error. I'm gonna pull it apart again tomorrow and swap them back and try a different power connector, before canning the drive off.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
problem is persistent to the drive
Good to know. I have seen a drive behave poorly under load without any sign of trouble in SMART attributes or SMART self-tests (it was an old Hitachi Deskstar).
 

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
From the spec sheet, operating temperature for the Red drives is 0C to 65C (32F to 149F). So 12C is fine.
 
Status
Not open for further replies.
Top