SMART error Failed Open Device

Status
Not open for further replies.

mazdajai

Dabbler
Joined
Jul 15, 2011
Messages
30
Since upgrading from 8.0.3 to, 8.2.0, 8.3.1 RC1. My disks would drop off from the array randomly, I need to reboot the box to discover them. I have used smartctl to check all drives - all appear to be healthy.

I suspect this is have something to do with smartd. Currently I have shut off smartd on the remaining two disks, and unable to bring smartd . Is there anything I can do to collect logs or bring the disks up?

dmesg
(ada0:ata0:0:0:0): lost device
(ada1:ata0:0:1:0): lost device

Mar 28 15:17:24 nas notifier: smartd not running? (check /var/run/smartd.pid).
Mar 28 15:17:24 nas notifier: Starting smartd.
Mar 28 15:17:24 nas smartd[76298]: Configuration file /usr/local/etc/smartd.conf parsed but has no entries (like /dev/hda)
Mar 28 15:17:24 nas root: /usr/local/etc/rc.d/smartd: WARNING: failed to start smartd
Mar 28 15:17:24 nas notifier: /usr/local/etc/rc.d/smartd: WARNING: failed to start smartd
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
This doesn't sound like a problem with smartd for me.

Can you please post your complete system specs?
Also you could upgrade to 8.3.1 Final and see if that fixes anything.

Do you remember at which FreeNAS version the hdds started to drop?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Devices being "lost" means that the hard drive was no longer responding to requests from the SATA controller. The hard drive may not be the problem, just about anything that prevents the controller from receiving the response from the hard drive can be the culprit. But 99% of the time its a cable issue or the hard drives themselves. Many hard drives don't have good firmware for recovering from a disk bad sector(google TLER for more info). If I were a betting man I'd assume you have 2 hard drives failiing and (less likely) a cable issue. Have to checked out the SMART data for ada0 and ada1? Can you post the output of those two drives.
 

mazdajai

Dabbler
Joined
Jul 15, 2011
Messages
30
warri,

Intel(R) Atom(TM) CPU D525 @ 1.80GHz
4GB
Jetway NF99
Hard drive info
ataidle ada2
Model: ST1500DL003-9VT16L
Serial: 5YD1HS0A
Firmware Rev: CC32
ATA revision: ATA-8
LBA 48: yes
Geometry: 16383 cyls, 16 heads, 63 spt
Capacity: 1397GB
SMART Supported: yes
SMART Enabled: yes
Write Cache Supported: yes
Write Cache Enabled: yes
APM Supported: no
AAM Supported: yes
AAM Enabled: no
Vendor Recommended AAM: 127

Also you could upgrade to 8.3.1 Final and see if that fixes anything.
Sure.

Do you remember at which FreeNAS version the hdds started to drop?
It has have been stable on 8.0.3. I think this start on 8.20 and after.

cyberjock,
...
Many hard drives don't have good firmware for recovering from a disk bad sector(google TLER for more info).
If I were a betting man I'd assume you have 2 hard drives failiing and (less likely) a cable issue.

I wish the problem was the hard drives. One time 4 of them dropped out together, and no dobut I ran smart on them, came up clean.

Have to checked out the SMART data for ada0 and ada1? Can you post the output of those two drives.
Going to give it shot, will post them in a bit.
 

mazdajai

Dabbler
Joined
Jul 15, 2011
Messages
30
Code:
0
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda Green (Adv. Format)
Device Model:     ST1500DL003-9VT16L
Serial Number:    5YD1KR9T
LU WWN Device Id: 5 000c50 02f5e4bfe
Firmware Version: CC32
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Mar 29 14:22:32 2013 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   106   099   006    Pre-fail  Always       -       12144168
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       67
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail  Always       -       34585654
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       12189
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       67
183 Runtime_Bad_Block       0x0032   009   009   000    Old_age   Always       -       91
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       15
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   063   060   045    Old_age   Always       -       37 (Min/Max 33/37)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       60
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       67
194 Temperature_Celsius     0x0022   037   040   000    Old_age   Always       -       37 (0 23 0 0 0)
195 Hardware_ECC_Recovered  0x001a   015   003   000    Old_age   Always       -       12144168
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       199810468556701
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3823517526
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       4079010037

SMART Error Log Version: 1
No Errors Logged

1
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda Green (Adv. Format)
Device Model:     ST1500DL003-9VT16L
Serial Number:    5YD1CMAP
LU WWN Device Id: 5 000c50 02f3ca216
Firmware Version: CC32
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Mar 29 14:22:32 2013 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   106   099   006    Pre-fail  Always       -       11122848
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       67
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail  Always       -       34526134
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       12197
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       67
183 Runtime_Bad_Block       0x0032   051   051   000    Old_age   Always       -       49
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       7
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   064   060   045    Old_age   Always       -       36 (Min/Max 32/36)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       60
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       67
194 Temperature_Celsius     0x0022   036   040   000    Old_age   Always       -       36 (0 22 0 0 0)
195 Hardware_ECC_Recovered  0x001a   022   003   000    Old_age   Always       -       11122848
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       190567698935717
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3973249487
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3408737776

SMART Error Log Version: 1
No Errors Logged

2
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda Green (Adv. Format)
Device Model:     ST1500DL003-9VT16L
Serial Number:    5YD1HS0A
LU WWN Device Id: 5 000c50 02f570c50
Firmware Version: CC32
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Mar 29 14:22:32 2013 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail  Always       -       29068904
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       67
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail  Always       -       34554254
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       12233
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       67
183 Runtime_Bad_Block       0x0032   017   017   000    Old_age   Always       -       83
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   097   000    Old_age   Always       -       16
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   064   061   045    Old_age   Always       -       36 (Min/Max 31/36)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       60
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       67
194 Temperature_Celsius     0x0022   036   040   000    Old_age   Always       -       36 (0 22 0 0 0)
195 Hardware_ECC_Recovered  0x001a   019   003   000    Old_age   Always       -       29068904
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       181277684674505
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3806709404
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       2451167761

SMART Error Log Version: 1
No Errors Logged

3
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda Green (Adv. Format)
Device Model:     ST1500DL003-9VT16L
Serial Number:    5YD1FSK1
LU WWN Device Id: 5 000c50 02f52f301
Firmware Version: CC32
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Mar 29 14:22:32 2013 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   111   099   006    Pre-fail  Always       -       29903784
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       69
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail  Always       -       34575321
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       12163
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       69
183 Runtime_Bad_Block       0x0032   011   011   000    Old_age   Always       -       89
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       9
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   065   061   045    Old_age   Always       -       35 (Min/Max 31/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       62
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       69
194 Temperature_Celsius     0x0022   035   040   000    Old_age   Always       -       35 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   020   003   000    Old_age   Always       -       29903784
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       117329916604291
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3803464655
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3384281527

SMART Error Log Version: 1
No Errors Logged
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
SMART data looks good to me. I got no good ideas, but I will tell you that if a drive is "lost" then any data written to the zpool will be out of sync with the "lost" drive. So as soon as the drive is detected and being used in the zpool again you would have to do a scrub of the zpool to restore complete redundancy. If you aren't doing the scrubs and a drive actually fails you may find you are in serious trouble because the drives aren't all in sync with themselves. The whole purpose of the scrub is to verify that all drives are in sync and the data is valid.

You are in a situation I wouldn't want to be in, and I'd definitely be trying to figure out what is going on. I had Seagates a few years ago that would drop out as part of their normal function(according to Seagate) and they won't work in a RAID no matter what you try to do. There is no fix and Seagate's answer to the forum thread(it was over 30 pages long with complaints) was that the drives are not designed for RAID and that the issue is our own problem for not buying the "RAID compatible" drives. Naturally, those drives are more expensive. In my situation, when a drive dropped out and came back a rebuild would auto-initiate and sometimes a second drive would drop out while the rebuild was in progress. I was not happy and had to ditch those drives that were only 70 days old. They have sat in a box ever since, never to be used again. Too bad too since I had spent over $2000 on the drives and used them for less than 3 months. For this reason I'll never buy Seagates again.
 

mazdajai

Dabbler
Joined
Jul 15, 2011
Messages
30
Yes cyberjock, I am doing a scrub as I am typing.

Earlier, I tried to do a graceful shut down but I got a kernel panic! Not sure if it is related, and bios did not detect the drives unless I do a hard reboot. (This happen every time when the hdds drop off the array)

Long story short, I have updated to 8.3.1. My next plan to is to disable smart (Turned it back on, because of the new release. This is the only piece I can think of that talks to the chip on the hard drive). If problem persists, I will consider replacing them with momentum xt or something.

Am I the only one that is experiencing this issue?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Generally other people experiencing the issue are failing hard drives. Since you aren't in that arena it's really hard to say. Kernel panics are usually a sign of insufficient hardware(such as not enough RAM), incompatible hardware, or bad hardware.
 

mazdajai

Dabbler
Joined
Jul 15, 2011
Messages
30
Just got two errors - 'SMART error (FailedOpenDevice) detected on host:' after upgrading to 8.3.1, two errors happens at the exact same time.

Thoughts?
 
Status
Not open for further replies.
Top