Strange behavior, possible disk issues, expert advice needed.

Status
Not open for further replies.

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
A FreeNAS machine was not responding to keyboard input and there were some error messages on the screen. I hit the reset button and then the machine began turning on and off again in a loop. I shut the power off from the back, then back on, and booted the machine. It seems to be working now but I'm worried there may be something wrong with one of the disks.

I got the disks on Amazon, and they turned out to be not the enterprise drives I had hoped they were. They were advertised as:


4 of Seagate Constellation ES.2 3 TB 7200RPM SATA 6Gb/s 64MB Cache 3.5 Inch Internal Bare Drive ST33000650NS

When I got them they had both HP and Seagate labels on the drives. Here's what the seller said when I contacted him:


These HP drives are the Seagates, but labeled for HP. The Seagate model number is there. I have a mixed batch of these drives, some came with Seagate and some came with HP labeling. I spent some time looking online too and am quite sure that they are the same, HOWEVER, if you are not completely confident, please do put in a return, and get your money back. I believe all of the lower priced ES.2s currently bouncing around are in fact these HP labeled drives. So if these are not right for you, and you do the return, check carefully with the next vendor as well! HP does not make their own hardware. These are Seagate ST33000650NS made for HP. I mentioned the labeling in the listing, but do return if you are not comfortable with these.



So I'm not really sure about them. Here's what "smartctl" says:

Code:
[root@beta-nas] ~# smartctl -i /dev/ada0
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Seagate Barracuda ATA IV
Device Model:    ST380021A
Serial Number:    3HV15Z4H
Firmware Version: 3.10
User Capacity:    80,025,280,000 bytes [80.0 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA/ATAPI-5 (minor revision not indicated)
Local Time is:    Fri Jan 10 16:29:58 2014 CST
SMART support is: Available - device has SMART capability.
SMART support is: Disabled
 
[root@beta-nas] ~# smartctl -i /dev/ada1
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:    ���S���S���S���S���S���S���S���S���S���S
Serial Number:    �S���S���S���S���S��
Firmware Version: ���S���S
Rotation Rate:    52462 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  Unknown(0xd953) (unknown minor revision code: 0xccee)
Local Time is:    Fri Jan 10 16:30:16 2014 CST
SMART support is: Unavailable - Packet Interface Devices [this device: Reserved] don't support ATA SMART
 
[root@beta-nas] ~# smartctl -i /dev/ada2
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:    MB3000EBKAB
Serial Number:    Z290E0SL
LU WWN Device Id: 5 000c50 0355a7b88
Firmware Version: HPG2
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jan 10 16:30:28 2014 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
[root@beta-nas] ~# smartctl -i /dev/ada3
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:    MB3000EBKAB
Serial Number:    Z290NZXA
LU WWN Device Id: 5 000c50 035a2cdf9
Firmware Version: HPG2
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jan 10 16:30:41 2014 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
[root@beta-nas] ~# smartctl -i /dev/ada4
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:    MB3000EBKAB
Serial Number:    Z290FZSJ
LU WWN Device Id: 5 000c50 0357b8ed8
Firmware Version: HPG2
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jan 10 16:30:45 2014 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


This thread talks about Seagate disks and false-positives, which may concern these drives:

http://forums.freenas.org/threads/after-9-1-0-gui-upgrade-bad-message-in-logs.14475/#post-70068

Here's some info about the machine:

Code:
Build    FreeNAS-9.2.0-RC-x64 (93440a9)
Platform    Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Memory    4074MB
System Time    Fri Jan 10 16:20:08 CST 2014
Uptime    4:20PM up 2:26, 1 user
Load Average    0.69, 0.80, 0.81


Here is the output of "zpool status":

Code:
[root@beta-nas] ~# zpool status -x
  pool: beta
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 3h42m with 0 errors on Sun Dec  8 03:42:38 2013
config:
 
    NAME                                            STATE    READ WRITE CKSUM
    beta                                            ONLINE      0    0    0
      mirror-0                                      ONLINE      0    0    0
        gptid/962c6551-2451-11e3-823b-001d7d06876f  ONLINE      0    0  304
        gptid/967895ce-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
      mirror-1                                      ONLINE      0    0    0
        gptid/96c60e24-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
        gptid/97127c36-2451-11e3-823b-001d7d06876f  ONLINE      0    0  294
 
errors: No known data errors


Reading http://illumos.org/msg/ZFS-8000-9P the output looks very different than mine above.

Here's the contents of "smartd.conf":

Code:
[root@beta-nas] ~# cat /usr/local/etc/smartd.conf
################################################
# smartd.conf generated by /etc/rc.d/ix-smartd
################################################
/dev/ada1 -a -n never -W 0,0,0 -m xxx@gmail.com
/dev/ada2 -a -n never -W 0,0,0 -m xxx@gmail.com
/dev/ada3 -a -n never -W 0,0,0 -m xxx@gmail.com
/dev/ada4 -a -n never -W 0,0,0 -m xxx@gmail.com


The command "smartctl -l selftest" returns immediately for the following device IDs:

Code:
[root@beta-nas] ~# smartctl -l selftest /dev/ada1
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
SMART support is: Unavailable - Packet Interface Devices [this device: Reserved] don't support ATA SMART
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
[root@beta-nas] ~# smartctl -l selftest /dev/ada4
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%        0        -
 
[root@beta-nas] ~# smartctl -l selftest /dev/ada2
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%        0        -
 
[root@beta-nas] ~# smartctl -l selftest /dev/ada3
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%        0        -


Is there cause for concern? Are these disks going to have issues going forward? The warranty has expired for them (February 2013).

I just got an email about 9.2 Stable. Is it safe to install it on this machine at this point?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Well you didn't get what you bought. The first drive is clearly an 80GB drive, not 3TB. The 3TB drives look to be server grade. Did you buy used hard drives? And the warranty expires next month? Hope you didn't pay much for them. Run 'smartctl -a /dev/ada0' and post the results for each drive. We can tell you if it was a good deal or not.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Money says these are used drives. It's also known that no test has been done on the drives since power-on hours 0. So my guess if they are used is that he pulled them from a server and sold them as-is with no testing!
 

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
I received these "Seagate/HP" drives on 10/16/2012. I tested them using Seatools on a Windows box before keeping them. No issues and they were definitely new. They were reasonably priced and, at the time, I couldn't wait for them to be returned to shop again.

I forgot to mention the 80GB is an old IDE drive from a long discarded Dell machine. It's what's running the FreeNAS OS.

The fourth "Seagate/HP" is in a ReadyNAS unit which WAS reporting problems and forcing rebuilds because of the drive, but I've had that machine powered off for a couple of months. Inside ReadyNAS machine are two other LEGITIMATE Seagate Constellations, which have never caused issues (which is why I bought these "Seagate/HP" drives in the first place).

Here's "smartctl" with the "-a" flag:

Code:
[root@beta-nas] ~# smartctl -a /dev/ada0
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Seagate Barracuda ATA IV
Device Model:    ST380021A
Serial Number:    3HV15Z4H
Firmware Version: 3.10
User Capacity:    80,025,280,000 bytes [80.0 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA/ATAPI-5 (minor revision not indicated)
Local Time is:    Fri Jan 10 21:24:05 2014 CST
SMART support is: Available - device has SMART capability.
SMART support is: Disabled
 
SMART Disabled. Use option -s with argument 'on' to enable it.
(override with '-T permissive' option)
 
 
 
[root@beta-nas] ~# smartctl -a /dev/ada1
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:    ���S���S���S���S���S���S���S���S���S���S
Serial Number:    �S���S���S���S���S��
Firmware Version: ���S���S
Rotation Rate:    52462 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  Unknown(0xd953) (unknown minor revision code: 0xccee)
Local Time is:    Fri Jan 10 21:24:18 2014 CST
SMART support is: Unavailable - Packet Interface Devices [this device: Reserved] don't support ATA SMART
 
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
 
 
 
[root@beta-nas] ~# smartctl -a /dev/ada2
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:    MB3000EBKAB
Serial Number:    Z290E0SL
LU WWN Device Id: 5 000c50 0355a7b88
Firmware Version: HPG2
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jan 10 21:24:24 2014 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (  609) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  2) minutes.
Extended self-test routine
recommended polling time:      ( 443) minutes.
Conveyance self-test routine
recommended polling time:      (  3) minutes.
SCT capabilities:            (0x103d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  082  063  044    Pre-fail  Always      -      176846932
  3 Spin_Up_Time            0x0003  092  091  070    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      16
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  069  060  030    Pre-fail  Always      -      4305107144
  9 Power_On_Hours          0x0032  098  098  000    Old_age  Always      -      2505
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      22
180 Unknown_HDD_Attribute  0x003b  100  100  030    Pre-fail  Always      -      1235605971
184 End-to-End_Error        0x0032  100  100  003    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0
188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0
189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0
190 Airflow_Temperature_Cel 0x0022  071  056  045    Old_age  Always      -      29 (Min/Max 27/31)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      10
193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      22
194 Temperature_Celsius    0x0022  029  044  000    Old_age  Always      -      29 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a  022  008  000    Old_age  Always      -      176846932
196 Reallocated_Event_Count 0x0033  100  100  036    Pre-fail  Always      -      0
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      3
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%        0        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
 
 
[root@beta-nas] ~# smartctl -a /dev/ada3
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:    MB3000EBKAB
Serial Number:    Z290NZXA
LU WWN Device Id: 5 000c50 035a2cdf9
Firmware Version: HPG2
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jan 10 21:25:18 2014 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (  609) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  2) minutes.
Extended self-test routine
recommended polling time:      ( 459) minutes.
Conveyance self-test routine
recommended polling time:      (  3) minutes.
SCT capabilities:            (0x103d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  079  063  044    Pre-fail  Always      -      91044045
  3 Spin_Up_Time            0x0003  093  091  070    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      19
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  070  060  030    Pre-fail  Always      -      4305789350
  9 Power_On_Hours          0x0032  098  098  000    Old_age  Always      -      2510
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      25
180 Unknown_HDD_Attribute  0x003b  100  100  030    Pre-fail  Always      -      154949758
184 End-to-End_Error        0x0032  100  100  003    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0
188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0
189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0
190 Airflow_Temperature_Cel 0x0022  068  058  045    Old_age  Always      -      32 (Min/Max 29/33)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      13
193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      25
194 Temperature_Celsius    0x0022  032  042  000    Old_age  Always      -      32 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a  019  009  000    Old_age  Always      -      91044045
196 Reallocated_Event_Count 0x0033  100  100  036    Pre-fail  Always      -      0
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%        0        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
 
 
[root@beta-nas] ~# smartctl -a /dev/ada4
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Device Model:    MB3000EBKAB
Serial Number:    Z290FZSJ
LU WWN Device Id: 5 000c50 0357b8ed8
Firmware Version: HPG2
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jan 10 21:25:21 2014 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (  609) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  2) minutes.
Extended self-test routine
recommended polling time:      ( 450) minutes.
Conveyance self-test routine
recommended polling time:      (  3) minutes.
SCT capabilities:            (0x103d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  082  063  044    Pre-fail  Always      -      201194706
  3 Spin_Up_Time            0x0003  092  091  070    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      18
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  069  060  030    Pre-fail  Always      -      4305390107
  9 Power_On_Hours          0x0032  098  098  000    Old_age  Always      -      2510
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      24
180 Unknown_HDD_Attribute  0x003b  100  100  030    Pre-fail  Always      -      74037671
184 End-to-End_Error        0x0032  100  100  003    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0
188 Command_Timeout        0x0032  099  088  000    Old_age  Always      -      292062232644
189 High_Fly_Writes        0x003a  095  095  000    Old_age  Always      -      5
190 Airflow_Temperature_Cel 0x0022  069  055  045    Old_age  Always      -      31 (Min/Max 29/32)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      12
193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      24
194 Temperature_Celsius    0x0022  031  045  000    Old_age  Always      -      31 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a  022  008  000    Old_age  Always      -      201194706
196 Reallocated_Event_Count 0x0033  100  100  036    Pre-fail  Always      -      0
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  197  000    Old_age  Always      -      3432
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%        0        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
Actually, I just physically checked: I have FIVE drives in the FreeNAS box.

So, FOUR "Seagate/HP" drives in addition to ONE 80GB IDE.

That's why I was able to use "smartctl" on ada 0-4.

So, ada1 seems to be reporting all "?" marks. What exactly does that mean? I'm not getting any scary warnings through the FreeNAS web GUI. Is ada1 dead and do I need to replace it?

Here's Storage → Volumes → View Volumes:

volumes.png


And here's Storage → Volumes → View Disks:

disks.png
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
ada1 I'd definitely not use. Something is wrong if a drive is providing trash characters like yours.

ada2 through 4 looks good. ada4 probably has a bad SATA cable(identified by UDMA CRC errors) and I'd replace it at your next convenience.

I'd do a SMART short and long test on all of the disks. Here's how...

smartctl -t short /dev/adaX
(wait 5 minutes before doing next test on the same drive again)
smartctl -t long /dev/adaX
(wait about 459 minutes before checking results)

Don't powerdown, reboot, or do a pool scrub during these tests. You can use the server as you normally do, just try not to put alot of load on the server because it'll slow down the test.

Then do smartctl -a /dev/adaX on the drives and look at the SMART test log results portion of the output. If they say pass then your drives are in perfect shape.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
btw, ZFS on FreeNAS requires a minimum of 8Gb of RAM. You only have 4.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Unfortunately I assumed you had recently purchased theses drives from a scam artist.

I agree with Cyberjock, you need to pull the ada1 drive. If it's still under warranty then I'd pursue that immediately.
 

JohnK

Patron
Joined
Nov 7, 2013
Messages
256
Rotation Rate: 52462 rpm - That is one seriously fast drive
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Rotation Rate: 52462 rpm - That is one seriously fast drive
It's branded as "Barracuda Bull-S" or the "Barracuda Cheatcha". I'm sure there are other appropriate names.
 

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
btw, ZFS on FreeNAS requires a minimum of 8Gb of RAM. You only have 4.

Sh!t. Thank you for noticing. THOUGHT I had 8GB total memory. Turns out I misunderstood the "2x2" labeling: http://www.avadirect.com/product_details_parts.asp?PRID=11578

I've added two more sticks for a total of "8170MB" memory. Thanks again for noticing!

Regarding the failing disk:

I left the machine off for a night and am still getting bad readings from ada1. I have an equal size [3.0 TB] spare ready to go, but I'm having trouble following the wiki:

http://doc.freenas.org/index.php/Volumes#Replacing_a_Failed_Drive

When I check Volume Status in the web GUI, all rows under the Status column say ONLINE. Does this have anything to do with being "AHCI capable"?

Volume_Status.png


Sorry for the ignorance but, How do I go about replacing ada1?
 

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
I just received this message on the monitor connected to the FreeNAS box:

Code:
Jan 13 18:21:37 beta-nas smartd[2793]: Device: /dev/ada1, FAILED SMART self-check. BACK UP NOW!


In the Web GUI, I still see no UNHEALTHY, DEGRADED, or OFFLINE messages, and no drives/devices are missing from the GUI.

Is it possible to use the command-line instead of the Web GUI to replace ada1?
 

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
Ffffk, just found that clicking to highlight "ada1p2" shows the "Offline" button under the Volume Status screen. That was not intuitive at all! >:]
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ffffk, just found that clicking to highlight "ada1p2" shows the "Offline" button under the Volume Status screen. That was not intuitive at all! >:]

No, but the manual is very intuitive! It guides you step by step! Check it out.. 6.3.12.

Yes, I know it by heart because so many people don't want to be inconvenienced with reading the manual!
 

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
I doubt this is the place for feature suggestions, but it really could say something like "Please click a disk's entry".

I'm REALLY glad I posted here and for everyones help. I was oblivious to the fact that I was running half the minimum RAM. I've added two more sticks for a total of 8GB.

Resilvering is in process! After that I'll update the OS and have a drink.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
:D
 

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
After resilvering finished, I checked the status of zpool:

Code:
[root@beta-nas] ~# zpool status
  pool: beta
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 818G in 3h23m with 0 errors on Mon Jan 13 23:32:39 2014
config:
 
    NAME                                            STATE    READ WRITE CKSUM
    beta                                            ONLINE      0    0    0
      mirror-0                                      ONLINE      0    0    0
        gptid/dd44cd8f-7cc0-11e3-ab3a-001d7d06876f  ONLINE      0    0    0
        gptid/967895ce-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
      mirror-1                                      ONLINE      0    0    0
        gptid/96c60e24-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
        gptid/97127c36-2451-11e3-823b-001d7d06876f  ONLINE      0    0    83
 
errors: No known data errors


There were 83 CKSUM (errors?).

I then restarted from the Web GUI. The machine failed to get past the BIOS screen. It showed it had detected Master 1, Master 2, Slave 1, but not Slave 2 (the one that just resilvered). It just sat there blinking. I tried turning the box on and off but more of the same. Then I powered off, disconnected the newly resilvered drive, and it booted successfully. I turned it off again, replaced the SATA cable with a spare, and now it booted.

I think however on this last boot it resilvered a small amount of data (124K), and 3 CKSUM errors happened on "Master 1":

Code:
[root@beta-nas] ~# zpool status
  pool: beta
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 124K in 0h0m with 0 errors on Tue Jan 14 02:21:13 2014
config:
 
    NAME                                            STATE    READ WRITE CKSUM
    beta                                            ONLINE      0    0    0
      mirror-0                                      ONLINE      0    0    0
        gptid/dd44cd8f-7cc0-11e3-ab3a-001d7d06876f  ONLINE      0    0    3
        gptid/967895ce-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
      mirror-1                                      ONLINE      0    0    0
        gptid/96c60e24-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
        gptid/97127c36-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
 
errors: No known data errors


In the Web GUI there was an orange "Warning" about resilvering. Now the latest alert is "OK: The volume beta (ZFS) status is HEALTHY".

Running smartctl on ada0-4 passes for all drives.

I'm currently setting up my old ReadyNAS to backup all the FreeNAS data, and am not going to power down the FreeNAS until then.

Hopefully that's the last of the hiccups, and I can finally update to the latest FreeNAS OS and run "zpool upgrade".
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You "ran smartctl" on all 4 drives. What does that even mean? You checked their parameters? You ran SMART tests?

I'd say you should run short and long tests on all of your disks. You appear to have more serious problems if you have CHKSUM errors.
 

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
I did "smartctl" with the "-a" flag, and they returned with:
SMART overall-health self-assessment test result: PASSED

Anyways. all the data is now backed up to the ReadyNAS after a very long RSYNC.

Now feeling safe, I upgraded the firmware to FreeNAS-9.2.0-RELEASE-x64 (ab098f4).

After that, I finally ran "zpool upgrade". Here's the result of "zpool status":

Code:
pool: beta
state: ONLINE
  scan: resilvered 124K in 0h0m with 0 errors on Tue Jan 14 02:21:13 2014
config:
 
NAME                                            STATE    READ WRITE CKSUM
beta                                            ONLINE      0    0    0
mirror-0                                      ONLINE      0    0    0
  gptid/dd44cd8f-7cc0-11e3-ab3a-001d7d06876f  ONLINE      0    0    0
  gptid/967895ce-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
mirror-1                                      ONLINE      0    0    0
  gptid/96c60e24-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
  gptid/97127c36-2451-11e3-823b-001d7d06876f  ONLINE      0    0    0
 
 
errors: No known data errors


Thanks again for help in previous posts.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I did "smartctl" with the "-a" flag, and they returned with:

SMART overall-health self-assessment test result: PASSED

That doesn't mean what you think it means. It means that of all of the SMART tests you've run none have failed. Well, guess what? If you don't do any SMART tests then none failed.

You should do -a and check out the parameters and raw values to figure out the health of the disk. That, or do a SMART short and long test and THEN look at that value again. Long tests can take 6+ hours but the -a will tell you the estimated time for the test under idle conditions. Even then, it is possible to have a bad condition on your disks that will pass short and long tests. So the -a and parameter review is the best way to determine your drive's health.

And since 124k was resilver recently, tends to prove that something is not quite right. You shouldn't be having to resilver any data on scrubs.
 
Status
Not open for further replies.
Top