DEGRADED storage pool

Status
Not open for further replies.

Paul Bearne

Dabbler
Joined
Jun 29, 2015
Messages
21
Hi All
01_storage_pool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
gptid/3cd0d0b1-f380-11e4-9ca0-d050993846a8 ONLINE 0 0 0
gptid/3da87d6e-f380-11e4-9ca0-d050993846a8 DEGRADED 0 0 861 too many errors
mirror-1 ONLINE 0 0 0
gptid/3e7f6ea0-f380-11e4-9ca0-d050993846a8 ONLINE 0 0 0
gptid/3f60316b-f380-11e4-9ca0-d050993846a8 ONLINE 0 0 0

This is in a new FreeNAS Mini with 6tb reds

It looks like of me that one of the drives has problems and is creating CKSUM errors

What test should do if any to conform or do I just return it?

Anybody had problems send Reds back?

If I return it, how do I work out which drive I need to remove (short of pulling one and checking if it the right drive)

Returning it will mean that I will be down a drive at least a couple of weeks I am assuming that I will have to just live with FreeNAS sending emails each day until I replace the drive or can set it to ignore this for a bit

Many thanks for answers to the questions and any other advice that I need but didn't know to ask :smile:
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
i take the bad drive out and put it in my windows box (or external hdd case) - i then use western digital data lifeguard diagnostics on the drive - run an extended test this will test every sector on the hdd. if it fails and if it is in warranty return to wd. virtually all of my hdd are wd and i have never had any problems with returns. in my experience it is one of the best.

to identify the bad drive - in the gui - hit storage and then view disks - this will give you all the information you need to see which one has the problem (the name and serial especially)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Before you pull the disk, you could try checking out its SMART status. Go into Storage -> view disks to see which disk it is, then from the CLI, do "smartctl -a /dev/ada#", where # corresponds with the disk that's showing the checksum errors. Post the output here (in code tags--click the Insert... button in the editor toolbar, then select Code) and we can see what it shows.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Also, it's a good idea to setup a cron job to run @Bidule0hm SMART script daily. I'm going through an RMA right now. Pretty painless. I just went to WDC.com, entered the serial number, chose the "experiencing SMART errors" option and chose the advance replacement option. I gave them a CC (they place a hold on the card), they sent me a new drive, I then send the old one back, and they release the CC hold.

As for identifying the drive, I use dd. Once you run the smart test and identify which device just run

Code:
dd if=/dev/ada## of=dev/null bs=4096


And then look for the solid drive activity light.
 

Paul Bearne

Dabbler
Joined
Jun 29, 2015
Messages
21
HI Guys

Thanks for the feedback
the report list the bad drive as gptid/3da87d6e-f380-11e4-9ca0-d050993846a8

All the commands / View Disk list the drives as ada0-4

Do I assume that the 2 drive listed in the report is ado1?

the dd will help

Paul
 

Paul Bearne

Dabbler
Joined
Jun 29, 2015
Messages
21
Here ada0 and ada1

running tests on ada3 and ada4

Code:
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD60EFRX-68MYMN1
Serial Number:    WD-WX21DA4D1JHA
LU WWN Device Id: 5 0014ee 20b55c1d2
Firmware Version: 82.00A82
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jun 29 10:57:11 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 7784) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp                                 ort.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 731) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                                 FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -                                        0
  3 Spin_Up_Time            0x0027   197   197   021    Pre-fail  Always       -                                        9125
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -                                        10
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -                                        0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -                                        0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -                                        1309
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -                                        0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -                                        0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -                                        10
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -                                        0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -                                        33
194 Temperature_Celsius     0x0022   120   114   000    Old_age   Always       -                                        32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -                                        0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -                                        0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -                                        0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -                                        0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -                                        0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                                 _of_first_error
# 1  Short offline       Completed without error       00%      1285         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[pbearne@bearne-nas ~]$ sudo smartctl -a /dev/ada0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD60EFRX-68MYMN1
Serial Number:    WD-WX21DC42EEUP
LU WWN Device Id: 5 0014ee 261226bf5
Firmware Version: 82.00A82
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jun 29 10:57:54 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 4784) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 702) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   197   196   021    Pre-fail  Always       -       9150
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1309
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       81
194 Temperature_Celsius     0x0022   119   113   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1285         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[pbearne@bearne-nas ~]$ sudo smartctl -a /dev/ada2
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Apacer SSD
Device Model:     16GB SATA Flash Drive
Serial Number:    B051448130070000027F
Firmware Version: SFDE001A
User Capacity:    16,013,942,784 bytes [16.0 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jun 29 10:58:51 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 255) Self-test routine in progress...
                                        150% of test remaining.
Total time to complete Offline
data collection:                (   32) seconds.
Offline data collection
capabilities:                    (0x1b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   1) minutes.
SCT capabilities:              (0x0039) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       1489
12 Power_Cycle_Count       0x0012   100   100   000    Old_age   Always       -       20
167 SSD_Protect_Mode        0x0022   100   100   000    Old_age   Always       -       0
168 SATA_PHY_Err_Ct         0x0012   100   100   000    Old_age   Always       -       0
175 Program_Fail_Count_Chip 0x0013   100   100   010    Pre-fail  Always       -       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       11
194 Temperature_Celsius     0x0023   100   100   000    Pre-fail  Always       -       40 (Min/Max 30/60)
163 Max_Erase_Count         0x0000   100   100   001    Old_age   Offline      -       345
164 Average_Erase_Count     0x0000   100   100   001    Old_age   Offline      -       299
166 Later_Bad_Block_Count   0x0000   100   100   010    Old_age   Offline      -       0
241 Total_LBAs_Written      0x0000   100   100   000    Old_age   Offline      -       3487039847

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported
 

Paul Bearne

Dabbler
Joined
Jun 29, 2015
Messages
21
and ada3 and 4

Code:
sudo smartctl -a /dev/ada3
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD60EFRX-68MYMN1
Serial Number:    WD-WX31DA43KK1C
LU WWN Device Id: 5 0014ee 20b5604a6
Firmware Version: 82.00A82
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jun 29 11:03:24 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  41) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete Offline
data collection:                ( 5024) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 703) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   199   199   021    Pre-fail  Always       -       9033
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1309
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       34
194 Temperature_Celsius     0x0022   120   114   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Interrupted (host reset)      90%      1309         -
# 2  Short offline       Completed without error       00%      1285         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[pbearne@bearne-nas ~]$ sudo smartctl -a /dev/ada4
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD60EFRX-68MYMN1
Serial Number:    WD-WX31DC4CKJZS
LU WWN Device Id: 5 0014ee 2b67481db
Firmware Version: 82.00A82
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jun 29 11:03:28 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 3344) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 687) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   191   191   021    Pre-fail  Always       -       9416
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1309
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       33
194 Temperature_Celsius     0x0022   119   113   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1309         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
Last edited:

Paul Bearne

Dabbler
Joined
Jun 29, 2015
Messages
21
I have spotted that ada2 hasn't been tested

tried to test
Code:
smartctl -t short /dev/ada2
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Can't start self-test without aborting current test (150% remaining),
add '-t force' option to override, or run 'smartctl -X' to abort test.


and the -X didn't work

Code:
[pbearne@bearne-nas ~]$ sudo smartctl -X /dev/ada2
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Abort SMART off-line mode self-test routine".
Self-testing aborted!
[pbearne@bearne-nas ~]$ sudo smartctl -t short /dev/ada2
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Can't start self-test without aborting current test (150% remaining),
add '-t force' option to override, or run 'smartctl -X' to abort test.
 

Paul Bearne

Dabbler
Joined
Jun 29, 2015
Messages
21
Code:
 zpool status
  pool: bearne_01_storage_pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 3h14m with 0 errors on Sun Jun 14 03:14:18 2015
config:

        NAME                                            STATE     READ WRITE CKSUM
        bearne_01_storage_pool                          DEGRADED     0     0     0
          mirror-0                                      DEGRADED     0     0     0
            gptid/3cd0d0b1-f380-11e4-9ca0-d050993846a8  ONLINE       0     0     0
            gptid/3da87d6e-f380-11e4-9ca0-d050993846a8  DEGRADED     0     0  1020  too many errors
          mirror-1                                      ONLINE       0     0     0
            gptid/3e7f6ea0-f380-11e4-9ca0-d050993846a8  ONLINE       0     0     0
            gptid/3f60316b-f380-11e4-9ca0-d050993846a8  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu Jun 11 03:45:31 2015
config:

        NAME                                          STATE     READ WRITE CKSUM
        freenas-boot                                  ONLINE       0     0     0
          gptid/41aa1718-c73c-11e4-944f-d050993846a8  ONLINE       0     0     0

errors: No known data errors
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
okay - you know that
gptid/3da87d6e-f380-11e4-9ca0-d050993846a8
is the problem

in cli type

glabel status

this gives you the name, status and components

find - gptid/3da87d6e-f380-11e4-9ca0-d050993846a8 and under components it say something like daop2 or ada0p2 or ada0p1 - ignore the p2 or p1

type smartctl -i /dev/ada0 (whichever is the bad one)

this will give you the serial number and now you know which one to remove
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
this is Bidule0hm's script - edit the drives part so it suits you setup

#!/bin/sh

drives="da0 da1 da2 da3 da4 da5 da6 da7"

echo ""
echo "+========+============================================+=================+"
echo "| Device | GPTID | Serial |"
echo "+========+============================================+=================+"
for drive in $drives
do
gptid=`glabel status -s "${drive}p2" | awk '{print $1}'`
serial=`smartctl -i /dev/${drive} | grep "Serial Number" | awk '{print $3}'`
printf "| %-6s | %-42s | %-15s |\n" "$drive" "$gptid" "$serial"
echo "+--------+--------------------------------------------+-----------------+"
done
echo ""
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Nothing looks obviously wrong with your SMART data, except that you aren't running SMART self-tests regularly. You should schedule them, with a long self-test at least every couple of weeks. Don't worry about ada2; that's your boot device and flash devices don't typically support SMART self-tests.

Signs are pointing toward a cabling issue--are you sure all your SATA cables are securely fastened on both ends?
 

Paul Bearne

Dabbler
Joined
Jun 29, 2015
Messages
21
Nothing looks obviously wrong with your SMART data, except that you aren't running SMART self-tests regularly. You should schedule them, with a long self-test at least every couple of weeks. Don't worry about ada2; that's your boot device and flash devices don't typically support SMART self-tests.

Signs are pointing toward a cabling issue--are you sure all your SATA cables are securely fastened on both ends?
Yes I had just worked out that ada2 is the boot

Code:
+========+============================================+=================+
| Device | GPTID                                      | Serial          |
+========+============================================+=================+
| ada0   | gptid/3da87d6e-f380-11e4-9ca0-d050993846a8 | WD-WX21DC42EEUP |
+--------+--------------------------------------------+-----------------+
| ada1   | gptid/3e7f6ea0-f380-11e4-9ca0-d050993846a8 | WD-WX21DA4D1JHA |
+--------+--------------------------------------------+-----------------+
| ada2   | gptid/41aa1718-c73c-11e4-944f-d050993846a8 | B051448130070000027F |
+--------+--------------------------------------------+-----------------+
| ada3   | gptid/3cd0d0b1-f380-11e4-9ca0-d050993846a8 | WD-WX31DA43KK1C |
+--------+--------------------------------------------+-----------------+
| ada4   | gptid/3f60316b-f380-11e4-9ca0-d050993846a8 | WD-WX31DC4CKJZS |
+--------+--------------------------------------------+-----------------+


I will re seat the cables
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
run a smart long test you can do this from the gui or cli. More information can be found in the FreeNAS manual
 

Paul Bearne

Dabbler
Joined
Jun 29, 2015
Messages
21
3da87d6e-f380-11e4-9ca0-d050993846a8 maps to ada0

going to run a long test on that drive

Testing has begun.
Please wait 702 minutes for test to complete.
Test will complete after Mon Jun 29 23:31:14 2015
 

Blasm12

Cadet
Joined
Dec 11, 2014
Messages
7
This thread couldn't have been better timed.

If there is a cabling problem... A sata cable disconnect (or in my case a suspected bad hot swap sata bay latch)... Can A person reseat/reattach/recable the drive back to the drive pool and reboot?
Or would it be better take a clean drive and rebuild the pool?

The question: if a drive accidentally gets detached from the pool can you/should you reattach it to the pool knowing it's been out?
Or should you wipe it and rebuild the drive (assuming it checks out ok)?

(I'm running raid-z2 so with one parity drive... I turned the system off)

Basic knowledge only, appreciate responses!
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You can just add it back in and it will resilver because it might be out of sync then it's ready for action.
 

Paul Bearne

Dabbler
Joined
Jun 29, 2015
Messages
21
Code:
 sudo smartctl -a /dev/ada0

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

Password:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD60EFRX-68MYMN1
Serial Number:    WD-WX21DC42EEUP
LU WWN Device Id: 5 0014ee 261226bf5
Firmware Version: 82.00A82
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Jun 30 09:41:04 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 4784) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 702) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   205   196   021    Pre-fail  Always       -       8708
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1332
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       11
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       81
194 Temperature_Celsius     0x0022   104   103   000    Old_age   Always       -       48
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      90%      1310         -
# 2  Short offline       Completed without error       00%      1285         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Status
Not open for further replies.
Top