Replaced dead drive in RAIDZ1, now what?

Status
Not open for further replies.

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
Folks, I have a 5-disk RAIDZ1 (all 2TB SATA drives), and one drive died. This is my first experience with this. I replaced the drive with a new one, and FreeNAS recognized the new drive (all 5 drives show up correctly under "Storage -> Volumes -> View Disks", whereas previously only the 4 functioning drives were visible before).

When I went to view status, I clicked "Replace" on the dead volume, selected the new drive (the only choice), and clicked OK. Nothing more appeared to happen. I went to the command line and entered "zpool status", and saw that the pool had started a scrub, which would end up taking about 12 hours. The scrub is now complete, but I have no other change.

I now have:
  • one pool (state "DEGRADED")
  • 4 existing volumes (named "ada0p2", "ada1p2", "ada2p2", and "ada4p2") whose state is "ONLINE" (the missing "ada3p2" is the dead drive)
  • 1 new volume (named "gptid/dfdc9-ef57-11e2-bbc4-001cc05665fb"), state "UNAVAIL", and an additional comment of "cannot open".
Have I missed an obvious step? Does the new drive have to be "formatted" first? I didn't see any steps to this effect in any of the documentation. Or (hopefully not), is there a problem with the drive.
Thanks in advance.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You did. Go to the manual and look at the section regarding replacing a failed disk. It's also possible that your new disk has failed or another disk just failed.
 

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
You did. Go to the manual and look at the section regarding replacing a failed disk. See what you did wrong.

OK, I've looked again at Section "6.3.11 Replacing a Failed Drive". I still don't know what to do.

From the manual:

1) This step in irrelevant: the volume in question was already offline.
2) I did shut down the machine and replaced the failed disk.
3) I clicked the "Replace Disk" button after selecting the new drive (verified from the "Storage -> Volumes -> View Disks" screen). Then comes this from the manual: "If the disk is being added to a ZFS pool, it will start to resilver." Obviously, this is what I want, but it did not happen.
4) I have not yet got to this step.

So, sorry if I'm asking the same question again, but what did I miss? Your response implies it was simple and obvious (judging from your RTFM non-reply), but I'm sorry, I don't see it.

It's also possible that your new disk has failed or another disk just failed.

As my first post shows, the rest of the original disks are fine. Yes, the new drive may be bad, and I'm willing to pull it back out to verify it on another computer if need be, but I just wanted to know if I had just missed something simple, first.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
At step 3 you said the resilver didn't happen. But at the first post you said that a scrub started. A scrub is a resilver, except that with a resilver the expectation is that you are restoring a disk.

So did the scrub/resilver happen or not? Now I'm confused.... LOL
 

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
At step 3 you said the resilver didn't happen. But at the first post you said that a scrub started. A scrub is a resilver, except that with a resilver the expectation is that you are restoring a disk.

So did the scrub/resilver happen or not? Now I'm confused.... LOL

That's correct: a 12-hour scrub started after I hit the "Replace Disk" disk button. It finished, apparently without issue. However, I'm still left with a new volume that "zpool status" lists as "UNAVAIL".

Am I supposed to "resilver" manually: if so, how?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The resilver happens automatically when you add the drive.

So are you SURE you aren't on step 4? Most people that ask your question haven't done step 4.

Can you post the output of zpool status and put it in code tags? The code tags will protect the formatting which is important.
 

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
Here you go...

Code:
[root@freenas8] ~# zpool status
  pool: FREENAS-8
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
  see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: scrub completed after 12h26m with 0 errors on Thu Jul 18 08:36:20 2013
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        FREENAS-8                                      DEGRADED    0    0    0
          raidz1                                        DEGRADED    0    0    0
            ada0p2                                      ONLINE      0    0    0
            ada1p2                                      ONLINE      0    0    0
            ada2p2                                      ONLINE      0    0    0
            gptid/772dfdc9-ef57-11e2-bbc4-001cc05665fb  UNAVAIL      0    0    0  cannot open
            ada4p2                                      ONLINE      0    0    0
 
errors: No known data errors


Here's what I see in the "Volume Status" page in the GUI.
freenas.png


Hope you can help, thanks.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
To be honest, it looks like another disk has failed....

Post the output of smartctl -a /dev/ada3 in code please. :)
 

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
Here you go...

Code:
[root@freenas8] ~# smartctl -a /dev/ada3
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
 
=== START OF INFORMATION SECTION ===
Device Model:    WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC1T2982309
LU WWN Device Id: 5 0014ee 65891ef97
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:    512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  8
ATA Standard is:  ACS-2 (revision not indicated)
Local Time is:    Thu Jul 18 17:44:46 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (26940) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (  2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (  5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  100  253  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0027  175  175  021    Pre-fail  Always      -      6241
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      8
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  100  253  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      21
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      8
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      7
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      0
194 Temperature_Celsius    0x0022  113  111  000    Old_age  Always      -      37
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  100  253  000    Old_age  Offline      -      0
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
 
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


This is a new drive, a WD Caviar Red, taken out of the bag about 24 hours ago. By comparison, here is one of the other drives (one of the four Caviar Greens: I couldn't buy Caviar Reds 16 months ago).

Code:
[root@freenas8] ~# smartctl -a /dev/ada4
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
 
=== START OF INFORMATION SECTION ===
Model Family:    Western Digital Caviar Green (Adv. Format)
Device Model:    WDC WD20EARS-00MVWB0
Serial Number:    WD-WMAZA3180257
LU WWN Device Id: 5 0014ee 656108046
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Jul 18 17:48:48 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (37200) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (  2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (  5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0027  253  166  021    Pre-fail  Always      -      1216
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      696
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  077  077  000    Old_age  Always      -      17494
10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      79
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      58
193 Load_Cycle_Count        0x0032  091  091  000    Old_age  Always      -      329959
194 Temperature_Celsius    0x0022  117  109  000    Old_age  Always      -      33
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  180  000    Old_age  Always      -      6422
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
 
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


The biggest differences appear to be:
  • smartctl database
    • old drives are there, new is not
  • offline data collection status
    • new drive - Auto offline data collection disabled
    • old drives - Auto offline data collection enabled
  • SCT capabilities
    • new drive supports Error Recovery Control, old drives do not
Does any of this tell you anything?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The smartctl database is created on bootup(and I think if you cycle the SMART service in the UI).
The offline data collection won't work if its not in the smartctl database.
The SCT capabilities shouldn't matter.

Have you tried an old fashioned reboot? Everything actually looks fine....

ada4 has alot of UDMA CRC errors, but those wouldn't cause you any problems. Kind of perplexed...
 

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
Hmm, reboot! Why didn't I think of that?

Here's "zpool status" after the reboot:
Code:
[root@freenas8] ~# zpool status
  pool: FREENAS-8
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        FREENAS-8                                      ONLINE      0    0    0
          raidz1                                        ONLINE      0    0    0
            ada0p2                                      ONLINE      0    0    0
            ada1p2                                      ONLINE      0    0    0
            ada2p2                                      ONLINE      0    0    0
            gptid/772dfdc9-ef57-11e2-bbc4-001cc05665fb  ONLINE      0    0    12
            ada4p2                                      ONLINE      0    0    0
 
errors: No known data errors


So, does that mean that the RAID is already re-established? Could any one of my drives crap out on me now, and I'll still be safe?

I see the 12 Checksum errors. Sure there's nothing to be concerned about?

Also, why does "zpool status" refer to the new drive as "gptid/772dfdc9-ef57-11e2-bbc4-001cc05665fb", while the Volume Status page in the GUI refer to it as "ada3p2"? Any reason for worry?

Thanks again for your help.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Here's my pool...
Code:
  pool: tank
state: ONLINE
  scan: scrub repaired 0 in 45h1m with 0 errors on Wed Jul 17 15:52:57 2013
config:
 
    NAME                                            STATE    READ WRITE CKSUM
    tank                                            ONLINE      0    0    0
      raidz3-0                                      ONLINE      0    0    0
        gptid/6fbb91d5-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/70448fd2-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/70c0c7b3-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/713de0d5-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/71e3eea1-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/728458d2-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/7326aebc-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/73c64f27-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/7468c69a-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/75045f96-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/75a0096a-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/8dd1d140-ca02-11e2-bdf7-0015171496ae  ONLINE      0    0    0
        gptid/76d701fa-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/77759c5c-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/78190bd3-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/78bb9173-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/795a7052-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
        gptid/79fbc7b0-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
 
errors: No known data errors


Your zpool was setup by MBR partition(I believe anyway.. anyone is welcome to correct me).

GPT is the current "recommended" way.

Those checksum errors are something to be concerned about. Lemme look at the thread again...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Post the output of gpart list.

If you look at the gpart list you'll see that one of the drives has a rawuuid that matches that gptid from your zpool status. Post the output of smartctl -a /dev/daXX substituting that disk unless you provided it above.

Edit: Your data is safe. Don't do anything crazy that might change this fact. :P
 

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
Hey, thanks again. I trust you on the gptid naming.

However, I don't know if this is worthy of a different thread or not, but the checksum errors on the new drive are escalating:

Code:
[root@freenas8] ~# zpool status
  pool: FREENAS-8
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        FREENAS-8                                      ONLINE      0    0    0
          raidz1                                        ONLINE      0    0    0
            ada0p2                                      ONLINE      0    0    0
            ada1p2                                      ONLINE      0    0    0
            ada2p2                                      ONLINE      0    0    0
            gptid/772dfdc9-ef57-11e2-bbc4-001cc05665fb  ONLINE      0    0    94
            ada4p2                                      ONLINE      0    0    0
 
errors: No known data errors


How concerned should I be? Should I try to RMA the drive?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Look at my last thread.. can you post the output of that drive or tell me its the same. I'd assume its ada3 but those can change so compare serial numbers on the output of smartctl.
 

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
Code:
[root@freenas8] ~# smartctl -a /dev/ada3
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
 
=== START OF INFORMATION SECTION ===
Device Model:    WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC1T2982309
LU WWN Device Id: 5 0014ee 65891ef97
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:    512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  8
ATA Standard is:  ACS-2 (revision not indicated)
Local Time is:    Sat Jul 20 07:57:34 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (26940) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (  2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (  5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  100  253  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0027  175  175  021    Pre-fail  Always      -      6241
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      8
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  100  253  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      59
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      8
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      7
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      0
194 Temperature_Celsius    0x0022  112  111  000    Old_age  Always      -      38
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  100  253  000    Old_age  Offline      -      0
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
 
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Yes, "ada3" is definitely the new WD Caviar Red drive. And checksum errors are now up to 104.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'd do a scrub of the pool. Checksum errors mean the data on that disk is not matching the pool. That disk is "out of sync" with the rest of your disks.

You can start a scrub with the command zpool scrub poolname.
 

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
Ok, here you go....

Code:
[root@freenas8] ~# zpool scrub FREENAS-8

This completed in less than a second.
Code:
[root@freenas8] ~# zpool status
  pool: FREENAS-8
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver completed after 0h0m with 0 errors on Sat Jul 20 14:56:03 2013
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        FREENAS-8                                      ONLINE      0    0    0
          raidz1                                        ONLINE      0    0    0
            ada0p2                                      ONLINE      0    0    0
            ada1p2                                      ONLINE      0    0    0
            ada2p2                                      ONLINE      0    0    0
            gptid/772dfdc9-ef57-11e2-bbc4-001cc05665fb  ONLINE      0    0  104  1.41M resilvered
            ada4p2                                      ONLINE      0    0    0
 
errors: No known data errors


So no apparent difference, other than the message "1.41M resilvered" appended to the new drive. On the plus side, where the checksum errors had been increasing, the value has now been unchanged at 104. Of course, that could also be attributable to the fact I haven't used any data off the NAS since the previous time I checked :tongue:

Any advice? Think the drive is alright?
 

purduephotog

Explorer
Joined
Jan 14, 2013
Messages
73
Whenever I have mysterious errors or checksum problems with drives (and I am not talking about freenas) I replace the sata cable. I don't know if you have a spare, but Id try it.
 

darrenbest

Dabbler
Joined
Apr 7, 2012
Messages
33
I think it'd be a pretty big coincidence if the lone dead hard drive (confirmed dead, BTW, couldn't even be recognized in another machine with a different cabling and hardware) was also saddled with a faulty SATA cable.

I suppose that stranger things have happened, though. And, I also have no idea if the old dead drive was throwing checksum errors *before* it died, so who knows, you may be on the right track. Won't hurt to try it in any case.
 
Status
Not open for further replies.
Top