New Drive Resilvering Multiple Times (Write no Reads)

evlvd · Oct 27, 2015

One of my drives started giving me errors and doing small resilvers (a few KB to a few MB) so I replaced it with a new 3tb WD Red drive (ada1). The resilver completed, then the next day I noticed it was resilvering again. Strange.

I was going to replace another drive that had an increased error count, but was waiting for the resilvering to finish, but it never finishes, it just keeps going. I noticed the RED drive ada1 has just been writing, and not reading.

I'm not sure if there is something wrong with the new RED drive as I haven't seen this before.

I'm running 2 raidz1 arrays (media only) and have another RED drive arriving tomorrow, but I don't know at what point I should change the drive out.

m0nkey_ · Oct 27, 2015

Have you performed a SMART test on the drives? Could you run 'smartctl' on the degraded drives. I'd also be looking to backup your data as soon as possible, just in case your pool does decide to disappear on you.

Robert Trevellyan · Oct 28, 2015

m0nkey_ said:
I'd also be looking to backup your data as soon as possible, just in case your pool does decide to disappear on you.

+1

With RAIDZ1 vdevs and >1TB drives, your entire pool is in a precarious state.

evlvd said:
I replaced it with a new 3tb WD Red drive (ada1). The resilver completed, then the next day I noticed it was resilvering again. Strange.

Strange indeed, and maybe indicative of a different hardware problem.

Please take a look at the forum rules and post the relevant info to give folks a better chance of helping you.

evlvd · Oct 28, 2015

Thanks for the quick response. Data is backed up, looking to switch to raidz2 or 3 soon once I can migrate data.
Signature updated to show system.
Freenas is showing my pool is healthy, but still resilvering, it also looks like the drive I'm trying to replace is spitting back lots of errors.
Here are the smartctl's for both drives, RED first, then Seagate 3tb.

Code:

smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p26 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N1YE760S
LU WWN Device Id: 5 0014ee 261673553
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Oct 28 09:09:14 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (41520) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 417) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x703d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       366
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       39
194 Temperature_Celsius     0x0022   119   111   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Code:

smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p26 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-9YN166
Serial Number:    W1F0QFJR
LU WWN Device Id: 5 000c50 0529ac31b
Firmware Version: CC4B
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Oct 28 09:08:53 2015 PDT

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  575) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 332) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x3085)    SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   097   006    Pre-fail  Always       -       153647096
  3 Spin_Up_Time            0x0003   097   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       210
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   050   050   030    Pre-fail  Always       -       12210893547295
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       10381
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       56
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       516
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       2 2 6
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   062   053   045    Old_age   Always       -       38 (Min/Max 24/44)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       35
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       270
194 Temperature_Celsius     0x0022   038   047   000    Old_age   Always       -       38 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       16
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       16
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       3
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       10064h+58m+51.392s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       121203013777523
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       167662569814704

SMART Error Log Version: 1
ATA Error Count: 514 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 514 occurred at disk power-on lifetime: 10380 hours (432 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00  12d+12:13:27.723  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:27.723  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:27.723  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  12d+12:13:27.584  READ LOG EXT
  60 00 40 ff ff ff 4f 00  12d+12:13:24.761  READ FPDMA QUEUED

Error 513 occurred at disk power-on lifetime: 10380 hours (432 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00  12d+12:13:24.761  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+12:13:24.760  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:24.760  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:24.760  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:24.760  READ FPDMA QUEUED

Error 512 occurred at disk power-on lifetime: 10380 hours (432 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00  12d+12:13:21.876  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+12:13:21.876  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:21.876  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:21.876  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:21.875  READ FPDMA QUEUED

Error 511 occurred at disk power-on lifetime: 10380 hours (432 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  12d+12:13:18.855  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+12:13:18.854  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:18.842  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:18.842  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:18.842  READ FPDMA QUEUED

Error 510 occurred at disk power-on lifetime: 10380 hours (432 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 40 ff ff ff 4f 00  12d+12:13:15.844  WRITE FPDMA QUEUED
  61 00 40 ff ff ff 4f 00  12d+12:13:15.844  WRITE FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  12d+12:13:15.838  READ FPDMA QUEUED
  61 00 40 ff ff ff 4f 00  12d+12:13:15.838  WRITE FPDMA QUEUED
  61 00 10 ff ff ff 4f 00  12d+12:13:15.807  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      8357         -
# 2  Short offline       Completed without error       00%      8308         -
# 3  Short offline       Completed without error       00%      8260         -
# 4  Short offline       Completed without error       00%      8212         -
# 5  Extended offline    Completed without error       00%      8195         -
# 6  Short offline       Completed without error       00%      8164         -
# 7  Short offline       Completed without error       00%      8116         -
# 8  Short offline       Completed without error       00%      8068         -
# 9  Short offline       Completed without error       00%      8020         -
#10  Short offline       Completed without error       00%      7972         -
#11  Short offline       Completed without error       00%      7924         -
#12  Short offline       Completed without error       00%      7876         -
#13  Extended offline    Completed without error       00%      7861         -
#14  Short offline       Completed without error       00%      7828         -
#15  Short offline       Completed without error       00%      7780         -
#16  Short offline       Completed without error       00%      7732         -
#17  Short offline       Completed without error       00%      7684         -
#18  Short offline       Completed without error       00%      7636         -
#19  Short offline       Completed without error       00%      7588         -
#20  Short offline       Completed without error       00%      7540         -
#21  Short offline       Completed without error       00%      7504         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

And here is a screenshot of the zpool status again showing the resilvering is still going, making no real progress.

jgreco · Oct 28, 2015

Two drives are resilvering but you only have RAIDZ1? ... (*little puzzled*)

Is the scan progress showing any progress being made? i.e. run it every few minutes and look for changes in the amount completed.

Robert Trevellyan · Oct 28, 2015

The WD Red looks OK but the Seagate looks a bit sick. Unfortunately the ST3000DM001 has a history of early failure. This thread came up when I Googled "RAIDZ1 two drives resilvering" and it's the exact same revision (9YN166).

I too am puzzled by your zpool status. I'm starting to wonder if there's an edge case bug involved.

evlvd said:
resilvering is still going, making no real progress

If a disk is failing, as seems likely, resilvering could take a very long time, and may never actually complete.

evlvd said:
Data is backed up, looking to switch to raidz2 or 3 soon once I can migrate data.

Might make sense to start on this project now rather than sinking more time into the current problem.

evlvd · Oct 28, 2015

jgreco said:
Two drives are resilvering but you only have RAIDZ1? ... (*little puzzled*)

I figured they were probably resilvering different pieces of data blocks, and could continue on grabbing data from each other in different places. Plus, the RED drive should be complete, so I don't know why it is resilvering.

jgreco said:
Is the scan progress showing any progress being made? i.e. run it every few minutes and look for changes in the amount completed.

It does progress fairly quickly (as quickly as my occasional resilvers took before I replaced a drive) but I never see it finish.

I don't know if it is related, but I can't open the plugins -> installed page on the webgui, and one of my plugins doesn't load my settings each time.
All my other data is intact and runs fine, performance is maxed out over cifs. The only lost file is an old history revision from October 1st.

Robert Trevellyan said:
Might make sense to start on this project now rather than sinking more time into the current problem.

I'd love to, but need to figure out what setup raidz2 or raidz3 I'd like, and need drives to migrate data to. Even though data is backed up, it's offsite and I'd like to move everything at once.
I was hoping I could just swap out this failing drive for my new RED drive, then swap out the rest next. Now I don't know if I can replace the drive while it is still resilvering.

jgreco · Oct 28, 2015

It just slows down when it doesn't finish? If so, the pool might be damaged. If so, back up the pool and start over.

Robert Trevellyan · Oct 28, 2015

evlvd said:
I figured they were probably resilvering different pieces of data blocks, and could continue on grabbing data from each other in different places.

Intriguing hypothesis.

evlvd said:
I don't know if I can replace the drive while it is still resilvering.

Seems like asking for trouble to me.

evlvd · Oct 28, 2015

jgreco said:
It just slows down when it doesn't finish? If so, the pool might be damaged. If so, back up the pool and start over.

It never slows down. Each morning I check it and it is still resilvering, but it always says resilvering since 10/28 (or that morning). So perhaps it is finishing, but then starts again quickly after.
It never freezes, and the progress time always decreases, but is usually around 10-14 hours when I check it.

What I'm mostly concerned about is this new RED drive only writing data, and not reading anything while it is resilvering, as if none of the data written is taking and it just keeps starting over.

Here is the zpool status taken just now, showing the progress and speed:

Code:

[root@freenas] ~# zpool status
  pool: Media
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Oct 28 15:22:49 2015
        2.28T scanned out of 16.5T at 268M/s, 15h26m to go
        467G resilvered, 13.85% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        Media                                           ONLINE       0     0     8
          raidz1-0                                      ONLINE       0     0    16
            gptid/b2487721-696e-11e4-8228-0cc47a40699d  ONLINE       0     0     0
            gptid/3de3a420-712a-11e5-9363-0cc47a40699d  ONLINE       0     0     0  (resilvering)
            gptid/b36c7fb6-696e-11e4-8228-0cc47a40699d  ONLINE       0     0     0
            gptid/b4636191-696e-11e4-8228-0cc47a40699d  ONLINE       8     0     0  (resilvering)
            gptid/b5485367-696e-11e4-8228-0cc47a40699d  ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            gptid/146e9344-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0
            gptid/14d76f68-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0
            gptid/152f5a34-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0
            gptid/158bd979-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0
            gptid/15f20b06-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0

Robert Trevellyan · Oct 28, 2015

jgreco said:
the pool might be damaged

+1

jgreco · Oct 28, 2015

I think your pool's busted, but I haven't seen enough failures of this sort to know for sure. I definitely strongly emphatically suggest migrating the data elsewhere immediately or even sooner.

evlvd · Oct 28, 2015

jgreco said:
I think your pool's busted, but I haven't seen enough failures of this sort to know for sure. I definitely strongly emphatically suggest migrating the data elsewhere immediately or even sooner.

Will buy some drives tonight so I can rearrange.

Thanks everyone for the input. I'm still not sure what to think about the new RED drive that isn't reading.

jgreco · Oct 29, 2015

evlvd said:
Will buy some drives tonight so I can rearrange.

Thanks everyone for the input. I'm still not sure what to think about the new RED drive that isn't reading.

We're thinking that it's reading just fine but that the pool's corrupt or something like that.

cyberjock · Oct 29, 2015

evlvd said:

Code:

[root@freenas] ~# zpool status
  pool: Media
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Oct 28 15:22:49 2015
        2.28T scanned out of 16.5T at 268M/s, 15h26m to go
        467G resilvered, 13.85% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        Media                                           ONLINE       0     0     8 <---- Very bad
          raidz1-0                                      ONLINE       0     0    16 <---- Very bad
            gptid/b2487721-696e-11e4-8228-0cc47a40699d  ONLINE       0     0     0
            gptid/3de3a420-712a-11e5-9363-0cc47a40699d  ONLINE       0     0     0  (resilvering)
            gptid/b36c7fb6-696e-11e4-8228-0cc47a40699d  ONLINE       0     0     0
            gptid/b4636191-696e-11e4-8228-0cc47a40699d  ONLINE       8     0     0  (resilvering)
            gptid/b5485367-696e-11e4-8228-0cc47a40699d  ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            gptid/146e9344-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0
            gptid/14d76f68-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0
            gptid/152f5a34-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0
            gptid/158bd979-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0
            gptid/15f20b06-080f-11e5-b77a-0cc47a40699d  ONLINE       0     0     0

See the "Very bad" I listed above. Your zpool *is* damaged. You need to destroy and recreate it from backups.

Important Announcement for the TrueNAS Community.

New Drive Resilvering Multiple Times (Write no Reads)

evlvd

Dabbler

m0nkey_

MVP

Robert Trevellyan

Pony Wrangler

evlvd

Dabbler

jgreco

Resident Grinch

Robert Trevellyan

Pony Wrangler

evlvd

Dabbler

jgreco

Resident Grinch

Robert Trevellyan

Pony Wrangler

evlvd

Dabbler

Robert Trevellyan

Pony Wrangler

jgreco

Resident Grinch

evlvd

Dabbler

jgreco

Resident Grinch

cyberjock

Inactive Account

Similar threads

Important Announcement for the TrueNAS Community.

New Drive Resilvering Multiple Times (Write no Reads)

Dabbler

MVP

Pony Wrangler

Dabbler

Resident Grinch

Pony Wrangler

Dabbler

Resident Grinch

Pony Wrangler

Dabbler

Pony Wrangler

Resident Grinch

Dabbler

Resident Grinch

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "New Drive Resilvering Multiple Times (Write no Reads)"

Similar threads