Recover

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
Code:
SMART Extended Self-test Log Version: 1 (1 sectors)

No self-tests have been logged.  [To run self-tests, use: smartctl -t]
You have not been running SMART tests on any sort of schedule. It is a good idea to run a short test daily and a long test weekly. This allows the system to find errors before the errors are so bad that something like this happens.
Code:
gptid/6a45ab6d-1140-11e6-980e-d0509913e8a9     N/A  ada0p2
gptid/695ab06c-1140-11e6-980e-d0509913e8a9     N/A  ada1p2
These are the two drives that are not showing as being in the pool and they are also the two drives that look to have errors based on the smartctl output.
I agree with @SweetAndLow that we need to run tests on these drives. The pool being encrypted limits your options.
 
Joined
May 10, 2017
Messages
838
OP should run the SMART tests, but the IDNF at LBA errors usually aren't a disk problem, they are also from long time ago, so likely unrelated to current issues.
 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
Here is what smartctl -x /dav/adax returns after running smartctl -t long /dev/adax.

Four post coming up. One for each drive...
 
Last edited:

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
Code:
root@freenas:~ # smartctl -x /dev/ada0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N6TYTX45
LU WWN Device Id: 5 0014ee 20af897c8
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 17 22:40:11 2019 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (39840) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 399) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x703d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   178   177   021    -    6066
  4 Start_Stop_Count        -O--CK   100   100   000    -    141
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   071   071   000    -    21564
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    141
192 Power-Off_Retract_Count -O--CK   200   200   000    -    69
193 Load_Cycle_Count        -O--CK   200   200   000    -    388
194 Temperature_Celsius     -O---K   119   103   000    -    31
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 1
    CR     = Command Register
    FEATR  = Features Register
    COUNT  = Count (was: Sector Count) Register
    LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    LH     = LBA High (was: Cylinder High) Register    ]   LBA
    LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    LL     = LBA Low (was: Sector Number) Register     ]
    DV     = Device (was: Device/Head) Register
    DC     = Device Control Register
    ER     = Error register
    ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 [0] occurred at disk power-on lifetime: 14122 hours (588 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 4e 40 88 60 40 00  Error: IDNF at LBA = 0x14e408860 = 5607819360

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 08 00 60 00 01 4e 40 88 60 40 08     00:21:42.613  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 00 00 00 00 00 40 08     00:21:37.733  FLUSH CACHE EXT
  61 00 08 00 50 00 01 5d 50 a2 70 40 08     00:21:37.733  WRITE FPDMA QUEUED
  61 00 08 00 48 00 01 5d 50 a0 70 40 08     00:21:37.733  WRITE FPDMA QUEUED
  61 00 08 00 40 00 00 00 40 04 70 40 08     00:21:37.733  WRITE FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     21561         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    31 Celsius
Power Cycle Min/Max Temperature:     25/40 Celsius
Lifetime    Min/Max Temperature:      2/47 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (23)

Index    Estimated Time   Temperature Celsius
  24    2019-04-17 14:43    40  *********************
 ...    ..(112 skipped).    ..  *********************
 137    2019-04-17 16:36    40  *********************
 138    2019-04-17 16:37    39  ********************
 ...    ..( 50 skipped).    ..  ********************
 189    2019-04-17 17:28    39  ********************
 190    2019-04-17 17:29    38  *******************
 ...    ..( 34 skipped).    ..  *******************
 225    2019-04-17 18:04    38  *******************
 226    2019-04-17 18:05    37  ******************
 ...    ..( 95 skipped).    ..  ******************
 322    2019-04-17 19:41    37  ******************
 323    2019-04-17 19:42    38  *******************
 324    2019-04-17 19:43    38  *******************
 325    2019-04-17 19:44    37  ******************
 ...    ..(  4 skipped).    ..  ******************
 330    2019-04-17 19:49    37  ******************
 331    2019-04-17 19:50    36  *****************
 332    2019-04-17 19:51    37  ******************
 333    2019-04-17 19:52    36  *****************
 ...    ..(  4 skipped).    ..  *****************
 338    2019-04-17 19:57    36  *****************
 339    2019-04-17 19:58    35  ****************
 ...    ..(  6 skipped).    ..  ****************
 346    2019-04-17 20:05    35  ****************
 347    2019-04-17 20:06    34  ***************
 ...    ..(  7 skipped).    ..  ***************
 355    2019-04-17 20:14    34  ***************
 356    2019-04-17 20:15    33  **************
 ...    ..( 13 skipped).    ..  **************
 370    2019-04-17 20:29    33  **************
 371    2019-04-17 20:30    32  *************
 ...    ..(113 skipped).    ..  *************
   7    2019-04-17 22:24    32  *************
   8    2019-04-17 22:25    31  ************
 ...    ..(  8 skipped).    ..  ************
  17    2019-04-17 22:34    31  ************
  18    2019-04-17 22:35    40  *********************
 ...    ..(  4 skipped).    ..  *********************
  23    2019-04-17 22:40    40  *********************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            3  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4        38656  Vendor specific
 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
Code:
root@freenas:~ # smartctl -x /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N0SN0YZP
LU WWN Device Id: 5 0014ee 2604dd258
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 17 22:43:00 2019 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (39540) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 397) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x703d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    50
  3 Spin_Up_Time            POS--K   180   177   021    -    5958
  4 Start_Stop_Count        -O--CK   100   100   000    -    146
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   071   071   000    -    21564
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    146
192 Power-Off_Retract_Count -O--CK   200   200   000    -    74
193 Load_Cycle_Count        -O--CK   200   200   000    -    439
194 Temperature_Celsius     -O---K   118   102   000    -    32
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    6
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 1
    CR     = Command Register
    FEATR  = Features Register
    COUNT  = Count (was: Sector Count) Register
    LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    LH     = LBA High (was: Cylinder High) Register    ]   LBA
    LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    LL     = LBA Low (was: Sector Number) Register     ]
    DV     = Device (was: Device/Head) Register
    DC     = Device Control Register
    ER     = Error register
    ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 [0] occurred at disk power-on lifetime: 14122 hours (588 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 4e 40 88 60 40 00  Error: IDNF at LBA = 0x14e408860 = 5607819360

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 08 00 c8 00 01 4e 40 88 60 40 08     00:21:42.862  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 00 00 00 00 00 40 08     00:21:37.982  FLUSH CACHE EXT
  61 00 08 00 b8 00 01 5d 50 a2 70 40 08     00:21:37.981  WRITE FPDMA QUEUED
  61 00 08 00 b0 00 01 5d 50 a0 70 40 08     00:21:37.981  WRITE FPDMA QUEUED
  61 00 08 00 a8 00 00 00 40 04 70 40 08     00:21:37.981  WRITE FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     21561         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    32 Celsius
Power Cycle Min/Max Temperature:     25/41 Celsius
Lifetime    Min/Max Temperature:      2/48 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (31)

Index    Estimated Time   Temperature Celsius
  32    2019-04-17 14:46    41  **********************
 ...    ..(102 skipped).    ..  **********************
 135    2019-04-17 16:29    41  **********************
 136    2019-04-17 16:30    40  *********************
 ...    ..( 51 skipped).    ..  *********************
 188    2019-04-17 17:22    40  *********************
 189    2019-04-17 17:23    39  ********************
 ...    ..( 26 skipped).    ..  ********************
 216    2019-04-17 17:50    39  ********************
 217    2019-04-17 17:51    38  *******************
 ...    ..( 52 skipped).    ..  *******************
 270    2019-04-17 18:44    38  *******************
 271    2019-04-17 18:45    37  ******************
 ...    ..( 51 skipped).    ..  ******************
 323    2019-04-17 19:37    37  ******************
 324    2019-04-17 19:38    38  *******************
 325    2019-04-17 19:39    38  *******************
 326    2019-04-17 19:40    38  *******************
 327    2019-04-17 19:41    39  ********************
 328    2019-04-17 19:42    38  *******************
 329    2019-04-17 19:43    38  *******************
 330    2019-04-17 19:44    38  *******************
 331    2019-04-17 19:45    37  ******************
 ...    ..( 11 skipped).    ..  ******************
 343    2019-04-17 19:57    37  ******************
 344    2019-04-17 19:58    36  *****************
 ...    ..(  6 skipped).    ..  *****************
 351    2019-04-17 20:05    36  *****************
 352    2019-04-17 20:06    35  ****************
 ...    ..(  5 skipped).    ..  ****************
 358    2019-04-17 20:12    35  ****************
 359    2019-04-17 20:13    34  ***************
 ...    ..( 10 skipped).    ..  ***************
 370    2019-04-17 20:24    34  ***************
 371    2019-04-17 20:25    33  **************
 ...    ..( 19 skipped).    ..  **************
 391    2019-04-17 20:45    33  **************
 392    2019-04-17 20:46    32  *************
 ...    ..(  3 skipped).    ..  *************
 396    2019-04-17 20:50    32  *************
 397    2019-04-17 20:51    33  **************
 ...    ..(  9 skipped).    ..  **************
 407    2019-04-17 21:01    33  **************
 408    2019-04-17 21:02    32  *************
 ...    ..( 94 skipped).    ..  *************
  25    2019-04-17 22:37    32  *************
  26    2019-04-17 22:38    41  **********************
 ...    ..(  4 skipped).    ..  **********************
  31    2019-04-17 22:43    41  **********************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            3  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4        38825  Vendor specific

 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
Code:
root@freenas:~ # smartctl -x /dev/ada2
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N0SN0VL1
LU WWN Device Id: 5 0014ee 2604dfade
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 17 22:43:23 2019 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (40980) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 411) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x703d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    1
  3 Spin_Up_Time            POS--K   177   175   021    -    6116
  4 Start_Stop_Count        -O--CK   100   100   000    -    138
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   071   071   000    -    21564
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    138
192 Power-Off_Retract_Count -O--CK   200   200   000    -    66
193 Load_Cycle_Count        -O--CK   200   200   000    -    386
194 Temperature_Celsius     -O---K   118   101   000    -    32
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    1
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     21562         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    32 Celsius
Power Cycle Min/Max Temperature:     25/41 Celsius
Lifetime    Min/Max Temperature:      2/49 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (41)

Index    Estimated Time   Temperature Celsius
  42    2019-04-17 14:46    41  **********************
 ...    ..( 21 skipped).    ..  **********************
  64    2019-04-17 15:08    41  **********************
  65    2019-04-17 15:09    40  *********************
 ...    ..(118 skipped).    ..  *********************
 184    2019-04-17 17:08    40  *********************
 185    2019-04-17 17:09    39  ********************
 ...    ..( 26 skipped).    ..  ********************
 212    2019-04-17 17:36    39  ********************
 213    2019-04-17 17:37    38  *******************
 ...    ..( 53 skipped).    ..  *******************
 267    2019-04-17 18:31    38  *******************
 268    2019-04-17 18:32    37  ******************
 ...    ..( 82 skipped).    ..  ******************
 351    2019-04-17 19:55    37  ******************
 352    2019-04-17 19:56    38  *******************
 353    2019-04-17 19:57    38  *******************
 354    2019-04-17 19:58    38  *******************
 355    2019-04-17 19:59    37  ******************
 ...    ..(  4 skipped).    ..  ******************
 360    2019-04-17 20:04    37  ******************
 361    2019-04-17 20:05    36  *****************
 ...    ..(  3 skipped).    ..  *****************
 365    2019-04-17 20:09    36  *****************
 366    2019-04-17 20:10    35  ****************
 ...    ..(  5 skipped).    ..  ****************
 372    2019-04-17 20:16    35  ****************
 373    2019-04-17 20:17    34  ***************
 ...    ..(  8 skipped).    ..  ***************
 382    2019-04-17 20:26    34  ***************
 383    2019-04-17 20:27    33  **************
 ...    ..( 18 skipped).    ..  **************
 402    2019-04-17 20:46    33  **************
 403    2019-04-17 20:47    32  *************
 404    2019-04-17 20:48    32  *************
 405    2019-04-17 20:49    32  *************
 406    2019-04-17 20:50    33  **************
 ...    ..(  9 skipped).    ..  **************
 416    2019-04-17 21:00    33  **************
 417    2019-04-17 21:01    32  *************
 ...    ..( 95 skipped).    ..  *************
  35    2019-04-17 22:37    32  *************
  36    2019-04-17 22:38    41  **********************
 ...    ..(  4 skipped).    ..  **********************
  41    2019-04-17 22:43    41  **********************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4        38848  Vendor specific
 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
Code:
root@freenas:~ # smartctl -x /dev/ada3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4NLPERC39
LU WWN Device Id: 5 0014ee 2b58c09b4
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 17 22:43:36 2019 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (40980) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 411) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x703d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   179   178   021    -    6033
  4 Start_Stop_Count        -O--CK   100   100   000    -    137
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   071   071   000    -    21564
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    137
192 Power-Off_Retract_Count -O--CK   200   200   000    -    67
193 Load_Cycle_Count        -O--CK   200   200   000    -    387
194 Temperature_Celsius     -O---K   119   103   000    -    31
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     21562         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    31 Celsius
Power Cycle Min/Max Temperature:     24/39 Celsius
Lifetime    Min/Max Temperature:      2/47 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (42)

Index    Estimated Time   Temperature Celsius
  43    2019-04-17 14:46    39  ********************
 ...    ..(108 skipped).    ..  ********************
 152    2019-04-17 16:35    39  ********************
 153    2019-04-17 16:36    38  *******************
 ...    ..( 48 skipped).    ..  *******************
 202    2019-04-17 17:25    38  *******************
 203    2019-04-17 17:26    37  ******************
 ...    ..(145 skipped).    ..  ******************
 349    2019-04-17 19:52    37  ******************
 350    2019-04-17 19:53    36  *****************
 351    2019-04-17 19:54    36  *****************
 352    2019-04-17 19:55    37  ******************
 ...    ..(  5 skipped).    ..  ******************
 358    2019-04-17 20:01    37  ******************
 359    2019-04-17 20:02    36  *****************
 ...    ..(  2 skipped).    ..  *****************
 362    2019-04-17 20:05    36  *****************
 363    2019-04-17 20:06    35  ****************
 ...    ..(  3 skipped).    ..  ****************
 367    2019-04-17 20:10    35  ****************
 368    2019-04-17 20:11    34  ***************
 ...    ..(  5 skipped).    ..  ***************
 374    2019-04-17 20:17    34  ***************
 375    2019-04-17 20:18    33  **************
 ...    ..(  7 skipped).    ..  **************
 383    2019-04-17 20:26    33  **************
 384    2019-04-17 20:27    32  *************
 ...    ..( 67 skipped).    ..  *************
 452    2019-04-17 21:35    32  *************
 453    2019-04-17 21:36    31  ************
 ...    ..( 12 skipped).    ..  ************
 466    2019-04-17 21:49    31  ************
 467    2019-04-17 21:50    32  *************
 ...    ..( 13 skipped).    ..  *************
   3    2019-04-17 22:04    32  *************
   4    2019-04-17 22:05    31  ************
 ...    ..( 31 skipped).    ..  ************
  36    2019-04-17 22:37    31  ************
  37    2019-04-17 22:38    39  ********************
 ...    ..(  4 skipped).    ..  ********************
  42    2019-04-17 22:43    39  ********************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4        38861  Vendor specific
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Passing smart tests that's good. Means they are not broken enough. So now it's a matter of figuring out why those 2 drives do not show up in your pool. Maybe they didn't get unlocked maybe they got the partitions deleted. Just ideas
 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
Maybe they didn't get unlocked maybe they got the partitions deleted.

@SweetAndLow - Thanks for your reply and pointers.

Right now, I am trying to figure out my next steps towards recovering the pool. I am starting with checking into getting the two drives unlocked. If that does not work, then I will looking into partitions being missing.

After searching on-line, I came across this iX Community forum post.

Here is what I get when I list the contents of /dev/gptid

Code:
root@freenas:/dev/gptid # ls
695ab06c-1140-11e6-980e-d0509913e8a9        6c49e955-1140-11e6-980e-d0509913e8a9
6a45ab6d-1140-11e6-980e-d0509913e8a9        6c49e955-1140-11e6-980e-d0509913e8a9.eli
6b64892f-1140-11e6-980e-d0509913e8a9        8f3569c9-7489-11e5-8705-d0509913e8a9
6b64892f-1140-11e6-980e-d0509913e8a9.eli


The '.eli' files are pointing to the same device as the ones we are missing.

Since there is data in this pool that like to recover, there are several things that I have questions about...

  1. I understand that the user mentioned in a earlier post was using a different RAID configuration - RAID-Z2 vdevs. Since I am using a different configuration (from what I understand 'stiped mirrored vdevs') will that make any difference in the approach?
  2. I have shutdown my NAS to move it to a more accessible location to work on it and check hardware. Was there a necessary key in /tmp/geli.key? If so, that would be gone, correct?
  3. In his solution, he mentions his drives resilvering. Since my pool is not just degraded as his was was, but is 'state unknown', will I needed to avoid resilvering to avoid losing data? If so, is there a way to unlock the drives without needing to resilver?
Or to put it into one question, "Is there a good reference on how to unlock drives?"

Thanks again for all of y'all's patience and help.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
@SweetAndLow - Thanks for your reply and pointers.

Right now, I am trying to figure out my next steps towards recovering the pool. I am starting with checking into getting the two drives unlocked. If that does not work, then I will looking into partitions being missing.

After searching on-line, I came across this iX Community forum post.

Here is what I get when I list the contents of /dev/gptid

Code:
root@freenas:/dev/gptid # ls
695ab06c-1140-11e6-980e-d0509913e8a9        6c49e955-1140-11e6-980e-d0509913e8a9
6a45ab6d-1140-11e6-980e-d0509913e8a9        6c49e955-1140-11e6-980e-d0509913e8a9.eli
6b64892f-1140-11e6-980e-d0509913e8a9        8f3569c9-7489-11e5-8705-d0509913e8a9
6b64892f-1140-11e6-980e-d0509913e8a9.eli


The '.eli' files are pointing to the same device as the ones we are missing.

Since there is data in this pool that like to recover, there are several things that I have questions about...

  1. I understand that the user mentioned in a earlier post was using a different RAID configuration - RAID-Z2 vdevs. Since I am using a different configuration (from what I understand 'stiped mirrored vdevs') will that make any difference in the approach?
  2. I have shutdown my NAS to move it to a more accessible location to work on it and check hardware. Was there a necessary key in /tmp/geli.key? If so, that would be gone, correct?
  3. In his solution, he mentions his drives resilvering. Since my pool is not just degraded as his was was, but is 'state unknown', will I needed to avoid resilvering to avoid losing data? If so, is there a way to unlock the drives without needing to resilver?
Or to put it into one question, "Is there a good reference on how to unlock drives?"

Thanks again for all of y'all's patience and help.
Well normally drives are just unlocked all at once when you unlock them in the GUI.

There are ways to unlock each drive individually but I don't know how to do that and it's not the standard operations.
 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
it's not the standard operations

Got it. Since I am in a recovery mode, I understand that we are out of a standard operations mode.

What I am hoping to be able to do is to first recover some missing data and then regroup on getting this NAS fixed and back on-line.

So is there someone out there that can help me with this question: What kind of risk am I putting my data in if I rename those *.eli files in /dev/gptid and then trying to import the pool again?

Or is there something I should do first to reduce the risk of data loss?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
What kind of risk am I putting my data in if I rename those *.eli files in /dev/gptid and then trying to import the pool again?
The name of the devices, as such, isn't the issue. The issue is the fact that they're encrypted with geli, and need to be decrypted/unlocked. The GUI should ordinarily do that for you, but isn't in this case. I don't have any advice on how to fix this (I don't use pool encryption), but I seriously doubt that "rename the /dev/ entries" will be helpful.
 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
From what I can tell, partitions look good.

Code:
root@freenas:/dev/gptid # gpart show
=>      34  15633341  da0  GPT  (7.5G)
        34      1024    1  bios-boot  (512K)
      1058         6       - free -  (3.0K)
      1064  15631360    2  freebsd-zfs  (7.5G)
  15632424       951       - free -  (476K)

=>        34  5860533101  ada0  GPT  (2.7T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338696     2  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5K)

=>        34  5860533101  ada1  GPT  (2.7T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338696     2  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5K)

=>        34  5860533101  ada2  GPT  (2.7T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338696     2  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5K)

=>        34  5860533101  ada3  GPT  (2.7T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338696     2  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5K)


What I am trying to do now is figure out how to re-attached the missing devices to the pool...

Code:
root@freenas:~ # zpool online VolumeName gptid/6a45ab6d-1140-11e6-980e-d0509913e8a9
cannot open 'VolumeName': no such pool


Another things I tried was clearing errors on the pool (without discarding the transactions) with a similar error....

Code:
root@freenas:~ # zpool clear -Fn VolumeName
cannot open 'VolumeName': no such pool


So my question now is how do I go about making the pool available so that I can re-attach the devices?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
If the pool does not appear in zpool status it's not imported and you can't do thing to it.

To find pools that could possibly be imported you use zpool import.

To import a pool you use the GUI and if that fails you use the cli.
 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
Here is the complete output of zpool import

Code:
root@freenas:~ # zpool import
   pool: VolumeName
     id: 5057694033267979932
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: http://illumos.org/msg/ZFS-8000-6X
 config:

    VolumeName                                     UNAVAIL  missing device
      mirror-1                                          ONLINE
        gptid/6b64892f-1140-11e6-980e-d0509913e8a9.eli  ONLINE
        gptid/6c49e955-1140-11e6-980e-d0509913e8a9.eli  ONLINE

    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.
root@freenas:~ #


Does that mean that I have no import to work off of?

Or should I try to import that? (Right now, I am trying to leave the disk in such a state that I can clone them as soon as I have another set of drives and check out other means of rebuilding and recovering the data on them.)
 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
Success: Recovered

Since the two drives were not showing up in volume but were showing on the operating system, I figured it may have something to do with the devices not being decrypted - therefore unreadable.

Another thing that pushed me in that direction was when I looked in /dev/gptid, the volumes were showing up in the pool were showing up in this directory with the *.eli suffix. Again this made me think that the drives must not be decrypting.

So I decided to try to figure out how to use the 'geli key' found in /data/geli/keys_to_the_kingdom.key to decrypt the drives. (For newbies, 'keys_to_the_kingdom.key' will be a long 32-character plus some hyphens UUID. Ie, mine was b300eff1-4f42-4869-a6fd-6bbce473dad5.key)

So here is the code I used to "get 'er done."

Did a little sleuthing to determine which two drives were off-line. The following shows the drives that ARE on-line from which I can deduct which ones are NOT on-line.

Code:
root@freenas:/dev/gptid # glabel status
                                      Name  Status  Components
gptid/8f3569c9-7489-11e5-8705-d0509913e8a9     N/A  da0p1
gptid/6c49e955-1140-11e6-980e-d0509913e8a9     N/A  ada0p2
gptid/6b64892f-1140-11e6-980e-d0509913e8a9     N/A  ada1p2


Now I attach and decrypt the two drives that are not showing up. This post on how to "Unlock Geli-encrypted ZFS Volume - FreeNAS" by Kai Wagner at OpenAttic.org was helpful on sorting some of the details.

Code:
root@freenas:/dev/gptid # geli attach -pk /data/geli/b300eff1-4f42-4869-a6fd-6bbce473dad5.key /dev/ada3p2
root@freenas:/dev/gptid # geli attach -pk /data/geli/b300eff1-4f42-4869-a6fd-6bbce473dad5.key /dev/ada2p2


Let's see what pools are available.

Code:
root@freenas:/dev/gptid # zpool import
   pool: VolumeName
     id: 5057694033267979932
  state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

    VolumeName                                     ONLINE
      mirror-0                                          ONLINE
        ada3p2.eli                                      ONLINE
        ada2p2.eli                                      ONLINE
      mirror-1                                          ONLINE
        gptid/6b64892f-1140-11e6-980e-d0509913e8a9.eli  ONLINE
        gptid/6c49e955-1140-11e6-980e-d0509913e8a9.eli  ONLINE


Making progress. Let's try to import the pool now.

Code:
root@freenas:/dev/gptid # zpool import VolumeName


Great, no errors. Let's check status now.

Code:
root@freenas:/mnt # zpool status
  pool: VolumeName
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 280M in 0 days 00:00:05 with 0 errors on Fri Apr 19 18:41:09 2019
config:

    NAME                                                STATE     READ WRITE CKSUM
    VolumeName                                     ONLINE       0     0     0
      mirror-0                                          ONLINE       0     0     0
        ada3p2.eli                                      ONLINE       0     0     6
        ada2p2.eli                                      ONLINE       0     0     0
      mirror-1                                          ONLINE       0     0     0
        gptid/6b64892f-1140-11e6-980e-d0509913e8a9.eli  ONLINE       0     0     0
        gptid/6c49e955-1140-11e6-980e-d0509913e8a9.eli  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:06:22 with 0 errors on Thu Apr 18 03:51:22 2019
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      da0p2     ONLINE       0     0     0

errors: No known data errors


Take a deep breath and move on to recovering the needed data. But first we will need to set a mount point.

Code:
root@freenas:~ # zfs set mountpoint=/mnt/VolumeName VolumeName
root@freenas:~ # zfs mount -a


Switched to a client on the network, connected via NFS, and recovered data.

Now it is time to write up a report on what I learned and how to avoid this situation again. Coming in another post.
 

nightowl

Dabbler
Joined
Apr 16, 2019
Messages
23
Reflections:

For many years, I have religiously used the 3-2-1 principle for data backup: 3 copies, 2 local, 1 off-site.

For the last few months, I have been in the process of restructuring, deduping, and cleaning up about 15 years work of data.

Strike one: Last week during the process, to make space I wiped my local back-up leaving me with two copies of data: one local, live, and working; and one off-site.

Strike two: Within hours the off-site goes down violently and catastrophically: Think fire, acts of nature, theft. Now we are down to one copy and I am on pins and needles because this is not a good situation.

Moving carefully over the weekend, we make a second local copy of a good bit of the legacy data and are up to May 2018 plus major important records and are pushing out the most important records out as fast as our uplink will let us to a few cloud storage shares we have.

Monday morning - all the backups I started before over the weekend are completed but I do not have time to start any new ones before leaving for a meeting. The NAS is idle.

Strike three: Get back from the meeting and the NAS is down without warning with a Red Alert. That was one of the moments when you get cold and sweat at the same time!

From the discussion on this thread, it is obvious that I should have been done some things differently: Watching drive SMART data better, retaining more configuration information, and possibly not using encryption.

Lesson one: Never have less than three copies.

Lesson two: More aggressively monitor my local FreeNAS setup. I plan to peruse the FreeNAS documents again and update my processes to hopefully avoid this kind of unexpected (and nearly catastrophic) failure again.

If anyone has any pointers, feel free to speak up. I am listening for lesson three.

And thanks again for all of your help, especially @Chris Moore, @danb35, and @SweetAndLow

All of you all have a good weekend.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
Switched to a client on the network, connected via NFS, and recovered data.

Now it is time to write up a report on what I learned and how to avoid this situation again. Coming in another post.
You should export the pool and import it through the GUI. You should always use the GUI when you can and now that the pool can import, it will import through the GUI. The pool also needs to be rekeyed. The most likely reason that it didn't import automatically is because the keys did not all match. Once you rekey, all the drives should be keyed the same.
 
Top