UFS volume degraded

poldas

Contributor
Joined
Sep 18, 2012
Messages
104
Hi

I have UFS mirrored volume, today I noticed that volume is degraded.

Code:
[root@freenas /]# gmirror list -a
Geom name: Volume1
State: DEGRADED
Components: 2
Balance: load
Slice: 4096
Flags: NONE
GenID: 1
SyncID: 1
ID: 2748955271
Providers:
1. Name: mirror/Volume1
   Mediasize: 500107861504 (465G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e2
Consumers:
1. Name: ada1
   Mediasize: 500107862016 (465G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   State: ACTIVE
   Priority: 1
   Flags: NONE
   GenID: 1
   SyncID: 1
   ID: 2404720640

Geom name: Volume1.sync

[root@freenas /]# gmirror status
          Name    Status  Components
mirror/Volume1  DEGRADED  ada1 (ACTIVE)


The second disk ada0 is available. I checked both disk and SMART didn't report any errors. I don't know why ada0 was unplugged from volume...
How can I add ada0 to degraded volume?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
What is the output of "camcontrol devlist", and UFS?!?!?!?! how OLD is this system?
 

poldas

Contributor
Joined
Sep 18, 2012
Messages
104
<WDC WD5000AZRX-00A8LB0 01.01A01> at scbus2 target 0 lun 0 (pass0,ada0)
<WDC WD5000AZRX-00A8LB0 01.01A01> at scbus2 target 1 lun 0 (pass1,ada1)
<Kingston DataTraveler G3 PMAP> at scbus5 target 0 lun 0 (da0,pass2)

How old? About 7 years
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
ok, well the disk is still there, just not mounted. The likely cause is that there is a hardware problem.

Can you pastebin the output of
Code:
smartctl -x -qnoserial /dev/ada0
?

Also, please confirm, you are using a version of FreeNAS from 5 years ago? Can you please tell us what your motherboard, CPU, RAM, is?
 

poldas

Contributor
Joined
Sep 18, 2012
Messages
104
Hi

It is old DELL GX280 Pentium 4 2.0 GHz, 2 GB RAM DDR2.
FreeNAS-9.2.1.9-RELEASE-x86 (2bbba09)

Code:
[root@freenas /]# smartctl -x -qnoserial /dev/ada0
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p15 i386] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD5000AZRX-00A8LB0
Firmware Version: 01.01A01
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Mon Jan 14 08:36:58 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                ( 6240) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp                                                                                                                                ort.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  74) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x30b5) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    2972
  3 Spin_Up_Time            POS--K   141   136   021    -    3925
  4 Start_Stop_Count        -O--CK   100   100   000    -    90
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   035   035   000    -    47493
 10 Spin_Retry_Count        -O--CK   100   253   000    -    0
 11 Calibration_Retry_Count -O--CK   100   253   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    89
192 Power-Off_Retract_Count -O--CK   200   200   000    -    171
193 Load_Cycle_Count        -O--CK   108   108   000    -    276115
194 Temperature_Celsius     -O---K   109   098   000    -    34
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   192   192   000    -    720
198 Offline_Uncorrectable   ----CK   198   196   000    -    178
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   198   191   000    -    1018
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not i                                                                                                                                mplemented for legacy controllers
Read GP Log Directory failed

SMART Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00           SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x09           SL  R/W      1  Selective self-test log
0x80-0x9f      SL  R/W     16  Host vendor specific log
0xa0-0xa7      SL  VS      16  Device vendor specific log
0xa8-0xb7      SL  VS       1  Device vendor specific log
0xbd           SL  VS       1  Device vendor specific log
0xc0           SL  VS       1  Device vendor specific log
0xe0           SL  R/W      1  SCT Command/Status
0xe1           SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log Version: 1
No Errors Logged

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                                                                                                                                _of_first_error
# 1  Short offline       Completed: read failure       90%     47256         546                                                                                                                                597832
# 2  Short offline       Completed: read failure       10%     46727         949                                                                                                                                053624
# 3  Short offline       Completed: read failure       10%     46726         949                                                                                                                                053624
# 4  Short offline       Completed without error       00%     22904         -
# 5  Short offline       Completed: read failure       90%     22904         972                                                                                                                                140632
# 6  Short offline       Completed: read failure       90%     16395         952                                                                                                                                896640
# 7  Short offline       Completed: read failure       90%     16395         952                                                                                                                                896640
# 8  Short offline       Completed: read failure       90%     16395         972                                                                                                                                140632

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    34 Celsius
Power Cycle Min/Max Temperature:     30/40 Celsius
Lifetime    Min/Max Temperature:     17/45 Celsius
Under/Over Temperature Limit Count:   0/0
SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (146)

Index    Estimated Time   Temperature Celsius
 147    2019-01-14 00:39    35  ****************
 ...    ..(167 skipped).    ..  ****************
 315    2019-01-14 03:27    35  ****************
 316    2019-01-14 03:28    34  ***************
 ...    ..( 67 skipped).    ..  ***************
 384    2019-01-14 04:36    34  ***************
 385    2019-01-14 04:37    35  ****************
 ...    ..( 33 skipped).    ..  ****************
 419    2019-01-14 05:11    35  ****************
 420    2019-01-14 05:12    36  *****************
 421    2019-01-14 05:13    35  ****************
 ...    ..( 87 skipped).    ..  ****************
  31    2019-01-14 06:41    35  ****************
  32    2019-01-14 06:42    36  *****************
  33    2019-01-14 06:43    35  ****************
 ...    ..( 27 skipped).    ..  ****************
  61    2019-01-14 07:11    35  ****************
  62    2019-01-14 07:12    36  *****************
  63    2019-01-14 07:13    35  ****************
 ...    ..( 82 skipped).    ..  ****************
 146    2019-01-14 08:36    35  ****************

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04) not supported

ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not i                                                                                                                                mplemented for legacy controllers
Read SATA Phy Event Counters failed
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Yeah so this drive is bad.

Note first the drive's age: 47493 hours. That's 5.5 years. 5.5 years is already "elderly" for any bottom-tier consumer grade drive, which this is.

First of all, you are using a WD Green 500GB. These have design considerations which make them, out of the box, very poor choices for a NAS. You'll note on attribute #193, you have 276115 load cycles. A proper NAS drive would have a number like "100" here, and in any case, any number over a few tens of thousands is probably too high for this drive.

Second of all, you have 720 pending sectors (attribute #197). This is instantaneous garbage can. This means your drive is failing now.

Third of all, you have 178 uncorrectable errors (attribute #198). Instant garbage can for me if this were even 1.

And then you have multizone errors (attribute #200) on top of it.

Also, your SMART tests have been failing for years. (And you haven't been doing any, hardly). You're lucky you got this far with this drive.

So the diagnosis is clear sir: Your drive has fallen out of your pool because it has major hardware problems, due to a combination of being older than its useful life, and being exposed to the rigors of a NAS when it's head-parking behavior was not designed for that.

I will say, for the record, the drive's temperature record shows that it was well-ventiled and in a decent place. So that's good.

You must replace this drive, immediately, so that you don't lose your pool. Of course, finding a new 500GB drive is not easy in 2019.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Also, having a UFS pool on 9.2.1.9 is very, very, very, very out of date. I'm guessing if you told me your hardware you were using, it would be pretty scary.

You know FreeNAS does not even do UFS pools anymore, right?
 

SeaFox

Explorer
Joined
Aug 6, 2013
Messages
98
Also, having a UFS pool on 9.2.1.9 is very, very, very, very out of date. I'm guessing if you told me your hardware you were using, it would be pretty scary. You know FreeNAS does not even do UFS pools any more, right?

It is old DELL GX280 Pentium 4 2.0 GHz, 2 GB RAM DDR2.

He's using an old Dell workstation. The reason he is using it is likely he got it cheap/free from a business.
It does not have the proper hardware for a "correct" implementation of ZFS, so he is using UFS, same as me
(Note: I am actually building a new NAS right now, so I'll be upgrading soon, though.)

I imagine there are a lot of people out there doing the same thing. Running old FreeNAS because the setup they have generally suits their needs and there is a rather high cost-of-entry to upgrade. They just don't come on this forum very much. There is an uncanny valley of people who want something more powerful (or more open) than a Synology, but don't need enterprise-level data protection. That's where you get the "repurposed this old Core i5 desktop" group. Not everyone can afford to plop down several hundred dollars on a system that supports ECC RAM and 4+ hard drives at the same time.

A few weeks ago I added additional capacity to my (current) FreeNAS: an almost 10-year-old WD Caviar Black 640. It was just sitting on a shelf gathering dust after being retired from my main desktop PC, and it still worked fine. Since I'm using UFS it was a breeze to add it to my available storage, and now I have more storage for my Plex install.

When running things like this, the main thing to do is don't keep your critical data only on the NAS. Treat is as a tool for high availability. The data on my FreeNAS now is 95% already on other computers in my home (that last 5% being something that may be torrenting maybe playlists/ library setups on plugins -- nothing really important). If that old 640GB fails tomorrow -- it will be a fitting end of its duties. I wont care: I haven't lost anything in reality except that capacity.
 
Top