Alert/Error message help

Status
Not open for further replies.

takkischitt

Explorer
Joined
Jan 20, 2014
Messages
70
I just updated FreeNAS from 9.2 stable to 9.3 stable and now the system has fully restarted, I've been presented with a couple of alerts:

  • CRITICAL: Device: /dev/ada1, 3 Currently unreadable (pending) sectors
  • WARNING: New feature flags are available for volume Files. Refer to the "Upgrading a ZFS Pool" section of the User Guide for instructions.
Can anyone advise?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Well, you have a drive which has maybe started to fail, 3 pending sectors isn't a big deal but you want to keep an eye on this drive to see if the numbers of pending, reallocated and uncorrectable sectors rise. Can you post the outputs of smartctl -a /dev/ada1 and zpool status (between code tags for readability) please?

The warning isn't a big deal too, read the quoted section of the manual to know why and how to upgrade the pool ;)
 

takkischitt

Explorer
Joined
Jan 20, 2014
Messages
70
Many thanks for your swift reply. I have successfully upgraded the ZFS pool.

I'm not 100% sure about the format you want the other information in, or if this is even the correct information! Sorry if this isn't correct! I'm quite a novice when it come to this sort of thing!

Output of smartctl -a /dev/ada1

Code:
=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar P7K500
Device Model:     Hitachi HDP725050GLA360
Serial Number:    *******************
LU WWN Device Id: ************
Firmware Version: GM4OA5CA
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Mon Oct 19 12:16:34 2015 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)    Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         ( 7890) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 131) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       140
  3 Spin_Up_Time            0x0007   119   119   024    Pre-fail  Always       -       313 (Average 333)
  4 Start_Stop_Count        0x0012   096   096   000    Old_age   Always       -       18321
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       2
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   131   131   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   093   093   000    Old_age   Always       -       49342
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       106
192 Power-Off_Retract_Count 0x0032   084   084   000    Old_age   Always       -       20059
193 Load_Cycle_Count        0x0012   084   084   000    Old_age   Always       -       20059
194 Temperature_Celsius     0x0002   176   176   000    Old_age   Always       -       34 (Min/Max 18/47)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       2
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Output of zpool status

Code:
 
pool: freenas-boot
state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      da0p2     ONLINE       0     0     0

errors: No known data errors
/mnt/Files zpool status
  pool: Files
state: ONLINE
  scan: scrub repaired 0 in 0h31m with 0 errors on Sun Oct  4 00:31:59 2015
config:

    NAME                                          STATE     READ WRITE CKSUM
    Files                                         ONLINE       0     0     0
      gptid/955741bb-7ec5-11e3-b5bb-38eaa7ab5e10  ONLINE       0     0     0
      gptid/95d28a46-7ec5-11e3-b5bb-38eaa7ab5e10  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      da0p2     ONLINE       0     0     0

errors: No known data errors
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, you do have a big deal now that you posted the 'zpool status'. You have no redundancy from disk failures in the zpool named "Files". Since you already have 3 bad sectors, you need to consider a backup strategy NOW. Your pool is likely to just suddenly die without warning...
 

takkischitt

Explorer
Joined
Jan 20, 2014
Messages
70
Well, you do have a big deal now that you posted the 'zpool status'. You have no redundancy from disk failures in the zpool named "Files". Since you already have 3 bad sectors, you need to consider a backup strategy NOW. Your pool is likely to just suddenly die without warning...

Yeah, I do a regular backup of the system and keep the portable hard-drive offsite.

So its just a matter of time before a hard-drive goes down? Might be better to sort this before it gets worse?
 
Last edited:

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
You have no redundancy from disk failures in the zpool named "Files"
Just curious how you can tell? Is it because files doesn't have "raid-z or mirror" under "files" that tells you it's striped?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Well, given the fact this drive has 49342 hours of usage (more than 5.5 years, usually we consider that a 5 years life for a drive is very good) and it has seen temperatures far higher than 40 °C (you really should keep your drives below 40 °C) I don't think it'll last very long from now.
 

takkischitt

Explorer
Joined
Jan 20, 2014
Messages
70
Could you recommend a setup which would be more robust in future?

I was thinking (albeit in a very knowledgeable fashion) if I was to get another new 1TB drive and mirror it with this pool (2 x 500GB drives), then if one of the 500GB's gives up, I can just remove the other one and go back to doing backups and keeping the backup disk offsite? Would that be wise? I dont really want to be buying more than I have to, if possible.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You can't mirror a vdev. But you can mirror the two 500 GB drives and add another mirror of two 1 TB drives for example ;)

Offsite backups are always a good idea ;)
 

takkischitt

Explorer
Joined
Jan 20, 2014
Messages
70
I wouldn't really need that much space, to be honest. As you can see below, I'm not even close to 1TB. 2TB would be unused.

space.png


Is there any way to move all the files onto the 500GB drive that isn't showing any bad sectors? Then I could just remove the dodgy one?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
In addition to the other problems noted, /dev/ada1 has never had a SMART self-test run. It should. @takkischitt, you can (and should) scheduled regular SMART self-tests through the FreeNAS web GUI--short tests every 1-3 days, long tests every 1-3 weeks.
Is there any way to move all the files onto the 500GB drive that isn't showing any bad sectors?
Unfortunately, no, there isn't. Once you add a disk to a pool, it (or its replacement) is there forever. If the data is backed up, you could destroy the pool and recreate it as a mirror. This would, of course, cut your capacity in half, but it would protect you when (not if) a drive fails. Since you're using < 200 GB, a 500 GB mirror would still have plenty of space.
 

takkischitt

Explorer
Joined
Jan 20, 2014
Messages
70
In addition to the other problems noted, /dev/ada1 has never had a SMART self-test run. It should. @takkischitt, you can (and should) scheduled regular SMART self-tests through the FreeNAS web GUI--short tests every 1-3 days, long tests every 1-3 weeks.

I have set it to perform a short test on each drive every 3 days and a long test twice a month. Thanks for the advice. I think the following settings are correct...

smart.png


Unfortunately, no, there isn't. Once you add a disk to a pool, it (or its replacement) is there forever. If the data is backed up, you could destroy the pool and recreate it as a mirror. This would, of course, cut your capacity in half, but it would protect you when (not if) a drive fails. Since you're using < 200 GB, a 500 GB mirror would still have plenty of space.

That sounds wise. I think what I will do is get a completely up to date backup on the external drive which I normally keep off site. Then I will destroy the pool and set up a mirror, then just run the two 500GB drives as local fail protection (as you suggested) with the offsite drive kept as a 2nd backup. I think this would probably be the best way to utilise what I've currently got, without spending what I dont need to.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, your schedule is not correct. You're going to run a long test on the 1st at 0000, 2nd at 0500, 19th at 0000, and 20th at 0500. You have a similar error for Short SMART tests. You should only need 1 line for Long and one line for Short.
 

takkischitt

Explorer
Joined
Jan 20, 2014
Messages
70
Yeah, your schedule is not correct. You're going to run a long test on the 1st at 0000, 2nd at 0500, 19th at 0000, and 20th at 0500. You have a similar error for Short SMART tests. You should only need 1 line for Long and one line for Short.

Even though there are two physical hard drives?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I presume that he's trying to offset the tests so that ada0 and ada1 aren't running at the same time. @takkischitt, that isn't necessary, and it's quite a bit simpler to include all the disks in one schedule. So you'd have one entry for the long test for both drives, running on whatever days at whatever time you wanted, and another entry for the short test.

For now, kick off a long test manually from the command line. Run 'smartctl -t long /dev/adaX' for ada0 and ada1. It will give you an estimate of when the test will complete. A while after that estimated time, run 'smartctl -a /dev/adaX' for both drives and post the results here between code tags.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You can Ctrl + click in the drives list of the config pop-up to select more than one drive ;)
 

takkischitt

Explorer
Joined
Jan 20, 2014
Messages
70
Ah, thanks folks. Didn't realise you could multi-select the drives! Should be set up correctly now, I think...

smart.png


For now, kick off a long test manually from the command line. Run 'smartctl -t long /dev/adaX' for ada0 and ada1. It will give you an estimate of when the test will complete. A while after that estimated time, run 'smartctl -a /dev/adaX' for both drives and post the results here between code tags.

Can the drive still be used whilst running a long test?
 

takkischitt

Explorer
Joined
Jan 20, 2014
Messages
70
Any way to tell when it is complete, or is the completion time usually pretty accurate?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I'd probably add about 10% to the estimate, but near the top of the smartctl -a output, it will tell you if it's complete or if it's still running. Yours says this right now:
Code:
Self-test execution status:      (   0)    The previous self-test routine completed
                 without error or no self-test has ever
                 been run.


If it's still running, it will give a percentage remaining, in 10% increments.
 
Status
Not open for further replies.
Top