Critial Errors - Not capable of SMART Self Check & Currently unreadable (pending) sectors

magicchicken

Dabbler
Joined
Apr 5, 2019
Messages
24
Hi all,

I am running TrueNAS-12.0-RELEASE and have started getting the following errors and not sure how to fix them.

Device: /dev/ada0, not capable of SMART self-check.
Device: /dev/ada0, 18664 Currently unreadable (pending) sectors.
Device: /dev/ada0, 18664 Offline uncorrectable sectors.

I have spent quite a few hours trying to find solutions via Google however, I'm not having much luck. This is the most helpful I could find however, I'm not sure whether this will actually solve the issue and I'm also a bit stuck on how to interpret the results.


I'm not exactly advanced with TrueNAS so it would really help me out with some step by step instructions if that's not too much to ask.
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
Hello,

Device: /dev/ada0, not capable of SMART self-check.
What kind of drive is this? Are you using some kind of disk controller? How is it connected?
What is the output of smartctl -a /dev/ada0 (in a terminal)?
Could you give your hardware details?

Generally when facing Offline uncorrectable sectors, the best way to fix it would be to change the disk! :smile:
Because 18664 offline sectors is a lot. If the disk is still under warranty, you should RMA it.

To change the disk, the best way is to follow the user manual. Ok... TrueNAS doesn't have such manual available and the documentation is still to be improved. But there is an article about it. And you can also have a look at the documentation for FreeNAS 11.3, there shouldn't be much difference in how to change the disk.
 

magicchicken

Dabbler
Joined
Apr 5, 2019
Messages
24
Thanks for taking the time to help.

This is an older 3.5" SATA 1TB drive I had and only use it for my jails. Attached are the results from the scan. The disk is definitely not under warranty, it's probably 10 years old or more.

So the drive is rooted in your opinion? I'm fine with changing the hardware, it's just the shell commands I get stuck with.
 

Attachments

  • results.txt
    12 KB · Views: 166

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
There are lots of errors logged by the drive:
Code:
Error 7 occurred at disk power-on lifetime: 44513 hours (1854 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

I don't know this error specifically, I've read that WP means write protected and it is always happening at the same LBA.
I don't really know how to interpret that.

But the long SMART tests are failing:
Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     45067         226655648
# 2  Extended offline    Completed: read failure       90%     45063         222317048
# 3  Short offline       Completed without error       00%     45063         -
# 4  Extended offline    Interrupted (host reset)      90%     45062         -
# 5  Extended offline    Interrupted (host reset)      90%     45062         -
# 6  Extended offline    Completed: read failure       90%     45043         222698000
# 7  Extended offline    Completed: read failure       70%     45019         650830840
...


And that is a bad sign definitely.

I would backup all the data on this drive as soon as possible.
Once all the data is backed-up, you could eventually run a badblocks on the drive but I guess that will only confirm the SMART data.
 

magicchicken

Dabbler
Joined
Apr 5, 2019
Messages
24
There are lots of errors logged by the drive:
Code:
Error 7 occurred at disk power-on lifetime: 44513 hours (1854 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

I don't know this error specifically, I've read that WP means write protected and it is always happening at the same LBA.
I don't really know how to interpret that.

But the long SMART tests are failing:
Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     45067         226655648
# 2  Extended offline    Completed: read failure       90%     45063         222317048
# 3  Short offline       Completed without error       00%     45063         -
# 4  Extended offline    Interrupted (host reset)      90%     45062         -
# 5  Extended offline    Interrupted (host reset)      90%     45062         -
# 6  Extended offline    Completed: read failure       90%     45043         222698000
# 7  Extended offline    Completed: read failure       70%     45019         650830840
...


And that is a bad sign definitely.

I would backup all the data on this drive as soon as possible.
Once all the data is backed-up, you could eventually run a badblocks on the drive but I guess that will only confirm the SMART data.

Right, I see. I think it might be best just to replace the drive given how old it is anyway and I'm better off getting something larger to maximise space. Out of curiosity, what will running badblocks do?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Additionally look at the ID's for key failure data...
ID5 Reallocated_Sector_Ct 0x0033 089 089 010 Pre-fail Always - 14504
197 Current_Pending_Sector 0x0012 001 001 000 Old_age Always - 18664
198 Offline_Uncorrectable 0x0010 001 001 000 Old_age Offline - 18664

And the fact that it cannot complete a surface scan as @Pitfrr indicated.

Your drive is significantly dying. Replace the drive using the User Guide instructions, the steps are pretty clear on how to do it properly. If you want to know more, check out the link in my signature, it will help you the next time you have a question about a hard drive failure.
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
badblocks is similar to a long SMART test but a long SMART test only test the disk's surface in reading mode (i.e. non destructive). badblocks can also do it in writing mode (i.e. destructive).
It is also used to burn in new drives.
It is a command line tool and is natively in FreeNAS.

So if you get a new drive, put is through a few passes with badblocks to burn it in, to make sure it is fit for use. I would refer you to this resources about burn in.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Use of badblocks is also in my little troubleshooting guide but if a SMART Extended test fails, no need to take it any further, to the drive is failing from surface defects.
 

magicchicken

Dabbler
Joined
Apr 5, 2019
Messages
24
badblocks is similar to a long SMART test but a long SMART test only test the disk's surface in reading mode (i.e. non destructive). badblocks can also do it in writing mode (i.e. destructive).
It is also used to burn in new drives.
It is a command line tool and is natively in FreeNAS.

So if you get a new drive, put is through a few passes with badblocks to burn it in, to make sure it is fit for use. I would refer you to this resources about burn in.

Thanks again for the help with this. I ended up buying a new HDD today to replace the drive with the problems. Installed and all working perfectly now.
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
Before you start to use your new drive, you should burn it in (check the resources)...
It takes some time (few days) but it is worth it.
 
Top