After 9.10.1 update, getting 8 unreadable sectors error

Jr922 · Aug 11, 2016

Updated to FreeNAS-9.10.1 (d989edd)
and I am now getting the red light with error:
CRITICAL: Aug. 11, 2016, 3:39 p.m. - Device: /dev/ada1, 8 Currently unreadable (pending) sectors

But volumes show everything healthy and zpool status shows everything online. Scrub is running

This drive is in a 6 drive Raid z2 Pool.

Bidule0hm · Aug 11, 2016

Well, 8 pending sectors isn't critical but if the number rise then it means the drive is probably going to die soon, just keep an eye on it ;)

Jr922 · Aug 11, 2016

That's what I figured but there are a couple weird things, this is less then 6 month old drive, I did a shutdown and reboot to move my power cord right before this update and no errors were thrown; then right after update it throws the error, and shouldn't this error cause the volume to show up as degraded or something other then healthy?

Do I need to run a smart long to check for pending sectors?

Bidule0hm · Aug 11, 2016

The volume shows up as degraded only if there's checksum mismatch (or a device drop, but that's extreme), not if a drive has smart warnings.

If I were you I'd probably do a long smart test, then a scrub, then a long smart test again ;)

Jr922 · Aug 12, 2016

Scrub repaired 0 and long test completed without error but still shows 8 pending sectors error and 8 pending sectors in smartctl -a /dev/ada1. Error is coming up every 30 min in /var/log/messages.

Shouldn't 8 pending sectors throw a read error on a smart long test? Anything else I can check to try verifying and/or find where the sectors are so I can try to rewrite?

Bidule0hm · Aug 13, 2016

Ok, it's good news then, but keep an eye on this drive nonetheless.

IIRC using smartctl -x /dev/ada1 will give you more info than -a and you'll maybe have the LBAs of those sectors ;)

Lars Jensen · Aug 16, 2016

FYI After upgrading to FreeNAS-9.10.1 I have the exact same SMART error on a less than 6 months old drive except it is on ada9 also with a 6 drive Raid z2. Did a long self-test and number increased to 16. Pool is still good, but have ordered a new disk just to have one ready for exchange.

Maybe just a coincidence of almost new drives dying.

Bidule0hm · Aug 17, 2016

What are the temps of those drives?

Lars Jensen · Aug 17, 2016

Bidule0hm said:
What are the temps of those drives?

Around 23 degrees celcius, see:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 223522992
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 4
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 071 060 030 Pre-fail Always - 12886077
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 3286
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 4
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 077 069 045 Old_age Always - 23 (Min/Max 20/31)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 4
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 4
194 Temperature_Celsius 0x0022 023 040 000 Old_age Always - 23 (0 20 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 16
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 16
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 3285 -
# 2 Short offline Completed: read failure 90% 3273 -
# 3 Short offline Completed: read failure 90% 3261 -
# 4 Short offline Completed: read failure 90% 3249 -
# 5 Extended offline Completed: read failure 10% 3247 -
# 6 Short offline Completed: read failure 70% 3238 -
# 7 Short offline Completed: read failure 50% 3238 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Robert Trevellyan · Aug 17, 2016

Lars Jensen said:
Maybe just a coincidence of almost new drives dying.

I think it's more likely that a known bug that resulted in email notifications not working was fixed in the update, and the drive has been failing for a while.

danb35 · Aug 19, 2016

Robert Trevellyan said:
a known bug that resulted in email notifications not working was fixed in the update,

...and lest it not be clear, this is the case--there was such a bug, and it was fixed in 9.10.1. It may or may not be the case that there were bad sectors for some time.

Lars Jensen · Aug 29, 2016

Great bugfix

FYI the disk actually died today, so it is in this case a "soon-to-die" smart error. And now resilvering:
scan: resilver in progress since Mon Aug 29 12:23:10 2016
1.58T scanned out of 15.9T at 925M/s, 4h30m to go
387M resilvered, 9.95% done

Jr922 · Aug 29, 2016

To update this, my drive was fine. The pending sectors went to 0 after a 5tb write and there were no reallocated sectors.

Robert Trevellyan · Aug 29, 2016

That just means the sectors were successfully written. It doesn't mean the underlying medium is good.

Jr922 · Aug 29, 2016

Robert Trevellyan said:
That just means the sectors were successfully written. It doesn't mean the underlying medium is good.

Yes it could still be bad even though smart tests show successfull writes on those sectors. However, It doesnt mean it is bad either and it's certainly a good sign that they were not reallocated right?
I believe the pending sectors were due to a local brown out that occured shortly after the update, I'm on UPS but its old and might have not have kicked on fast enough under a brown out. Unless I'm missing something else I'm just going to keep an eye on this drive until there is something to indicate a bad drive.

Robert Trevellyan · Aug 30, 2016

If it were my disk, I would prefer to see the sectors reallocated. Take a look at what Wikipedia has to say about SMART attribute #197: https://en.wikipedia.org/wiki/S.M.A.R.T.

Any reason you can't run badblocks destructive on the disk? That way you get four write passes and four read passes with different patterns.

Jr922 · Aug 30, 2016

Robert Trevellyan said:
If it were my disk, I would prefer to see the sectors reallocated. Take a look at what Wikipedia has to say about SMART attribute #197: https://en.wikipedia.org/wiki/S.M.A.R.T.

Any reason you can't run badblocks destructive on the disk? That way you get four write passes and four read passes with different patterns.

Why would you want them to be reallocated? That would indicate that the sectors were in fact bad right? In my case I had 8 pending, 0 reallocated, and 0 uncorrectable. After rewrite i now have 0 pending, 0 reallocated, and 0 uncorrectable. At the least we know those sectors are good enough to be rewritten successfully. If these sectors pop up again then there is for sure an issue, but I don't agree with the wiki that attempting to rewrite the pending sectors vs always reallocating them is "a serious shortcoming" That seems like a more intelligent approach since a pending sector could be caused by power failure for instance and the sectors might be fine. What is the advantage of reallocating them without testing to see if they can be written? You should be able to see if it keeps popping up and there is a very low chance of reoccurring pending sectors that can still be successfully written.

I'm still not understanding what you are basing the idea that it might be bad on? The wiki indicates that drives with first pending sectors, reallocations, and uncorrectables are much more likely to fail statistically, and I get that; but that doesn't give me anything to do or to check except to preemptively replace. And in this case I'm on Z2 with a full backup on another PC so I'm probably gonna run the drive until it either dies without warning or starts throwing more errors.

Only reason I can't run Badblocks is I don't have a spare drive and I don't want to degrade it to 1 disk for redundancy. I could buy another but they are running like $250 and at that point why not just replace the drive and be done.

Robert Trevellyan · Aug 30, 2016

The sectors were pending reallocation due to read failure. They were subsequently written without an error being detected. This does not tell us whether they can be read successfully in the future. Being able to write data without detecting an error is of no benefit if that data can't be read later - think untested backups. Reallocation to known good sectors would ensure that potentially flaky sectors were no longer in use. That would make me more comfortable, but that's just me.

Can you explain why you think the pending reallocation was related to a brownout?

Please note that I'm focusing on this specific disk and it's potentially bad sectors. I'm not suggesting you're about to lose data.

Jr922 · Aug 30, 2016

Robert Trevellyan said:
The sectors were pending reallocation due to read failure. They were subsequently written without an error being detected. This does not tell us whether they can be read successfully in the future. Being able to write data without detecting an error is of no benefit if that data can't be read later - think untested backups. Reallocation to known good sectors would ensure that potentially flaky sectors were no longer in use. That would make me more comfortable, but that's just me.

Can you explain why you think the pending reallocation was related to a brownout?

Please note that I'm focusing on this specific disk and it's potentially bad sectors. I'm not suggesting you're about to lose data.

But if it couldn't be read after the write wouldn't that come back up in a long smart test?

Im not sure, thats just a theory. I had run smart tests regularly and checked smart info on all the drives not too long before the update also no red light in the webgui right before the update. So it is likely that this occured after the update. I updated and let it reboot and walked away since everything was working fine, my cifs shares were up, etc. Later that night I went to the webgui and noticed I had a red light for 8 pending sectors so I thought it was too much of a coincidence that it happended right after the update so I posted here. But after thinking about it we have had quite a few intermitent powerlosses/brownouts lately with everyone using so much AC and one of them happened to be on that day. At the time I was on my PC and my UPS for my PC kicked on for a sec and then power came back almost instantly so I didn't think much of it. But its possible that my other UPS for my Freenas box failed in some way or got confused by the intermittent power. Other then that I can't think of what could have happened in that time between the update and when I checked the webgui again. There was no real activity at all on this drive during this time, system dataset, jails, etc. are all on other drives not on this pool.

Robert Trevellyan · Aug 31, 2016

Jr922 said:
But if it couldn't be read after the write wouldn't that come back up in a long smart test?

Maybe, to be honest I'm not sure.

Jr922 said:
just a theory

OK, well I guess anything is possible, but I'm inclined to believe it's either a coincidence, or a result of updating to a version that eliminated a reporting bug.

Important Announcement for the TrueNAS Community.

After 9.10.1 update, getting 8 unreadable sectors error

Explorer

Server Electronics Sorcerer

Explorer

Server Electronics Sorcerer

Explorer

Server Electronics Sorcerer

Explorer

Server Electronics Sorcerer

Explorer

Pony Wrangler

Hall of Famer

Explorer

Explorer

Pony Wrangler

Explorer

Pony Wrangler

Explorer

Pony Wrangler

Explorer

Pony Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "After 9.10.1 update, getting 8 unreadable sectors error"

Similar threads