Is there any way of dealing with pending sectors?

NASbox · Jul 16, 2020

I have a drive that has developed a few "pending sectors" is there a way to force ZFS to text/remap these sectors if they have hard errors. Since ZFS identified the errors and resilvered the drive in question, I don't think a scrub will even access the sectors in question.

Alecmascot · Jul 16, 2020

You are now leaving the forum

NASbox · Jul 16, 2020

Alecmascot said:
You are now leaving the forum

Thanks for passing this along.... I have done a bit of reading since I originally posted and I'm not sure if I should attempt to fix it. I originally had 60 "Pending Sectors" and I did a scrub, and after the scrub I was down to 49 and NO Reallocation. Does that mean 11 were corrected (and the other 49 weren't touched)?

Alecmascot · Jul 17, 2020

NASbox said:
Does that mean 11 were corrected (and the other 49 weren't touched)?

A Scrub does not usually resolve Pending Sectors as there is no writing going on.
I would say that you have a large number there and I would think about changing the drive.
You could resilver the drive and see what happens.

NASbox · Jul 19, 2020

Alecmascot said:
A Scrub does not usually resolve Pending Sectors as there is no writing going on.
I would say that you have a large number there and I would think about changing the drive.
You could resilver the drive and see what happens.

When I started a scrub I had 60 pending sectors, as part of the scrub it did a partial resilver and the number dropped to 49.
How do you resilver a disk on demand"

Alecmascot · Jul 19, 2020

NASbox said:
How do you resilver a disk on demand"

Take it offline for a little while then add it back.

Jailer · Jul 19, 2020

NASbox said:
as part of the scrub it did a partial resilver

This is not normal. You need to check out the smart data from the drive in question to verify if it is on it's way out.

NASbox · Jul 23, 2020

I've got a replacement drive doing it's burn in with BadBlocks as I write this, but I find this behaviour interesting. Why does the number of pending sectors drop without a reallocation event/sectors? That would seem to imply that bad sectors are correcting themselves automatically.

Once I get the drive safely out of the pool, I think I'll give it a good go over with badblocks to see if it falls appart or fixes itself completely or somewhere in between.

Does anytone have any comments about this weird behaviour?

Code:

20200714_050000~da2:ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
20200714_050000~da2:  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
20200714_050000~da2:196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
20200714_050000~da2:197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
20200714_050000~da2:198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
20200715_050000~da2:ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
20200715_050000~da2:  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
20200715_050000~da2:196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
20200715_050000~da2:197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       60
20200715_050000~da2:198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
20200716_050000~da2:ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
20200716_050000~da2:  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
20200716_050000~da2:196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
20200716_050000~da2:197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       60
20200716_050000~da2:198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
20200717_050000~da2:ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
20200717_050000~da2:  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
20200717_050000~da2:196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
20200717_050000~da2:197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       49
20200717_050000~da2:198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
20200718_050000~da2:ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
20200718_050000~da2:  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
20200718_050000~da2:196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
20200718_050000~da2:197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       49
20200718_050000~da2:198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
20200719_050000~da2:ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
20200719_050000~da2:  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
20200719_050000~da2:196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
20200719_050000~da2:197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       130
20200719_050000~da2:198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
20200720_050000~da2:ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
20200720_050000~da2:  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
20200720_050000~da2:196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
20200720_050000~da2:197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       81
20200720_050000~da2:198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
20200721_050000~da2:ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
20200721_050000~da2:  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
20200721_050000~da2:196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
20200721_050000~da2:197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       81
20200721_050000~da2:198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
20200722_050000~da2:ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
20200722_050000~da2:  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
20200722_050000~da2:196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
20200722_050000~da2:197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       81
20200722_050000~da2:198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0

Fredda · Jul 23, 2020

NASbox said:
I've got a replacement drive doing it's burn in with BadBlocks as I write this, but I find this behaviour interesting. Why does the number of pending sectors drop without a reallocation event/sectors? That would seem to imply that bad sectors are correcting themselves automatically.

Pending sectors can happen when the power is turned of during a write. In that case the block is not necessarily bad.
badblocks will to write to the sector, if it works, the sector becomes good, in case it does not work the sector gets reallocated.
In either case the pending sector count is decreased but the reallocated sector count is not necessarily increased.

NASbox · Jul 24, 2020

Fredda said:
Pending sectors can happen when the power is turned of during a write. In that case the block is not necessarily bad.
badblocks will to write to the sector, if it works, the sector becomes good, in case it does not work the sector gets reallocated.
In either case the pending sector count is decreased but the reallocated sector count is not necessarily increased.

It is possible that it could be power damage as there was a point when I had a bad UPS battery.

My current plan is to replace the drive with a new one after I finish buring it in.

Once I don't need the drive anymore I am going to run the burn in on the drive. If the damage is temporary then it should clear things up, and if the drive is flakey it will make it worse. Either way I should know what to do.

NASbox · Jul 27, 2020

I replaced the disk, and after removing it from the pool I did a full zero wipe from the Storage / Disks FreeNAS menu.

After the wipe this is the output from smartctl appears to have cleared any errors:

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   167   051    Pre-fail  Always       -       38
  3 Spin_Up_Time            0x0027   211   182   021    Pre-fail  Always       -       8425
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       313
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   046   046   000    Old_age   Always       -       39912
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       96
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       66
193 Load_Cycle_Count        0x0032   188   188   000    Old_age   Always       -       38807
194 Temperature_Celsius     0x0022   112   096   000    Old_age   Always       -       40
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

Given the nature of the data on the drive, it is quite likely that the surface has not experienced a write since the pool was created about 5 years ago since the content of the drive is largely library material that doesn't get updated frequently if at all. Does recorded info fade with time?

Am I correct in assuming that scrub is read only unless an error is found? No errors = no writes during scrub.
If an error is found, then the data is reconstructed and written to another part of the disk?

Any thoughts about this behaviour?

Now that the drive is out of the pool, I am going to run one pass of badblocks followed by a selftest to see what happens.

If the drive is clean after that would it be safe to assume that I can be as confident of the drive as I can be of any drive of the same type with a similar number of hours?

NASbox · Aug 11, 2020

I decided to retire the disk when I found an are in the upper TB that started throwing a ton of errors and started to put huge numbers on the 'Raw_Read_Error_Rate'.

As a matter of intellectual curiosity (since the drive is being scrapped), I am confused by the behaviour of SMART and was hoping someone might be able to explain the behaviour I have been observing..

I was able to use dd to write a full drive without errors. Writing cleared 'Current_Pending_Sector' to zero. When I read the bad area back with dd, I got an error, and issuing multiple reads on the bad spot didn't cause the sector to be reallocated. I could successfully write the sector that caused the read error repeately without any errors and reading always caused an error. There didn't seem to be any way to force the bad sector to be replaced, and the drive electronics didn't seem to catch the write error. I thought modern drives attempted to read back after writing to make sure the write was completed accurately.

Can someone explain this weird behaviour and what might force reallocation?

Code:
SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 001 001 051 Pre-fail Always FAILING_NOW 14644 3 Spin_Up_Time 0x0027 211 182 021 Pre-fail Always - 8450 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 314 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 40259 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 97 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 67 193 Load_Cycle_Count 0x0032 188 188 000 Old_age Always - 38811 194 Temperature_Celsius 0x0022 115 096 000 Old_age Always - 37 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 198 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 2 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: unknown failure 90% 40220 - # 2 Extended offline Completed: unknown failure 90% 40220 - # 3 Extended offline Completed: read failure 10% 39625 3072421288 # 4 Extended offline Completed without error 00% 17380 - # 5 Extended offline Completed without error 00% 92 - # 6 Extended offline Completed without error 00% 57 - # 7 Short offline Completed without error 00% 13 - # 8 Extended offline Completed without error 00% 12 - # 9 Short offline Completed without error 00% 0 - #10 Conveyance offline Completed without error 00% 0 -

Important Announcement for the TrueNAS Community.

Is there any way of dealing with pending sectors?

NASbox

Guru

Alecmascot

Guru

NASbox

Guru

Alecmascot

Guru

NASbox

Guru

Alecmascot

Guru

Jailer

Not strong, but bad

NASbox

Guru

Fredda

Guru

NASbox

Guru

NASbox

Guru

NASbox

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Is there any way of dealing with pending sectors?

Guru

Guru

Guru

Guru

Guru

Guru

Not strong, but bad

Guru

Guru

Guru

Guru

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Is there any way of dealing with pending sectors?"

Similar threads