refreshing the value of LBA_of_first_error

Status
Not open for further replies.

neveragain

Cadet
Joined
Oct 20, 2013
Messages
2
Hi,

Yesterday my FreeNAS server started reporting the following:

SMART error (CurrentPendingSector) detected
Device: /dev/ada0, 8 Currently unreadable (pending) sectors
SMART error (OfflineUncorrectableSector) detected
Device: /dev/ada0, 8 Offline uncorrectable sectors

I've researched the error and found the following guide referenced by one of the forum threads here: http://daemon-notes.com/articles/system/smartmontools/current-pending

I've followed the procedure and was able to refresh 1 sector by using the following:

Code:
dd if=/dev/ada0 of=/dev/ada0 bs=512 count=1 iseek=2643817536 oseek=2643817536 conv=noerror,sync


After that, I have re-run a long test on the disk to try and get the next sector ID, however, after the completion of the test, I get the following:

Code:
smartctl -l selftest /dev/ada0
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error      00%      8550        -
# 2  Extended offline    Completed: read failure      70%      8541        2643817536
# 3  Short offline      Completed without error      00%      8539        -
# 4  Extended offline    Completed without error      00%      8434        -
# 5  Short offline      Completed without error      00%      8261        -
# 6  Short offline      Completed without error      00%      7997        -
# 7  Short offline      Completed without error      00%      7877        -
# 8  Extended offline    Completed without error      00%      7714        -
# 9  Short offline      Completed without error      00%      7541        -
#10  Short offline      Completed without error      00%      7277        -
#11  Short offline      Completed without error      00%      7157        -
#12  Short offline      Completed without error      00%      6882        -
#13  Short offline      Completed without error      00%      6762        -
#14  Short offline      Completed without error      00%      6724        -
1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 1


The LBA_of_first_error is still showing the original sector ID. However, /var/log/messages is now reporting the following, which tells me that the sector refresh was successful:

Code:
Oct 20 12:09:59 nas01 smartd[2485]: Device: /dev/ada0, 7 Currently unreadable (pending) sectors
Oct 20 12:09:59 nas01 smartd[2485]: Device: /dev/ada0, 7 Offline uncorrectable sectors
Oct 20 12:09:59 nas01 smartd[2485]: Device: /dev/ada0, 7 Currently unreadable (pending) sectors
Oct 20 12:09:59 nas01 smartd[2485]: Device: /dev/ada0, 7 Offline uncorrectable sectors


I would like to get the remaining sector IDs.. What am I missing here?

Thanks in advance.
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Hi neveragain,

I think you need to scrub the volume, then run another long SMART test.

-Will
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Long tests don't always compare the sector data with its ECC. So when the sector's data and its ECC don't match(or the firmware determines that an error occurred on read) then you get the Current Pending Sector Count. But if your manufacturer doesn't do the comparison, it might pass the long test. Unfortunately, the manufacturers that do and don't do the comparison is a secret. :(

Do a scrub. If that doesn't fix it, then the only other option you really have is to do something like a bad blocks on the disk. That'll write to the whole drive and you should then see zero again.
 

neveragain

Cadet
Joined
Oct 20, 2013
Messages
2
Thank you very much for your responses. I've done as suggested: manually initiated a scrub on the pool and then on completion ran a long SMART test on /dev/ada0. Unfortunately no change in the selftest output. Unreadable/uncorrectable sector messages persist in /var/log/messages also.

Code:
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error      00%      8569        -
# 2  Extended offline    Completed without error      00%      8550        -
# 3  Extended offline    Completed: read failure      70%      8541        2643817536
# 4  Short offline      Completed without error      00%      8539        -
# 5  Extended offline    Completed without error      00%      8434        -
# 6  Short offline      Completed without error      00%      8261        -
# 7  Short offline      Completed without error      00%      7997        -
# 8  Short offline      Completed without error      00%      7877        -
# 9  Extended offline    Completed without error      00%      7714        -
#10  Short offline      Completed without error      00%      7541        -
#11  Short offline      Completed without error      00%      7277        -
#12  Short offline      Completed without error      00%      7157        -
#13  Short offline      Completed without error      00%      6882        -
#14  Short offline      Completed without error      00%      6762        -
#15  Short offline      Completed without error      00%      6724        -
1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 1


A new negative development are checksum errors on another device. I had scrubs scheduled to run every 35 days, and not sure when it ran last.

Code:
pool: nas01_pool
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
  scan: scrub repaired 132G in 3h43m with 0 errors on Mon Oct 21 12:19:23 2013
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        nas01_pool                                      ONLINE      0    0    0
          raidz2-0                                      ONLINE      0    0    0
            gptid/927df9ce-0b5f-11e2-80e1-902b3498f036  ONLINE      0    0    0
            gptid/932211a8-0b5f-11e2-80e1-902b3498f036  ONLINE      0    0    0
            gptid/93bf9dc6-0b5f-11e2-80e1-902b3498f036  ONLINE      0    0    0
            gptid/9464dae0-0b5f-11e2-80e1-902b3498f036  ONLINE      0    0 4.19M
            gptid/9506d3ce-0b5f-11e2-80e1-902b3498f036  ONLINE      0    0    0
            gptid/95aa6903-0b5f-11e2-80e1-902b3498f036  ONLINE      0    0    0
 
errors: No known data errors

I've bought a new set of drives and going to set up a new file server and migrate the data. Following that, I think, I'll try to run more tests on the disks of the original pool.
 
Status
Not open for further replies.
Top