I can't get Current Pending Sector from 1 back to 0

Status
Not open for further replies.

drwoodcomb

Explorer
Joined
Sep 15, 2016
Messages
74
I'm hoping someone can help me point out what I might be doing wrong here. I tried fixing this problem with a forum post I found on this website but it didnt seem to work.

It all started when I received the following Critical Error:

Device: /dev/da3 [SAT], 1 Offline uncorrectable sectors
Device: /dev/da3 [SAT], 1 Currently unreadable (pending) sectors
Device: /dev/da3 [SAT], Self-Test Log error count increased from 0 to 1


Since I have my FreeNAS server set up so that it does SMART long tests every month I checked the SMART output for da3 to find out where the bad sector is


root@freenas:~ # smartctl -a /dev/da3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Se
Device Model: WDC WD3000F9YZ-09N20L1
Serial Number: WD-WCC5D0078019
LU WWN Device Id: 5 0014ee 25fb52ce4
Firmware Version: 01.01A02
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Jul 15 14:17:30 2018 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (30720) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 333) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 156 153 021 Pre-fail Always - 11175
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 24
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10790
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 24
16 Unknown_Attribute 0x0022 006 194 000 Old_age Always - 113281321832
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 1
194 Temperature_Celsius 0x0022 117 114 000 Old_age Always - 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 10775 544874240
# 2 Short offline Completed: read failure 90% 10774 544874240
# 3 Short offline Completed: read failure 90% 10704 544874240
# 4 Short offline Completed: read failure 90% 10536 544874240
# 5 Extended offline Completed: read failure 90% 10373 544874240
# 6 Short offline Completed: read failure 90% 10321 544874240
# 7 Short offline Completed: read failure 90% 10153 544874240
#8 Short offline Completed without error 00% 9987 -
#9 Short offline Completed without error 00% 9819 -
#10 Extended offline Completed without error 00% 9637 -
#11 Short offline Completed without error 00% 9579 -
#12 Short offline Completed without error 00% 9411 -
#13 Short offline Completed without error 00% 9244 -
#14 Short offline Completed without error 00% 9075 -
#15 Extended offline Completed without error 00% 8917 -
#16 Short offline Completed without error 00% 8860 -
#17 Short offline Completed without error 00% 8692 -
#18 Short offline Completed without error 00% 8524 -
#19 Short offline Completed without error 00% 8356 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


After looking at the output I see that the LBA_of_first_error is 544874240. I also see that my sector size is 512 bytes logical, 4096 bytes physical which I assumed meant that when using the dd command bs=512.

I then used the following commands:


root@freenas:~ #sysctl kern.geom.debugflags=16
root@freenas:~ #dd if=/dev/zero of=/dev/da3 bs=512 count=1 seek=544874240 conv=noerror,sync
1+0 records in
1+0 records out
512 bytes transferred in 0.000300 secs (1709396 bytes/sec)


I then checked the SMART output once again but nothing changed. I thought maybe the sector size should have been bs=4096 since the SMART output said sector size 512 bytes logical, 4096 bytes physical so I wasnt entirely sure which one to use.


root@freenas:~ # dd if=/dev/zero of=/dev/da3 bs=4096 count=1 seek=544874240 conv=noerror,sync
1+0 records in
1+0 records out
4096 bytes transferred in 0.000337 secs (12167445 bytes/sec)


After checking the SMART output once again it still had not changed anything. I even tried running a short and a long SMART test but still no change.


root@freenas:~ # smartctl -a /dev/da3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Se
Device Model: WDC WD3000F9YZ-09N20L1
Serial Number: WD-WCC5D0078019
LU WWN Device Id: 5 0014ee 25fb52ce4
Firmware Version: 01.01A02
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Jul 15 14:20:09 2018 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (30720) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 333) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 156 153 021 Pre-fail Always - 11175
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 24
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10790
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 24
16 Unknown_Attribute 0x0022 006 194 000 Old_age Always - 113281322705
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 1
194 Temperature_Celsius 0x0022 117 114 000 Old_age Always - 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 10775 544874240
# 2 Short offline Completed: read failure 90% 10775 544874240
# 3 Short offline Completed: read failure 90% 10775 544874240
# 4 Short offline Completed: read failure 90% 10774 544874240
# 5 Short offline Completed: read failure 90% 10704 544874240
# 6 Short offline Completed: read failure 90% 10536 544874240
# 7 Extended offline Completed: read failure 90% 10373 544874240
# 8 Short offline Completed: read failure 90% 10321 544874240
# 9 Short offline Completed: read failure 90% 10153 544874240
#10 Short offline Completed without error 00% 9987 -
#11 Short offline Completed without error 00% 9819 -
#12 Extended offline Completed without error 00% 9637 -
#13 Short offline Completed without error 00% 9579 -
#14 Short offline Completed without error 00% 9411 -
#15 Short offline Completed without error 00% 9244 -
#16 Short offline Completed without error 00% 9075 -
#17 Extended offline Completed without error 00% 8917 -
#18 Short offline Completed without error 00% 8860 -
#19 Short offline Completed without error 00% 8692 -
#20 Short offline Completed without error 00% 8524 -
#21 Short offline Completed without error 00% 8356 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


If anyone has any advice I would greatly appreciate it
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Since I have my FreeNAS server set up so that it does SMART long tests every month I checked the SMART output for da3 to find out where the bad sector is
I would test more often. It doesn't hurt anything to do a short test every day and a long test once a week. Testing once a month lets it go too long before a fault might be detected.
After checking the SMART output once again it still had not changed anything. I even tried running a short and a long SMART test but still no change.
That is a lot of failed tests:
Code:
SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed: read failure	   90%	 10775		 544874240
# 2  Short offline	   Completed: read failure	   90%	 10775		 544874240
# 3  Short offline	   Completed: read failure	   90%	 10775		 544874240
# 4  Short offline	   Completed: read failure	   90%	 10774		 544874240
# 5  Short offline	   Completed: read failure	   90%	 10704		 544874240
# 6  Short offline	   Completed: read failure	   90%	 10536		 544874240
# 7  Extended offline	Completed: read failure	   90%	 10373		 544874240
# 8  Short offline	   Completed: read failure	   90%	 10321		 544874240
# 9  Short offline	   Completed: read failure	   90%	 10153		 544874240
I would say it is just a bad drive. With only 10k hours, it should still be in warranty. Just contact WesternDigital, they will take it back for an exchange. Do you have some cold spares on the shelf ready to go? I like to keep a couple in each of the sizes I use.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
PS. You really shouldn't try to 'fix' a bad sector. Usually, if it works at all, it is just a temporary measure and the drive will show more bad sectors later.
You should use a burn-in process to test all your drives before you start using them. There is a good script for that here:

Github repository for FreeNAS scripts, including disk burnin
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/

There is also some general guidance on setup and configuration of FreeNAS here and one of the things it goes over is disk prep prior to first use:

Uncle Fester's Basic FreeNAS Configuration Guide
https://www.familybrown.org/dokuwiki/doku.php?id=fester:intro
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
PPS. Having 1 drive out of 10 give you bad sectors isn't such a bad thing. I have a server at work with 60 of the WD Red Pro drives in it and I had 4 drives that needed to be replaced in the first 9 months.

If you really feel that it is a false error, you could pull it from the array, install it in a computer that will let you do a "Full" or "Long" format. Something that will actually write every block on the drive. Then you need to delete the partition table. If you have a Windows computer, you would use a utility called diskpart from the command line to do that. Once you have done a full format and wiped the partition, you could put it back in the FreeNAS and resilver it back into the pool. If it still shows as having a bad sector at that point, the only answer would be to send it to WD for warranty. Since it is one of the Gold 'Data Center' drives with a 5 year warranty, you could wait it out and see if it has any more errors. They sometimes don't accumulate very fast, but sometimes they do. I had a drive go from 1 to 10k in a day one time.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Bad sector? Failing SMART tests (which I concur should be run more often)? In warranty? RMA the drive, no question.
 

drwoodcomb

Explorer
Joined
Sep 15, 2016
Messages
74

Hi guys, I appreciate the input. I will definitely increase the SMART test schedules from now on. As for the drive warranty I bought a bunch of "refurbished" drives on ebay. I originally bought 12 drives. 10 of them for the server and 2 extra cold drives for exactly this kind of situation. I've only had the drives for a little over a year and the drive we are talking about now is the second drive to have problems. Unfortunately the warranty period was only 6 months.

As for the errors you mentioned:

Code:
SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed: read failure	   90%	 10775		 544874240
# 2  Short offline	   Completed: read failure	   90%	 10775		 544874240
# 3  Short offline	   Completed: read failure	   90%	 10775		 544874240
# 4  Short offline	   Completed: read failure	   90%	 10774		 544874240
# 5  Short offline	   Completed: read failure	   90%	 10704		 544874240
# 6  Short offline	   Completed: read failure	   90%	 10536		 544874240
# 7  Extended offline	Completed: read failure	   90%	 10373		 544874240
# 8  Short offline	   Completed: read failure	   90%	 10321		 544874240
# 9  Short offline	   Completed: read failure	   90%	 10153		 544874240


Are these not the same error?

Since I have one cold spare left I think I will take this one out and take your advice and try writing to the entire drive. You mentioned writing to every block with diskpart and then removing all the partitions. Should I do it in that order or do I remove the partitions first and then write to the entire drive? Does the entire drive
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I bought a bunch of "refurbished" drives on ebay.
Not quite the same thing, but I bought "white label" drives off eBay about two years ago. Out of six drives, I've had seven (or maybe eight) failures--they've all been replaced, and at least one of the replacements has failed too.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Unfortunately the warranty period was only 6 months.
Sorry. If you only had two refurbished drives go out in a year, that isn't so bad either. The refurbished ones usually fail more often than regular new drives. If you have not done it yet, you might want to buy a couple more spares.
Are these not the same error?
It is separate tests, so the same thing is failing repeatedly and that tells me it is a hard, mechanical, failure.
If it fails self-test, it isn't likely to recover from that.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I have said this in other threads but I will try to give a unique take on it so it isn't so repetitive. A few years back, around 2013 or 14, I built a tower system from parts and I sourced all the drives from eBay, six WD drives and six Hitachi drives, all with some use on them.
Photo:
20160124_090832.png

The system didn't have hot-swap bays and it made me very irritated for a number of reasons.
- I had enabled deduplication which made performance abysmal.
- I had used RAIDz3 and I blamed that (in part) on the poor performance.
- I couldn't get by for more that a month (maybe two) without needing to disassemble the thing to swap out a drive.
It was because of my experience with this system that all of my systems since have had hot-swap bays.
I paid less than half as much for those drives as I would have paid for a new drive, but I paid for that savings with hard labor.
This experience is part of the reason that I keep two (or more) spares for each size drive that I use, even now.
I bought a whole case of drives around the end of last year and some of those drives were dedicated to being spares.
This is 12 of them as I was getting ready to put them in my server:
20180701_134046.jpg
I use DBAN and do a DOD short wipe with verify between each pass as a test for my drives.
It does a very similar thing to the burn-in script on the forum, but I find it a bit easier to use.
I am not saying to only buy new drives though. I have had problems even with new ones.
I bought a batch of 8 Toshiba drives in the early part of 2017 and within six months 3 of them had hard failed.
Any drive can be a dud. I have had pretty good results from the Seagate Desktop drives.
If I remember to do it, I will post a photo when I get home with a side by side of the Seagate Desktop and the Seagate Barracuda.
They are similar but very different drives and I am not sure I trust the new Barracuda drives, but I have two of them I plan to test...
 
Last edited:

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
I can't get Current Pending Sector from 1 back to 0


You don't.

You replace the disk, it isn't fun / expensive :( but you always replace the disk.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
da3 is failing smart long and short test. Regardless of what else you see the drive is failing and needs to be replaced.
 
Status
Not open for further replies.
Top