smartctl vs Windows Smart analysis programs

turment

Dabbler
Joined
Feb 3, 2020
Messages
46
I have found some discrepancies and different results from smartctl results and Windows programs analysis about same "failing" disks.

I had a couple of disks having pending sectors and while smartctl was able to perform a long test on one disk, it was unable to perform on the other, always giving me read error.

Before thrashing the disk to bin, I decided to give a try with Windows based programs and I was able both to perform smart long test with NO errors and to scan entire surface for errors with remap on and have NO errors there too.

This was a bit strange. I double tested and used different Windows based programs (Victoria + CrystalDiskInfo + others), having always same results and SMART raw values.

Up to smartctl:
Code:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0027   171   167   021    Pre-fail  Always       -       6433
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1835
  5 Reallocated_Sector_Ct   0x0033   199   199   140    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       8057
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1296
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       83
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1835
194 Temperature_Celsius     0x0022   129   105   000    Old_age   Always       -       21
196 Reallocated_Event_Count 0x0032   199   199   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0032   200   195   000    Old_age   Always       -       9
198 Offline_Uncorrectable   0x0030   200   196   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0008   200   195   000    Old_age   Offline      -       0

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      8057         776006193
# 2  Extended offline    Completed: read failure       90%      8056         776006193
# 3  Extended offline    Completed: read failure       90%      8056         776006193
# 4  Extended offline    Completed: read failure       90%      8055         776006193
# 5  Extended offline    Completed: read failure       90%      8045         776016665
# 6  Extended offline    Completed: read failure       90%      8045         776016665
# 7  Extended offline    Completed: read failure       90%      8045         776016665
# 8  Extended offline    Completed: read failure       90%      8045         776016665
# 9  Extended offline    Completed: read failure       90%      8045         776016664
#10  Extended offline    Completed: read failure       90%      8045         776016664
#11  Extended offline    Completed: read failure       90%      8045         776016664
#12  Extended offline    Completed: read failure       90%      8045         776016664
#13  Extended offline    Completed: read failure       90%      8045         776016664
#14  Extended offline    Completed: read failure       90%      8045         776006457
#15  Extended offline    Completed: read failure       90%      8040         776006457
#16  Extended offline    Completed: read failure       90%      8040         776006457
#17  Extended offline    Completed: read failure       90%      8040         776006457
#18  Extended offline    Completed: read failure       90%      8037         776006192
#19  Extended offline    Completed: read failure       90%      8032         776006192
#20  Extended offline    Completed: read failure       90%      8030         776006192
#21  Extended offline    Completed: read failure       90%      8030         776006192

Up to Victoria:
Code:
WDC WD10EACS-00D6B1   WD-WCAU46011083
-----------------------------------------------------------------------------------
  ID      Name                   Value  Worst  Tresh       Raw    Health
-----------------------------------------------------------------------------------
  1 Raw read error rate                 200    200     51        0   •••••
  3 Spin-up time                        160    158     21     6975   •••••
  4 Number of spin-up times              99     99      0     1850   •••• 
  5 Reallocated sector count            200    200    140        0   •••••
  7 Seek error rate                     100    253      0        0   •••••
  9 Power-on time                        88     88      0     8817   •••• 
 10 Spin-up retries                     100    100      0        0   •••••
 11 Recalibration retries               100    100      0        0   •••••
 12 Power cycle count                    99     99      0     1310   •••• 
192 Power-off retract count             200    200      0       76   •••••
193 Load/unload cycle count             200    200      0     1850   •••••
194 HDA Temperature                     115    105      0 35°C/95°F   •••• 
196 Reallocated event count             200    200      0        0   •••••
197 Current pending sectors             200    200      0        0   •••••
198 Offline scan UNC sectors            200    200      0        0   •••••
199 Ultra DMA CRC errors                200    200      0        0   •••••
200 Multi zone error rate               200    200      0        0   •••••

00:22:10 : Starting Ext off-line routine SMART Test (2)... OK
04:21:25 : *** Scan results: no warnings, no errors. Last block at 1953525167 (1,0 TB), time 3 hours 57 minutes 16 seconds. 

Has smartctl some bugs or is it extremely picky? On FreeNAS I have no errors on scrub, and the pool is always in good shape. I went thru 11.2 updates and now I am on 11.3 release.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
Looks like you tested two similar, but different disks. Have both tools report disk serial numbers and compare.
 

turment

Dabbler
Joined
Feb 3, 2020
Messages
46
Looks like you tested two similar, but different disks. Have both tools report disk serial numbers and compare.

You are right. :(

Unfortunately Windows and FreeBSD order disks in a different manner and I picked up the wrong one from bay.

I will redo tests and post here.
 

turment

Dabbler
Joined
Feb 3, 2020
Messages
46
Ok, I got the right drive.

Before surface test:
Code:
WDC WD10EACS-00D6B1   WD-WCAU45982613
-----------------------------------------------------------------------------------
  ID      Name                   Value  Worst  Tresh       Raw    Health
-----------------------------------------------------------------------------------
  1 Raw read error rate                 200    200     51                1   •••••
  3 Spin-up time                        170    167     21             6475   •••••
  4 Number of spin-up times              99     99      0             1836   •••• 
  5 Reallocated sector count            199    199    140                1   •••••
  7 Seek error rate                     100    253      0                0   •••••
  9 Power-on time                        89     89      0             8058   •••• 
 10 Spin-up retries                     100    100      0                0   •••••
 11 Recalibration retries               100    100      0                0   •••••
 12 Power cycle count                    99     99      0             1297   •••• 
192 Power-off retract count             200    200      0               83   •••••
193 Load/unload cycle count             200    200      0             1836   •••••
194 HDA Temperature                     120    105      0        30°C/86°F   •••• 
196 Reallocated event count             199    199      0                1   •••••
197 Current pending sectors             200    195      0                9   •••••
198 Offline scan UNC sectors            200    196      0                1   •••••
199 Ultra DMA CRC errors                200    200      0                1   •••••
200 Multi zone error rate               200    195      0                0   •••••

After surface+remap with Victoria under Windows:
Code:
12:38:57 : Model: WDC WD10EACS-00D6B1; Capacity: 1953525168 LBAs; SN: WD-WCAU45982613; FW: 01.01A01
12:47:10 : Starting Reading, LBA=0..1953525167, FULL, sequential access w. REMAP, tio 2000ms
14:01:15 : LBA 776'006'193 try REMAP... complete
14:01:24 : LBA 776'006'194 try REMAP... complete
14:01:34 : LBA 776'006'195 try REMAP... complete
14:01:44 : LBA 776'006'196 try REMAP... complete
14:01:53 : LBA 776'006'197 try REMAP... complete
14:02:03 : LBA 776'006'198 try REMAP... complete
14:02:13 : LBA 776'006'199 try REMAP... complete
14:02:22 : LBA 776'006'280 try REMAP... complete
14:02:28 : LBA 776'006'281 try REMAP... complete
14:02:35 : LBA 776'006'282 try REMAP... complete
14:02:42 : LBA 776'006'283 try REMAP... complete
14:02:48 : LBA 776'006'284 try REMAP... complete
14:02:55 : LBA 776'006'285 try REMAP... complete
14:03:02 : LBA 776'006'286 try REMAP... complete
14:03:08 : LBA 776'006'287 try REMAP... complete
14:03:15 : LBA 776'006'458 try REMAP... complete
14:03:21 : LBA 776'006'459 try REMAP... complete
14:03:28 : LBA 776'006'460 try REMAP... complete
14:03:34 : LBA 776'006'461 try REMAP... complete
14:03:41 : LBA 776'006'462 try REMAP... complete
14:03:47 : LBA 776'006'463 try REMAP... complete
14:03:54 : LBA 776'016'665 try REMAP... complete
14:04:00 : LBA 776'016'666 try REMAP... complete
14:04:06 : LBA 776'016'667 try REMAP... complete
14:04:13 : LBA 776'016'668 try REMAP... complete
14:04:20 : LBA 776'016'669 try REMAP... complete
14:04:26 : LBA 776'016'670 try REMAP... complete
14:04:33 : LBA 776'016'671 try REMAP... complete
16:29:04 : Program terminated.

16:35:25 : *** Scan results: Warnings - 28, errors - 0. Last block at 1'953'525'167 (1,0 TB), time 3 hours 48 minutes 15 seconds. 
WDC WD10EACS-00D6B1   WD-WCAU45982613
-----------------------------------------------------------------------------------
  ID      Name                   Value  Worst  Tresh       Raw    Health
-----------------------------------------------------------------------------------
  1 Raw read error rate                 200    200     51                1   •••••
  3 Spin-up time                        188    167     21             5591   •••••
  4 Number of spin-up times              99     99      0             1837   •••• 
  5 Reallocated sector count            199    199    140                1   •••••
  7 Seek error rate                     100    253      0                0   •••••
  9 Power-on time                        89     89      0             8061   •••• 
 10 Spin-up retries                     100    100      0                0   •••••
 11 Recalibration retries               100    100      0                0   •••••
 12 Power cycle count                    99     99      0             1298   •••• 
192 Power-off retract count             200    200      0               84   •••••
193 Load/unload cycle count             200    200      0             1837   •••••
194 HDA Temperature                     114    105      0        36°C/96°F   •••• 
196 Reallocated event count             199    199      0                1   •••••
197 Current pending sectors             200    195      0                0   •••••
198 Offline scan UNC sectors            200    196      0                1   •••••
199 Ultra DMA CRC errors                200    200      0                1   •••••
200 Multi zone error rate               200    195      0                0   •••••

I am now running SMART long test under Victoria, to get further results.

As I told before, perhaps smartctl is a bit too picky.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
smartctl is not picky, it is a reporting tool. It reads SMART attributes and test logs from the drive and displays them. The interpretation is up to you. In addition to the same reading of attributes, Victoria does writes to disk, in attempt to get bad sectors remapped or transient write errors rectified. smartctl does not do this. All the other difference is in the eye of the beholder really.
 

turment

Dabbler
Joined
Feb 3, 2020
Messages
46
smartctl is not picky
I called it picky because it gave up at first error.

I tried to launch long test at least 20 times on the very same disk I reported above and it did exit after a couple of seconds giving the infamous read error.

I had to shut FreeNas down, strip the disk, mount in a Windows machine and have it surface corrected and long smart passed without a glitch.

Now smarctl works fine on a long test with that disk but I think it is a bit useless if it aborts the long test without telling you which sectors to correct.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
Oh, I see. That's correct. What happens is,

Smartctl asks the drive, "would you please do a (long or short) self-test". The drive starts reading itself, and encountering the first read error, comes back "hey smartctl, here is your test result, I am faulty", and that concludes the test. The test routine is wholly internal to the drive, controlled by the drive firmware. Smartctl really has no say in how the test is performed.

Now, Victoria goes different, it says "read me the sector 0", and drive comes back "here is your sector 0 content". Then the process repeats until at some point drive comes back with "sorry I can't read sector 1234, have this UNC uncorrectable error instead". Victoria will then ask "write these zeros into 1234, and if unable, then remap it to a spare". If there was some kind of a transient which result in half-written sector previously, or if there was a weak spot or something, the write itself corrects the problem. If it does not, then hopefully a remap happens, and the process moves on. Victoria controls how each sector is treated during its surface scan.

This is the difference between the two. Smartctl is a diagnostic tool, and Victoria a repair/recovery tool. I'm not sure these details are really important, but I decided to put the explanation here for the sake of completeness, maybe someone will need it later.
 

turment

Dabbler
Joined
Feb 3, 2020
Messages
46
Yep you are right too. Could be nice to have a tool in FreeNAS to read ALL the bad sectors without having to run a 6+ hours test for every failing LBA on disk. And remap would be even nicer too.
 

turment

Dabbler
Joined
Feb 3, 2020
Messages
46
I am having problems after I reallocated sectors with Victoria on Windows.

FreeNAS keeps on alerting me at every boot that the ada2 drive still have sector problems but when I run smartctl -long diagnostics, it can't find any,

Any way to tell it "shut up, everything is fine now" and reset that alert?

Thanks.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Any way to tell it "shut up, everything is fine now" and reset that alert?
Once your disk has reallocated a sector, it's just a fact. There's no erasing it.

For most of us, if the process of sector reallocation begins, it's time to send that disk to the dud-pile.

Trying to nurse a disk through the last years (or maybe days) of its life with errors becoming more and more frequent before final expiration is not the business of a server that intends to uphold data integrity.

Other OS options (not using ZFS) are better candidates for that job.
 

turment

Dabbler
Joined
Feb 3, 2020
Messages
46
Once your disk has reallocated a sector, it's just a fact. There's no erasing it.
FreeNAS is not complaining anymore to the sectors I fixed within BSD shell. I'd like to do the same with the sectors I fixed within Windows. As I told, smartctl doesn't report any pending or unallocated sector but FreeNAS is convinced they still exist...
 
Top