Need help interpreting badblocks and SMART test results on older drive

Status
Not open for further replies.

tromba

Dabbler
Joined
Jul 13, 2014
Messages
15
hi,

so I just finished testing my hard drives like qwertymodo's guide here explains and I need some help figuring out what to do with the results.

A bit of background first:

I started backing up my massive music and movie collection a couple of years ago and after getting to 500 backup dvds I got fed up and felt like having the media digitally available in my home so I decided to start storing it locally on a hard drive. I started up nearly filling one WD Red 3TB and moved on to filling a 4TB one with media still to go. After a scare with a nearly dead hard drive (Partition Table got overwritten by something I was installing, duh) I started considering a NAS solution. I considered multiple off-the-shelf systems before a friend mentioned FreeNAS and I became hooked.
The plan was to have a 5-6 drive setup in RAIDZ2, reusing one of my pre-existing Reds. I needed at least 8-9TB of working space, plus the recommended 20% free space on top, landing me at around 10-10.5 usable TBs (or a nominal 12TB storage) for data, plus double parity. As I had a 3TB drive and a 4TB one, I needed either 5 x4TB (2 parity drives + 3 data drives = 12TB storage) or 6*3TB (2 parity drives + 4 data drives = 12TB storage). 3TB drives are so much cheaper over here it was almost a hundred bucks cheaper for me to buy 5 3TB drives than 4 4TB ones.

Which brings me to my setup and the impending question. I have 5 new drives and one that is about 16 months old.

After running the SMART tests, I moved on to badblocks, and the old drive is the only one that had read errors (only read errors) on the first pass, but none on any other pass. The results on the first pass were (36/0/0), where badblocks AFAIK writes (read errors/write errors/comparison errors). I searched the forum here but found nothing that seemed relevant, so I googled. The most coherent hits I found said this was the "least concerning" type of error, as it "most likely" points to a "logical" read error, as opposed to anything being "physically" wrong with the disc. They recommended repairing it with a zero write, followed by another badblocks run, which seems to have cleared the issue for the people in question.

My question: Any opinions on this? And no offence, but I mean informed ones, not just some banal adage along the "I'd just buy a new one" line. As far as I see it, the SMART test results confirm the drive is ok, apart from raw read errors, unless I'm missing something. Should I try the zero write thing and rerun the tests (which took 75 hours! per drive)?

I would love to reuse this drive and save the 100 euros it would cost me to get a sixth new one as well as avoid having two, instead of just one, Red drives that will be become utterly useless the minute my FreeNAS box is up and running.

Here are the readings after the final long SMART test:
Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  198
  3 Spin_Up_Time  0x0027  181  171  021  Pre-fail  Always  -  5925
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  659
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  095  095  000  Old_age  Always  -  3882
 10 Spin_Retry_Count  0x0032  100  100  000  Old_age  Always  -  0
 11 Calibration_Retry_Count 0x0032  100  100  000  Old_age  Always  -  0
 12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  545
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  429
193 Load_Cycle_Count  0x0032  200  200  000  Old_age  Always  -  229
194 Temperature_Celsius  0x0022  122  107  000  Old_age  Always  -  28
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  200  200  000  Old_age  Offline  -  1


Thanks.
M

EDIT: Apparently it doesn't matter how I format the table, it just becomes unreadable when I post. Sorry for that and any ideas/ solutions welcome.
EDIT#2: I put the results in between CODE tags and still no joy.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First, if you had put the table in CODE tags it would have come out okay. After you've pasted and submitted it the formatting is trashed by the forum software and there is no going back.

Second, aside from the read error rate being 128 I don't really see any cause for concern. If you are looking to validate this drive I'd do a dd write to the disk with all zeros and then another badblocks test. I'm not sure what took 75 hours, but I've done SMART long tests, badblocks and a dd test in less than 24 hours doing them sequentially on 4TB drives. So I'm really confused as to why it would take you so long.

Anyway, here's what I'd do, in order:

1. dd if=/dev/zero of=/dev/XXX bs=1M
(let that finish)
2. badblocks -svw -t 0xFE /dev/XXX (you can do any pattern you want, I just always do 0xFE for some reason. ;)
(let that finish)
3. Do a long smart test... smartctl -t long /dev/XXX
(let that finish)
4. Check the SMART test results (smartctl -a /dev/XXX). If it says no errors then I'd use the drive.
 

tromba

Dabbler
Joined
Jul 13, 2014
Messages
15
Thanks for responding so quickly cyberjock.

The SMART long tests took between 390 and 412 minutes (that's almost seven hours) and qwertymodo's guide had me running two of those, and his badblocks has FOUR passes, which took between 57 and 60 hours on my drives.. So that's how I ended up with almost 75 hours of testing per drive.
I guess if I run only 1 badblocks pass followed by a smart long test than I should be done in 24 hours too.

I'll run badblocks with oxaa though, because that's the one where I got errors. I know there is no correlation necessarily, but I'd just feel better about it and you know logic is overrated. :)

Thanks again.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
If you leave off the -t for badblocks it seems to default to 4 tests. It does 0x55, 0xaa, 0xff, 0x00. This kinda makes sense, as 0x55 is "01010101" 0xaa is "10101010" and of course 0xff is "111111" and 0x00 is "000000". About as good as it gets unless you start writing multiple passes of random data, and we all know how slow generating random data is. I let it do all 4 tests, and since it does a write / read set for each, it passes over the disk 8 times.

I then run a long smart test, and if it passes, and smart attributes look good, I'm done.
 

tromba

Dabbler
Joined
Jul 13, 2014
Messages
15
If you leave off the -t for badblocks it seems to default to 4 tests. It does 0x55, 0xaa, 0xff, 0x00. This kinda makes sense, as 0x55 is "01010101" 0xaa is "10101010" and of course 0xff is "111111" and 0x00 is "000000". About as good as it gets unless you start writing multiple passes of random data, and we all know how slow generating random data is. I let it do all 4 tests, and since it does a write / read set for each, it passes over the disk 8 times.

I then run a long smart test, and if it passes, and smart attributes look good, I'm done.

Yes, that is the default badblocks pattern batch if left without argument (albeit in this order: 0xaa, 0x55, 0xff, 0x00). Following it by a SMART long test is exactly what I did, as explained in my first post. I found a good overview of badblocks and it's arguments here.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Sorry, Yea, I wasn't totally sure on the order. I was going by memory.

Definitely not a short process on large drives. Just did some new wd 3tb red's a few days ago. Just over 48 hours for badblocks to do it's thing. Then the long smart test after that.
 

tromba

Dabbler
Joined
Jul 13, 2014
Messages
15
First, if you had put the table in CODE tags it would have come out okay. After you've pasted and submitted it the formatting is trashed by the forum software and there is no going back.

Second, aside from the read error rate being 128 I don't really see any cause for concern. If you are looking to validate this drive I'd do a dd write to the disk with all zeros and then another badblocks test. I'm not sure what took 75 hours, but I've done SMART long tests, badblocks and a dd test in less than 24 hours doing them sequentially on 4TB drives. So I'm really confused as to why it would take you so long.

Anyway, here's what I'd do, in order:

1. dd if=/dev/zero of=/dev/XXX bs=1M
(let that finish)
2. badblocks -svw -t 0xFE /dev/XXX (you can do any pattern you want, I just always do 0xFE for some reason. ;)
(let that finish)
3. Do a long smart test... smartctl -t long /dev/XXX
(let that finish)
4. Check the SMART test results (smartctl -a /dev/XXX). If it says no errors then I'd use the drive.

So I did the dd zeroes write and reran badblocks oxaa, which ran without errors. I then ran the SMART long test and the results are unchanged, i.e. apart from the (relatively) small raw value entry for random read errors, everything is good. I'm assuming this is a good thing.
I've already setup my zfs2 volume, some cifs shares and copied about 3.5 TB of data.. yaaaaaaaaay

Thanks cyberjock.
 

alheim

Dabbler
Joined
Nov 19, 2014
Messages
22
Edit: Moved to another thread.
 
Last edited:
Status
Not open for further replies.
Top