smartctl vs badblocks

Status
Not open for further replies.

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
Well, I have a dilemma.

Recently I had a big problem with my WD60EFRX disks (WD Red-6TB).. I had a 9 disk array and replaced about 7 of them due to failures.. I tested every replacement with badblocks before confirming as safe..

Last two disks arrived from WDC Support ( which I had to send because local warranty was expired ).

I put them in my server and run badblocks for both of them.. One of them didn't even start counting. I thought this is bad... The other start counting from 0,01 and so on.. After about 20 hours I saw that one of them is finished first phase, and started second. The other one was still about 3 percent of first phase. When I checked the logs I saw there are lots of unretriable errors on that disk... After a long while I stopped the test on the bad(?) disk and checked and saw that the good disk has completed all the tests..

Then I thought maybe the disk is not bad ( it just came from WDC so it's new or certified to be good right ) but the cable could be the problem. So I switched the good disk's slot with bad disk's and started the test again on both disks.. Again the same result. The bad disk failed the test again while good disk completed the test.

I opened a case with WDC reporting my findings, and they suggested WD Disk Diag Tool on Windows first before sending the disk back again for replacement.

I connected the disk to Windows and started the tool first with quick test ( which I didn't think for a minute that there would be any errors ) and it passed. Then I started long test which lasted about 9,5 hours... In this test I thought it should find something at least but it also completed the test without any errors. I think these are same tests as "smartctl -t short" and "smartctl -t long".. I did start "smartctl -t long" on this disk in the NAS system when the badblocks failed but the test is interrupted on 70% due to some failure on the disk..

So as you can see I have a disk which refuses to run when connected to my NAS but works perfectly ( as far as smart test goes ) on Windows. Tonight I'll install a Linux system and test with Badblocks again on the desktop computer...

If it will pass the test like extended smart test what should I do?

In my NAS I'm using LSI 2008 controller and Intel expander to connect the disks, and there are 12 disks already working in the array while 7 disks are tested and currently on standby.. So I'm sure enough that there's nothing wrong with controller. I also checked the disk slot for errors as I wrote above and if a good disk passed in both slots then there should be nothing wrong with it.

Why do you think this disk refuses to run with my NAS system?!? :)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Could be a power supply issue. Perhaps it’s marginal? You could try disconnecting power from the other disks and see if the bad disks start working.

Badblocks writes data. If the drives can’t be written to, they’re bad. Or there is another hw problem.

And in my experience refurbished disks from manufacturers are often the ones with the dodgy problems. Ie if it works with the diagnostics, send it out as working fine

And since smart tests are non destructive they obviously don’t test the full surfaces ability to be written...
 

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
I think PSU is more than enough for this system ( 600W )

The CPU is low powered Intel C2750 on AsRock C2750D4I.. 7 disks are on standby and 2 USB (boot), 12 WD-Red disks (tank) plus 1 SSD (jail) is active.. I ran the badblocks test on 6 disks currently on standby at the same time previously. PSU would have failed then I think before this..

I still think smartctl test is not enough to confirm the disk is good. I'll test it tonight with badblocks and report the findings..
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
If it will pass the test like extended smart test what should I do?
If they offer to replace it again, why would you hold on to it?
It's supposed to be (as good as) new when it comes back from the vendor and it should work out of the box (same as the other disks).
 

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
Yes, but it takes almost a month for the replacement to arrive, and I still couldn't prove that the disk is defective.. What if badblocks test on Linux will not give any error? It means something is wrong when it's attached to NAS but I need to be sure of it first.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
You said:
I did start "smartctl -t long" on this disk in the NAS system when the badblocks failed but the test is interrupted on 70% due to some failure on the disk..
If the drive will not successfully complete a SMART long test, it is a failed drive and needs to be replaced. That is all there is to it.
The WD diagnostic does some strange proprietary testing that is virtually meaningless. I stopped using that years ago because it would say a drive was good when it was actually unusable.
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
7 disks out of 9 is an incredibly high failure rate. Any idea why?

What are the parameters you are using to run badblocks?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
7 disks out of 9 is an incredibly high failure rate. Any idea why?

What are the parameters you are using to run badblocks?

Because a 600W PSU is not sufficient for 20 disks.
 
Status
Not open for further replies.
Top