New Build, new drives have over a BILLION bad blocks

Status
Not open for further replies.

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
The only thing I would do at this point is to send back all the new drives that will
not run a conveyance test, as for that SMART error, that's one I've never seen or
even heard of so I am of no more use to you I'm afraid. A word of advice though,
don't plug any more drives into that machine before it's gone over with a magnifying glass...
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
Code:
root@APPA:~ # smartctl –t conveyance /dev/da0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

ERROR: smartctl takes ONE device name as the final command-line argument.
You have provided 3 device names:
–t
conveyance
/dev/da0

Use smartctl -h to get a usage summary

I'm not sure how this error message comes about. If a drive doesn't support SMART conveyance tests the error message would be

Code:
root@blunzn:~ # smartctl -t conveyance /dev/ada5
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Conveyance Self-test functions not supported

Sending command: "Execute SMART Conveyance self-test routine immediately in off-line mode".
Command "Execute SMART Conveyance self-test routine immediately in off-line mode" failed: Input/output error


The output shown comes from a SanDisk X300s SSD. Other drives that do not support SMART conveyance tests apparently are helium filled WD Reds. And from the smartctl man page conveyance tests can only be run on ATA drives --- rough guess: might backplanes get in the way here?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Take two that look OK and two of the worst ones, stick them in another box and run the tests again. Then compare results. Might give you a better understanding of where the issue really is ?

Sent from my A0001 using Tapatalk
Or swap the locations of good/bad drives and see if the issue follows the drives.
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
Or swap the locations of good/bad drives and see if the issue follows the drives.
The issue does follow the drives. No conveyance or bad block tests were possible on any drive in any machine now. Before they completed conveyance tests and at least tried to do bad block tests before.

Sent from my SM-G935V using Tapatalk
 

Evi Vanoost

Explorer
Joined
Aug 4, 2016
Messages
91
First thing to try would be to put the drives in a new machine on their own SATA bus and test them one by one.

It is entirely possible that one drive (or the backplane/controller itself) is acting up and sending out data making other drives appear corrupt.

Those are new drives I suppose so they shouldn't be all dead yet, one or two if you're unlucky but not all of them. If the drives are second hand, I would chuck them and get new ones.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Wow, what a thread. I agree with others who have suggested several times now that the drives should be tested in a differant machine.

You cannot troubleshoot this problem using only the FreeNAS system you are having problems with. My initial thoughts are you have an HBA or Backplane issue, not a hard drive issue, unless you bought used hard drives and they are all in serious condition.

My guide is:
1) Get a computer or USB to SATA adapter so you can test the hard drives.
2) You can use Windows if you like, use Smartmontools for windows and you can perform smartctl commands to the drive.
3) Run a conveyance test if you can, at a minimum a Smart Long test and then look at the results to ensure no failures.
4) If you have no failures then start looking at your FreeNAS hardware. I'd first start by removing the HBA and connecting up a drive directly to the MB SATA port and then running a test to see if the drives work this way. If they pass, on to the HBA.
5) Connect the HBA and connect a drive directly to it, no backplane. Run your tests, does it pass? If yes then hook up the backplane.
6) Connect a few drives to the backplane and test again, does it pass?

I suspect you have issues with the HBA or Backplane, unless you bought bad hard drives.
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
I already tried testing them in another machine, they won't run the tests there either. I'll try booting it to Windows and test it there. I doubt it is the drives since it's affecting old drives and new ones. No way 8 new drives manufactured a month apart and 3 drives manufactured years apart all die at once unless the hardware it was connected to did it. Or software I guess.

Sent from my SM-G935V using Tapatalk
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Could have been a power supply issue. But that is just a stab in the dark.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Could have been a power supply issue.
Power supply issue doesn't explain this:
root@APPA:~ # smartctl –t conveyance /dev/da0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

ERROR: smartctl takes ONE device name as the final command-line argument.
You have provided 3 device names:
–t
conveyance
/dev/da0

Use smartctl -h to get a usage summary[/quote
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
Well, if they were ok before, they sure aren't now. Last night right before bed I started the creation of a RaidZ1 pool. Less than 2 hours later I got a critical alert stating my pool was degraded because one drive faulted in response to persistant errors. So I'm shipping them all back and I'll have to contact the rackmount seller and let them know they sent me a very expensive lemon. Ugh.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Power supply issue doesn't explain this:
root@APPA:~ # smartctl –t conveyance /dev/da0 smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org ERROR: smartctl takes ONE device name as the final command-line argument. You have provided 3 device names: –t conveyance /dev/da0 Use smartctl -h to get a usage summary

Nope it doesn't but I work in a world where you cannot assume a single failure, I assume there are two or more at work.
So lets look at this for a second...

1) Was the command entered properly and using single spaces etc....
2) The next attempt should be smartctl -a /dev/da0 to see if it responds because I don't recall reading that this worked or not.
3) If the result is similar stating you provided xx names then try this smartctl -a "/dev/da0" as sometimes the quotations will make a difference, although I don't know why it would here.
4) If this still fails, then I'd try either Ubuntu Live or FreeBSD Live DVD and try the commands.

What I feel is wrong here are a few things, and I'm not trying to be mean here...
1) I still have no idea if the system has been tested properly (MemTest86 + CPU Burn-in)
2) There appears to be an issue with the HBA or Backplane.
3) Not being there myself, I don't understand how the drives are failing on a different computer system.


@DGenerateKane Here is something else to try, Place those drives in a Windows machine, partition and format them. Do they appear to work? Then as previously stated, grab a copy of Smartmontools for windows and try the smartctl -a X: to see if you can get some results out of the drive. If you can, run smartctl -t long X: and wait until it's finished (10TB drives are going to take a while, ensure your computer does not sleep) and then go fetch the results with the first command. Post those results. Don't get greedy any try to do more than a few drives for testing, just pick two drives that you think would fail.

You can also just boot up Ubuntu Live and then run the drive tests from there.
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
I already returned them. The rackmount seller is trying to blame me for using raidz1, and linked me to a reddit post, and two other sites which I didn't bother to read stating raidz1 is bad and not to use it. For those who forgot or didn't read, I already determined the drives had gone bad before creating a pool. I also never created a pool on the three drives that were known good drives before they touched the server. The good drives are now showing multiple errors signaling pre-faliure. Coincidence? Nope.

Sent from my SM-G935V using Tapatalk
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The rackmount seller is trying to blame me for using raidz1,
Oh, ffs. The seller's an idiot. RAIDZ1 would mean you're at increased risk of data loss, but is going to be completely unrelated to the fact that every single disk you plug in is bad.
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
Oh, ffs. The seller's an idiot. RAIDZ1 would mean you're at increased risk of data loss, but is going to be completely unrelated to the fact that every single disk you plug in is bad.
Exactly. After a few more messages back and forth he decided to send me a drive to test with. So I guess I'll plug it in and do all the tests again, and see how it goes. I might run all the tests on another machine first, to verify nothing is already wrong with the drive first. I can see him trying to claim it was a good drive prior to me handling it and still blaming me for the dead drives at this point.

Sent from my SM-G935V using Tapatalk
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
If this is a bif-time seller, please post the name. I certainly do not want to purchase from them. You had a lot of bad drives, it is unrealistic to have that many. If you are buying used drives then I'd say you get what you pay for. That RAIDZ1 comment, laughable.
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
It's mrrackables, a supposedly reputable seller. The replacement drive he sent was a SAS drive, so I couldn't test it on another machine. I was able to start the badblocks test this time, but the console spammed me with status errors regarding that drive. So he is now sending me a new HBA card. Hopefully that fixes it. I don't expect it until next week though unfortunately.

By the way, these are the links he sent me about not using RaidZ1.
https://serverfault.com/questions/634197/zfs-is-raidz-1-really-that-bad
https://serverfault.com/questions/634197/zfs-is-raidz-1-really-that-bad
http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/
https://www.reddit.com/r/freenas/comments/1ubebu/stop_using_raidz_seriously_just_stop_it/
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
It's mrrackables, a supposedly reputable seller. The replacement drive he sent was a SAS drive, so I couldn't test it on another machine. I was able to start the badblocks test this time, but the console spammed me with status errors regarding that drive. So he is now sending me a new HBA card. Hopefully that fixes it. I don't expect it until next week though unfortunately.

By the way, these are the links he sent me about not using RaidZ1.
https://serverfault.com/questions/634197/zfs-is-raidz-1-really-that-bad
http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/
https://www.reddit.com/r/freenas/comments/1ubebu/stop_using_raidz_seriously_just_stop_it/
He's correct that RAIDZ1 pools are more vulnerable to failure... but using drives in a RAIDZ1 configuration is not going to destroy them! That's just crazy talk!
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Assuming the vendor isn't making a big deal about raidz1, I wouldn't hold the raidz1 stuff against the vendor; expecting them to know the details of raidz1 is unreasonable. Nobody knows everything. And when they don't know, they use rules of thumb or misapply what they read elsewhere.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Yes. RaidZ1 does not cause badblocks tests to fail or physical failure, it just means you have very little chance of recovering from said events :)

Sounds like he is doing the right thing sending you parts to isolate.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
expecting them to know the details of raidz1 is unreasonable.
True as far as it goes, but to suggest that the RAID level had anything to do with the issues reported by OP is, to be kind, just plain silly. The seller might not know anything about ZFS, but should have some basic knowledge of RAID.
 
Status
Not open for further replies.
Top