Zpool Migration Nightmare

Status
Not open for further replies.

0x4161726f6e

Dabbler
Joined
Jul 3, 2016
Messages
19
Wanting more storage (and I don’t need IOPS) I migrated from a Zpool raid 1+0 with 6 HDDs to a raidZ2 with 7 HDDs. About a week after doing so (while on vacation) 2 HDDs come up as failed and all drives have read and write errors. When I VPN in, the system tells me something along the lines of not being able to communicate with the drives. In a panic I shutdown and pray to the storage gods that everything is fine. The shutdown had to be forced.

I get home from vacation, boot the server up, all drives register. There are some checksum errors but all read/write error counts are 0. Scrub, 2 HDDs have checksum errors and “failed”. I note which drives (da2 & da4), clear, and scrub again; everything is clean with 0 read/write errors. Maybe a false positive, I trust the drives to be ok.

Next day (about 12-14 hours later), a third HDD (da3) has 9 read, 96 write, 0 checksum, and “failed”; maybe the drives are OK but something else is wrong. I’ll try turning off spindown under advanced power management (set to 128) for all drives, even though this was never a problem. Scrub again because I can’t do much remotely; it comes back clean. But instantly the same HDD fails with 3 read errors.

TLDR: 3 out of 5 new drives are failing, and server was rock solid before changing/adding drives and changing Zpool configuration.

System:
FreeNAS-11.1-U4
Intel Xeon E3-1220 v3 @ 3.1GHz
ASRock E3C224D4I-14S
32 GB DDR3

Old Zpool: stripe across 3 mirrors
Mirror of 2 2 TB WD Reds
Mirror of 2 3 TB WD Reds
Mirror of 2 3 TB WD Greens

New Zpool:
RaidZ2 with 5 4TB WD Reds (all new drives) and 2 3TB WD Reds from old pool

Of the noted failures only new drives have failed.

Are my new drives failing or is something else wrong? What should I be checking?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Can we see a complete
Code:
smartctl -x -qnoserial /dev/da3
(please use that command exactly), posted to say, pastebin?

Also, that board has at least 4 regular SATA ports on it. You should be using those ports before you attempt to use the other ports. One test you might perform is to see when the drive(s) in question are moved to the other set of ports, if the problem follows the drives or not.
 

0x4161726f6e

Dabbler
Joined
Jul 3, 2016
Messages
19
That provided a lot of information. I have attached the output as a text file.

Should I be leaving the system up or shutting down when not troubleshooting?

Thanks
 

Attachments

  • smartctl_da3.txt
    12.7 KB · Views: 235

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You added a lot of disks to your system (7, or is it 5?... I guess in parallel while migrating the pool data) before the problem started... have you considered power supply problems? (I don't see your PSU listed above)
 

0x4161726f6e

Dabbler
Joined
Jul 3, 2016
Messages
19
My Power Supply is a "SILVERSTONE ST45SF 450W SFX12V 80 PLUS BRONZE Certified Active PFC". Currently I only have seven 3.5" HDDs and two 2.5" SSDs. During the migration I had twelve 3.5" HDDs and two 2.5" SSDs, and didn't seem to have any problems. I used http://www.coolermaster.com/power-supply-calculator/ to check if I had enough power to migrate, I got about 325W.

I've had that problem before so good point.
 

0x4161726f6e

Dabbler
Joined
Jul 3, 2016
Messages
19
Update:
4 HDDs disconnected, da0 da1 and da5 were the only HDDs still connected. I guess I'll be restoring from backup. da0 and da1 were from the old pool, so 4 out of 5 new disks failed. Maybe I should blame FedEx?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
For the record,

That drive you provided the smartctl information for, by all appearances, looks outstanding and in perfect shape with no errors.

I don't think there's anything wrong with the drive sir
 

0x4161726f6e

Dabbler
Joined
Jul 3, 2016
Messages
19
DrKK,

Thank you, I was thinking the same but I wasn't fully confident that I was understanding what I was reading.

Update:
Rebooted my NAS to find the pool functional, so I have updated my backup.

I'll try moving drives to other ports. If anyone has ideas on what to test, let me know. Maybe I have damaged my SATA cables somehow?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
That's very possible. We get a lot of people with loose cables, cables that get squashed by the case panel, or who have loose sata power cables, etc.

Or cheap cables they bought for 40 cents.
 
Status
Not open for further replies.
Top