Zpool Migration Nightmare

0x4161726f6e · Jun 19, 2018

Wanting more storage (and I don’t need IOPS) I migrated from a Zpool raid 1+0 with 6 HDDs to a raidZ2 with 7 HDDs. About a week after doing so (while on vacation) 2 HDDs come up as failed and all drives have read and write errors. When I VPN in, the system tells me something along the lines of not being able to communicate with the drives. In a panic I shutdown and pray to the storage gods that everything is fine. The shutdown had to be forced.

I get home from vacation, boot the server up, all drives register. There are some checksum errors but all read/write error counts are 0. Scrub, 2 HDDs have checksum errors and “failed”. I note which drives (da2 & da4), clear, and scrub again; everything is clean with 0 read/write errors. Maybe a false positive, I trust the drives to be ok.

Next day (about 12-14 hours later), a third HDD (da3) has 9 read, 96 write, 0 checksum, and “failed”; maybe the drives are OK but something else is wrong. I’ll try turning off spindown under advanced power management (set to 128) for all drives, even though this was never a problem. Scrub again because I can’t do much remotely; it comes back clean. But instantly the same HDD fails with 3 read errors.

TLDR: 3 out of 5 new drives are failing, and server was rock solid before changing/adding drives and changing Zpool configuration.

System:
FreeNAS-11.1-U4
Intel Xeon E3-1220 v3 @ 3.1GHz
ASRock E3C224D4I-14S
32 GB DDR3

Old Zpool: stripe across 3 mirrors
Mirror of 2 2 TB WD Reds
Mirror of 2 3 TB WD Reds
Mirror of 2 3 TB WD Greens

New Zpool:
RaidZ2 with 5 4TB WD Reds (all new drives) and 2 3TB WD Reds from old pool

Of the noted failures only new drives have failed.

Are my new drives failing or is something else wrong? What should I be checking?

DrKK · Jun 19, 2018

Can we see a complete

Code:

smartctl -x -qnoserial /dev/da3

(please use that command exactly), posted to say, pastebin?

Also, that board has at least 4 regular SATA ports on it. You should be using those ports before you attempt to use the other ports. One test you might perform is to see when the drive(s) in question are moved to the other set of ports, if the problem follows the drives or not.

0x4161726f6e · Jun 20, 2018

That provided a lot of information. I have attached the output as a text file.

Should I be leaving the system up or shutting down when not troubleshooting?

Thanks

sretalla · Jun 20, 2018

You added a lot of disks to your system (7, or is it 5?... I guess in parallel while migrating the pool data) before the problem started... have you considered power supply problems? (I don't see your PSU listed above)

0x4161726f6e · Jun 20, 2018

My Power Supply is a "SILVERSTONE ST45SF 450W SFX12V 80 PLUS BRONZE Certified Active PFC". Currently I only have seven 3.5" HDDs and two 2.5" SSDs. During the migration I had twelve 3.5" HDDs and two 2.5" SSDs, and didn't seem to have any problems. I used http://www.coolermaster.com/power-supply-calculator/ to check if I had enough power to migrate, I got about 325W.

I've had that problem before so good point.

0x4161726f6e · Jun 20, 2018

Update:
4 HDDs disconnected, da0 da1 and da5 were the only HDDs still connected. I guess I'll be restoring from backup. da0 and da1 were from the old pool, so 4 out of 5 new disks failed. Maybe I should blame FedEx?

DrKK · Jun 20, 2018

For the record,

That drive you provided the smartctl information for, by all appearances, looks outstanding and in perfect shape with no errors.

I don't think there's anything wrong with the drive sir

0x4161726f6e · Jun 20, 2018

DrKK,

Thank you, I was thinking the same but I wasn't fully confident that I was understanding what I was reading.

Update:
Rebooted my NAS to find the pool functional, so I have updated my backup.

I'll try moving drives to other ports. If anyone has ideas on what to test, let me know. Maybe I have damaged my SATA cables somehow?

DrKK · Jun 20, 2018

That's very possible. We get a lot of people with loose cables, cables that get squashed by the case panel, or who have loose sata power cables, etc.

Or cheap cables they bought for 40 cents.

Important Announcement for the TrueNAS Community.

Zpool Migration Nightmare

0x4161726f6e

Dabbler

DrKK

FreeNAS Generalissimo

0x4161726f6e

Dabbler

Attachments

sretalla

Powered by Neutrality

0x4161726f6e

Dabbler

0x4161726f6e

Dabbler

DrKK

FreeNAS Generalissimo

0x4161726f6e

Dabbler

DrKK

FreeNAS Generalissimo

Similar threads

Important Announcement for the TrueNAS Community.

Zpool Migration Nightmare

Dabbler

FreeNAS Generalissimo

Dabbler

Attachments

Powered by Neutrality

Dabbler

Dabbler

FreeNAS Generalissimo

Dabbler

FreeNAS Generalissimo

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Zpool Migration Nightmare"

Similar threads