Disks dropping from array during scrub

rs225 · Nov 25, 2014

cyberjock said:
It's not bizarre at all. You have multiple drives dropping out of the pool randomly all the time without the ability to reconsile with a zpool scrub and you *will* get corruption. There's nothing shocking at all. In fact, when I saw his first output I figured the pool would be done for, the question was whether it would even mount or not. :p

I also thought it would be wiped out. But seriously, I do think it is bizarre. Look at the math: He had to get 6 out of 12 drives to, in that (proximate) instant, write corrupt data for that particular metadata block. Remember, metadata is written twice, and with two vdevs, one copy goes to each vdev. His metadata corruption is absolutely astonishing, and to still have a semi-functional system? Amazing.

That's why I tend toward running the memory test on ECC RAM. It's more believable!

cyberjock · Nov 25, 2014

rs225 said:
I also thought it would be wiped out. But seriously, I do think it is bizarre. Look at the math: He had to get 6 out of 12 drives to, in that (proximate) instant, write corrupt data for that particular metadata block. Remember, metadata is written twice, and with two vdevs, one copy goes to each vdev. His metadata corruption is absolutely astonishing, and to still have a semi-functional system? Amazing.

That's why I tend toward running the memory test on ECC RAM. It's more believable!

The underlined is my emphasis. That's not completely true. All he needed was for the disks to get out of sync. Remember, ZFS writes data to the drive (which technically goes to the hard drive's on-disk cache). ZFS would normally also issue a command to the drive to flush the drive's write cache. BUT, this all falls apart when the disk receives the data but ZFS' flush command doesn't get performed because the disk suddenly goes offline. ZFS also *will* initiate the flush command to the other drives (which may or may not be attached still).

Next thing you know the different drives in vdevs (and pool) aren't in sync and it's looking for copies that aren't bad whenever it can. But once you can't get good copies because ZFS can't figure out if the alleged newer transactions are good or bad things go ugly.

You haven't been a member of the forums very long rs225, but we've had users that have had pools out of sync that were corrected by pulling out one or two bad disks (obviously you can't remove so many disks you have no redundancy + 1 more disk removed).

It happens. We know it happens. It's pretty rare for people to have problems, but it's not unheard of either. ;)

SirMaster · Nov 26, 2014

What firmware version do you have on your RES2SV240?

I had some SATA signal issues with mine until I updated it to the latest v13 firmware. After that it has been working perfectly with my LSI HBA.

agreenfi · Dec 21, 2014

Figured I should give an update since everything has been working fine for a few weeks. I put in a new power supply and reconnected all the cables, and i haven't had any more problems with disks disappearing. Other thoughts:
- I did upgrade the firmware of my M1015 to v16 (it was nice of FreeNAS 9.3 to warn me about this, as I was using an older version)
- All problems with scrub not completing went away after deleting the iScsi file extents. I think these files more susceptible to corruption due to not using sync writes?
- I didn't end up deleting the pool. I deleted a corrupted file, then ran 'zpool clear' to reset the error counters. No new errors over past three weeks.
- Keep an eye on the disks selected for smart testing. If a drive drops out or gets replaced, you will have to re-select it in the FreeNas GUI.
- If i have problems in the future, I should take the pool offline immediately. Then import it as read only if necessary for diagnostics. Is this okay to do from the shell?
- It seems like FreeNAS stores system logs on the pool, and so if the pool becomes inaccessible you may not have access to needed diagnostic information. Am I understanding this correctly, and is there some other best-practice?

Important Announcement for the TrueNAS Community.

Disks dropping from array during scrub

rs225

Guru

cyberjock

Inactive Account

SirMaster

Patron

agreenfi

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Disks dropping from array during scrub

rs225

Guru

cyberjock

Inactive Account

SirMaster

Patron

agreenfi

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Disks dropping from array during scrub"

Similar threads