What happens is a mirrored pair dies

wgriffa

Cadet
Joined
Aug 21, 2021
Messages
4
Hi all,

I am running TrueNAS, I have one pool comprised of 4 mirrored pair drives.

2 - 10 TB
2 - 8 TB
2 - 4 TB
2 - 2 TB

My 2 TB pair, one of the drives became bad and TrueNAS removed it. So I was running in degraded mode. I was waiting for a drive deal.
Today I added the new drive and now the good part of the 2 TB mirror drive is having errors and after 30 minutes the boot progress is still trying to boot.
Right now my 2 TB mirror consists of one drive with lots of error and a brand new drive.

My last boot the system told me my pool is offline.

I am ok with losing the data on the 2 TB mirrored pair. I am hoping there is a way to still access the data on the remaining 3 mirror pairs?

I am a little scared at the moment.

Thanks in advance :)
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Hi all,

I am running TrueNAS, I have one pool comprised of 4 mirrored pair drives.

2 - 10 TB
2 - 8 TB
2 - 4 TB
2 - 2 TB

My 2 TB pair, one of the drives became bad and TrueNAS removed it. So I was running in degraded mode. I was waiting for a drive deal.
Today I added the new drive and now the good part of the 2 TB mirror drive is having errors and after 30 minutes the boot progress is still trying to boot.
Right now my 2 TB mirror consists of one drive with lots of error and a brand new drive.

My last boot the system told me my pool is offline.

I am ok with losing the data on the 2 TB mirrored pair. I am hoping there is a way to still access the data on the remaining 3 mirror pairs?

I am a little scared at the moment.

Thanks in advance :)
I hope you have backups of your data...

A pool is made up of one or more vdevs; if you lose a single vdev, you lose your pool. That hurts, I know!

In your case, your pool is made up of 4 vdevs, each a mirrord pair. You've lost the 2TB vdev; and so you've lost your pool.

Your only option at this point is to re-create the pool and restore from backup.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
This is the problem with 2 disk mirrored vDevs, you only have 1 disk's worth of redundancy.

While Mirrored vDevs can be more flexible, allowing you to add 2 disks at a time to grow your pool, most people don't use 3 way mirrors for better redundancy.

When I was building my FreeNAS Mini, which has 4 x 3.5" disk slots, I more or less had to choose between 2, 2 disk mirrors or a 4 disk RAID-Z2. Because the RAID-Z2 would allow me to loose any 2 disks, I decided to go with the higher redundancy.


Lessons:
1. Always have backups, (unless the NAS is the backup device)
2. Try to have a burned in replacement disk available. Even if it is larger, (which works, and you can still change it back out with a smaller one later).

All that said, newer OpenZFS allows importing damaged pools to allow attempted recovery of some data. Results are wildly unpredictable.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
I was waiting for a drive deal.

That was the problem... A degraded 2 disks mirror is holding by only a single drive. And when a vdev is holding by a single, the entire pool is holding by that single drive. Now that it failed, the entire pool failed.

When designing a system, it is important to understand its capacities and limitations. You under estimated the limitation of a 2 drive mirrors and now face the consequence. Here, I do use 2 drives mirrors (and even a Raid-Z1 that is even worst) but with spares on site and in a setup with 3 copies on 3 different servers...
 

Electr0

Dabbler
Joined
Dec 18, 2020
Messages
47
@Arwen From what I understand mirrored vdevs are generally recommended over RAID-Z2*, due to the drastically shorter time to resilver, and drastically lower load placed on the pool while doing so. The theory being that while RAID-Z2 provides guaranteed two disk failure protection yet mirrored pools only provide 85.7% survival of two disk failure in a pool of mirrors; the time it takes to resilver a RAID-Z2 is usually measured in days, where resilvering a mirrored pool can take just a few hours. The longer the resilver time, the more risk of another failure you are exposing your system two.

*Providing that you make use of Hotspares.

But yeah, you have to do what makes sense for your own use case.

@wgriffa You may have a tiny sliver of luck as, since the 2TB drive has failed, most of your data will be on the other drives, so you might be able to import the damaged pool and repair it like @Arwen said, but I honestly have no idea. Check out this page on Repairing ZFS Storage Pool-Wide Damage, it may be of some help, or this ZFS Troubleshooting and Data Recovery guide from Illumos.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hey @Electr0,

mirrored vdevs are generally recommended over RAID-Z2

That recommendation is valid when all the conditions are in place. To survive the loss of any 2 drives is a greater protection than the possibility to survive the lost of 2 out of 3 drives. And that is out of 4 drives. If you have more, RaidZ2 will also provide you with more usable space.

Can you fit enough drives ? Then you are probably better with mirrors.
Do you have a complete backup solution or only a partial one ? With good and strong backups, the higher risk for mirrors is mitigated.
Do you need more performance for VMs or similar ? Then mirrors is the way to go.
Do you need a scalable solution ? Again, mirror will be easier to add or to auto-expand.
And more...

But if you can not fit more than 4 drive, your backups are not updated on a regular basis that matches the rate your data change, performance is not a major factor for your setup and with only 4 drives, scalability is about the same for the 2, ... then RaidZ2 higher survivability may be a plus for you.
 

Electr0

Dabbler
Joined
Dec 18, 2020
Messages
47
Can you fit enough drives ? Then you are probably better with mirrors.
Do you have a complete backup solution or only a partial one ? With good and strong backups, the higher risk for mirrors is mitigated.
Do you need more performance for VMs or similar ? Then mirrors is the way to go.
Do you need a scalable solution ? Again, mirror will be easier to add or to auto-expand.
And more...

But if you can not fit more than 4 drive, your backups are not updated on a regular basis that matches the rate your data change, performance is not a major factor for your setup and with only 4 drives, scalability is about the same for the 2, ... then RaidZ2 higher survivability may be a plus for you.


But yeah, you have to do what makes sense for your own use case.


:wink:
 

wgriffa

Cadet
Joined
Aug 21, 2021
Messages
4
Hi, thank you all for your replies. It was a bit of user error. I pulled out the good part of the mirror and initially replaced it with a Seagate Ironwolf drive that I later found was DOA. I put the drives back the way they were, it was in degraded mode. I did a full backup and then added another Seagate Ironwolf drive working one. The system re-silvered in about 4 1/2 hours. My pool no longer has errors. I applied the latest TrueNAS system update, all is well :) In hide-site I could have done the back up after the re-silver. Now I am thinking about switching to TrueNAS SCALE.
 
Joined
Oct 22, 2019
Messages
3,641
Today I added the new drive and now the good part of the 2 TB mirror drive is having errors and after 30 minutes the boot progress is still trying to boot.
Right now my 2 TB mirror consists of one drive with lots of error and a brand new drive.
I was about to reply to your original post (until I read your update) to ask if you're sure you didn't mix up your drives. :wink:

I pulled out the good part of the mirror


---

All hope is not lost, even if things seem grim: user errors (or even software not presenting truthful information) are more common than we realize.

Look at this recent scare I had when trying to use Syncoid from a PC to my TrueNAS server:


Googling for similar errors shows forum posts telling the user that their pool is gone and they've lost their data and must restore from backups. Yet in my case after a single scrub, my pool was back in order, no errors. Hardware is fine, SMART tests pass, and the "error" actually a glitch, rather than real data corruption.

It's like that joke about WebMD when you're trying to search the causes for recent symptoms, maybe an itch behind your ear or a sore elbow, and of course you're met with grim doomsday responses. :eek:
 
Last edited:
Top