Redundancy mismatch after applied the latest update

Shigure

Dabbler
Joined
Sep 1, 2022
Messages
39
Applied the update yesterday and there are two things pop up for one of my pools.

One is a failed SMART test at lifetime 11(aborted due to restart) and with some search it seems it will eventually be flushed with newer records as long as I use the drives for enough time, they are brand new when I put them into the box.
The other is the redundancy mismatch notice in topology. That pool has a 5 wide RaidZ2 data vdev and a 3 wide mirror metadata vdev(and 1 spare there just in case since I do have a exact same drive lying around). The raid type is different but I think both vdev can take two drive failures at the same time without losing data and the concept was approved here by others as well when I put everything together. (screenshot below)
1677287677652.png
1677287691629.png


Is there a way to do something to those two...changes? As they are not really issues. Well if not I think I can probably live with them anyway as long as I know my pool is safe...
 

flammen

Dabbler
Joined
Oct 16, 2022
Messages
20
Hi, I got the same "Redundancy Mismatch" error with a similar pool configuration after the update to 22.12.1:
6 data drives in a RaidZ2 + 3 metadata drives in a 3-way mirror
As both vdevs (data and metadata) have 2 drives which can fail before data is lost, I would not expect such a message. If someone knows what it is about, I would love to hear.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I think there's a bug in the webUI validation of VDEV redundancy across VDEV types. @Shigure or @flammen - are you able to use the "Report a Bug" function at the top of the forums to log this issue, and attach a debug from your systems?
 

Shigure

Dabbler
Joined
Sep 1, 2022
Messages
39
I think there's a bug in the webUI validation of VDEV redundancy across VDEV types. @Shigure or @flammen - are you able to use the "Report a Bug" function at the top of the forums to log this issue, and attach a debug from your systems?
Sure, besides the screenshots I have, do you need anything else like logs etc.?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Sure, besides the screenshots I have, do you need anything else like logs etc.?

A debug (System Settings, Advanced, "Save Debug" in the top-right corner) would be useful to attach as well. You can attach it privately to the ticket after the initial submission, so that it's only visible to iX employees.
 

Shigure

Dabbler
Joined
Sep 1, 2022
Messages
39

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112

Shigure

Dabbler
Joined
Sep 1, 2022
Messages
39
Much appreciated; thanks for bringing this to our attention.
NP and thank you guys for the great work as well.

BTW any thought on the smart test failure issue I briefly mentioned here? In the dicord server it seems there are several users have the same problem for TrueNAS to show a failed historical smart test due to various reasons(usually aborted) after applied the update.
 

devemia

Dabbler
Joined
Mar 5, 2023
Messages
20
Same here, my test config is 4 drive RAIDz1 with 2 metadata drives mirrored.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Joined
Oct 22, 2019
Messages
3,641
Unless there's a third metadata device in that vdev, that actually would be a mismatch - 2 disk failure tolerance in the data RAIDZ2, but only single disk failure in a mirror.
Does SCALE allow you to dismiss this warning? I would find it annoying, since if it was my system I would be aware that I cannot lose more than one device in my two-way mirror metadata vdev.

We're dealing with two different things here: The RAIDZ2 vdev is used for userdata stored on spinning HDDs, in which there are different requirements for "available storage capacity", which is not as true for a metadata vdev, in which a two-way mirror will more than suffice for the total needed available capacity.

Hence, RAIDZ2 for one thing, and mirror for the other. Two different requirements. (Yes, the entire pool's health cannot afford the loss of more than one device from the metadata vdev. But this would still be true if the storage (userdata) vdev was built with a RAIDZ1 or two-way mirror vdev. The risk is the same. Choosing RAIDZ2 for the storage may be for more than simply "two drive failure resiliency", since total usable capacity is also an important factor.)

Please tell me that this warning can be dismissed? :oops:
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
As HoneyBadger said your layout is actually a mismatch lol. Not sure if the actually problem get fixed in 22.12.2 though, haven't get a chance to give it a try.
I'm afraid the PR didn't make the cutoff for .2 - there were some other related fixes that ended up in there.
Does SCALE allow you to dismiss this warning? I would find it annoying, since if it was my system I would be aware that I cannot lose more than one device in my two-way mirror metadata vdev.
No; it won't be raised as a failure at the dashboard level though. If one of the drives fails in your mirror, you will see that; it's an informational warning to let you know about the mismatch.

(Yes, the entire pool's health cannot afford the loss of more than one device from the metadata vdev. But this would still be true if the storage (userdata) vdev was built with a RAIDZ1 or two-way mirror vdev. The risk is the same. Choosing RAIDZ2 for the storage may be for more than simply "two drive failure resiliency", since total usable capacity is also an important factor.)
The rhetorical question here is "if you're okay with only single-drive fault tolerance, why isn't your main data vdev a RAIDZ1?" It's got even better capacity than a RAIDZ2, and can survive the single-disk failure.

I do get that SSDs tend to be longer-lived than HDDs, on account of "no moving parts" and other factors - but space efficiency and total capacity matters not if the pool status is UNAVAIL: insufficient replicas. I'm certainly not looking to shut down debate here though; there's always room for another PR and I can raise with engineering if we think there's cause to have a "Suppress this warning" checkbox. Can't make any promises of course.
 
Joined
Oct 22, 2019
Messages
3,641
it won't be raised as a failure at the dashboard level though.
I can raise with engineering if we think there's cause to have a "Suppress this warning" checkbox.

I don't use SCALE, but I'm glad this isn't a warning that's shoved in your face in the Dashboard or Notifications. However, it still can come off as annoying when you visit your pool's page. (If that's what I'm seeing in the screenshots.)



But there's another problem already evident with this current approach, which is something as an "end user" my eyes have caught that might go unnoticed by an "engineer". :wink:

I'll demonstrate this issue in the GUI with a few simple questions:

Based on the screenshots provided for Tank's topology overview:

1680423523029-png.65331

1677287677652-png.63933


How large is the data vdev(s)? How wide? How many devices? Mirror? RAIDZ1? RAIDZ2? RAIDZ3?

How large is the cache vdev(s)? How wide? How many devices? Mirror? RAIDZ1? RAIDZ2? RAIDZ3?

How large is the metadata vdev(s)? How wide? How many devices? Mirror? RAIDZ1? RAIDZ2? RAIDZ3?

Can you answer the question colored in red?

You see the problem? This is not a good UI design. A warning hides the other information. Why? You can have a warning icon (and tooltip message) without hiding the relevant information.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
How large is the metadata vdev(s)? How wide? How many devices? Mirror? RAIDZ1? RAIDZ2? RAIDZ3?
Well, it'll never be a RAIDZ-anything, but your point is quite valid.

Got a Jira ticket I can lean on? :smile:
 
Top