Redundancy mismatch after applied the latest update

Shigure · Feb 24, 2023

Applied the update yesterday and there are two things pop up for one of my pools.

One is a failed SMART test at lifetime 11(aborted due to restart) and with some search it seems it will eventually be flushed with newer records as long as I use the drives for enough time, they are brand new when I put them into the box.
The other is the redundancy mismatch notice in topology. That pool has a 5 wide RaidZ2 data vdev and a 3 wide mirror metadata vdev(and 1 spare there just in case since I do have a exact same drive lying around). The raid type is different but I think both vdev can take two drive failures at the same time without losing data and the concept was approved here by others as well when I put everything together. (screenshot below)

Is there a way to do something to those two...changes? As they are not really issues. Well if not I think I can probably live with them anyway as long as I know my pool is safe...

flammen · Feb 25, 2023

Hi, I got the same "Redundancy Mismatch" error with a similar pool configuration after the update to 22.12.1:
6 data drives in a RaidZ2 + 3 metadata drives in a 3-way mirror
As both vdevs (data and metadata) have 2 drives which can fail before data is lost, I would not expect such a message. If someone knows what it is about, I would love to hear.

HoneyBadger · Feb 25, 2023

Interesting. Give me a second here.

HoneyBadger · Feb 25, 2023

I think there's a bug in the webUI validation of VDEV redundancy across VDEV types. @Shigure or @flammen - are you able to use the "Report a Bug" function at the top of the forums to log this issue, and attach a debug from your systems?

Shigure · Feb 25, 2023

HoneyBadger said:
I think there's a bug in the webUI validation of VDEV redundancy across VDEV types. @Shigure or @flammen - are you able to use the "Report a Bug" function at the top of the forums to log this issue, and attach a debug from your systems?

Sure, besides the screenshots I have, do you need anything else like logs etc.?

HoneyBadger · Feb 25, 2023

Shigure said:
Sure, besides the screenshots I have, do you need anything else like logs etc.?

A debug (System Settings, Advanced, "Save Debug" in the top-right corner) would be useful to attach as well. You can attach it privately to the ticket after the initial submission, so that it's only visible to iX employees.

Shigure · Feb 25, 2023

HoneyBadger said:
A debug (System Settings, Advanced, "Save Debug" in the top-right corner) would be useful to attach as well. You can attach it privately to the ticket after the initial submission, so that it's only visible to iX employees.

I attached mine debug to the ticket(forgot to choose exact issue for the first try lol).
Here it is: https://ixsystems.atlassian.net/browse/NAS-120508

HoneyBadger · Feb 25, 2023

Shigure said:
I attached mine debug to the ticket(forgot to choose exact issue for the first try lol).
Here it is: https://ixsystems.atlassian.net/browse/NAS-120508

Much appreciated; thanks for bringing this to our attention.

Shigure · Feb 25, 2023

HoneyBadger said:
Much appreciated; thanks for bringing this to our attention.

NP and thank you guys for the great work as well.

BTW any thought on the smart test failure issue I briefly mentioned here? In the dicord server it seems there are several users have the same problem for TrueNAS to show a failed historical smart test due to various reasons(usually aborted) after applied the update.

winnielinnie · Feb 25, 2023

Shigure said:
any thought on the smart test failure issue I briefly mentioned here?

Also a new bug. Any "failed" SMART test (even if canceled or aborted) will register as a perpetual warning in the GUI screen.

[NAS-120498] - iXsystems TrueNAS Jira

ixsystems.atlassian.net

Shigure · Feb 26, 2023

winnielinnie said:
Also a new bug. Any "failed" SMART test (even if canceled or aborted) will register as a perpetual warning in the GUI screen.

[NAS-120498] - iXsystems TrueNAS Jira

ixsystems.atlassian.net

Well that's good to know. Like others, probably I should hold off on the next update as well lel

0xDEADBEEF · Mar 4, 2023

I have the same issue, just to confirm.

devemia · Mar 11, 2023

Same here, my test config is 4 drive RAIDz1 with 2 metadata drives mirrored.

Toolius · Apr 2, 2023

Same here !

HoneyBadger · Apr 2, 2023

Toolius said:
Same here !
View attachment 65330

Unless there's a third metadata device in that vdev, that actually would be a mismatch - 2 disk failure tolerance in the data RAIDZ2, but only single disk failure in a mirror.

Shigure · Apr 12, 2023

Toolius said:
Same here !
View attachment 65331
View attachment 65330

View attachment 65332

As HoneyBadger said your layout is actually a mismatch lol. Not sure if the actually problem get fixed in 22.12.2 though, haven't get a chance to give it a try.

winnielinnie · Apr 12, 2023

HoneyBadger said:
Unless there's a third metadata device in that vdev, that actually would be a mismatch - 2 disk failure tolerance in the data RAIDZ2, but only single disk failure in a mirror.

Does SCALE allow you to dismiss this warning? I would find it annoying, since if it was my system I would be aware that I cannot lose more than one device in my two-way mirror metadata vdev.

We're dealing with two different things here: The RAIDZ2 vdev is used for userdata stored on spinning HDDs, in which there are different requirements for "available storage capacity", which is not as true for a metadata vdev, in which a two-way mirror will more than suffice for the total needed available capacity.

Hence, RAIDZ2 for one thing, and mirror for the other. Two different requirements. (Yes, the entire pool's health cannot afford the loss of more than one device from the metadata vdev. But this would still be true if the storage (userdata) vdev was built with a RAIDZ1 or two-way mirror vdev. The risk is the same. Choosing RAIDZ2 for the storage may be for more than simply "two drive failure resiliency", since total usable capacity is also an important factor.)

Please tell me that this warning can be dismissed?

HoneyBadger · Apr 12, 2023

Shigure said:
As HoneyBadger said your layout is actually a mismatch lol. Not sure if the actually problem get fixed in 22.12.2 though, haven't get a chance to give it a try.

I'm afraid the PR didn't make the cutoff for .2 - there were some other related fixes that ended up in there.

winnielinnie said:
Does SCALE allow you to dismiss this warning? I would find it annoying, since if it was my system I would be aware that I cannot lose more than one device in my two-way mirror metadata vdev.

No; it won't be raised as a failure at the dashboard level though. If one of the drives fails in your mirror, you will see that; it's an informational warning to let you know about the mismatch.

winnielinnie said:
(Yes, the entire pool's health cannot afford the loss of more than one device from the metadata vdev. But this would still be true if the storage (userdata) vdev was built with a RAIDZ1 or two-way mirror vdev. The risk is the same. Choosing RAIDZ2 for the storage may be for more than simply "two drive failure resiliency", since total usable capacity is also an important factor.)

The rhetorical question here is "if you're okay with only single-drive fault tolerance, why isn't your main data vdev a RAIDZ1?" It's got even better capacity than a RAIDZ2, and can survive the single-disk failure.

I do get that SSDs tend to be longer-lived than HDDs, on account of "no moving parts" and other factors - but space efficiency and total capacity matters not if the pool status is UNAVAIL: insufficient replicas. I'm certainly not looking to shut down debate here though; there's always room for another PR and I can raise with engineering if we think there's cause to have a "Suppress this warning" checkbox. Can't make any promises of course.

winnielinnie · Apr 12, 2023

HoneyBadger said:
it won't be raised as a failure at the dashboard level though.

HoneyBadger said:
I can raise with engineering if we think there's cause to have a "Suppress this warning" checkbox.

I don't use SCALE, but I'm glad this isn't a warning that's shoved in your face in the Dashboard or Notifications. However, it still can come off as annoying when you visit your pool's page. (If that's what I'm seeing in the screenshots.)

But there's another problem already evident with this current approach, which is something as an "end user" my eyes have caught that might go unnoticed by an "engineer".

I'll demonstrate this issue in the GUI with a few simple questions:

Based on the screenshots provided for Tank's topology overview:

How large is the data vdev(s)? How wide? How many devices? Mirror? RAIDZ1? RAIDZ2? RAIDZ3?

How large is the cache vdev(s)? How wide? How many devices? Mirror? RAIDZ1? RAIDZ2? RAIDZ3?

How large is the metadata vdev(s)? How wide? How many devices? Mirror? RAIDZ1? RAIDZ2? RAIDZ3?

Can you answer the question colored in red?

You see the problem? This is not a good UI design. A warning hides the other information. Why? You can have a warning icon (and tooltip message) without hiding the relevant information.

HoneyBadger · Apr 12, 2023

winnielinnie said:
How large is the metadata vdev(s)? How wide? How many devices? Mirror? RAIDZ1? RAIDZ2? RAIDZ3?

Well, it'll never be a RAIDZ-anything, but your point is quite valid.

Got a Jira ticket I can lean on?

Important Announcement for the TrueNAS Community.

Redundancy mismatch after applied the latest update

Dabbler

Dabbler

actually does care

actually does care

Dabbler

actually does care

Dabbler

actually does care

Dabbler

MVP

Dabbler

Cadet

Dabbler

Cadet

actually does care

Dabbler

MVP

actually does care

MVP

actually does care

Similar threads