TrueNAS Scale incorrectly reporting Mixed Capacity VDEVS

Brandito

Explorer
Joined
May 6, 2023
Messages
72
After adding a 4th 6 disk raidz2 to my 3 existing identical 6 disk raidz2 vdevs truenas is reporting they're mixed capacity.

I'm on 23.10.0.1. Tried the simplest thing which was to reboot but no change.

All of the drives are seagate exos 16tb x16's however one of the new drives is an x18 in the same 16tb capacity.

Is this something to be worried about, is it a known bug?

Update 11/22/23: This thread has moved beyond the original issue. I've experienced corruption on my zpool, particularly a single dataset that now refuses to mount when the pool is imported.

Nearly all hardware has been swapped (hba, sff cables, the server itself, and tried freebsd)

Currently have additional drives coming that can ingest all my data, just need some way to access it.

If you have any insights into what I might try to get the data out of my pool please let me know. I am open to suggestions

Also, thank you so much to everyone that has helped me get to the point where the pool imports and helping me come up with a plan to save everything I can
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
After adding a 4th 6 disk raidz2 to my 3 existing identical 6 disk raidz2 vdevs truenas is reporting they're mixed capacity.

I'm on 23.10.0.1. Tried the simplest thing which was to reboot but no change.

All of the drives are seagate exos 16tb x16's however one of the new drives is an x18 in the same 16tb capacity.

Is this something to be worried about, is it a known bug?
What is it reporting?

Is the x18 reporting the right capacity?
 

Brandito

Explorer
Joined
May 6, 2023
Messages
72
Here is what the WebUI and cli report

Edit: figured out the problem below, block cloning was the "issue". Found a work around. Still not clear what the problem is with my OP

possibly related to this, I've notice very little is being written to the new vdev. I've copied/added hundreds of GiB of data and the new vdev raidz2-5 is stuck at 63GB. I've watched this number go up then back down repeatedly as new data is written to the pool.

I've also had truenas completely reboot my system 3 times now while manually rebalancing the pool through copying data with a script posted here on the forums. This process worked flawlessly when adding 2 prior vdevs to the same pool
 

Attachments

  • vdev-snip2.png
    vdev-snip2.png
    156.3 KB · Views: 121
  • vdev-snip1.png
    vdev-snip1.png
    329.8 KB · Views: 116
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Here is what the WebUI and cli report

Edit: figured out the problem below, block cloning was the "issue". Found a work around. Still not clear what the problem is with my OP

possibly related to this, I've notice very little is being written to the new vdev. I've copied/added hundreds of GiB of data and the new vdev raidz2-5 is stuck at 63GB. I've watched this number go up then back down repeatedly as new data is written to the pool.

I've also had truenas completely reboot my system 3 times now while manually rebalancing the pool through copying data with a script posted here on the forums. This process worked flawlessly when adding 2 prior vdevs to the same pool
Can you provide a link to the specific script.

It would be strange if old vdevs are growing in capacity and new devs are not.... can you document that?

The primary immediate benefit of more vdevs is more write bandwidth to the pool.... after it gets reasonably full, you should see more read bandwidth as well.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I suspect it is the "ZFS In-Place Rebalancing" script that has been linked a few times here by other community members.

Because it creates new copies of files using the cp command server-side, deduplication prevents this from working, which includes the new ZFS 2.2 block cloning.


However, this doesn't necessarily explain the "mismatched vdev size" alert that would show in the UI.

@Brandito can you connect to a shell (ideally over SSH, for ease of copying) and provide the output of lsblk -bd which will show the size in bytes of your block devices? It's possible that Seagate changed the drive layouts ever so slightly with the EXOS X18 vs the X16.
 

Brandito

Explorer
Joined
May 6, 2023
Messages
72
I suspect it is the "ZFS In-Place Rebalancing" script that has been linked a few times here by other community members.

Because it creates new copies of files using the cp command server-side, deduplication prevents this from working, which includes the new ZFS 2.2 block cloning.


However, this doesn't necessarily explain the "mismatched vdev size" alert that would show in the UI.

@Brandito can you connect to a shell (ideally over SSH, for ease of copying) and provide the output of lsblk -bd which will show the size in bytes of your block devices? It's possible that Seagate changed the drive layouts ever so slightly with the EXOS X18 vs the X16.
Code:
root@truenas[~]# lsblk -bd
NAME    MAJ:MIN RM           SIZE RO TYPE MOUNTPOINTS
sda       8:0    0 16000900661248  0 disk
sdb       8:16   0 16000900661248  0 disk
sdc       8:32   0 16000900661248  0 disk
sdd       8:48   0 16000900661248  0 disk
sde       8:64   0 16000900661248  0 disk
sdf       8:80   0 16000900661248  0 disk
sdg       8:96   0 16000900661248  0 disk
sdh       8:112  0 16000900661248  0 disk
sdi       8:128  0 16000900661248  0 disk
sdj       8:144  0 16000900661248  0 disk
sdk       8:160  0 16000900661248  0 disk
sdl       8:176  0 16000900661248  0 disk
sdm       8:192  0 16000900661248  0 disk
sdn       8:208  0 16000900661248  0 disk
sdo       8:224  0 16000900661248  0 disk
sdp       8:240  0 16000900661248  0 disk
sdq      65:0    0 16000900661248  0 disk
sdr      65:16   0 16000900661248  0 disk
sds      65:32   0 16000900661248  0 disk
sdt      65:48   0 16000900661248  0 disk
sdu      65:64   0 16000900661248  0 disk
sdv      65:80   0 16000900661248  0 disk
sdw      65:96   0 16000900661248  0 disk
sdx      65:112  0 16000900661248  0 disk
sdy      65:128  0 16000900661248  0 disk
sdz      65:144  0 16000900661248  0 disk
sdaa     65:160  0 16000900661248  0 disk
sdab     65:176  0   960197124096  0 disk
sdac     65:192  0   240057409536  0 disk
sdad     65:208  0   240057409536  0 disk
zd0     230:0    0    32212254720  0 disk
nvme1n1 259:0    0   118410444800  0 disk
nvme0n1 259:1    0   118410444800  0 disk


Here is the output you requested.

According to my spreadsheet, sds should be the x18 drive

And yes, that's the script I'm using, however, that's not related to my original issue as reported by trueNAS in the webUI like I originally thought it may be.

Here are a couple other bits of info that may be relevant. The original 3 vdevs were added over a period of time prior to updating to 23.10 and this latest vdev was added under 23.10.0.1. I know the wizard is different but I doubt the underlying process has changed since bluefin?

I also upgraded the pool after switching to cobia but before adding this vdev
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Did the partition scheme change over time during the development of SCALE? You might want to check if not only the raw disks but also the ZFS partitions are all the same size.
 

Brandito

Explorer
Joined
May 6, 2023
Messages
72
Did the partition scheme change over time during the development of SCALE? You might want to check if not only the raw disks but also the ZFS partitions are all the same size.
How do I check that? I certainly haven't changed anything, just ran through the wizard to add vdevs each time it was needed
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
On my TN SCALE system they look like this:
Code:
root@truenas[~]#  partx -s /dev/nvme0n1
NR    START       END   SECTORS   SIZE NAME UUID
 1      128  33554432  33554305    16G      745d9710-ee1f-494c-8cc6-f47fea76dc75
 2 33554560 488397134 454842575 216.9G      df3ccca1-930a-4ce6-8186-8d2323c61c47
root@truenas[~]#  partx -s /dev/nvme1n1
NR    START       END   SECTORS   SIZE NAME UUID
 1      128  33554432  33554305    16G      8a8b47d7-2869-401d-94ff-db49493400a7
 2 33554560 488397134 454842575 216.9G      554002b8-4633-4b3f-84b9-db51404dcdb8
root@truenas[~]#  partx -s /dev/nvme2n1
NR    START       END   SECTORS   SIZE NAME UUID
 1      128  33554432  33554305    16G      e99c3b1c-d660-40cc-b119-4ed471c575db
 2 33554560 488397134 454842575 216.9G      fc3dfb84-33f7-4c98-920a-86fb9fe8f1e6


It's a bit cumbersome and they seem not to have any partition type that tells you which one is the ZFS partition. So you'll have to go by the size alone.

HTH,
Patrick
 

Brandito

Explorer
Joined
May 6, 2023
Messages
72
On my TN SCALE system they look like this:
Code:
root@truenas[~]#  partx -s /dev/nvme0n1
NR    START       END   SECTORS   SIZE NAME UUID
 1      128  33554432  33554305    16G      745d9710-ee1f-494c-8cc6-f47fea76dc75
 2 33554560 488397134 454842575 216.9G      df3ccca1-930a-4ce6-8186-8d2323c61c47
root@truenas[~]#  partx -s /dev/nvme1n1
NR    START       END   SECTORS   SIZE NAME UUID
 1      128  33554432  33554305    16G      8a8b47d7-2869-401d-94ff-db49493400a7
 2 33554560 488397134 454842575 216.9G      554002b8-4633-4b3f-84b9-db51404dcdb8
root@truenas[~]#  partx -s /dev/nvme2n1
NR    START       END   SECTORS   SIZE NAME UUID
 1      128  33554432  33554305    16G      e99c3b1c-d660-40cc-b119-4ed471c575db
 2 33554560 488397134 454842575 216.9G      fc3dfb84-33f7-4c98-920a-86fb9fe8f1e6


It's a bit cumbersome and they seem not to have any partition type that tells you which one is the ZFS partition. So you'll have to go by the size alone.

HTH,
Patrick
I think you've helped discover the issue. I ran partx on the first drive of each vdev and the results are below

Code:
root@truenas[~]# partx -s /dev/sda
NR   START         END     SECTORS  SIZE NAME UUID
 1     128     4194431     4194304    2G      a76f8acb-f891-11ed-a2f8-90e2baf17bf0
 2 4194432 31251759063 31247564632 14.6T      a7d78b0d-f891-11ed-a2f8-90e2baf17bf0
root@truenas[~]# partx -s /dev/sdj
NR   START         END     SECTORS  SIZE NAME UUID
 1     128     4194304     4194177    2G      b949da47-5141-486d-bad4-e84dd58aa671
 2 4194432 31251759070 31247564639 14.6T      74f3cc23-1b32-4faf-89cc-ba0cd72ba308
root@truenas[~]# partx -s /dev/sdf
NR   START         END     SECTORS  SIZE NAME UUID
 1     128     4194304     4194177    2G      5e3e71f2-9ee9-400c-aa8f-aa01fddc6a86
 2 4194432 31251759070 31247564639 14.6T      7e1fa408-7565-4913-b045-49447ef9253b
root@truenas[~]# partx -s /dev/sds
NR START         END     SECTORS  SIZE NAME UUID
 1  2048 31251757055 31251755008 14.6T      1a865d37-0e03-4dd8-a0f4-96f35e6fcfd3


The newest vdev is missing that 2GB partition. I don't know what that 2gb partition is for but I assume it's important enough for truenas to add it.

Is this fixable without rebuilding the pool from scratch? I don't see how I could have messed anything up running through the wizard

Edit: little research on my end determined this should be a swap partition. I believe at some point I moved it to an SSD pool, so my guess is truenas didn't see a need to match the partitions on the current vdevs and left this partition off the new vdev. I guess this is my fault?

I've attempted to fix this by moving the s3wap partition back to the pool I just expanded and the process finished but the partition doesn't show up on the new vdev and the error remains.

Still wondering if there's a simple fix and whether this even matters other than the warning, and if this would be considered a bug?
 
Last edited:

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Did you possibly set the swap size to 0 before adding the new vdev? That would explain it. I doubt you can add the swap partition now, because that would imply shrinking the ZFS partition. And shrinking vdevs is not possible, unfortunately. Bildschirmfoto 2023-11-11 um 19.00.17.png

You can still try. There is a bit of slack in the required partition sizes. So first attempt (assuming all disks are healthy at the moment so you can offline one and still have sufficient redundancy):

- offline one of the disks without a swap partition
- create a partition table matching the layout of one of the other disks that have one - don't know from the top of my head how you do that in Linux
- replace the offlined disk with the new smaller partition referenced by UUID

There is a chance ZFS might not care about the "missing" 2 G and just perform the operation and resilver.

- repeat for each disk.


Alternative option: do it the other way round and remove swap partitions from all other disks. This assumes you have some swap space configured somewhere in some other pool.


HTH,
Patrick
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Good catch @Brandito and @Patrick M. Hausen - if there was a manual swap size change or a reinstall to a large enough boot media to prompt for swap-on-boot-device during the installation, this would have put them offset by that ~2GB value.

@Brandito would you be willing to grab a debug file from your system, and then either you or I can submit a Jira ticket outlining how this happened? (If you'd rather I submit the ticket, just DM me the debug and I'll attach.)

Unfortunately I'm not confident that the steps outlined by @Patrick M. Hausen above will work, as ZFS will quite likely refuse the replace operation with a "device too small" error.
 

Brandito

Explorer
Joined
May 6, 2023
Messages
72
Good catch @Brandito and @Patrick M. Hausen - if there was a manual swap size change or a reinstall to a large enough boot media to prompt for swap-on-boot-device during the installation, this would have put them offset by that ~2GB value.

@Brandito would you be willing to grab a debug file from your system, and then either you or I can submit a Jira ticket outlining how this happened? (If you'd rather I submit the ticket, just DM me the debug and I'll attach.)

Unfortunately I'm not confident that the steps outlined by @Patrick M. Hausen above will work, as ZFS will quite likely refuse the replace operation with a "device too small" error.
Is this fixable and most importantly, does it require fixing?

There's a part of me bothered by the warning regardless

Would you link me to the ticket once submitted?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Is this fixable and most importantly, does it require fixing?

There's a part of me bothered by the warning regardless

Would you link me to the ticket once submitted?
Is it fixable - possibly only through the "remove swap from all other vdevs and expand them to match" method. RAIDZ2 means that a full vdev evacuation and removal is off the table unfortunately.

Does it require fixing - no, IMHO. The capacity mismatch is so minor (2GB out of 16TB is basically a rounding error) that it won't have any measurable impact on balancing/metaslab sizes.

Got your debug - I'll link the ticket here and @ you when it's in.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112

Brandito

Explorer
Joined
May 6, 2023
Messages
72
New wrinkle to this situation. I think switching the swap partition back to the affected pool was a bad idea. Truenas has become unbootable, boot looping at the point where it's trying to load ix-zfs.service and iz-swap.service. I can boot into an older install, but it's bluefin and my pool has been upgraded to the latest zfs so it doesn't load.

Currently reinstalling truenas, hoping to load a config file and be back up shortly.

Maybe I just got unlucky, but if this is due to the swap only being on 3 of the 4 vdevs, this maybe should be something truenas warns you about.
 

Brandito

Explorer
Joined
May 6, 2023
Messages
72
Actually, worse news still, tried importing my pool and truenas force reboots the system. Have I lost my entire pool to this?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I think switching the swap partition back to the affected pool was a bad idea.
What do you mean by switching the swap partition? TrueNAS will at boot dynamically create mirrors of sufficient redundancy of all swap partitions it finds. "Sufficient" meaning matching the redundancy of the pool storage vdevs. So e.g. for RAIDZ2 it will create three-way mirrors for swap.

There is no UI setting to assign swap to a certain pool. If you mean the system dataset - this has absolutely no connection to swap space whatsoever.

So your pool is most probably not lost. A clean reinstall should be able to import the pool easily. If you have a config backup from before the system crashed, all the better.
 

Brandito

Explorer
Joined
May 6, 2023
Messages
72
Sorry, the option is under the same area where you set swap. I moved the system data set I should say.

Whether it's related or not, when trying to import the pool under a fresh install the system reboots suddenly.

Zpool import shows no errors on the drives to import
 
Top