SOLVED Scale - Expand to fill new larger disks errors and chasing my tail Error: Partition(s) 1 on /dev/sdb have been written, but we have been unable

ColeTrain · Jan 22, 2024

Scale. OS Version:TrueNAS-SCALE-23.10.1.1
So, I replaced all the disks in a RaidZ2 8 Disk pool. 3 TB to 10 TB.
Autoexpand is ON.
Nothing happens.
I press "expand" in the GUI. I get this error.

[EFAULT] Command partprobe /dev/sdb failed (code 1): Error: Partition(s) 1 on /dev/sdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.

I reboot. I get this error. (SDB has moved to SDA, by the way, when verifying serial number. Its the same disk in a different spot.)

And it looks like this here. (below)

And they were all fine before the button pressing of "expand" and then the reboot. I rebooted a couple times before this. It was the act of pressing "expand" and then reboot causes this. Not the act of rebooting.

I'm trying to keep this concise, but I have chased my tail on this a couple times. and I am back here again. The long version is, I did the above and then resilver the disk. It said to Force, and then that caused different issues. I replaced that disk with another, only to resilver it again, to get back here, to get in the loop again. I just wiped that Disk. Serial ends in 69Z as it said it had partitions. Now I am replacing that disk "142...835" with the disk again.

What I did not keep track of, is when I did this before, was it the same disk or a different disk? Unsure.
When this gets done. (It will be about 4 hours, not the 4 days it says now) Can anyone advise best steps? Command line or GUI is fine with me. But if you do command line, be specific. And no, I have not used command line here other than viewing status. (Just saying, I didn't mistype anything or mess it up that way.)

EDIT - My Solution in Last post. POST 17

danb35 · Jan 22, 2024

I think this is a bug in SCALE; I'm encountering a similar issue. Ticket is here:

Log in with Atlassian account

Log in to Jira, Confluence, and all other Atlassian Cloud products here. Not an Atlassian user? Sign up for free.

ixsystems.atlassian.net

ColeTrain · Jan 22, 2024

danb35 said:
I think this is a bug in SCALE; I'm encountering a similar issue. Ticket is here:

Log in with Atlassian account

Log in to Jira, Confluence, and all other Atlassian Cloud products here. Not an Atlassian user? Sign up for free.

ixsystems.atlassian.net

What do we do? Command line?

danb35 · Jan 22, 2024

ColeTrain said:
What do we do? Command line?

It's an option, but I'd be inclined to at least wait a few days and see what the devs say.

ColeTrain · Jan 22, 2024

For the sake of documentation, I ran into this issue in the loop and this was another problem on the last go-a-round.

My solve was resilver back to another disk and then wipe and resilver back to the new disk.
Separate issue perhaps

This isn't my exact error, but similar to the one I got.

Reddit - Dive into anything

www.reddit.com

chef:~ # zpool replace -f tank 12988155072034477206 /dev/disk/by-id/wwn-0x50014ee2b10a5952-part1
cannot replace 12988155072034477206 with /dev/disk/by-id/wwn-0x50014ee2b10a5952-part1: /dev/disk/by-id/wwn-0x50014ee2b10a5952-part1 is busy, or device removal is in progress

ColeTrain · Jan 22, 2024

root@TrueNAS73[/home/admin]# zpool online -e Super8 26840c01-03fe-41dd-9214-6afca165f2ba
root@TrueNAS73[/home/admin]# zpool online -e Super8 b9142d59-d5d4-48b7-9bb6-a66ed4f55f81
root@TrueNAS73[/home/admin]# zpool online -e Super8 60bfc898-a0ed-4b9d-9979-1802b5c91a11
root@TrueNAS73[/home/admin]# zpool online -e Super8 8aed19bf-a818-4c7d-ba26-7b624a2c185b
root@TrueNAS73[/home/admin]# zpool online -e Super8 99b7bf9a-9267-442b-9460-5ee89a4eed3f
root@TrueNAS73[/home/admin]# zpool online -e Super8 baf7abf1-1b69-4a2e-8ff3-bd6fd60efeb8
root@TrueNAS73[/home/admin]# zpool online -e Super8 8a77e35e-66a0-4b1a-b874-c49b5555b1c8
root@TrueNAS73[/home/admin]# zpool online -e Super8 7f8a443a-7f85-40c0-bddb-fba7e3147126
root@TrueNAS73[/home/admin]# zpool get autoexpand
NAME PROPERTY VALUE SOURCE
Super8 autoexpand on local
boot-pool autoexpand off default
root@TrueNAS73[/home/admin]# zpool autoexpand=off Super8
root@TrueNAS73[/home/admin]# zpool get autoexpand
NAME PROPERTY VALUE SOURCE
Super8 autoexpand off default
boot-pool autoexpand off default
root@TrueNAS73[/home/admin]# zpool autoexpand=on Super8

Tried the above, did not work

ColeTrain · Jan 22, 2024

Well, I am trying it again. I hit the "expand" button. It errored. I did not reboot as the error indicated. I noted the Drive "SDF" this time. Serial ending in 36Z. I offlined it. I wiped it. I added it back. Resilvering now.
then P6z.
its rotating through the drives, not the same one over and over. I plan to keep resilvering until I loop or see repetition.... or it expands

otherego31235 · Jan 23, 2024

I'm experiencing the same problem. I think I will try to solve by using zfs commands directly. My ssd pool in raidz1 was completely killed by expand button.
Fortunately I had important data cloned into another pool. Now I'm trying to expand main pool, because I've changed all HDD from 3 to 6 tb, but I'm encountering same error message

danb35 · Jan 24, 2024

My ticket's been marked as a duplicate of this one:

[NAS-126809] - iXsystems TrueNAS Jira

ixsystems.atlassian.net

danb35 · Jan 24, 2024

I decided to go ahead and do the disk replacement at the CLI. Here's my write-up of how I did it:

Manual disk replacement in TrueNAS SCALE

wiki.familybrown.org

ColeTrain · Jan 24, 2024

I am noting that when you hit the "expand button" the drive now looks like this in the GUI after you get the "Fail"

"offline"
"wipe"

sretalla · Jan 24, 2024

danb35 said:
I decided to go ahead and do the disk replacement at the CLI. Here's my write-up of how I did it:

Manual disk replacement in TrueNAS SCALE

wiki.familybrown.org

It's great that this is a thing as it seems there's a gap waiting for a bug fix and generally good information to know... but...

Shouldn't that be a resource here instead? (@danb35)

danb35 · Jan 24, 2024

sretalla said:
Shouldn't that be a resource here instead?

Maybe. iX has a bad habit of deleting information (see the on-again, off-again iX wiki), but the resources don't really seem like they're going anywhere. Once I'm confident this works without undesirable side-effects in the GUI, I'll see about putting it in a resource.

ColeTrain · Jan 24, 2024

otherego31235 said:
I'm experiencing the same problem. I think I will try to solve by using zfs commands directly. My ssd pool in raidz1 was completely killed by expand button.
Fortunately I had important data cloned into another pool. Now I'm trying to expand main pool, because I've changed all HDD from 3 to 6 tb, but I'm encountering same error message

well, let us know if you find out good things

ColeTrain · Jan 24, 2024

danb35 said:
I decided to go ahead and do the disk replacement at the CLI. Here's my write-up of how I did it:

Manual disk replacement in TrueNAS SCALE

wiki.familybrown.org

and @sretalla

Thank you both so much for your help!

@danb35 Reading through the guide, so you partition one disk. select the UUID from the 2nd bigger partition, and resilver onto that 2nd partition. Then you do the next disk. Right?
I was trying to understand this. wondering how the resilver doesn't undo your partitioning, and the answer is because you are using the UUID from the 2nd partition? And you don't lose all the date because you do it one at a time. I think that makes sense now, to me. Sorry/ thanks

I have hit the "expand" button and wiped / replaced / resilvered every disk as they errored. The last one is running now. If it doesn't work after this, it will be time for your guide.

danb35 · Jan 25, 2024

When you replace a disk through the TrueNAS UI, TrueNAS partitions the disk (which is where the problem is; it's partitioning the disk incorrectly*) and then does zpool replace to replace the old partition/drive with the new one. The only difference between doing it through the UI and following my instructions should be (emphasis here because I'm not yet 100% sure this is the case) that the new disk is partitioned correctly; the zpool replace/resilver doesn't affect the partitioning.

The other steps are just to keep things in line with the TrueNAS way of doing things. TrueNAS uses UUIDs to identify disks/partitions in a ZFS pool (for good reason--this way, if disks get moved around or otherwise come up with different names, ZFS doesn't get confused), so I do as well. TrueNAS by default puts a small swap partition at the beginning of the disk; might as well do that.

ColeTrain said:
And you don't lose all the date because you do it one at a time.

I'm actually resilvering all six disks at once--I have enough drive bays to have all the old disks, and all the new ones, online at the same time, so I didn't see any real reason to drag it out and do them one-by-one. ZFS handles this just fine.

I've also made a tweak to the instructions I linked above, reducing the number of steps involved in partitioning the new drive.

* This is how I knew I was running into this problem as well. Threads here had me aware of an issue, but once the drive replacement through the GUI started, I ran lsblk and could see that my replacement disk had only a 3.6 TiB (4 TB) data partition rather than the 16 TB it should have had. That's when I reported the bug.

ColeTrain · Jan 25, 2024

IT WORKED

My disks expanded. Here is what I did. I am not sure all the behind the scenes, I would have liked to know the problem I am resolving, and I don't, but here is what i did.

first of all, take a screenshot of Disks. Make sure you know the correlation of Drive letters (ie. sda) and Serial numbers.
Press the "expand" button the GUI. It will spin and think and then it errored. (First post) "Command partprobe /dev/sdb failed (code 1): Error: Partition(s) 1..." etc.
Notate that "sdb" letter. [note] If you go to "manage devices" that letter is gone {post 11}
Select the drive (that is errored, with a new funny number) and "offline" it
I then went to storage > disks and selected the disk by using the serial number and then selected "wipe" and I used "quick" Was this necessary? I am not sure. The reason I did it was post 5 On my first attempts, It grabbed the wrong partition and caused problems. I am not sure what that problem was, my assumption had to do with partitioning. But I didn't want to do that again, so i used "wipe" to prevent that fiasco.
I went back to Storage > Managed Devices and selected "replace"
For me, I waited about 4.5 hours
Repeat for all disks. I kept having hope about this process as the error in step 2 was always a new disk. I never rebooted as the error indicates and therefore my drive letters (sda etc. ) never changed. [And also the rebooting caused the fiasco in step post 5 .]
The irony of having to hit "expand" for every disk in the above occurs after the last resilver, I did not have to hit "expand" for it to work, it finally just did automatically. (Assuming autoexpand=on)

Is it the best way? I don't know.
Did I do extra steps? I don't know.
What is the problem I am solving? I don't know, but everything I saw supported @danb35 and the theory of partitions.
I don't know why this worked. But, my theory is to tell you to "offline" the disks before replacing. I read somewhere that it was good to let the disk being replaced help in the resilver process, which is what I did, but perhaps the old disk needs to be offlined before resilver. again, I don't know.

Starblazr · Jan 30, 2024

I had the same issue, but after the initial resilvering, all I did was:

Code:

parted /dev/sdX resizepart 1 100%

Then TrueNAS found the new size and expanded the pool.

jenesuispasbavard · Feb 1, 2024

Same issue, and following @danb35's writeup worked perfectly. Except I'm using three 4TB NVMe drives in raidz1, so there is no swap partition (actually none of my disks - even HDDs in a separate raidz2 pool - have a swap partition ¯\_(ツ)_/¯).

Kinda wild that such a (possibly) pool-breaking bug made it to production... Replace through the GUI is pretty much broken. But looks like it'll be fixed in 23.10.2 judging by the Jira ticket.

Thebokke · Feb 3, 2024

Just wanted to chime in that I'm having the exact same issues Scale 23.10.0.1. Do I follow @ColeTrain or @danb35 writeups? All original disks have been removed and replaced with larger drives, expand button has broken the pool on 2 occassions (6 disk Z2). Bit of a nightmare, looking for the simplest (and least risky method). Any advice appreciated.

Important Announcement for the TrueNAS Community.

SOLVED **Scale - Expand to fill new larger disks errors and chasing my tail** *Error: Partition(s) 1 on /dev/sdb have been written, but we have been unable*

Dabbler

Hall of Famer

Dabbler

Hall of Famer

Dabbler

Dabbler

Dabbler

Cadet

Hall of Famer

Hall of Famer

Dabbler

Powered by Neutrality

Hall of Famer

Dabbler

Dabbler

Hall of Famer

Dabbler

Cadet

Dabbler

Dabbler

Similar threads

SOLVED Scale - Expand to fill new larger disks errors and chasing my tail Error: Partition(s) 1 on /dev/sdb have been written, but we have been unable