SOLVED **Scale - Expand to fill new larger disks errors and chasing my tail** *Error: Partition(s) 1 on /dev/sdb have been written, but we have been unable*

ColeTrain

Dabbler
Joined
Jan 30, 2022
Messages
10
Scale. OS Version:TrueNAS-SCALE-23.10.1.1
So, I replaced all the disks in a RaidZ2 8 Disk pool. 3 TB to 10 TB.
Autoexpand is ON.
Nothing happens.
I press "expand" in the GUI. I get this error.
[EFAULT] Command partprobe /dev/sdb failed (code 1): Error: Partition(s) 1 on /dev/sdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.


I reboot. I get this error. (SDB has moved to SDA, by the way, when verifying serial number. Its the same disk in a different spot.)
1705954221152.png



And it looks like this here. (below)


1705954253069.png

And they were all fine before the button pressing of "expand" and then the reboot. I rebooted a couple times before this. It was the act of pressing "expand" and then reboot causes this. Not the act of rebooting.


I'm trying to keep this concise, but I have chased my tail on this a couple times. and I am back here again. The long version is, I did the above and then resilver the disk. It said to Force, and then that caused different issues. I replaced that disk with another, only to resilver it again, to get back here, to get in the loop again. I just wiped that Disk. Serial ends in 69Z as it said it had partitions. Now I am replacing that disk "142...835" with the disk again.

1705954956272.png


What I did not keep track of, is when I did this before, was it the same disk or a different disk? Unsure.
When this gets done. (It will be about 4 hours, not the 4 days it says now) Can anyone advise best steps? Command line or GUI is fine with me. But if you do command line, be specific. And no, I have not used command line here other than viewing status. (Just saying, I didn't mistype anything or mess it up that way.)



EDIT - My Solution in Last post. POST 17
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I think this is a bug in SCALE; I'm encountering a similar issue. Ticket is here:
 

ColeTrain

Dabbler
Joined
Jan 30, 2022
Messages
10
I think this is a bug in SCALE; I'm encountering a similar issue. Ticket is here:
What do we do? Command line?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

ColeTrain

Dabbler
Joined
Jan 30, 2022
Messages
10
For the sake of documentation, I ran into this issue in the loop and this was another problem on the last go-a-round.

My solve was resilver back to another disk and then wipe and resilver back to the new disk.
Separate issue perhaps

This isn't my exact error, but similar to the one I got.




chef:~ # zpool replace -f tank 12988155072034477206 /dev/disk/by-id/wwn-0x50014ee2b10a5952-part1
cannot replace 12988155072034477206 with /dev/disk/by-id/wwn-0x50014ee2b10a5952-part1: /dev/disk/by-id/wwn-0x50014ee2b10a5952-part1 is busy, or device removal is in progress
 

ColeTrain

Dabbler
Joined
Jan 30, 2022
Messages
10
root@TrueNAS73[/home/admin]# zpool online -e Super8 26840c01-03fe-41dd-9214-6afca165f2ba
root@TrueNAS73[/home/admin]# zpool online -e Super8 b9142d59-d5d4-48b7-9bb6-a66ed4f55f81
root@TrueNAS73[/home/admin]# zpool online -e Super8 60bfc898-a0ed-4b9d-9979-1802b5c91a11
root@TrueNAS73[/home/admin]# zpool online -e Super8 8aed19bf-a818-4c7d-ba26-7b624a2c185b
root@TrueNAS73[/home/admin]# zpool online -e Super8 99b7bf9a-9267-442b-9460-5ee89a4eed3f
root@TrueNAS73[/home/admin]# zpool online -e Super8 baf7abf1-1b69-4a2e-8ff3-bd6fd60efeb8
root@TrueNAS73[/home/admin]# zpool online -e Super8 8a77e35e-66a0-4b1a-b874-c49b5555b1c8
root@TrueNAS73[/home/admin]# zpool online -e Super8 7f8a443a-7f85-40c0-bddb-fba7e3147126
root@TrueNAS73[/home/admin]# zpool get autoexpand
NAME PROPERTY VALUE SOURCE
Super8 autoexpand on local
boot-pool autoexpand off default
root@TrueNAS73[/home/admin]# zpool autoexpand=off Super8
root@TrueNAS73[/home/admin]# zpool get autoexpand
NAME PROPERTY VALUE SOURCE
Super8 autoexpand off default
boot-pool autoexpand off default
root@TrueNAS73[/home/admin]# zpool autoexpand=on Super8

Tried the above, did not work
 

ColeTrain

Dabbler
Joined
Jan 30, 2022
Messages
10
Well, I am trying it again. I hit the "expand" button. It errored. I did not reboot as the error indicated. I noted the Drive "SDF" this time. Serial ending in 36Z. I offlined it. I wiped it. I added it back. Resilvering now.
then P6z.
its rotating through the drives, not the same one over and over. I plan to keep resilvering until I loop or see repetition.... or it expands
 
Last edited:
Joined
Jan 18, 2021
Messages
5
I'm experiencing the same problem. I think I will try to solve by using zfs commands directly. My ssd pool in raidz1 was completely killed by expand button.
Fortunately I had important data cloned into another pool. Now I'm trying to expand main pool, because I've changed all HDD from 3 to 6 tb, but I'm encountering same error message
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

ColeTrain

Dabbler
Joined
Jan 30, 2022
Messages
10
I am noting that when you hit the "expand button" the drive now looks like this in the GUI after you get the "Fail"

"offline"
"wipe"



1706104760685.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Shouldn't that be a resource here instead?
Maybe. iX has a bad habit of deleting information (see the on-again, off-again iX wiki), but the resources don't really seem like they're going anywhere. Once I'm confident this works without undesirable side-effects in the GUI, I'll see about putting it in a resource.
 

ColeTrain

Dabbler
Joined
Jan 30, 2022
Messages
10
I'm experiencing the same problem. I think I will try to solve by using zfs commands directly. My ssd pool in raidz1 was completely killed by expand button.
Fortunately I had important data cloned into another pool. Now I'm trying to expand main pool, because I've changed all HDD from 3 to 6 tb, but I'm encountering same error message
well, let us know if you find out good things
 

ColeTrain

Dabbler
Joined
Jan 30, 2022
Messages
10
I decided to go ahead and do the disk replacement at the CLI. Here's my write-up of how I did it:
and @sretalla

Thank you both so much for your help!

@danb35 Reading through the guide, so you partition one disk. select the UUID from the 2nd bigger partition, and resilver onto that 2nd partition. Then you do the next disk. Right?
I was trying to understand this. wondering how the resilver doesn't undo your partitioning, and the answer is because you are using the UUID from the 2nd partition? And you don't lose all the date because you do it one at a time. I think that makes sense now, to me. Sorry/ thanks

I have hit the "expand" button and wiped / replaced / resilvered every disk as they errored. The last one is running now. If it doesn't work after this, it will be time for your guide.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
When you replace a disk through the TrueNAS UI, TrueNAS partitions the disk (which is where the problem is; it's partitioning the disk incorrectly*) and then does zpool replace to replace the old partition/drive with the new one. The only difference between doing it through the UI and following my instructions should be (emphasis here because I'm not yet 100% sure this is the case) that the new disk is partitioned correctly; the zpool replace/resilver doesn't affect the partitioning.

The other steps are just to keep things in line with the TrueNAS way of doing things. TrueNAS uses UUIDs to identify disks/partitions in a ZFS pool (for good reason--this way, if disks get moved around or otherwise come up with different names, ZFS doesn't get confused), so I do as well. TrueNAS by default puts a small swap partition at the beginning of the disk; might as well do that.
And you don't lose all the date because you do it one at a time.
I'm actually resilvering all six disks at once--I have enough drive bays to have all the old disks, and all the new ones, online at the same time, so I didn't see any real reason to drag it out and do them one-by-one. ZFS handles this just fine.

I've also made a tweak to the instructions I linked above, reducing the number of steps involved in partitioning the new drive.

* This is how I knew I was running into this problem as well. Threads here had me aware of an issue, but once the drive replacement through the GUI started, I ran lsblk and could see that my replacement disk had only a 3.6 TiB (4 TB) data partition rather than the 16 TB it should have had. That's when I reported the bug.
 
Last edited:

ColeTrain

Dabbler
Joined
Jan 30, 2022
Messages
10
IT WORKED

My disks expanded. Here is what I did. I am not sure all the behind the scenes, I would have liked to know the problem I am resolving, and I don't, but here is what i did.

  1. first of all, take a screenshot of Disks. Make sure you know the correlation of Drive letters (ie. sda) and Serial numbers.
  2. Press the "expand" button the GUI. It will spin and think and then it errored. (First post) "Command partprobe /dev/sdb failed (code 1): Error: Partition(s) 1..." etc.
  3. Notate that "sdb" letter. [note] If you go to "manage devices" that letter is gone {post 11}
  4. Select the drive (that is errored, with a new funny number) and "offline" it
  5. I then went to storage > disks and selected the disk by using the serial number and then selected "wipe" and I used "quick" Was this necessary? I am not sure. The reason I did it was post 5 On my first attempts, It grabbed the wrong partition and caused problems. I am not sure what that problem was, my assumption had to do with partitioning. But I didn't want to do that again, so i used "wipe" to prevent that fiasco.
  6. I went back to Storage > Managed Devices and selected "replace"
  7. For me, I waited about 4.5 hours
  8. Repeat for all disks. I kept having hope about this process as the error in step 2 was always a new disk. I never rebooted as the error indicates and therefore my drive letters (sda etc. ) never changed. [And also the rebooting caused the fiasco in step post 5 .]
  9. The irony of having to hit "expand" for every disk in the above occurs after the last resilver, I did not have to hit "expand" for it to work, it finally just did automatically. (Assuming autoexpand=on)
Is it the best way? I don't know.
Did I do extra steps? I don't know.
What is the problem I am solving? I don't know, but everything I saw supported @danb35 and the theory of partitions.
I don't know why this worked. But, my theory is to tell you to "offline" the disks before replacing. I read somewhere that it was good to let the disk being replaced help in the resilver process, which is what I did, but perhaps the old disk needs to be offlined before resilver. again, I don't know.
 
Last edited:

Starblazr

Cadet
Joined
Jan 30, 2024
Messages
1
I had the same issue, but after the initial resilvering, all I did was:
Code:
parted /dev/sdX resizepart 1 100%


Then TrueNAS found the new size and expanded the pool.
 
Joined
Jan 1, 2023
Messages
16
Same issue, and following @danb35's writeup worked perfectly. Except I'm using three 4TB NVMe drives in raidz1, so there is no swap partition (actually none of my disks - even HDDs in a separate raidz2 pool - have a swap partition ¯\_(ツ)_/¯).

Kinda wild that such a (possibly) pool-breaking bug made it to production... Replace through the GUI is pretty much broken. But looks like it'll be fixed in 23.10.2 judging by the Jira ticket.
 
Last edited:

Thebokke

Dabbler
Joined
Aug 3, 2018
Messages
10
Just wanted to chime in that I'm having the exact same issues Scale 23.10.0.1. Do I follow @ColeTrain or @danb35 writeups? All original disks have been removed and replaced with larger drives, expand button has broken the pool on 2 occassions (6 disk Z2). Bit of a nightmare, looking for the simplest (and least risky method). Any advice appreciated.
 
Top