Pool Created from CLI, RMA replaced via GUI, extra 2G Linux swap created on replacement drive, the larger partition is smaller than other pool members

Sgt_Bizkit

Cadet
Joined
Apr 29, 2023
Messages
3
Hello All, (my first post)

Version:
TrueNAS-SCALE-22.02.4
Running within VMWARE with direct physical access to the disks
(This is done so i can dual boot directly into truenas scale if i want and it continue to function normally as a backup OS)

I created a pool via CLI using (due to budget constraints)
3 x 18tb Disks
2 x fake 18tb disks (offlined)
to create a degraded RaidZ2

I later added extra 2x 18tb Disks replacing the offlined fake disks via CLI and all was well.

5 Months later 1 disk encountered uncorrectable sectors, so i offlined and used the GUI to replace the disk. (hindsight i should have done CLI again)

I didnt notice any immediate issue, and thought i'd check the drive stats using "fdisk --List" and saw the replacement had a 2G swap partition created, whereas the others did not.

It started the resilvering process without issue

This means the larger partition would be smaller than the other pool member partitions, so i imagine the resilver will fail near 100% (currently at 81% - 6 hours to go).
However i'm not 100% sure as some people state the swap is created in the event other disks vary in size slighly per manufacturer and its advisable to keep.
others state its used for swapping out RAM or backward compatibility with core (lot of interpretations)

My question:
Should i disable swap using "midclt call system.advanced.update '{"swapondrive": 0}'" then offline, delete the partitions of the RMA drive and let truenas resilver it?
or see if the Resilver takes with the 2G partition?

The RMA drive is Disk /dev/sdc (disk 2 in windows screenshot)

root@truenas[~]# fdisk --list Disk /dev/sdd: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors Disk model: VMware Virtual S Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: E903C3E2-0EC7-D140-BF00-207392D0F46B Device Start End Sectors Size Type /dev/sdd1 2048 35156637695 35156635648 16.4T Solaris /usr & Apple ZFS /dev/sdd9 35156637696 35156654079 16384 8M Solaris reserved 1 [B]Disk /dev/sdc:[/B] 16.37 TiB, 18000207937536 bytes, 35156656128 sectors Disk model: VMware Virtual S Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: D6F9D8A4-5383-4C84-AA23-C2147ADE4674 Device Start End Sectors Size Type [B]/dev/sdc1 128 4194304 4194177 2G Linux swap[/B] /dev/sdc2 4194432 35156656094 35152461663 16.4T Solaris /usr & Apple ZFS Disk /dev/sdb: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors Disk model: VMware Virtual S Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: A87EAFD6-7424-A940-BB1A-8B1CC3EA0685 Device Start End Sectors Size Type /dev/sdb1 2048 35156637695 35156635648 16.4T Solaris /usr & Apple ZFS /dev/sdb9 35156637696 35156654079 16384 8M Solaris reserved 1 Disk /dev/sdf: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors Disk model: VMware Virtual S Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: EA6E172A-78E0-5B41-BB22-A1967F989676 Device Start End Sectors Size Type /dev/sdf1 2048 35156637695 35156635648 16.4T Solaris /usr & Apple ZFS /dev/sdf9 35156637696 35156654079 16384 8M Solaris reserved 1 Disk /dev/sda: 120 GiB, 128849018880 bytes, 251658240 sectors Disk model: VMware Virtual S Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: B154E109-E16F-4562-AD84-875DDE6BA813 Device Start End Sectors Size Type /dev/sda1 4096 6143 2048 1M BIOS boot /dev/sda2 6144 1054719 1048576 512M EFI System /dev/sda3 34609152 251658206 217049055 103.5G Solaris /usr & Apple ZFS /dev/sda4 1054720 34609151 33554432 16G Linux swap Partition table entries are not in disk order. Disk /dev/sde: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors Disk model: VMware Virtual S Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: CE1CB627-ADA3-9744-BAB6-AD59F74C855D Device Start End Sectors Size Type /dev/sde1 2048 35156637695 35156635648 16.4T Solaris /usr & Apple ZFS /dev/sde9 35156637696 35156654079 16384 8M Solaris reserved 1 Disk /dev/mapper/sda4: 16 GiB, 17179869184 bytes, 33554432 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes root@truenas[~]#

1682833130159.png



Checking my old notes i used these commands when creating the original pool:
truncate -s 18000207937536 /tmp/FD1.img truncate -s 18000207937536 /tmp/FD2.img zpool create StoragePool -o ashift=12 -f raidz2 /dev/sdd /dev/sdc /dev/sdb /tmp/FD1.img /tmp/FD2.img zpool offline StoragePool /tmp/FD1.img zpool offline StoragePool /tmp/FD2.img root@truenas[~]# zpool status StoragePool pool: StoragePool state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. config: NAME STATE READ WRITE CKSUM StoragePool DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 sdf ONLINE 0 0 0 sde ONLINE 0 0 0 sdd ONLINE 0 0 0 /tmp/FD1.img OFFLINE 0 0 0 /tmp/FD2.img OFFLINE 0 0 0 [COLOR=rgb(20, 20, 20)]zpool replace StoragePool -f /tmp/FD2.img /dev/sdb[/COLOR] zpool online StoragePool /dev/sdb zpool replace StoragePool -f /tmp/FD1.img /dev/sdc zpool online StoragePool /dev/sdc


Thank you for taking the time to read & reply
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
This means the larger partition would be smaller than the other pool member partitions, so i imagine the resilver will fail near 100% (currently at 81% - 6 hours to go).
I doubt the former is the case, and I think it's highly unlikely the latter is the case--if the remaining partition were too small, the replace operation should have failed immediately.
 

Sgt_Bizkit

Cadet
Joined
Apr 29, 2023
Messages
3
Hi Danb35,

I waited for it to complete the resilver and it did complete

however it appears to be starting over again, but not showing exactly as before in the CLI (not showing which disk it is resilvering, but i can see its the RMA drive being written too again.)

Any suggestions on how to proceed?

#########################################

Code:
root@truenas[~]# zpool status StoragePool
  pool: StoragePool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Apr 29 04:02:20 2023
        64.7T scanned at 546M/s, 64.6T issued at 546M/s, 64.7T total
        12.9T resilvered, 99.98% done, 00:00:21 to go
config:


        NAME                                      STATE     READ WRITE CKSUM
        StoragePool                               ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            3cfbdc2e-fd63-4c48-8ed7-29726666cbf3  ONLINE       0     0     0
            c02af95d-028a-4324-a95e-5d1a32c7259b  ONLINE       0     0     0  (resilvering)
            sdb                                   ONLINE       0     0     0
            b8bb1e1f-c388-8648-a7f0-6da6b08b169e  ONLINE       0     0     0
            352b02bc-488d-cc41-8cbf-32995ea61f72  ONLINE       0     0     0

errors: No known data errors


root@truenas[~]# zpool status StoragePool
  pool: StoragePool
 state: ONLINE
  scan: resilvered 12.9T in 1 days 10:36:25 with 0 errors on Sun Apr 30 14:38:45 2023
config:


        NAME                                      STATE     READ WRITE CKSUM
        StoragePool                               ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            3cfbdc2e-fd63-4c48-8ed7-29726666cbf3  ONLINE       0     0     0
            c02af95d-028a-4324-a95e-5d1a32c7259b  ONLINE       0     0     0
            sdb                                   ONLINE       0     0     0
            b8bb1e1f-c388-8648-a7f0-6da6b08b169e  ONLINE       0     0     0
            352b02bc-488d-cc41-8cbf-32995ea61f72  ONLINE       0     0     0


errors: No known data errors


root@truenas[~]# zpool status StoragePool
  pool: StoragePool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Apr 30 14:38:50 2023
        871G scanned at 12.8G/s, 641M issued at 9.42M/s, 64.7T total
        0B resilvered, 0.00% done, no estimated completion time
config:


        NAME                                      STATE     READ WRITE CKSUM
        StoragePool                               ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            3cfbdc2e-fd63-4c48-8ed7-29726666cbf3  ONLINE       0     0     0
            c02af95d-028a-4324-a95e-5d1a32c7259b  ONLINE       0     0     0
            sdb                                   ONLINE       0     0     0
            b8bb1e1f-c388-8648-a7f0-6da6b08b169e  ONLINE       0     0     0
            352b02bc-488d-cc41-8cbf-32995ea61f72  ONLINE       0     0     0


errors: No known data errors



1682862869891.png
 

Sgt_Bizkit

Cadet
Joined
Apr 29, 2023
Messages
3
For those wondering the outcome of my endeavour,

I used the command midclt call system.advanced.update '{"swapondrive": 0}' to change the swap size to 0GB on new disk members.

shut down the VM, re-instorduced the disk as "new" hardware (direct physical access), used the "new disk" to replace the old reference via gui.

this time it didnt show the 2Gb swap at the start of the partition
Disk /dev/sdc: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors Disk model: VMware Virtual S Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: CDE4CCD6-9595-4FD8-A80F-7DF4BD7894F7 Device Start End Sectors Size Type /dev/sdc1 40 35156656094 35156656055 16.4T Solaris /usr & Apple ZFS



It ran the resilver once, the GUI however didnt show the % or time to finish ETA correctly during this run.

after it completed, it ran the resilver again however the GUI this time showed all the figures and ETA correctly.

After being very patient it completed successfully on thhe 2nd run and has not restarted the resilver.


root@truenas[~]# zpool status -xv pool: StoragePool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue May 2 01:44:51 2023 64.7T scanned at 547M/s, 64.6T issued at 547M/s, 64.7T total 12.9T resilvered, 99.98% done, 00:00:26 to go config: NAME STATE READ WRITE CKSUM StoragePool ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 3cfbdc2e-fd63-4c48-8ed7-29726666cbf3 ONLINE 0 0 0 85e97ab0-ceca-4c32-b1c3-ef36570c58b5 ONLINE 0 0 0 (resilvering) sdb ONLINE 0 0 0 b8bb1e1f-c388-8648-a7f0-6da6b08b169e ONLINE 0 0 0 352b02bc-488d-cc41-8cbf-32995ea61f72 ONLINE 0 0 0 root@truenas[~]# zpool status StoragePool pool: StoragePool state: ONLINE scan: resilvered 12.9T in 1 days 10:24:46 with 0 errors on Wed May 3 12:09:37 2023 config: NAME STATE READ WRITE CKSUM StoragePool ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 3cfbdc2e-fd63-4c48-8ed7-29726666cbf3 ONLINE 0 0 0 85e97ab0-ceca-4c32-b1c3-ef36570c58b5 ONLINE 0 0 0 sdb ONLINE 0 0 0 b8bb1e1f-c388-8648-a7f0-6da6b08b169e ONLINE 0 0 0 352b02bc-488d-cc41-8cbf-32995ea61f72 ONLINE 0 0 0 errors: No known data errors

1683112958450.png



closing remarks:
I recall the GUI having the same issues when using the 2GB partition and when it restarted the resilver again i saw the GUI updated correctly.
so it may have completed successfully even with the swap partition on the second run (however this was not identical to the partitions of the drive it was replacing). Here is a thread of similar issues with replacement drives not being replaced identically in partitions causing issues https://www.truenas.com/community/t...g-error-that-is-not-warned-over-anyway.93735/

Where i got the disable swap command from:


I believe the default setting is: midclt call system.advanced.update '{"swapondrive": 2}'
(i read it in a comment somewhere, results in 2GB swap)

3 full resilvers (2 borked gui (with and without swap), and 1 correct gui (after full resilver with no swap)

Very stressful but I think i am good.
 
Top