SOLVED After upgrading from CORE to SCALE, one pool cannot be imported

airflow

Contributor
Joined
May 29, 2014
Messages
111
Hi! After using a TrueNAS Core setup in my network for many years, I decided to have a look at TrueNAS Scale.

The hardware I'm using is the following:

ASRock Rack E3C226D2I with Intel Xeon E3-1270 v3, 4C/8T, 3.50-3.90GHz, 16GB ECC RAM
pool DATA consisting of 6x "TOSHIBA HDWE140"
pool SSD consisting of 1x "WD_BLACK AN1500" (this is an NVME drive which goes directly into the PCIE slot)

I tried both upgrading the existing CORE-installation (TrueNAS-13.0-U6.1) and also a fresh install to a new USB-drive with TrueNAS-SCALE-23.10.2.

In both cases, the pool consisting of the spinning rust was imported successfully. The SSD-pool however is not shown as importable.

However, the NVME-device itself is recognized and listed properly. Actually, it is offered to use it for a new pool. But no option to import the existing pool there.

If I boot back into TrueNAS CORE, everything's working fine and both pool can be used.

Any idea what to do here? One idea would be to zfs-copy the SSD-pool-data to the DATA-pool in CORE, re-create the pool in SCALE and then copy back. But I want to be sure that I can boot back into TrueNAS CORE as long as I have to completely migrated all my jails to some container-equivalent on SCALE. That's why I would prefer to just import the existing pool in SCALE.

Thanks,
airflow
 
Joined
Oct 22, 2019
Messages
3,641
Does your SSD pool house your System Dataset by any chance?
 
Joined
Oct 22, 2019
Messages
3,641
Are there any remnant ".system" datasets/children that remain on the SSD pool?
Code:
zfs list -r -t filesystem -o name ssdpool | grep "\.system"
 
Joined
Oct 22, 2019
Messages
3,641
Yes. But before doing anything, make a temporary safety checkpoint of the pool, first:
Code:
zpool checkpoint ssdpool


Confirm that there is a "size" value, which means the checkpoint exists:
Code:
zpool get checkpoint


Then confirm that these datasets are not mounted. (The only mounted ".system"-associated datasets should be those of your main pool; assuming that you intentionally set the main pool to house your System Dataset.) I must stress: You do NOT want to destroy your actual System Dataset that is currently in use.
Code:
mount | grep "\.system"
You should only see references to your main pool (not the SSD pool's datasets/children).


Then do a dry-run, recursive destruction, to make sure it looks correct:
Code:
zfs destroy -n -vr ssdpool/.system


If it looks correct, go ahead and destroy them by removing the "-n" flag:
Code:
zfs destroy -vr ssdpool/.system



Now try "side-grading" / "fresh installing" / export-importing into SCALE.


If everything seems to be in working order, you can remove the checkpoint:
Code:
zpool checkpoint -d ssdpool



Confirm the checkpoint is gone. There should be no "size" value:
Code:
zpool get checkpoint
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Actually, it is offered to use it for a new pool. But no option to import the existing pool there.
This is disturbing...

I wonder if it's related to the SCALE bug in regards to partitioning?
 

airflow

Contributor
Joined
May 29, 2014
Messages
111
Yes. But before doing anything, make a temporary safety checkpoint of the pool, first:
Code:
zpool checkpoint ssdpool

Thanks for your detailed instructions and help. I did all this in the last half hour.

Removal of the .system dataset worked without problems. The old CORE-installation works fine just as before. For the behaviour of the SCALE-installation, it doesn't change anything. Pool cannot be imported, is is not listed as available. Should I file a bug?
 
Joined
Oct 22, 2019
Messages
3,641
While in SCALE, can you list the block devices and partitions?
Code:
lsblk


(Don't do anything "ZFS"-related, and don't try to import anything.)
 

airflow

Contributor
Joined
May 29, 2014
Messages
111
While in SCALE, can you list the block devices and partitions?
Code:
lsblk


(Don't do anything "ZFS"-related, and don't try to import anything.)

This gives:

Code:
root@truenas[~]# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINTS
sda           8:0    0  3.6T  0 disk
├─sda1        8:1    0    2G  0 part
│ └─md126     9:126  0    2G  0 raid1
│   └─md126 253:0    0    2G  0 crypt [SWAP]
└─sda2        8:2    0  3.6T  0 part
sdb           8:16   0  3.6T  0 disk
├─sdb1        8:17   0    2G  0 part
│ └─md127     9:127  0    2G  0 raid1
│   └─md127 253:1    0    2G  0 crypt [SWAP]
└─sdb2        8:18   0  3.6T  0 part
sdc           8:32   0  3.6T  0 disk
├─sdc1        8:33   0    2G  0 part
│ └─md127     9:127  0    2G  0 raid1
│   └─md127 253:1    0    2G  0 crypt [SWAP]
└─sdc2        8:34   0  3.6T  0 part
sdd           8:48   0  3.6T  0 disk
├─sdd1        8:49   0    2G  0 part
│ └─md126     9:126  0    2G  0 raid1
│   └─md126 253:0    0    2G  0 crypt [SWAP]
└─sdd2        8:50   0  3.6T  0 part
sde           8:64   0  3.6T  0 disk
├─sde1        8:65   0    2G  0 part
│ └─md127     9:127  0    2G  0 raid1
│   └─md127 253:1    0    2G  0 crypt [SWAP]
└─sde2        8:66   0  3.6T  0 part
sdf           8:80   0  3.6T  0 disk
├─sdf1        8:81   0    2G  0 part
│ └─md126     9:126  0    2G  0 raid1
│   └─md126 253:0    0    2G  0 crypt [SWAP]
└─sdf2        8:82   0  3.6T  0 part
sdg           8:96   1 14.4G  0 disk
├─sdg1        8:97   1    1M  0 part
├─sdg2        8:98   1  512M  0 part
└─sdg3        8:99   1 13.9G  0 part
nvme0n1     259:0    0  1.8T  0 disk
 
Joined
Oct 22, 2019
Messages
3,641
Did the code get cropped?

It abruptly stops at the listing of your NVMe.
 
Joined
Oct 22, 2019
Messages
3,641
This appears to be a hardware issue then, and I would pay particular attention to the fact that this is an NVMe drive connected to a PCI-e slot. (Are you using an "adapter" to achieve this?)

This goes out of my wheelhouse, and someone more versed in hardware quirks for SCALE might be able to help understand why SCALE does not have access to the underlying partitions of your NVMe when it's connecting into a PCI-e slot.

EDIT: I'm assuming you mean "PCI-e slot" as in what most people consider it. I don't believe you mean the m.2 slot that is configured for PCI-e via its "keying slots".

@HoneyBadger

@morganL
 

airflow

Contributor
Joined
May 29, 2014
Messages
111
This appears to be a hardware issue then, and I would pay particular attention to the fact that this is an NVMe drive connected to a PCI-e slot. (Are you using an "adapter" to achieve this?)

This goes out of my wheelhouse, and someone more versed in hardware quirks for SCALE might be able to help understand why SCALE does not have access to the underlying partitions of your NVMe when it's connecting into a PCI-e slot.

EDIT: I'm assuming you mean "PCI-e slot" as in what most people consider it. I don't believe you mean the m.2 slot that is configured for PCI-e via its "keying slots".

@HoneyBadger

@morganL

Well thanks for trying!

Regarding the hardware: No, I don't use an adapter. But i think the product itself might internally use such an adapter. For me as the end-user the device is plugged directly in the PCI-e slot as in what most people would interpret it.

View attachment wd-black-an1500-nvme-ssd.png.wdthumb.1280.1280.webp
 
Joined
Oct 22, 2019
Messages
3,641
Oh, it's one of those.

I wonder if FreeBSD vs Linux treats them differently, and hence, when you partition it while under FreeBSD it is not properly reflected under Linux?
 

airflow

Contributor
Joined
May 29, 2014
Messages
111
Oh, it's one of those.

I wonder if FreeBSD vs Linux treats them differently, and hence, when you partition it while under FreeBSD it is not properly reflected under Linux?

And I wonder what would happen if I'd add the device to a new pool while booting TrueNAS SCALE. Perhaps it would use slightly different partitioning and would then work under both CORE and SCALE. In this case I would do the copy-back-and-forth job gladly. It's just quite time-expensive to try...
 
Joined
Oct 22, 2019
Messages
3,641
And I wonder what would happen if I'd add the device to a new pool while booting TrueNAS SCALE.
You'll essentially destroy everything that already exists on that pool (which is currently accessible on Core).


So yes, you'd have to make a good backup/copy of everything (or the entire pool's dataset contents replicated to a temporary safe place), and then you can create a new pool under SCALE. Beware that once you commit to this, there's no turning back.


Perhaps it would use slightly different partitioning and would then work under both CORE and SCALE.
I'm not sure about this. Someone else who has more savvy on such hardware and the differences between Linux vs FreeBSD might know.
 

airflow

Contributor
Joined
May 29, 2014
Messages
111
I can report that I could solve this problem with help of ixsystems.

A hint for the problem could be found in the logs of the CORE-system:

Code:
Mar 30 09:41:32 fractal nvd0: <WD_BLACK AN1500> NVMe namespace
Mar 30 09:41:32 fractal nvd0: 1907611MB (3906787328 512 byte sectors)
Mar 30 09:41:32 fractal GEOM: nvd0: the primary GPT table is corrupt or invalid.
Mar 30 09:41:32 fractal GEOM: nvd0: using the secondary instead -- recovery strongly advised.


So it seems there is something corrupt in the partition table of the drive, which FreeBSD could swallow, but Linux can't.

The solution was to recover the partition with the following command:
Code:
gpart recover nvd0


After this the pool could be imported in SCALE as well.
 
Top