Truenas keeps importing zfs by /dev/sd<x>, mount point changes

LiX47

Dabbler
Joined
Nov 4, 2021
Messages
24
Truenas keeps importing some of my pool disks by, for example, /dev/sda, however this mount point occasionally changes for any given disk, causing truenas to lose the drive. I've attempted to change the affected pools by re-importing them via /dev/disk/by-id, and this seems to work, however upon rebooting, it's back to using /dev/sd<whatever>. This is especially problematic on my home pool because lost disks appear to cause other services to break, and it's a pain to go through the whole re-importing routine for the home pool
 
Joined
Oct 22, 2019
Messages
3,641
The GUI might be showing it differently.

Can you confirm with:
zpool status -vv
 

LiX47

Dabbler
Joined
Nov 4, 2021
Messages
24
That was indeed already based on what the shell was showing, not the gui
1677446343258.png
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
How was this pool created? You can do a replace operation - replacing a disk referred to via e.g. /dev/sdX with itself using the proper device path. Or in case of a mirror even simpler: do a detach followed by an attach.
 

LiX47

Dabbler
Joined
Nov 4, 2021
Messages
24
I've been doing the detach/attach cycle plenty, I keep attaching by ID and then after a reboot, they're back to being like this
 
Joined
Oct 22, 2019
Messages
3,641
I've been doing the detach/attach cycle plenty, I keep attaching by ID and then after a reboot, they're back to being like this
Like @Patrick M. Hausen asked, how/where was the pool created? In Core? In a non-TrueNAS Linux system?


I keep attaching by ID and then after a reboot, they're back to being like this
I wonder if a stale or corrupt zpool.cache file can cause this.


You could have it generate a new zpool.cache file by setting it to "none" for all pools, and then setting it once again to /data/zfs/zpool.cache for all pools.
Code:
zpool set cachefile=none pool1
zpool set cachefile=none pool2


Then do your detach-attach and/or export/re-import with the proper convention.

Followed by this, assuming the above works:
Code:
zpool set cachefile=/data/zfs/zpool.cache pool1
zpool set cachefile=/data/zfs/zpool.cache pool2


Disclaimer. I've tested out unsetting/resetting the cachefile property on my pools on my Core system, and it worked as expected. Not sure if there's any risk with SCALE, or if there is something under-the-hood that TrueNAS (SCALE or Core) does to forcefully "reuse" a cachefile upon import of any pool. (As it does keep a "zpool.cache.saved" file in the same location.)
 

LiX47

Dabbler
Joined
Nov 4, 2021
Messages
24
This pool was definitely originally created on CORE years ago before I migrated this over to SCALE, but it's a ship of theseus at this point, between swapping drives around, importing/exporting, etc. I can try mucking about with those suggestions regarding the cache file when I get a chance though
 
Joined
Oct 22, 2019
Messages
3,641
I can try mucking about with those suggestions regarding the cache file when I get a chance though
Before you unset/set anything, it's possible to peak inside the zpool.cache file.

Code:
strings /data/zfs/zpool.cache | grep gptid


(Thanks to this old post.)
 
Joined
Oct 22, 2019
Messages
3,641
That command produces no output for me :(
Interesting.......

What about the ".save" file?
Code:
strings /data/zfs/zpool.cache.saved | grep gptid


It's looking like you need to unset the cachefile (for all pools, which will remove /data/zfs/zpool.cache) while the pools are imported with the correct device identifiers, and then reset the cachefile again.


EDIT: Wait! I remembered you're on SCALE (Linux).

The wording might be different.
Try this:
Code:
strings /data/zfs/zpool.cache | grep /dev/


Or you can scroll through the file manually with arrow keys:
Code:
strings /data/zfs/zpool.cache | less

See if you find any references of the device names.

Of course "Q" to quit.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Oh there we go, that makes a little more sense.
And to confirm, those bottom two entries are for a data pool, not your boot-pool, correct?

If that's the case, and you're feeling comfortable going forward, you can try the "reset cachefile method" explained above.

EDIT: To be honest though, I'm not sure if TrueNAS uses the ".saved" file as a contingency if it detects the absence of the "zpool.cache" file in the expected location.
 

LiX47

Dabbler
Joined
Nov 4, 2021
Messages
24
That's my home pool (docker images, non-root users, etc) yes. I'll give that a test when I can and report back. Also thanks for the help so far!
 
Joined
Oct 22, 2019
Messages
3,641
Roger that. Make sure you only do it for "pool1", "pool2", etc. Not your boot-pool (since it doesn't use the cachefile).

In fact, "removing" the zpool.cache file might mean you can just export-import the troubled "home" pool using the GUI; assuming that TrueNAS SCALE will automatically import it via the partuuid identifiers.
 

LiX47

Dabbler
Joined
Nov 4, 2021
Messages
24
Didn't seem to work :/

Doing the process of clearing the cachefile value, export (via ui), import (via command line), produced correct imports, however the pool wasn't visible to the ui. Exporting the pool again (via command line, strings no longer in the cachefile btw), then importing via ui, and it once again is stuck to, in this case, sde/sdc. Seems truenas scale is either caching this elsewhere somehow, or just really wants things imported this way, not sure why...
 
Joined
Oct 22, 2019
Messages
3,641
Seems truenas scale is either caching this elsewhere somehow, or just really wants things imported this way, not sure why...
I know it keeps a copy named zpool.cache.saved
 
Joined
Oct 22, 2019
Messages
3,641
On Core, I removed (moved it) as a test. Nothing bad happened. Re-importing my pools (with the GUI) and then setting the cachefile property brought back the cache file like normal.
 

LiX47

Dabbler
Joined
Nov 4, 2021
Messages
24
Ok, I think I got it...
Was struggling to get it to do what I wanted, no amount of fiddling with the files or command line helped, but doing detach/attach via the ui (Storage -> pool -> Manage Devices, took insanely long to attach btw, and the ui calls it "extending" the vdev rather than attaching a device), seems to have got it...I guess in a bit when I can get both drives done and resilvered, I'll reboot again and check back

Really annoying that the ui's so insistent on fighting the shell though, no reason why I should "have" do to things one way vs the other
 
Joined
Oct 22, 2019
Messages
3,641
but doing detach/attach via the ui (Storage -> pool -> Manage Devices, took insanely long to attach btw, and the ui calls it "extending" the vdev rather than attaching a device),
No one should be expected to do that. This whole ordeal feels like a bug on SCALE's side.
 
Top