SOLVED How to get rid of phantom zpool?

Status
Not open for further replies.

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Hi,

I have a strange problem with one of our backup FreeNAS boxes. A Couple of months ago we had an unrecoverable error in our zpool so we had to build a new pool from scratch. The box ran fine since, but yesterday we had to reboot it (probably for the first time since the new pool had been created).

What happens now is that FreeNAS tries to import the old (broken) pool. After booting running zpool status I get:

freenas# zpool status
pool: store1
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-3C
scan: none requested
config:

NAME STATE READ WRITE CKSUM
store1 UNAVAIL 0 0 0
raidz1-0 UNAVAIL 0 0 0
9465009962783256376 UNAVAIL 0 0 0 was /dev/mfid6
3777795644875467493 UNAVAIL 0 0 0 was /dev/gptid/690fb07a-9610-11e2-bd8f-14dae9c4f110
2091297130569300125 UNAVAIL 0 0 0 was /dev/gptid/6982955e-9610-11e2-bd8f-14dae9c4f110
12910823325045097081 UNAVAIL 0 0 0 was /dev/gptid/69ecfb45-9610-11e2-bd8f-14dae9c4f110
664568350415246497 UNAVAIL 0 0 0 was /dev/gptid/6a225b3c-9610-11e2-bd8f-14dae9c4f110
10625246729079390574 UNAVAIL 0 0 0 was /dev/gptid/6a5cf01a-9610-11e2-bd8f-14dae9c4f110
1657281061953720003 UNAVAIL 0 0 0 was /dev/gptid/6a98ee7b-9610-11e2-bd8f-14dae9c4f110
70234070040747385 UNAVAIL 0 0 0 was /dev/gptid/3b599ab0-c45c-11e2-80e2-002590ae58de


So, when I export that broken pool and run a "zpool import" so see which pools are actually available I get only the new (clean) pool:

freenas# zpool import
pool: store1
id: 8712011636306578773
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

store1 ONLINE
raidz1-0 ONLINE
gptid/9ae1827c-eeb5-11e2-b81c-002590ae58de ONLINE
gptid/ebd60f04-ed54-11e2-b81c-002590ae58de ONLINE
gptid/a87f5b8b-efa4-11e2-b81c-002590ae58de ONLINE
gptid/b75bee78-f04d-11e2-b81c-002590ae58de ONLINE
gptid/f5b82166-f430-11e2-b81c-002590ae58de ONLINE
gptid/4d4b22fb-e94c-11e2-b81c-002590ae58de ONLINE
gptid/dafa4ec0-ea51-11e2-b81c-002590ae58de ONLINE
gptid/c2c814ac-ead5-11e2-b81c-002590ae58de ONLINE
That's the correct (and clean) pool. When I detach the broken pool and use the auto-import feature of the FreeNAS GUI it imports the clean pool just fine and all datasets become available.
But when rebooting the machine I have the same problem all over again.
I already tried the following without success: I imported the new (clean) pool under a different name, ran a "zpool destroy" of the old "phantom" pool and re-imported the clean pool under the correct name (store1). That seemed to work at first, but after the next reboot FreeNAS finds and imports the old "phantom" pool again instead of the correct pool.
Where does FreeNAS store it's metadata about the old "phantom" pool and how do I get rid of that? I assumed FreeNAS would use the pool ID for import but if that were the case it would find the correct pool. It somehow seems to get confused that the old and the new pool have the same name.
What's to do in this case? Any help appreciated!
Regards,
Joern
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Could you please post output of:
strings /data/zfs/zpool.cache | grep gptid
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Unfortunately there is no "strings" command in my FreeNAS installation?!?


-bash: strings: command not found
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Hmm, it is present in 9.1.1. Anyway, I'm fairly sure that the "phantom" zpool is in zpool.cache. To be 100% sure, you can run cat /data/zfs/zpool.cache and verify that it contains gptid's of the old drives. However, it is a binary file, so without strings you will also see lot of garbage :).
If you verify that the old pool is stuck in the cache, backup the file (just in case) and delete it. It should get recreated on the next reboot.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
I just realized that you can also run
zdb -C
to check the content of zpool.cache.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
I deleted zpool.cache and rebooted the machine. Afterwards FreeNAS detects the same old "phantom" pool and creates a new zpool.cache with the old disks in it (just checked with zdb -C).

So, deleting the cache did not do the trick, unfortunately. Any other ideas? Where else could the old pool metadata be stored?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The pool is probably in the FreeNAS config file. Pretty sure the spool.cache is never saved on shutdown.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
zpool.cache is stored in /data, a read/write filesystem that survives reboot (the config DB itself is stored in /data and survives reboots just fine)
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Did you also try to delete zpool.cache.saved?

Another possibility. Each physical vdev contains 4 copies of vdev label. The label describes the entire pool (including all other vdev's that make up the pool). The 4 labels should be identical (redundancy).
You can check the labels like this (for the first drive in your current pool):
zdb -l /dev/gptid/9ae1827c-eeb5-11e2-b81c-002590ae58de
Run it for every device/disk and check that all 4 labels are identical and describe the current pool.

Also, are those 8 drives in the current pool the only drives in the system? Is it possible some other drive is connected that contains an old vdev label?

Also please post output of:
sqlite3 /data/freenas-v1.db "select * from storage_volume"
But I don't think it will be in the DB, as another user reported that the GUI gets stuck in such case. Also, the table does not contain IDs of the disks the zpool consists of.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
zpool.cache is stored in /data, a read/write filesystem that survives reboot (the config DB itself is stored in /data and survives reboots just fine)

for some reason the /data/zfs directory is empty since I removed the zpool.cache file:

freenas# ls -la /data/zfs/
total 1
drwxr-xr-x 2 root wheel 512 Sep 2 20:17 .
drwxrwxr-x 7 root wheel 512 Sep 3 05:00 ..
Is that ok?
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Did you also try to delete zpool.cache.saved?

Where would I find this file?

Another possibility. Each physical vdev contains 4 copies of vdev label. The label describes the entire pool (including all other vdev's that make up the pool). The 4 labels should be identical (redundancy).
You can check the labels like this (for the first drive in your current pool):
zdb -l /dev/gptid/9ae1827c-eeb5-11e2-b81c-002590ae58de
Run it for every device/disk and check that all 4 labels are identical and describe the current pool.

Ok, I've done that now. I checked each of the 8 Devices and they all had 4 labels which belonged to the correct (new/clean) pool. So that should be in order, right?

Also, are those 8 drives in the current pool the only drives in the system? Is it possible some other drive is connected that contains an old vdev label?

Well, there's an external USB drive currently connected. But this one has never been part of any ZFS pool. It's formatted with UFS. How can I check if this drives contains any ZFS metadata?

Also please post output of:
sqlite3 /data/freenas-v1.db "select * from storage_volume"


freenas# sqlite3 /data/freenas-v1.db "select * from storage_volume"
8712011636306578773|store1||ZFS|0|1

That should be correct, too. 8712011636306578773 is the ID of the clean pool:

freenas# zpool import
pool: store1
id: 8712011636306578773
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
store1 ONLINE
raidz1-0 ONLINE
gptid/9ae1827c-eeb5-11e2-b81c-002590ae58de ONLINE
gptid/ebd60f04-ed54-11e2-b81c-002590ae58de ONLINE
gptid/a87f5b8b-efa4-11e2-b81c-002590ae58de ONLINE
gptid/b75bee78-f04d-11e2-b81c-002590ae58de ONLINE
gptid/f5b82166-f430-11e2-b81c-002590ae58de ONLINE
gptid/4d4b22fb-e94c-11e2-b81c-002590ae58de ONLINE
gptid/dafa4ec0-ea51-11e2-b81c-002590ae58de ONLINE
gptid/c2c814ac-ead5-11e2-b81c-002590ae58de ONLINE

Any other ideas?
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
The pool is probably in the FreeNAS config file. Pretty sure the spool.cache is never saved on shutdown.

Which config file would that be? And how can I fix that?

I detached the old "store1" via the FreeNAS GUI. I had to detach that before I could use the GUI's auto-import to actually import the new pool. But I did not check the box that says

"
Mark the disks as new (destroy data):"

I was afraid this could destroy my current drives. Is it safe (and necessary) to check this box as well to get rid of the old pool?
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Which config file would that be? And how can I fix that?
It's the freenas-v1.db, I was quite sure it won't be there, but we already checked to cover all possibilities (sqlite3...).

for some reason the /data/zfs directory is empty since I removed the zpool.cache file:
This is interesting, when I remove the file on my testing VM it always gets recreated during reboot. However! you mentioned that zdb -C still shows the data. When I run zdb -C without zpool.cache it fails with an error message ("cannot open ..."). This seems to indicate that your pool is using some other cache file. That could also be the reason why your /data/zfs/zpool.cache was not recreated -- your system is (for some strange reason) not caring about that one.

Try this:
ls -la /boot/zfs/zpool.cache
This should be a symlink to /data/zfs/zpool.cache
Result on my system:
lrwxr-xr-x 1 root wheel 21 Aug 27 10:50 /boot/zfs/zpool.cache@ -> /data/zfs/zpool.cache

Also check:
zpool get cachefile
This shows you which cache files are used by the pools
Result on my system:
NAME PROPERTY VALUE SOURCE
backup cachefile /data/zfs/zpool.cache local
tank cachefile /data/zfs/zpool.cache local

(I did a bit of searching and it appears there was a problem with the location of the cache file in early FreeNAS 8.0.x versions: http://forums.freenas.org/threads/8-0-1-beta-2-reset-cachefile-path-zdb-doesnt-work.300/)
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Try this:
ls -la /boot/zfs/zpool.cache
This should be a symlink to /data/zfs/zpool.cache
Result on my system:
lrwxr-xr-x 1 root wheel 21 Aug 27 10:50 /boot/zfs/zpool.cache@ -> /data/zfs/zpool.cache

Bingo! That solved it. There was an old zpool.cache file (timestamp from June 3rd) in /boot/zfs. Renaming this file and setting the symlink to /data/zfs/zpool.cache fixed the problem. After the reboot the new pool came online automatically.

Thank you so much!
 
Status
Not open for further replies.
Top