Drives found but not pool?

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Hello folks,

In a series of migration & updating, my pool does not seem to import no longer.

The following happened unplanned across several months (hence the somewhat >WTF< ...)

Machine 1: X11-SSL, ESXi 6.5, passthroughed 9201-16i. Been running for a few years. Last known FreeNAs update 11.2-U7 (as far as I can remember I did the zfs upgrade a few patches ago). This config file was imported into machine 2 successfully.
Then moving drives over to Machine 2: SCE-847, X8 system, with a 9211-8i (which also has been running freenas successfully a while during 2017) bare metal FreeNAS.
I do not manage to install a 11.2-U7 version ( https://download.freenas.org/11.2/STABLE/U7/)
Instead I install 11.3-U3.2, and import my last working config from 11.2-U7.

The import seems successful in terms of general settings. However, the pool "wd60efrx" is listed in the UI, as "unknown" at the Storage/Pools menu.
In Storage / Disks, all drives are found as "unused".
I <might> have had a LOG device connected at some point in time, would that lacking cause the pool to ...do whatever it does?

Code:
root@blackhole:~ # zpool list
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
freenas-boot  29.5G  1.01G  28.5G        -         -     0%     3%  1.00x  ONLINE  -
root@blackhole:~ # zfs list
NAME                                                            USED  AVAIL  REFER  MOUNTPOINT
freenas-boot                                                   1.01G  27.6G    23K  none
freenas-boot/.system                                           6.09M  27.6G    25K  legacy
freenas-boot/.system/configs-0dc2ca1e7fa9464d8c4d7c4fd81f6855    23K  27.6G    23K  legacy
freenas-boot/.system/configs-646f8dae97d646cc8946ddeb0ca79d97    23K  27.6G    23K  legacy
freenas-boot/.system/configs-918f8f8785054bb0919cdacb5658a136    23K  27.6G    23K  legacy
freenas-boot/.system/cores                                      450K  27.6G   450K  legacy
freenas-boot/.system/rrd-0dc2ca1e7fa9464d8c4d7c4fd81f6855      1.84M  27.6G  1.84M  legacy
freenas-boot/.system/rrd-646f8dae97d646cc8946ddeb0ca79d97      1.24M  27.6G  1.24M  legacy
freenas-boot/.system/rrd-918f8f8785054bb0919cdacb5658a136      1.73M  27.6G  1.73M  legacy
freenas-boot/.system/samba4                                      53K  27.6G    53K  legacy
freenas-boot/.system/syslog-0dc2ca1e7fa9464d8c4d7c4fd81f6855   80.5K  27.6G  80.5K  legacy
freenas-boot/.system/syslog-646f8dae97d646cc8946ddeb0ca79d97    390K  27.6G   390K  legacy
freenas-boot/.system/syslog-918f8f8785054bb0919cdacb5658a136    218K  27.6G   218K  legacy
freenas-boot/.system/webui                                       23K  27.6G    23K  legacy
freenas-boot/ROOT                                              1.00G  27.6G    23K  none
freenas-boot/ROOT/Initial-Install                                 1K  27.6G  1019M  legacy
freenas-boot/ROOT/default                                      1.00G  27.6G  1019M  legacy
root@blackhole:~ # zpool status wd60efrx
cannot open 'wd60efrx': no such pool


Cheers
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Can you run zdb -U /data/zfs/zpool.cache -eCC wd60efrx from an SSH session and post the results in CODE tags?

Code:
root@blackhole:~ # zdb -U /data/zfs/zpool.cache -eCC wd60efrx

Configuration for import:
        vdev_children: 2
        version: 5000
        pool_guid: 11000648889599319938
        name: 'wd60efrx'
        state: 0
        hostid: 337067445
        hostname: ''
        vdev_tree:
            type: 'root'
            id: 0
            guid: 11000648889599319938
            children[0]:
                type: 'raidz'
                id: 0
                guid: 10276980902104932441
                nparity: 2
                metaslab_array: 35
                metaslab_shift: 38
                ashift: 12
                asize: 41993159835648
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 17558972081548652608
                    whole_disk: 1
                    DTL: 228
                    create_txg: 4
                    path: '/dev/gptid/14ef1fa6-e0a4-11e5-b134-0cc47ab3208c'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 7222625354766742649
                    whole_disk: 1
                    DTL: 220
                    create_txg: 4
                    path: '/dev/gptid/15c495ba-e0a4-11e5-b134-0cc47ab3208c'
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 5000400009514829905
                    whole_disk: 1
                    DTL: 203
                    create_txg: 4
                    path: '/dev/gptid/16990bee-e0a4-11e5-b134-0cc47ab3208c'
                children[3]:
                    type: 'disk'
                    id: 3
                    guid: 294471879250878871
                    whole_disk: 1
                    DTL: 195
                    create_txg: 4
                    path: '/dev/gptid/1769399b-e0a4-11e5-b134-0cc47ab3208c'
                children[4]:
                    type: 'disk'
                    id: 4
                    guid: 6406197435911464822
                    whole_disk: 1
                    DTL: 193
                    create_txg: 4
                    path: '/dev/gptid/18479def-e0a4-11e5-b134-0cc47ab3208c'
                children[5]:
                    type: 'disk'
                    id: 5
                    guid: 17540946459795037337
                    whole_disk: 1
                    DTL: 192
                    create_txg: 4
                    path: '/dev/gptid/1911207e-e0a4-11e5-b134-0cc47ab3208c'
                children[6]:
                    type: 'disk'
                    id: 6
                    guid: 11030673002167230266
                    whole_disk: 1
                    DTL: 63
                    create_txg: 4
                    path: '/dev/gptid/36d83d93-b0a9-11e9-9f80-000c298edbf1'
            children[1]:
                type: 'missing'
                id: 1
                guid: 0

MOS Configuration:
        version: 5000
        name: 'wd60efrx'
        state: 0
        txg: 12631952
        pool_guid: 11000648889599319938
        hostid: 337067445
        hostname: ''
        com.delphix:has_per_vdev_zaps
        vdev_children: 2
        vdev_tree:
            type: 'root'
            id: 0
            guid: 11000648889599319938
            children[0]:
                type: 'raidz'
                id: 0
                guid: 10276980902104932441
                nparity: 2
                metaslab_array: 35
                metaslab_shift: 38
                ashift: 12
                asize: 41993159835648
                is_log: 0
                create_txg: 4
                com.delphix:vdev_zap_top: 129
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 17558972081548652608
                    path: '/dev/gptid/14ef1fa6-e0a4-11e5-b134-0cc47ab3208c'
                    whole_disk: 1
                    DTL: 228
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 140
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 7222625354766742649
                    path: '/dev/gptid/15c495ba-e0a4-11e5-b134-0cc47ab3208c'
                    whole_disk: 1
                    DTL: 220
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 141
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 5000400009514829905
                    path: '/dev/gptid/16990bee-e0a4-11e5-b134-0cc47ab3208c'
                    whole_disk: 1
                    DTL: 203
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 142
                children[3]:
                    type: 'disk'
                    id: 3
                    guid: 294471879250878871
                    path: '/dev/gptid/1769399b-e0a4-11e5-b134-0cc47ab3208c'
                    whole_disk: 1
                    DTL: 195
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 144
                children[4]:
                    type: 'disk'
                    id: 4
                    guid: 6406197435911464822
                    path: '/dev/gptid/18479def-e0a4-11e5-b134-0cc47ab3208c'
                    whole_disk: 1
                    DTL: 193
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 145
                children[5]:
                    type: 'disk'
                    id: 5
                    guid: 17540946459795037337
                    path: '/dev/gptid/1911207e-e0a4-11e5-b134-0cc47ab3208c'
                    whole_disk: 1
                    DTL: 192
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 146
                children[6]:
                    type: 'disk'
                    id: 6
                    guid: 11030673002167230266
                    path: '/dev/gptid/36d83d93-b0a9-11e9-9f80-000c298edbf1'
                    whole_disk: 1
                    DTL: 63
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 58
            children[1]:
                type: 'disk'
                id: 1
                guid: 517946242947266850
                path: '/dev/gptid/ae50a861-e463-11e8-8177-000c298edbf1'
                whole_disk: 1
                metaslab_array: 171
                metaslab_shift: 28
                ashift: 12
                asize: 40006320128
                is_log: 1
                DTL: 286
                create_txg: 11328713
                com.delphix:vdev_zap_leaf: 165
                com.delphix:vdev_zap_top: 166
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data
space map refcount mismatch: expected 214 != actual 201
root@blackhole:~ #


The ae50a861-e463-11e8-8177-000c298edbf1 is the log drive device I mentioned.
I'm gonna attempt to locate the drive and install it. IT might not be lost. *fingers crossed*
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The ae50a861-e463-11e8-8177-000c298edbf1 is the log drive device I mentioned.
I'm gonna attempt to locate the drive and install it. IT might not be lost. *fingers crossed*
If your log device is permanently lost (either "physically gone" or "present but reformatted/overwritten") then you can import the pool manually via zpool import -m poolname to drop the outstanding transactions. You might lose some data if you didn't have a clean export and shutdown.

Once that's done, you can detach the log device from the imported pool, export it, and import again through the GUI to ensure all the GUIDs are lined up and the FreeNAS cachefile agrees with the on-disk config.

Sidebar:

asize: 40006320128

What drive is this, and what was the intended purpose of adding a log vdev here?
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
I found the LOG(!) and installed it.
The pool wd60efrx is still marked as "Unknown" in GUI/Storage/Pools
Here is the output again for the same command.

Code:
root@blackhole:~ # zdb -U /data/zfs/zpool.cache -eCC wd60efrx

Configuration for import:
vdev_children: 2
version: 5000
pool_guid: 11000648889599319938
name: 'wd60efrx'
state: 0
hostid: 337067445
hostname: ''
vdev_tree:
type: 'root'
id: 0
guid: 11000648889599319938
children[0]:
type: 'raidz'
id: 0
guid: 10276980902104932441
nparity: 2
metaslab_array: 35
metaslab_shift: 38
ashift: 12
asize: 41993159835648
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 17558972081548652608
whole_disk: 1
DTL: 228
create_txg: 4
path: '/dev/gptid/14ef1fa6-e0a4-11e5-b134-0cc47ab3208c'
children[1]:
type: 'disk'
id: 1
guid: 7222625354766742649
whole_disk: 1
DTL: 220
create_txg: 4
path: '/dev/gptid/15c495ba-e0a4-11e5-b134-0cc47ab3208c'
children[2]:
type: 'disk'
id: 2
guid: 5000400009514829905
whole_disk: 1
DTL: 203
create_txg: 4
path: '/dev/gptid/16990bee-e0a4-11e5-b134-0cc47ab3208c'
children[3]:
type: 'disk'
id: 3
guid: 294471879250878871
whole_disk: 1
DTL: 195
create_txg: 4
path: '/dev/gptid/1769399b-e0a4-11e5-b134-0cc47ab3208c'
children[4]:
type: 'disk'
id: 4
guid: 6406197435911464822
whole_disk: 1
DTL: 193
create_txg: 4
path: '/dev/gptid/18479def-e0a4-11e5-b134-0cc47ab3208c'
children[5]:
type: 'disk'
id: 5
guid: 17540946459795037337
whole_disk: 1
DTL: 192
create_txg: 4
path: '/dev/gptid/1911207e-e0a4-11e5-b134-0cc47ab3208c'
children[6]:
type: 'disk'
id: 6
guid: 11030673002167230266
whole_disk: 1
DTL: 63
create_txg: 4
path: '/dev/gptid/36d83d93-b0a9-11e9-9f80-000c298edbf1'
children[1]:
type: 'missing'
id: 1
guid: 0

MOS Configuration:
version: 5000
name: 'wd60efrx'
state: 0
txg: 12631952
pool_guid: 11000648889599319938
hostid: 337067445
hostname: ''
com.delphix:has_per_vdev_zaps
vdev_children: 2
vdev_tree:
type: 'root'
id: 0
guid: 11000648889599319938
children[0]:
type: 'raidz'
id: 0
guid: 10276980902104932441
nparity: 2
metaslab_array: 35
metaslab_shift: 38
ashift: 12
asize: 41993159835648
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 129
children[0]:
type: 'disk'
id: 0
guid: 17558972081548652608
path: '/dev/gptid/14ef1fa6-e0a4-11e5-b134-0cc47ab3208c'
whole_disk: 1
DTL: 228
create_txg: 4
com.delphix:vdev_zap_leaf: 140
children[1]:
type: 'disk'
id: 1
guid: 7222625354766742649
path: '/dev/gptid/15c495ba-e0a4-11e5-b134-0cc47ab3208c'
whole_disk: 1
DTL: 220
create_txg: 4
com.delphix:vdev_zap_leaf: 141
children[2]:
type: 'disk'
id: 2
guid: 5000400009514829905
path: '/dev/gptid/16990bee-e0a4-11e5-b134-0cc47ab3208c'
whole_disk: 1
DTL: 203
create_txg: 4
com.delphix:vdev_zap_leaf: 142
children[3]:
type: 'disk'
id: 3
guid: 294471879250878871
path: '/dev/gptid/1769399b-e0a4-11e5-b134-0cc47ab3208c'
whole_disk: 1
DTL: 195
create_txg: 4
com.delphix:vdev_zap_leaf: 144
children[4]:
type: 'disk'
id: 4
guid: 6406197435911464822
path: '/dev/gptid/18479def-e0a4-11e5-b134-0cc47ab3208c'
whole_disk: 1
DTL: 193
create_txg: 4
com.delphix:vdev_zap_leaf: 145
children[5]:
type: 'disk'
id: 5
guid: 17540946459795037337
path: '/dev/gptid/1911207e-e0a4-11e5-b134-0cc47ab3208c'
whole_disk: 1
DTL: 192
create_txg: 4
com.delphix:vdev_zap_leaf: 146
children[6]:
type: 'disk'
id: 6
guid: 11030673002167230266
path: '/dev/gptid/36d83d93-b0a9-11e9-9f80-000c298edbf1'
whole_disk: 1
DTL: 63
create_txg: 4
com.delphix:vdev_zap_leaf: 58
children[1]:
type: 'disk'
id: 1
guid: 517946242947266850
path: '/dev/gptid/ae50a861-e463-11e8-8177-000c298edbf1'
whole_disk: 1
metaslab_array: 171
metaslab_shift: 28
ashift: 12
asize: 40006320128
is_log: 1
DTL: 286
create_txg: 11328713
com.delphix:vdev_zap_leaf: 165
com.delphix:vdev_zap_top: 166
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
space map refcount mismatch: expected 214 != actual 201



Sidebar:
What drive is this, and what was the intended purpose of adding a log vdev here?

The drive is a Samsung SM863, (it has End-to-End Protection and Power-Loss Protection)
I put it as a LOG during a time when I hosted VMs on the spinning rust pool.
The reason it looks awkwardly small is due to I shrunk the drive to a less overkill size to hopefully enable a longer life span of the drive (IIRC Stux posted a guide a few years ago, that I followed).

Is the best move to do
Code:
zpool import -m wd60efrx
, then scrub if successful and call the situation 'as good as it is gonna get'?

Cheers
 
Last edited:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Adding information:

Code:
root@blackhole:~ # zpool import
   pool: wd60efrx
id: 11000648889599319938
state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
see: http://illumos.org/msg/ZFS-8000-EY
config:

wd60efrx UNAVAIL missing device
raidz2-0 ONLINE
gptid/14ef1fa6-e0a4-11e5-b134-0cc47ab3208c ONLINE
gptid/15c495ba-e0a4-11e5-b134-0cc47ab3208c ONLINE
gptid/16990bee-e0a4-11e5-b134-0cc47ab3208c ONLINE
gptid/1769399b-e0a4-11e5-b134-0cc47ab3208c ONLINE
gptid/18479def-e0a4-11e5-b134-0cc47ab3208c ONLINE
gptid/1911207e-e0a4-11e5-b134-0cc47ab3208c ONLINE
gptid/36d83d93-b0a9-11e9-9f80-000c298edbf1 ONLINE
logs
517946242947266850 UNAVAIL cannot open
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Huge thanks to @HoneyBadger
I attempted to import the pool using -m.
Did not work, I was encouraged to add -f.

I prevailed and successfully imported the pool using -mf.
Then proceeded to remove the SLOG without problem.
Then exported the pool via GUI, proceeded by importing the pool via the GUI.
Currently running a scrub. Last one was close to the day, 6 months ago. Ie, the last time the drives were operational, before machine#1 was repurposed. I definitely going to remember to export my pool the next time around....

Cheers!

Hopefully this turns out okay too:
Code:
  scan: scrub in progress since Mon Jun 22 21:54:56 2020
        8.93T scanned at 834M/s, 146G issued at 144M/s, 29.7T total
0 repaired, 0.48% done, 2 days 11:46:10 to go
 
Last edited:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Unfortunately the story has not ended yet.
The machine#2 (X8 system) has loads of issues from its first night running.

Shortly after midnight the machine seemed to go into "sleep mode" for approx 8hrs and I checked the screen showing midst of a reboot.
A whiles observing and tinkering screamed RAM issues.

The last 7hrs I've spent to create a bootable USB that boots into legacy BIOS mode on the old machine.
It ...should not be this difficult.

On the stick, I've set boot-flag, formatted to FAT32.
Tried the regular
Code:
dd if=path of=path

I tried using UnetBootin.
I've tried numerous variants of memtest86+ (which allegedly works on non-UEFI boards) https://www.memtest.org/
I've tried setting up FreeDOS.

I simply cannot find a ISO file that will straight up be dd'd to a USB drive and become bootable.
It should not be this difficult.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Did you try the BIOS-bootable V4 image? That one should just be able to be DD'd straight to it.

https://www.memtest86.com/downloads/memtest86-4.3.7-usb.tar.gz
Thanks a bunch!
Among the approx 30 other ISOs and images files I found in my downloads folder - had one with the same name, but different file size. Anyhow, yours worked perfectly. Machine even booted directly into Memtest !

My own solution prior to finding the V4 image, I've used the ubuntu server ISO. It comes with memtest, and boots, and works. The drawback is the very narrow window of time during each boot sequence (probably 4-5mins?) where the window of opportunity to enter the memtest flickers by in less than 2 seconds.

I've clearly a memory issue going on. Sometimes it hangs, giving me no indication where the problem is. Othertimes, during the reboot phase, a error message (looks to be sent by the IPMI card) indicates a specific RAMslot to be responsible for the problem.
I've commenced reducing RAM to find <ANY> amount of working system. There are 12 DIMMs that used to work. Found another 6 in a box.

I'm very grateful I now have an memtest ISO that does not require more attention between each testing round than "power on"!!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Othertimes, during the reboot phase, a error message (looks to be sent by the IPMI card) indicates a specific RAMslot to be responsible for the problem.
Fire up an OS that has ipmitool support (Ubuntu should do) and get to a prompt where you can do ipmitool sel list and that should hopefully let you pick out the exact slot that's raising alerts. Target that one (swap the DIMM to the adjacent slot) and reboot - see if the fault follows the DIMM (replace it) or the slot (it's the board, so either replace it or skip that slot if possible; at least X8's are fairly cheap if you decide to replace the board)

Hopefully it's an X8 board that takes RDIMMs, because those are still nice and cheap.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
@HoneyBadger
So here's the follow up.. As it turns out, troubleshooting RAM is not very thrilling in general and in particular, when dealing with 18 DIMMs.. Even more so when the system quickly becomes less stable to the point where it cannot even boot FreeNAS, or complete ubuntu install.
I started to shift around memory configurations and observed differences. Partly in hopes this was a quick fix, just reseating a memory or two, and it'd be good again. Like before. Hopes and dreams carried me away at this point... Turns out the more I poked at the system, the weirder and less stable it behaved. As if the problem part of memory got <enticed> to act up more often.

Here is what the symptoms boiled down to:
if X RAM slots causes the machine to not register X amount of memory. Ie, I could put in RAM, that would not become registered.
if Z RAM slots where populated the machine would crash in memory test, a while into the test.
if Y RAM slots where populated, the machine would crash immediately upon entering memory testing.

Imagine the amount of rearranging of RAM, reseating and reconfiguring that went in to figure out a) the symptoms, and b) which RAM OR slot caused what problem......... I guarantee I've augmented the total amount of reboots this machine has seen the past days exceeds the total amount since its deployment in 2011 in some datacenter, by at least a factor of 10. I wish that was a bold estimate. It likely is the opposite.

Now I've found what slot causes Y, and what other slot caused X and Z.
No sticks in those slots, and the machine has been doing memory test loops for 24hrs without errors. Thats a step in the right direction.

So it is rather discomforting to put a server with 2 broke motherboard RAM slots into 'service' as it is.... Luckily the demands on this server is exceptionally low. It basically acts as a "glacier" type of storage.
Next plan is to install ubuntu server and run some rudimentary general stress tests on the system to observe if it ....misbehaves when forced with general workloads. If it doesnt, then I'd recon it is good enough for FreeNAS again :)


Fingers crossed...
 
Top