ZFS pool migration issue

parski

Cadet
Joined
Jun 24, 2019
Messages
2
I used to have four drives in a USB DAS configured as a JBOD enclosure. It was managed as a ZFS pool in FreeNAS and served me well until I felt the itch of approaching what my storage paranoid gut felt was "full" storage. My USB DAS didn't have the capacity to hold more drives so I had to invest in new hardware.

Said and done I got ahold of a NetApp DS4243, a popular DAS with room for 24 drives. I started it up and played around with it for some time to get used to the machine. After I felt like I knew what I was doing I installed memory equal to what I had in the USB DAS and started a migration. I've never migrated a ZFS pool before so I decided to proceed with caution. I created a snapshot of the pool I wanted to migrate and using zfs send and zfs receive I managed to migrate the whole thing to the new drives in the new DAS. Cool! It took about a week but I wanted to make sure I had a "backup" before moving the drives physically from the USB DAS to the DS4243. Paranoid.

With the "backup" verified I felt I had the courage to move the drives so I did. It worked great. I just swapped in my old drives and was up and running with my files just like that. I did a scrub on the pool just to make sure and had no errors. I watched a movie I had stored in the pool and it worked flawlessly. The drives even ran cooler than they used to.

This was yesterday. Today I had an electrician over to look at some exposed life threatening wiring I won't touch as I have no idea what I'm doing when it comes to 230V house electrical stuff. I shut down my server since we had to cut the electricity when he was dealing with the hazard.

Electrician did the thing and power was switched back on. I start up the server and am met with:

# zpool import
pool: pool
id: 5659758306567918509
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://illumos.org/msg/ZFS-8000-3C
config:

pool UNAVAIL insufficient replicas
raidz1-0 ONLINE
gptid/45bce56c-95e1-11e9-a660-0cc47ae6b3da ONLINE
gptid/4a84722b-95e1-11e9-a660-0cc47ae6b3da ONLINE
gptid/4f87712b-95e1-11e9-a660-0cc47ae6b3da ONLINE
gptid/53c936ca-95e1-11e9-a660-0cc47ae6b3da ONLINE
raidz1-1 UNAVAIL insufficient replicas
18040520093922349829 UNAVAIL cannot open
9910213577815475357 UNAVAIL cannot open
39194004711191694 UNAVAIL cannot open
6951099159920172535 UNAVAIL cannot open


Now I think this is a multipath issue. Listing the multipaths gives me:

# gmultipath list
Geom name: disk4
Type: AUTOMATIC
Mode: Active/Passive
UUID: 49802238-8eb9-11e9-8568-0cc47ae6b3da
State: OPTIMAL
Providers:
1. Name: multipath/disk4
Mediasize: 8001563221504 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r0w0e0
State: OPTIMAL
Consumers:
1. Name: da15
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: ACTIVE
2. Name: da7
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: PASSIVE

Geom name: disk3
Type: AUTOMATIC
Mode: Active/Passive
UUID: 497193c9-8eb9-11e9-8568-0cc47ae6b3da
State: OPTIMAL
Providers:
1. Name: multipath/disk3
Mediasize: 8001563221504 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r0w0e0
State: OPTIMAL
Consumers:
1. Name: da14
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: ACTIVE
2. Name: da6
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: PASSIVE

Geom name: disk2
Type: AUTOMATIC
Mode: Active/Passive
UUID: 496285c5-8eb9-11e9-8568-0cc47ae6b3da
State: OPTIMAL
Providers:
1. Name: multipath/disk2
Mediasize: 8001563221504 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r0w0e0
State: OPTIMAL
Consumers:
1. Name: da13
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: ACTIVE
2. Name: da5
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: PASSIVE

Geom name: disk1
Type: AUTOMATIC
Mode: Active/Passive
UUID: aab643d6-8eb3-11e9-9d87-0cc47ae6b3da
State: OPTIMAL
Providers:
1. Name: multipath/disk1
Mediasize: 8001563221504 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r0w0e0
State: OPTIMAL
Consumers:
1. Name: da12
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: ACTIVE
2. Name: da4
Mediasize: 8001563222016 (7.3T)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: PASSIVE


However I am terrified to destroying the wrong multipath and I don't really know what I'm doing right now. Replacing a drive using the `zpool replace` command only seems to work when the pool is online so I could use a hand right now.

# zpool replace pool 18040520093922349829 /dev/da12
cannot open 'pool': no such pool


I did the camcontrol thing and it looks like I have twice the amount of devices that I was expecting.

# camcontrol devlist
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 0 lun 0 (pass0,da0)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 1 lun 0 (pass1,da1)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 2 lun 0 (pass2,da2)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 3 lun 0 (pass3,da3)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 4 lun 0 (pass4,da4)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 5 lun 0 (pass5,da5)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 6 lun 0 (pass6,da6)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 7 lun 0 (pass7,da7)
<NETAPP DS424IOM3 0212> at scbus2 target 8 lun 0 (pass8,ses0)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 9 lun 0 (pass9,da8)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 10 lun 0 (pass10,da9)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 11 lun 0 (pass11,da10)
<ST8000VN 0022-2EL112 SM 4321> at scbus2 target 12 lun 0 (pass12,da11)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 13 lun 0 (pass13,da12)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 14 lun 0 (pass14,da13)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 15 lun 0 (pass15,da14)
<WDC WD80EZAZ-11TDBSM 4321> at scbus2 target 16 lun 0 (pass16,da15)
<NETAPP DS424IOM3 0212> at scbus2 target 17 lun 0 (pass17,ses1)
<MX MXUB3SESU-32G 1.00> at scbus20 target 0 lun 0 (pass18,da16)


I've tried reading data from da0 through da15 and they all play ball with output something like:

# smartctl -i /dev/da15
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: WDC
Product: WD80EZAZ-11TDBSM
Revision: 4321
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x500605ba004e2004
Serial number: 7HK8D28N
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Jun 24 17:36:08 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled


As well as:

# dd if=/dev/da0 of=/dev/null bs=1024k count=1
1+0 records in
1+0 records out
1048576 bytes transferred in 0.007682 secs (136489987 bytes/sec)


Swapping the drives to different slots in the DS4243 makes no difference.

I'm stuck. Please help.
 
Top