Glenn Buckholz
Cadet
- Joined
- Apr 3, 2015
- Messages
- 8
Recently I had two degraded drives on different arrays. I ordered the replacements. I'm going to list the steps I took for completeness.
First off I'm running TrueNAS-12.0-U6.
The first array is simple. 5 4 TB drives in raidz2 with cache and log SSDs. One of the 4TB data drives went bad. I offlined the failing drive, pulled it, replaced it. After the resliver it was fine. The second array was more complicated. 4 mirrored udevs of various sizes strung together. It was basically a bunch of old drives I didn't want to go to waste. It looked like this:
The setup in mirror-2 is a little strange. I wanted all mirrored udevs but I didn't have a matching drive so I strung two smaller drives together to make spare-0. I figured spare-0 doesn't matter much because it is one set of storage in a mirror. I offline da13 and da21, they are a raid 1 udev making up the other half of a mirror. Even if I completely mess it up spare-0 the other part of the mirror is fine. I replace da13, then I need to reboot because it is encrypted. You can't online encrypted drives. When True came back on-line it renamed several of the drives. The array was now reading like this from the zpool command:
Through poking around I know that da13 is now da12. The drive names were now reordered somehow. This caused TrueNAS to think the entire mirror was now borked taking the entire array offline. Once I saw this, I shutdown, re-inserted the old degraded drive in the hopes that things would just go back to the way they were. This did not work. TrueNAS is keeping the new naming order. I know for a fact that the original da12 is still there but I don't know what the new name is. So now that the pool is in the UNAVIL state is there any way to fix this? The original da12 is there, somewhere, is there a way to find it and convince TrueNAS to use it in the correct spot? It seems to me that this should be possible. I'd rather recover the array than rebuild. While I belive I have all the critical data backed up, but I'd rather not have to rebuild. So the questions I have are as follows:
How do I query a drive to see what cluster it thinks it should belong to?
How do I convince an UNAVIAL zpool to change one of the disk members once I find it?
Any help is appreciated.
First off I'm running TrueNAS-12.0-U6.
The first array is simple. 5 4 TB drives in raidz2 with cache and log SSDs. One of the 4TB data drives went bad. I offlined the failing drive, pulled it, replaced it. After the resliver it was fine. The second array was more complicated. 4 mirrored udevs of various sizes strung together. It was basically a bunch of old drives I didn't want to go to waste. It looked like this:
Code:
MAGVM01 ONLINE mirror-0 ONLINE da18p2.eli ONLINE da18p2.eli ONLINE mirror-1 ONLINE da1p2.eli ONLINE da0p2.eli ONLINE mirror-2 DEGRADED spare-0 DEGRADED da13p2.eli DEGRADED da21p2.eli ONLINE da12p2.eli ONLINE mirror-3 ONLINE da2p2.eli ONLINE da3p2.eli ONLINE
The setup in mirror-2 is a little strange. I wanted all mirrored udevs but I didn't have a matching drive so I strung two smaller drives together to make spare-0. I figured spare-0 doesn't matter much because it is one set of storage in a mirror. I offline da13 and da21, they are a raid 1 udev making up the other half of a mirror. Even if I completely mess it up spare-0 the other part of the mirror is fine. I replace da13, then I need to reboot because it is encrypted. You can't online encrypted drives. When True came back on-line it renamed several of the drives. The array was now reading like this from the zpool command:
Code:
MAGVM01 UNAVAIL insufficient replicas mirror-0 ONLINE da18p2.eli ONLINE da18p2.eli ONLINE mirror-1 ONLINE da1p2.eli ONLINE da0p2.eli ONLINE mirror-2 UNAVAIL insufficient replicas spare-0 OFFLINE all children offline da13p2.eli OFFLINE da21p2.eli OFFLINE da12p2.eli UNAVAIL cannot open mirror-3 ONLINE da2p2.eli ONLINE da3p2.eli ONLINE
Through poking around I know that da13 is now da12. The drive names were now reordered somehow. This caused TrueNAS to think the entire mirror was now borked taking the entire array offline. Once I saw this, I shutdown, re-inserted the old degraded drive in the hopes that things would just go back to the way they were. This did not work. TrueNAS is keeping the new naming order. I know for a fact that the original da12 is still there but I don't know what the new name is. So now that the pool is in the UNAVIL state is there any way to fix this? The original da12 is there, somewhere, is there a way to find it and convince TrueNAS to use it in the correct spot? It seems to me that this should be possible. I'd rather recover the array than rebuild. While I belive I have all the critical data backed up, but I'd rather not have to rebuild. So the questions I have are as follows:
How do I query a drive to see what cluster it thinks it should belong to?
How do I convince an UNAVIAL zpool to change one of the disk members once I find it?
Any help is appreciated.