Device / Partition reference in Pool status

Thibaut

Dabbler
Joined
Jun 21, 2014
Messages
33
Hello,

I recently noticed that, after replacing two disks in a pool, the newly inserted devices / partitions are not referenced as the other ones in the pool.

The pool is composed of a stripe of 4 RAIDZ1. Most RAIDZ1 devices are referred by their gptid, or should I say "used to be referred", because I recently noticed that, after replacing two HDD, the newly inserted drives are now referred differently, here is the output of the zpool status -v command:

Code:
# zpool status -v Pool-1
pool: Pool-1
state: ONLINE
scan: scrub repaired 0 in 0 days 13:51:01 with 0 errors on Sun Dec 13 13:51:07 2020
config:                                                                          
                                                                                 
        NAME                                            STATE     READ WRITE CKSUM
        Jobs                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/229a228f-9aa6-11e5-bb5e-002590e7df5a  ONLINE       0     0     0
            gptid/a85cae2c-6585-11e5-bbff-002590e7df5a  ONLINE       0     0     0
            gptid/0a7abf4f-581b-11e6-a62d-002590e7df5a  ONLINE       0     0     0
            gptid/764bb79a-3478-11e6-b459-002590e7df5a  ONLINE       0     0     0
            gptid/6361e5f9-ae21-11e4-9240-002590e7df5a  ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            gptid/e163561d-82a4-11e7-b04c-002590e7df5a  ONLINE       0     0     0
            gptid/aec6fa23-e9cf-11e8-8a9f-002590e7df5a  ONLINE       0     0     0
            da16                                        ONLINE       0     0     0
            gptid/98c748c3-6146-11e6-a8e5-002590e7df5a  ONLINE       0     0     0
            gptid/aa2cf2e7-cb6e-11e9-aa2d-002590c9a764  ONLINE       0     0     0
          raidz1-2                                      ONLINE       0     0     0
            da0p2                                       ONLINE       0     0     0
            gptid/6c0e1ea3-db79-11e6-a83f-002590e7df5a  ONLINE       0     0     0
            gptid/4b34719b-b135-11e9-aa2d-002590c9a764  ONLINE       0     0     0
            gptid/81313c49-3a50-11e8-9182-002590e7df5a  ONLINE       0     0     0
            gptid/b28a0cba-0d21-11e5-907b-002590e7df5a  ONLINE       0     0     0
          raidz1-3                                      ONLINE       0     0     0
            gptid/dbb2a4a8-8cff-11e7-9f74-002590e7df5a  ONLINE       0     0     0
            gptid/a57c71e5-5307-11e5-bf52-002590e7df5a  ONLINE       0     0     0
            gptid/c9d378ba-927f-11e7-a3c5-002590e7df5a  ONLINE       0     0     0
            gptid/5e9ced60-4ef5-11e7-a16c-002590e7df5a  ONLINE       0     0     0
            gptid/43ac215e-4c9b-11e8-a9f9-002590e7df5a  ONLINE       0     0     0
                                                                                 
errors: No known data errors


raidz1-1 has one device referred as da16, and raidz1-2 has one referred as da0p2.

When SMART alerts indicate a drive will soon fail (xx Offline uncorrectable sectors), I always use the FreeNAS interface to replace the failing hard disk, using Storage > Pools > Status, then choosing the Offline menu in front of the targeted drive. Then I physically replace the disk and use the Replace option to select the newly inserted device.

Up until now it always automatically partitioned the new device and referred the second partition's gptid as the RAIDZ1 member.
But now I realize that the last two hard disks I replaced where not referenced in the same way, opening the risk of messing things up in case the hard drives would be placed in a different bay...

Could anyone help me understand why those last two replacements didn't went as the previous ones?
Also what would be the best way to re-reference the two existing devices using their gptid instead of their current ones, of course possibly avoiding having to re-silver the whole pool in the process?

Any help would be appreciated, thank you.
 

Thibaut

Dabbler
Joined
Jun 21, 2014
Messages
33
It happened again as I had to replace another failed HD:

Code:
raidz1-2                                      ONLINE       0     0     0
  da0p2                                       ONLINE       0     0     0
  da17p2                                      ONLINE       0     0     0
  gptid/4b34719b-b135-11e9-aa2d-002590c9a764  ONLINE       0     0     0
  gptid/81313c49-3a50-11e8-9182-002590e7df5a  ONLINE       0     0     0
  gptid/b28a0cba-0d21-11e5-907b-002590e7df5a  ONLINE       0     0     0


Now I have da17p2 referenced where I'd like to have gptid/56a8f9a1-7911-11eb-9b98-002590c9a764.
The partition UUID can be found as the partition's rawuuid:
Code:
# gpart list | grep -A 15 'da17p2'
2. Name: da17p2
   Mediasize: 9998683779072 (9.1T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   efimedia: HD(2,GPT,56a8f9a1-7911-11eb-9b98-002590c9a764,0x400080,0x48bffff58)
   rawuuid: 56a8f9a1-7911-11eb-9b98-002590c9a764
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 9998683779072
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 19532873687
   start: 4194432


Now, the weird thing that I can observe is that for all partitions referenced by gptid, their rawuuid reference can be found under /dev/gptid, but the newly created partion on the replacement disk isn't present under /dev/gptid, for example:
Code:
# gpart list | grep -A 15 'da1p2'
2. Name: da1p2
...
...
   rawuuid: aa2cf2e7-cb6e-11e9-aa2d-002590c9a764

# ls -la /dev/gptid | grep 'aa2cf2e7'
crw-r-----   1 root  operator  0x1e7 Jan 27 03:31 aa2cf2e7-cb6e-11e9-aa2d-002590c9a764


After the physical replacement of the failed hard drive, the logical disk replacement in the pool was triggered from FreeNAS's Web UI, using FreeNAS > Storage > Pools > Pool_name > Status - /dev/gptid/old_partition_uuid REMOVED > Replace
This triggered the automatic formatting of the disk with two partitions, as can be confirmed by:
Code:
# gpart list | grep -A 2 'da17'
Geom name: da17
modified: false
state: OK
--
1. Name: da17p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
--
2. Name: da17p2
   Mediasize: 9998683779072 (9.1T)
   Sectorsize: 512
--
1. Name: da17
   Mediasize: 10000831348736 (9.1T)
   Sectorsize: 512


But the UUID of those new partitions are NOT referenced under /dev/gptid:
Code:
# gpart list | grep -A 15 'da17p2'
2. Name: da17p2
   ...
   ...
   rawuuid: 56a8f9a1-7911-11eb-9b98-002590c9a764
# ls -la /dev/gptid | grep '56a8f9a1'
(no result)


What I intended to do to have all devices referenced by UUID again, was trying to export the pool, then reimporting it using zpool import -d /dev/gptid Pool_name, but since the new partition's UUID is not present under /dev/gptid this will obviously not work as intended.
Note: by the way trying to export the main pool also fails:
Code:
# zpool export Pool_name
cannot unmount '/var/db/system/syslog-b9cd4bfb7f07412da087441df250c27c': Device busy


Once again, any help, hint or indication on what might be going on here would be welcome.

Thank you very much.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
This can happen when you perform operations on the pool with the CLI.

If you do the replacements in the GUI, it should remain consistent with the others.

You can replace those drives with themselves via the GUI if you want to see it matching up again.
 

Thibaut

Dabbler
Joined
Jun 21, 2014
Messages
33
Thanks @sretalla for taking the time to review my problem!

This can happen when you perform operations on the pool with the CLI.
I don't think I ever touched the pool from the CLI, but maybe...

If you do the replacements in the GUI, it should remain consistent with the others.
I always performed the disk replacements from the GUI. Mainly because, although I'm quite acquainted with the zfs commands, I must say that I'm always a bit worried when it comes to replacing disks. The zfs offline <poolname> <diskid>, zpool replace <poolname> <old_diskid <new_diskid> and such are still intimidating to me :rolleyes:
I don't even know whether I should first partition the disk?
Anyway, I always preferred to trust the FreeTrueNAS GUI to do all the heavy lifting for me... Until it started to do thing differently. The first "glitch" was after replacing the da16 drive, where the system apparently didn't even format the drive before attaching it to the pool, resulting in a drive without any partition:
Code:
# gpart list | grep -A 15 'da16'
(no result)

# gpart list | grep -A 15 'da17'
Geom name: da17
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 19532873687
first: 40
entries: 128
scheme: GPT
Providers:
1. Name: da17p1
   Mediasize: 2147483648 (2.0G)
   ...
   rawuuid: 565e3741-7911-11eb-9b98-002590c9a764
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   ...
2. Name: da17p2
   Mediasize: 9998683779072 (9.1T)
   ...
   efimedia: HD(2,GPT,56a8f9a1-7911-11eb-9b98-002590c9a764,0x400080,0x48bffff58)
   rawuuid: 56a8f9a1-7911-11eb-9b98-002590c9a764
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   ...
--
1. Name: da17
   Mediasize: 10000831348736 (9.1T)
   ...

Since then da0, da1, da2, da17, da23 and da24 where replaced. Out of those, only da0(p2) and da17(p2) where referred by name while all others are correctly referred by gptid/<uuid>.

You can replace those drives with themselves via the GUI if you want to see it matching up again.
OK, I'll certainly try that during the next maintenance cycle (during week-ends) and report back whether it worked!
My only concern is that, since the partition's UUID isn't present under /dev/gptid, I wonder how zfs could reference it either?

Thanks again for answering me.
 

Thibaut

Dabbler
Joined
Jun 21, 2014
Messages
33
The proposed solution to replace the hard disks by themselves didn't work.

Although I found this other thread which describes the exact same problem I'm facing. @Patrick M. Hausen is detailing various options that can be followed in order to fix the problem, I'll soon examine them in more details and will try and see whether it can fix my issue.
 
Top