Is it possible to configure a Spare against more than one pool?

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello @vgusev2007

A spare device can only be attached to a single pool - this is an OpenZFS limitation TrueNAS design decision at present, so it will apply to both TrueNAS CORE and SCALE.

As shown below, it's possible to do this in OpenZFS, and through the CLI in TrueNAS, but there's a non-zero risk of data loss. I'd still recommend against it.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
A spare device can only be attached to a single pool - this is an OpenZFS limitation at present, so it will apply to both TrueNAS CORE and SCALE.
I don't think that's right.

Code:
root@TrueNAS13:~ # truncate -s 1g /root/sparse1
root@TrueNAS13:~ # truncate -s 1g /root/sparse2
root@TrueNAS13:~ # truncate -s 1g /root/sparse3
root@TrueNAS13:~ # truncate -s 1g /root/sparse4
root@TrueNAS13:~ # zpool create test1 /root/sparse1
root@TrueNAS13:~ # zpool create test2 /root/sparse2
root@TrueNAS13:~ # zpool create test3 /root/sparse3
root@TrueNAS13:~ # zpool add test1 spare /root/sparse4
root@TrueNAS13:~ # zpool add test2 spare /root/sparse4
root@TrueNAS13:~ # zpool add test3 spare /root/sparse4
root@TrueNAS13:~ # zpool status test1 test2 test3
  pool: test1
 state: ONLINE
config:

    NAME             STATE     READ WRITE CKSUM
    test1            ONLINE       0     0     0
      /root/sparse1  ONLINE       0     0     0
    spares
      /root/sparse4  AVAIL  

errors: No known data errors

  pool: test2
 state: ONLINE
config:

    NAME             STATE     READ WRITE CKSUM
    test2            ONLINE       0     0     0
      /root/sparse2  ONLINE       0     0     0
    spares
      /root/sparse4  AVAIL  

errors: No known data errors

  pool: test3
 state: ONLINE
config:

    NAME             STATE     READ WRITE CKSUM
    test3            ONLINE       0     0     0
      /root/sparse3  ONLINE       0     0     0
    spares
      /root/sparse4  AVAIL   
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
And because it's easy to do, the same for SCALE:

Code:
root@scale2[~]# truncate -s 1g /root/sparse1
root@scale2[~]# truncate -s 1g /root/sparse2
root@scale2[~]# truncate -s 1g /root/sparse3
root@scale2[~]# truncate -s 1g /root/sparse4
root@scale2[~]# zpool create test1 /root/sparse1
root@scale2[~]# zpool create test2 /root/sparse2
root@scale2[~]# zpool create test3 /root/sparse3
root@scale2[~]# zpool add test1 spare /root/sparse4
root@scale2[~]# zpool add test2 spare /root/sparse4
root@scale2[~]# zpool add test3 spare /root/sparse4
root@scale2[~]# zpool status test1 test2 test3     
  pool: test1
 state: ONLINE
config:

    NAME             STATE     READ WRITE CKSUM
    test1            ONLINE       0     0     0
      /root/sparse1  ONLINE       0     0     0
    spares
      /root/sparse4  AVAIL   

errors: No known data errors

  pool: test2
 state: ONLINE
config:

    NAME             STATE     READ WRITE CKSUM
    test2            ONLINE       0     0     0
      /root/sparse2  ONLINE       0     0     0
    spares
      /root/sparse4  AVAIL   

errors: No known data errors

  pool: test3
 state: ONLINE
config:

    NAME             STATE     READ WRITE CKSUM
    test3            ONLINE       0     0     0
      /root/sparse3  ONLINE       0     0     0
    spares
      /root/sparse4  AVAIL 
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I don't think that's right.

Since I've got my foot in my mouth, might as well chew on my toes for a while. Have you any Grey Poupon?

An active spare cannot be shared across pools, and if a shared spare is part of a pool that becomes exported/UNAVAIL, there's a risk of another pool attempting to mount that shared spare and overwrite the data.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
An active spare cannot be shared across pools, and if a shared spare is part of a pool that becomes exported/UNAVAIL, there's a risk of another pool attempting to mount that shared spare and overwrite the data.
As I understand it, the active spare is understood by ZFS across all pools and won't be available/invoked if active elsewhere...

I'll try to put together a quick test for that...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
As I understand it, the active spare is understood by ZFS across all pools and won't be available/invoked if active elsewhere...

I'll try to put together a quick test for that...

Already got you beat to it.

Code:
  pool: pool1
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 624K in 00:00:00 with 0 errors on Fri Jun 30 06:29:28 2023
config:

        NAME                                              STATE     READ WRITE CKSUM
        pool1                                             DEGRADED     0     0     0
          mirror-0                                        DEGRADED     0     0     0
            gptid/8630c624-1749-11ee-a5c4-000c292ff71e    ONLINE       0     0     0
            spare-1                                       DEGRADED     0     0     0
              gptid/8632a13d-1749-11ee-a5c4-000c292ff71e  REMOVED      0     0     0
              gptid/96ae9260-1749-11ee-a5c4-000c292ff71e  ONLINE       0     0     0
        spares
          gptid/96ae9260-1749-11ee-a5c4-000c292ff71e      INUSE     currently in use

errors: No known data errors

  pool: pool2
 state: ONLINE
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool2                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/8d8fef93-1749-11ee-a5c4-000c292ff71e  ONLINE       0     0     0
            gptid/8d8bd519-1749-11ee-a5c4-000c292ff71e  ONLINE       0     0     0
        spares
          gptid/96ae9260-1749-11ee-a5c4-000c292ff71e    INUSE     in use by pool 'pool1'

errors: No known data errors


It does indeed understand this but only when both pools are online.

Export pool1 and the spare is considered AVAIL to pool2:

Code:
  pool: pool2
 state: ONLINE
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool2                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/8d8fef93-1749-11ee-a5c4-000c292ff71e  ONLINE       0     0     0
            gptid/8d8bd519-1749-11ee-a5c4-000c292ff71e  ONLINE       0     0     0
        spares
          gptid/96ae9260-1749-11ee-a5c4-000c292ff71e    AVAIL


Which poses a problem if pool2 then invokes the spare:

Code:
  pool: pool2
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 624K in 00:00:00 with 0 errors on Fri Jun 30 06:45:45 2023
config:

        NAME                                              STATE     READ WRITE CKSUM
        pool2                                             DEGRADED     0     0     0
          mirror-0                                        DEGRADED     0     0     0
            spare-0                                       DEGRADED     0     0     0
              gptid/8d8fef93-1749-11ee-a5c4-000c292ff71e  REMOVED      0     0     0
              gptid/96ae9260-1749-11ee-a5c4-000c292ff71e  ONLINE       0     0     0
            gptid/8d8bd519-1749-11ee-a5c4-000c292ff71e    ONLINE       0     0     0
        spares
          gptid/96ae9260-1749-11ee-a5c4-000c292ff71e      INUSE     currently in use


Let's see what happens if we import pool1 again.

1688132863483.png


Ruh roh.

Code:
root@badger[~]# zpool import pool1
cannot import 'pool1': pool already exists
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I'll spare you the details of how to fake a failed disk and cause the spare to kick in, but here's the result:

Code:
root@scale1[~]# zpool status -v test1 test2     
  pool: test1
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: resilvered 10.3M in 00:00:00 with 0 errors on Fri Jun 30 15:43:57 2023
config:

    NAME                 STATE     READ WRITE CKSUM
    test1                DEGRADED     0     0     0
      mirror-0           DEGRADED     0     0     0
        spare-0          DEGRADED     0     0     0
          /root/sparse1  UNAVAIL      0     0     0  invalid label
          /root/sparse5  ONLINE       0     0     0
        /root/sparse2    ONLINE       0     0     0
    spares
      /root/sparse5      INUSE     currently in use

errors: No known data errors

  pool: test2
 state: ONLINE
config:

    NAME               STATE     READ WRITE CKSUM
    test2              ONLINE       0     0     0
      mirror-0         ONLINE       0     0     0
        /root/sparse3  ONLINE       0     0     0
        /root/sparse4  ONLINE       0     0     0
    spares
      /root/sparse5    INUSE     in use by pool 'test1'
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Export pool1 the spare is considered AVAIL to pool2:
OK, this would appear to carry some risk, but should be generally handled through procedure (don't export an unhealthy pool before resolving that situation).
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
OK, this would appear to carry some risk, but should be generally handled through procedure (don't export an unhealthy pool before resolving that situation).

If a feature request is put in, I can raise it with the Engineering team, but right now a shared spare opens up a non-trivial risk of data loss.

Code:
root@badger[~]# zpool import
   pool: pool2
     id: 2748236079509059999
  state: DEGRADED
status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
 config:

        pool2                                             DEGRADED
          mirror-0                                        DEGRADED
            spare-0                                       DEGRADED
              gptid/8d8fef93-1749-11ee-a5c4-000c292ff71e  UNAVAIL  cannot open
              gptid/96ae9260-1749-11ee-a5c4-000c292ff71e  ONLINE
            gptid/8d8bd519-1749-11ee-a5c4-000c292ff71e    ONLINE
        spares
          gptid/96ae9260-1749-11ee-a5c4-000c292ff71e

   pool: pool1
     id: 2748236079509059999
  state: DEGRADED
status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
 config:

        pool1                                             DEGRADED
          mirror-0                                        DEGRADED
            spare-0                                       DEGRADED
              gptid/8d8fef93-1749-11ee-a5c4-000c292ff71e  UNAVAIL  cannot open
              gptid/96ae9260-1749-11ee-a5c4-000c292ff71e  ONLINE
            gptid/8d8bd519-1749-11ee-a5c4-000c292ff71e    ONLINE
        spares
          gptid/96ae9260-1749-11ee-a5c4-000c292ff71e


1688133241407.png


Behold, Schrodinger's Spare - it's both a member of pool1 and pool2 simultaneously until it's observed (imported)
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
right now a shared spare opens up a non-trivial risk of data loss.
Agreed, should only be used by folks who know what they are doing and how to handle it (via CLI) when things go off the rails.

It would indeed leave the GUI in a state of confusion if not handled better there, so maybe somebody who cares enough about it can log that feature request and maybe with that, some guardrails could be included for GUI users of the feature (warning with override possibe... that a multi-pool spare being invoked prevents export of any of those pools).
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
maybe somebody who cares enough about it can log that feature request and maybe with that, some guardrails could be included for GUI users of the feature (warning with override possibe... that a multi-pool spare being invoked prevents export of any of those pools).
Alternatively, the option could be offered to remove that spare from the remaining pools when exporting the pool with it active. Anyway, all speculation until enough momentum arrives to get the development underway.

I'll spare you the details of how to fake a failed disk and cause the spare to kick in
Interestingly, a pool made of sparsefiles is happy to lose one of those files and continue writing with no notes in zpool status (still showing online even with the file gone).

Offlining the sparsefile in question, creating a new one, then putting that back online then creates the "failed status" in the above post. (in case anybody cares)
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Interestingly, a pool made of sparsefiles is happy to lose one of those files and continue writing with no notes in zpool status (still showing online even with the file gone).
Hint: unlink() before close() :wink:
 
Top