Remove busted metadata SSD

itractus

Dabbler
Joined
Mar 1, 2021
Messages
39
Code:
root@srv-nas[/home/admin]# zpool status
  pool: Fat-man
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: resilvered 3.04M in 00:00:02 with 0 errors on Mon Nov  6 09:07:48 2023
config:

        NAME                                      STATE     READ WRITE CKSUM
        Fat-man                                   DEGRADED     0     0     0
          raidz1-0                                DEGRADED     1     0     0
            b34e4f1f-49cb-4a1f-aaab-1169d37f5512  ONLINE       0     0     0
            a404d89a-b7fc-449f-bc4b-0c725739f26a  ONLINE       0     0     0
            97f49a5f-33dc-46ff-b742-9ffe4fab7321  FAULTED      0     0     0  too many errors
            9ef0ddcc-71fe-4a22-b4ac-c9f83476731b  ONLINE       1     0     0
          raidz1-1                                ONLINE       0     0     0
            a75f3faa-362e-4205-ada6-22a3c77e4696  ONLINE       0     0     0
            7f8804fc-524e-4173-9510-6bbcca7b8334  ONLINE       0     0     0
            288fb572-ba8c-471b-acf1-ecd0ef870494  ONLINE       0     0     0
            69472a99-e95a-476f-91ae-52c7f4829abf  ONLINE       0     0     0
            2cb413d7-34a0-4f62-9403-59b5e7ed04c7  ONLINE       0     0     0
          raidz1-2                                ONLINE       0     0     0
            939e316f-b8ed-44b6-8c67-f33cfc4feb20  ONLINE       0     0     0
            926c12de-9bd2-4677-b9f4-4d76604140ed  ONLINE       0     0     0
            be4dda05-fa0b-4646-ac95-cf2ae2de029f  ONLINE       0     0     0
            1dc5f091-f2b6-4860-990b-d4231f537f63  ONLINE       0     0     0
            4b0ff571-1554-4aa2-bc81-c30ce1b6ee2c  ONLINE       0     0     0
          raidz1-3                                DEGRADED     0     0     0
            353e679c-4b0e-4ef2-9af3-c10d8d40a2da  ONLINE       0     0     0
            3519f115-723d-4eb5-9708-4d0796fd20e6  ONLINE       0     0     0
            5226930f-c9be-47d7-84aa-f1d1d12df1f3  ONLINE       0     0     0
            0032bcc5-306a-43a7-9bfe-ae80d5abd777  ONLINE       0     0     0
            606928f2-b587-49b9-abb6-90a84f256f88  DEGRADED     0     0     0  too many errors
          raidz1-6                                ONLINE       0     0     0
            f037b274-44a9-4a6d-a1c9-79df4c1cc0b9  ONLINE       0     0     0
            f00554b5-e758-4221-9e16-c52956fa5242  ONLINE       0     0     0
            aab5c8f2-da17-4116-9498-6473ca4346ca  ONLINE       0     0     0
        special
          e4e86085-34cd-4815-9a7d-7d226d257b14    DEGRADED     1    31     0  too many errors
          42a7a435-c13f-470b-bdcd-b6f34da63515    ONLINE       0     0     0
        logs
          2dac6e59-3d1c-4609-bdf9-8b1111c3abf0    ONLINE       0     0     0


my Metadata ssd is getting some issues, as you can see. I've been unable to remove it. I have added a new one to replace the old one.

How can I remove the ssd from the pool?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
If you have replaced the old one with zpool replace it should be gone. You do have two drives for that special vdev, do you? If you don't you are living quite dangerously. Complete failure of that single drive and all your data is lost.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
zpool detach Fat-man e4e86085-34cd-4815-9a7d-7d226d257b14

But - repeating myself - you run the risk of complete data loss should that single metadata drive ever fail. The special vdev is not a cache. It stores all the metadata, i.e. internal ZFS data structures (block pointers etc.), directories, ... if you lose that your data (the file contents) is still there, somewhere, on the storage drives, but you will not be able to find it.
 

itractus

Dabbler
Joined
Mar 1, 2021
Messages
39
I did try that before, but Truenas doesn't seem to like that...
Code:
root@srv-nas[/home/admin]# zpool detach Fat-man e4e86085-34cd-4815-9a7d-7d226d257b14
cannot detach e4e86085-34cd-4815-9a7d-7d226d257b14: only applicable to mirror and replacing vdevs
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Offline it first?
 

itractus

Dabbler
Joined
Mar 1, 2021
Messages
39
Code:
root@srv-nas[/home/admin]# zpool offline Fat-man e4e86085-34cd-4815-9a7d-7d226d257b14
cannot offline e4e86085-34cd-4815-9a7d-7d226d257b14: no valid replicas
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I got it - that metadata vdev is not a mirror but a stripe of two drives. You need to replace it your you will lose all your data once it fails completely. Then I would think about a strategy to recreate the pool ...

I overlooked that your config is:
Code:
special
  e4e86085-34cd-4815-9a7d-7d226d257b14    DEGRADED     1    31     0  too many errors
  42a7a435-c13f-470b-bdcd-b6f34da63515    ONLINE       0     0     0


While it better would be:
Code:
special
  mirror-0
    e4e86085-34cd-4815-9a7d-7d226d257b14    DEGRADED     1    31     0  too many errors
    42a7a435-c13f-470b-bdcd-b6f34da63515    ONLINE       0     0     0
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Replace the defective one with a new one. That's all you can do to my knowledge.
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
Doesn't a metadate vdev require the same fault tolerance as the other vdevs, eg. mirror for raidz1 or 3 way mirror for raidz2?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
It does not require it. The UI tries to enforce some safety policies but on the command line you can create whatever pool topology you feel like. Same for data. Add a single disk vdev striped to an existing RAIDZ3 one? Of course, Unix does not prevent you from shooting yourself in the foot.
 

itractus

Dabbler
Joined
Mar 1, 2021
Messages
39
Well, I'd love to heal my shot foot in that case... I'll start by replacing the faulty drive. After I have done that. Can i kick the 2 devices out of the pool and place just 1 in there?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
As far as I know you would need to add a mirror component to each one of them to get to any sane configuration. A single drive is still ridiculously dangerous.
 

itractus

Dabbler
Joined
Mar 1, 2021
Messages
39
hmm, good tip. I'll order some drives to replace this mess of a config in that case.
 

itractus

Dabbler
Joined
Mar 1, 2021
Messages
39
I have some spares it seems, If I replace the broken metadata device, how can I remove it all together from the pool. So i have no more metadata devices and start from a clean 'metadata' slate?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
By recreating your pool from scratch.
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
Unlike a slog or l2arc, the Special metadata vdev is Pool critical and can't be removed. As Patrick mentioned you'd need to recreate your Pool and restore your Data from Backup.
 

itractus

Dabbler
Joined
Mar 1, 2021
Messages
39
Sadly I don't have the storage to backup 39TB worth of data.
Is there a way I can move to a new pool without losing data gradually? I'm thinking moving one rdaidz1 collection at a time or something along those lines...
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Because there are raidz1 vdevs you cannot remove any vdev from the pool.
Your only way out to sanity and safety is to make a whole new pool and replicate everything everything to the new pool in one go. Preferably, the new pool should be based on raidz2 (for safety) or be all mirrors (flexible option); if it has a special vdev, this should be a 2-way or much preferably 3-way mirror.

Short of that, as a temporary stopgap, you should immediately attach new drives to each of the two special drives so that your metadata is on a stripe of mirrors rather than a stripe of single drives. Use HDDs if you have to! If any of these special drives fail you lose the whole 39 TB of Fat-man.
 
Top