Weird pool topology after power failure.

plague_doctor · Feb 21, 2022

Hi guys. This is my first post to this forum, so please be gentle with me

.
I use TrueNAS Core 12.0-U8 in my home lab. Never has a single issue with my NAS. However we had couple blackouts recently and I wasn't at home to properly react when the UPS finally gave up.

Long story short - one of my disks has not come up properly and TrueNAS used the spare. I have received info, that my Pool state is Degraded and

Code:

The following devices are not healthy:
Disk ATA ST5000LM000-2AN1 WCJ3EPBQ is UNAVAIL
Disk ATA ST5000LM000-2AN1 WCJ3RL7F is FAULTED

After few days of Resilvering my pool came up as healthy, however the topology looks strange and I am not sure how to make sense out of it...

I've got 4 disks in the pool:

da0p2
da1p2
da2p2
da3p2

All happy and error free. All online.

But my pool status looks somehow weird...

Can you guys advice on why there are two SPARE disks inside RAIDZ1? Why it says da0p2 is both ONLINE and UNAVAILABLE?
Is this something I should be worry about?
What can/should I do in order to bring it back to (my regular topology) 3 disks in RAIDZ1 and one spare?

sretalla · Feb 21, 2022

plague_doctor said:
Can you guys advice on why there are two SPARE disks inside RAIDZ1? Why it says da0p2 is both ONLINE and UNAVAILABLE?
Is this something I should be worry about?
What can/should I do in order to bring it back to (my regular topology) 3 disks in RAIDZ1 and one spare?

looks like da0 went offline at some point and the spare kicked in to take over... but later, da0 came back and is now (although maybe only temporarily) fine.

You could either detach da0 or detach the spare to return it to being a spare. (I'm not sure that you can do that in the GUI, so it would be zpool detach cryptohell /dev/da0p2 or da3p2 if you want to send the spare back).

plague_doctor · Feb 21, 2022

So by running `zpool detach cryptohell /dev/da0p2` it should give back my `da0` as spare?

sretalla · Feb 21, 2022

Worst case it will be out of the pool and you can add it back as a spare again, but as I understand it, it should just go back to being a spare directly with that command.

See topic 4.4.6 here: https://illumos.org/books/zfs-admin/gavwn.html

Seems to confirm what I'm saying toward the end of that section.

plague_doctor · Feb 21, 2022

Hmmm... a little confusion:

Code:

% sudo zpool detach cryptohell /dev/da0p2

cannot detach /dev/da0p2: no such device in pool

Code:

zpool status cryptohell

  pool: cryptohell
 state: ONLINE
  scan: scrub in progress since Tue Feb 22 07:56:43 2022
    4.08T scanned at 449M/s, 2.35T issued at 259M/s, 10.4T total
    0B repaired, 22.65% done, 09:01:44 to go
config:

    NAME                                                  STATE     READ WRITE CKSUM
    cryptohell                                            ONLINE       0     0     0
      raidz1-0                                            ONLINE       0     0     0
        gptid/6b4914f3-e1b3-11ea-9b92-002590c43598.eli    ONLINE       0     0     0
        spare-1                                           ONLINE       0     0     1
          gptid/c57f6251-ef2c-11ea-88f7-002590c43598.eli  ONLINE       0     0     0
          gptid/b5da369b-7737-11ec-b453-002590c43598.eli  ONLINE       0     0     0
        gptid/750c48ba-e1b3-11ea-9b92-002590c43598.eli    ONLINE       0     0     0
    logs    
      gptid/c0717c3a-4ef0-11ec-8314-002590c43598.eli      ONLINE       0     0     0
    spares
      gptid/b5da369b-7737-11ec-b453-002590c43598.eli      INUSE     currently in use

errors: No known data errors

Is there a way to map GPTIDs to short dev names?

elvisimprsntr · Feb 21, 2022

@plague_doctor

For the future, make sure you enable UPS monitoring to allow your NAS to gracefully shutdown.

plague_doctor · Feb 21, 2022

@elvisimprsntr To be honest it is configured and (to my surprise) it worked every single time before. The last blackout was different somehow... They have been switching on and off electricity couple times, and I think this did something weird to NUT logic...

plague_doctor · Feb 21, 2022

@sretalla I found it! Yes, when I've detached the disk it went straight back to the spare, and the topology looks like it used to before.
Thanks a lot!

sretalla · Feb 21, 2022

glabel status

HoneyBadger · Feb 22, 2022

plague_doctor said:
ST5000LM000

Just a note, you're using SMR (shingled) drives, which are known to have varying degrees of bad behaviour under ZFS.

List of known SMR drives

Hard drives that write data in overlapping, "shingled" tracks, have greater areal density than ones that do not. For cost and capacity reasons, manufacturers are increasingly moving to SMR, Shingled Magnetic Recording. SMR is a form of PMR...

www.truenas.com

Desc	Model	OS	Size/Speed	Boot/Pool	Other
NAS-1	QNAP TS-453A	TrueNAS CORE	2x8GB	16GB SLC eUSB DOM	APACHE, LACP, RSYNC, SMB, TFTP, TM
- Hourly	Seagate ST2000VN00[04]		4x2TB	RZ2	SATA
- Daily	Google Drive		15GB		Offsite
NAS-2	QNAP TS-253A	TrueNAS CORE	2x8GB	16GB SLC eUSB DOM	LACP, RSYNC, SMB, TM, VM
- Daily	Seagate ST4000VN008		2x4TB	RZ1	SATA
- Weekly	Crucial X8		500GB	RZ0	USB
NAS-3	QNAP TS-453A	TrueNAS SCALE	2x8GB	16GB SLC eUSB DOM	LACP, RSYNC, SMB, TM
- Weekly	Seagate ST2000[DM,VN]00[46]		4x2TB	RZ2	SATA
- Monthly	Crucial X8		500GB	RZ0	USB
NAS-4	QNAP TS-253A	TrueNAS SCALE	2x8GB	16GB SLC eUSB DOM	LACP, RSYNC, SMB, TM, VM
- Testing	WD40EFRX		2x4TB	RZ1	SATA
VM-1	QNAP TS-253A		2x8GB	16GB SLC eUSB DOM	KVM, LXC
- PVE	Segate ST4000VN000	Proxmox VE	2x4TB	RZ1	SATA
FW	Protectli FW4C	pfSense CE	8GB	256GB TLC mSATA	DHCP/DNS, IDS/IPS, GPS/PPS, NTP, VPN
- WAN-1	Arris NVG599		375Mbps		ATT Fiber
~~- WAN-2~~	~~Netgear LB1120~~		~~150Mbps~~		~~SpeedTalk LTE~~
- GPS	Garmin 18X LVC				RS232, PPS
- LCD	Crystalfontz XES635	LCDproc			USB, NTP, UPS
- UPS	APC BX1500M		1500VA		USB, Master
SW	Linksys LGS326		52Gbps		LACP, VLAN
- AP-[123]	EnGenius EWS377APv3		3.6Gbps		WPA2/3
- SK-1	EnGenius SkyKeyIv1.1		1GB	4GB MLC eMMC	400GB mSD
NVR
- DB-1	Lorex B862AJD		8MP	32GB mSDXC	RTSP
- DB-[23]	Lorex B451AJD		4MP	32GB mSDXC	RTSP
- CAM-[12]	Lorex W461ASC		4MP	32GB mSDXC	RTSP

Important Announcement for the TrueNAS Community.

Weird pool topology after power failure.

plague_doctor

Cadet

sretalla

Powered by Neutrality

plague_doctor

Cadet

sretalla

Powered by Neutrality

plague_doctor

Cadet

elvisimprsntr

Guru

plague_doctor

Cadet

plague_doctor

Cadet

sretalla

Powered by Neutrality

HoneyBadger

actually does care

List of known SMR drives

Similar threads