Pool Degraded - One or more devices could not be opened.

Astraea

Dabbler
Joined
Sep 7, 2019
Messages
28
I randomly logged into my new TrueNAS build as I have yet to finish setting up my email server and getting notifications set up in TrueNAS and saw this error:

Pool DataPool01 state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. The following devices are not healthy Disk 5543375440076347745 is UNAVAIL.

The system is an older Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with the maximum ram allowed for that board at 16GB with 24 new 4TB Seagate Ironwolf NAS drives attached to it in a Norco 4224 case. A new EVGA Supernova 850-watt power supply was installed for the system, and it is booting off of mirrored boot drives.

I have the pool set up in 4 VDEVs of 6 disks with RAID-Z2 for each VDEV. The Norco backplanes are connected via SAS cables to a SAS expander an HP 468405-002 which is then connected via dual SAS cables to an HBA card which is an IBM M1015.

When I first built the machine and was making sure everything was working and compatible, I did have some issues with the original SAS cables I got not working and some drives not showing but those have been replaced. Since then I only had one drive (different serial number) not show up once before and a reboot corrected the error until today.

I did use my drive layout chart and the information under the Pool tile on the main TrueNAS screen to identify the drive's serial number and location. I removed it from the bay with the system off and reinserted it to see if it was somehow not seated properly but that made no change. Then I turned the system off again and swapped 2 drives in the same VDEV to see if it could be an issue with the backplane (though other drives on that backplane have no issues) and it did not make a difference.

I also did perform a smart test on the disk. Using my desktop and an HDD dock and it reported no errors. In TrueNAS, I cannot see the drive at all though I am seeing the power and activity lights when the system boots and then just a power light after that. I then put things back to their original.

I am not sure what my next steps are and how to go about resolving this issue. I don't have a spare drive or I would have tried offlining this drive and replacing it. The drives are less than a month old and were bought locally through a reputable reseller all at the same time.

Hopefully, the TrueNAS community can help guide me to the correct way to fix and resolve this issue and if I have left out any details or information that is missing let me know and I will get the information that is missing and post a reply with the details.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I am not sure what my next steps are and how to go about resolving this issue. I don't have a spare drive or I would have tried offlining this drive and replacing it.
Let's start by identifying the disk...

zpool status -v

glabel status

dmesg | grep Serial (you don't need to share that with us, but it should give you the link from disk identifier to Serial Number)
 

Astraea

Dabbler
Joined
Sep 7, 2019
Messages
28
First Command Shows:
pool: DataPool01
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
scan: resilvered 6.33M in 00:00:01 with 0 errors on Wed Nov 24 11:26:03 2021
config:

NAME STATE READ WRITE CKSUM
DataPool01 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/4bcc42f9-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
da18p2 ONLINE 0 0 0
5543375440076347745 UNAVAIL 0 0 0 was /dev/gptid/530329e0-4570-11ec-877f-6805ca7ae82b
gptid/5be70f97-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/5e010807-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/69009ea1-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
gptid/4a89d3d9-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/4fc675c3-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/5ae7404d-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/64102e02-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/684653dc-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/6cb21684-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
gptid/55c06e85-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/5ad4656b-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/60a45b7e-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/642659b8-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/6c9ec920-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/70743816-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
da22p2 ONLINE 0 0 0
gptid/765c6088-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/76de5ca6-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/799fc6a3-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/7aade4c0-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0
gptid/7b64f138-4570-11ec-877f-6805ca7ae82b ONLINE 0 0 0

errors: No known data errors

pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:01:05 with 0 errors on Sun Nov 21 03:46:05 2021
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da23p2 ONLINE 0 0 0
da24p2 ONLINE 0 0 0

errors: No known data errors

The Second Command Shows:
Name Status Components
gptid/684653dc-4570-11ec-877f-6805ca7ae82b N/A da0p2
gptid/60a45b7e-4570-11ec-877f-6805ca7ae82b N/A da1p2
gptid/5be70f97-4570-11ec-877f-6805ca7ae82b N/A da2p2
gptid/6cb21684-4570-11ec-877f-6805ca7ae82b N/A da3p2
gptid/765c6088-4570-11ec-877f-6805ca7ae82b N/A da4p2
gptid/642659b8-4570-11ec-877f-6805ca7ae82b N/A da5p2
gptid/5e010807-4570-11ec-877f-6805ca7ae82b N/A da6p2
gptid/4a89d3d9-4570-11ec-877f-6805ca7ae82b N/A da7p2
gptid/5ad4656b-4570-11ec-877f-6805ca7ae82b N/A da8p2
gptid/76de5ca6-4570-11ec-877f-6805ca7ae82b N/A da9p2
gptid/69009ea1-4570-11ec-877f-6805ca7ae82b N/A da10p2
gptid/64102e02-4570-11ec-877f-6805ca7ae82b N/A da11p2
gptid/55c06e85-4570-11ec-877f-6805ca7ae82b N/A da12p2
gptid/799fc6a3-4570-11ec-877f-6805ca7ae82b N/A da13p2
gptid/4bcc42f9-4570-11ec-877f-6805ca7ae82b N/A da14p2
gptid/5ae7404d-4570-11ec-877f-6805ca7ae82b N/A da15p2
gptid/70743816-4570-11ec-877f-6805ca7ae82b N/A da16p2
gptid/7aade4c0-4570-11ec-877f-6805ca7ae82b N/A da17p2
gptid/4fc675c3-4570-11ec-877f-6805ca7ae82b N/A da19p2
gptid/6c9ec920-4570-11ec-877f-6805ca7ae82b N/A da20p2
gptid/7b64f138-4570-11ec-877f-6805ca7ae82b N/A da21p2
gptid/25f3b1e7-454e-11ec-9ed2-6805ca7ae82b N/A da23p1
gptid/26b530a9-454e-11ec-9ed2-6805ca7ae82b N/A da24p1
gptid/4882d759-4570-11ec-877f-6805ca7ae82b N/A da7p1
gptid/5bd3af8b-4570-11ec-877f-6805ca7ae82b N/A da6p1
gptid/624408be-4570-11ec-877f-6805ca7ae82b N/A da5p1
gptid/74a9941c-4570-11ec-877f-6805ca7ae82b N/A da4p1
gptid/6afc32b7-4570-11ec-877f-6805ca7ae82b N/A da3p1
gptid/59318019-4570-11ec-877f-6805ca7ae82b N/A da2p1
gptid/5e169925-4570-11ec-877f-6805ca7ae82b N/A da1p1
gptid/653c8f74-4570-11ec-877f-6805ca7ae82b N/A da0p1

I ran the third command and did not see the link to the serial number for the missing disks. according to my chart of what disk is in what bay and based on the serial number that does not show up in the disks list it is the disk ending in 8VB in the top right bay.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK, process of elimination is acceptable.

It seems you have done a bit (or even a lot) of manual CLI work on that pool, so you are probably able to do what you need now.
 

Astraea

Dabbler
Joined
Sep 7, 2019
Messages
28
So the disk that is showing as UNAVIL is in the system but is not showing up at all. I have tried removing it and reinserting it again to see if it will show under the Storage -> Disks menu but it does not. Should I be taking the disk out of the system and trying to wipe it totally clean and then install it as if it was a new disk? When I try to replace the UNAVIL disk it does not give me any disks to choose from, my guess is because it is not showing up in the system from what I can see. Is there another way to see if the disk is being detected by the system and just needs to be wiped and installed clean?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
So the disk that is showing as UNAVIL is in the system but is not showing up at all. I have tried removing it and reinserting it again to see if it will show under the Storage -> Disks menu but it does not. Should I be taking the disk out of the system and trying to wipe it totally clean and then install it as if it was a new disk?
Either a problem with that port on the backplane or the disk itself... I don't know what you would expect wiping it to do, but you can certainly try that.

When I try to replace the UNAVIL disk it does not give me any disks to choose from, my guess is because it is not showing up in the system from what I can see. Is there another way to see if the disk is being detected by the system and just needs to be wiped and installed clean?
You may need to to a refresh of the GUI to have it go out and search again... I noticed that a few times in the past.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Is there another way to see if the disk is being detected by the system and just needs to be wiped and installed clean?
camcontrol devlist
 

Astraea

Dabbler
Joined
Sep 7, 2019
Messages
28
So I ran that command and it only shows 23 storage disks and the 2 boot disks. I have a sneaking suspicion that it might be the backplane for that particular bay. It is the top-right slot in the group of 24 bays. Now I know that Norco is gone and parts are next to impossible to find or they are super expensive. If it is the backplane I am still coming out ahead on the buy (got the chassis, HBA, SAS Expander and a motherboard for just over $400 CDN) for this chassis as it included a Supermicro board (Supermicro X8DT3 server board with Dual Xeon X5670 processors with Dynatron G555 Coolers with 96GB ram) that I am using in another server build and came with the SAS expander and HBA. Is there anything else I should try to confirm my theory? The only thing I can think of is to connect the drive to the motherboard directly with a SATA cable and see if it detects it and is able to function normally that way.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Is there anything else I should try to confirm my theory? The only thing I can think of is to connect the drive to the motherboard directly with a SATA cable and see if it detects it and is able to function normally that way.
that's a valid test
 
Top