Request for drive replacement guidance

TNightster

Cadet
Joined
Jan 8, 2022
Messages
9
System: Supermicro X9DRH - 7TF / SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]

Version: TrueNAS-12.0-U2.1

RaidZ2 array - 8 disks (6TB)

One drive is showing UNAVAIL

root@freenas:~ # zpool import -m pool: zPool1 id: 5856686450448771866 state: DEGRADED status: One or more devices are missing from the system. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q config: zPool1 DEGRADED raidz2-0 DEGRADED gptid/5c2a9901-b9b7-11eb-bbb1-00259091fd8c ONLINE gptid/ec64938b-d907-11ea-8a13-00259091fd8c ONLINE gptid/ed9597ff-d907-11ea-8a13-00259091fd8c ONLINE gptid/eee0eb7b-d907-11ea-8a13-00259091fd8c ONLINE gptid/f020af04-d907-11ea-8a13-00259091fd8c ONLINE gptid/f16d67d6-d907-11ea-8a13-00259091fd8c UNAVAIL cannot open gptid/f2af2c38-d907-11ea-8a13-00259091fd8c ONLINE gptid/f3eb76ed-d907-11ea-8a13-00259091fd8c ONLINE root@freenas:~ #

Pool is OFFLINE in the GUI, so I have to access to manipulate it.

I don't have a hot swap drive currently. I have installed a 14TB blank drive and it mounted normally/successfully and does display in the Disks list. Not worried about any lost drive space at this time.

-------------

I've read several posts which offer resolution options, but can someone educate me on the best practice since I can't manage the pool in the GUI?

What should my process be?

1 - Attempt an import to Force the pool to mount via Shell? if successful, then remove/replace the drive in the GUI?

2 - Assign the new 14TB drive as a Hot Spare via Shell? will TrueNAS then automatically replace/resilver the UNAVAIL drive?

3 - Attempt to replace the UNAVAIL drive in the CLI with the new 14TB drive using "zpool replace" via Shell?

Thanks in advance for any guidance.
 

Attachments

  • zPool1 - OFFLINE.PNG
    zPool1 - OFFLINE.PNG
    3.2 KB · Views: 133

flashdrive

Patron
Joined
Apr 2, 2021
Messages
264
Hello @TNightster

I have had 2 out of 6 drives missed in my pool.

So I powered down the host; replugged the power connectors to the SATA drives (power-cycle) and restarted the host, checked in BIOS / EFI that all 6 drives are there.

Then in TN Core GUI I checked the pool status which came back up.

How about giving that missing original drive a 2nd try instead of the 14 TByte?

How is your data backup situation of your pool's data?

Before tinkering with the TN system settings I like to save a TN config file first.


see also


 

thomas-hn

Explorer
Joined
Aug 2, 2020
Messages
82
I understand why the RAIDZ2 pool is DEGRADED because of a failed drive, but why is the pool OFFLINE? Shouldn't a degraded pool not simply stay ONLINE (as long as enough redundancy is available)?
 

flashdrive

Patron
Joined
Apr 2, 2021
Messages
264
"I understand why the RAIDZ2 pool is DEGRADED because of a failed drive, but why is the pool OFFLINE? Shouldn't a degraded pool not simply stay ONLINE (as long as enough redundancy is available)?"

yes, this is also my experience with TrueNAS CORE 12.0-U7

Are you indeed running U2.1 ?
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Unless there's another failure, it should be online. I'm wondering if this ties in with the 12.0-U7 disk availability defect. There is a defect in U7 that allows the GUI to present a pool member drive as available for use in another pool. You might want to pull in one of the iXsystems people here and see if they need a defect open. @Kris Moore maybe?
 

TNightster

Cadet
Joined
Jan 8, 2022
Messages
9
Hello @TNightster

I have had 2 out of 6 drives missed in my pool.

So I powered down the host; replugged the power connectors to the SATA drives (power-cycle) and restarted the host, checked in BIOS / EFI that all 6 drives are there.

Then in TN Core GUI I checked the pool status which came back up.

How about giving that missing original drive a 2nd try instead of the 14 TByte?

How is your data backup situation of your pool's data?

Before tinkering with the TN system settings I like to save a TN config file first.


see also


Thanks for the response flashdrive.

Following the error I was seeing on the console (CCB request completed with an error) :

20220108_195524.jpg


it lead me to other's posts about bad/loose cables so I did check those this weekend. And I physically swapped the drive into at least 3-4 other drive bays.

The drive behaves the same in every test. It powers on and I get an activity light on the drive tray. I can hear it "working" which sounds like it's intensely reading/writing to the drive. After ~30 seconds, it makes a loud "chirp" noise and goes dark with these errors. So the conclusion I came to was the drive is either failed or past a threshold of operational stability.

On the flip side, I docked the 14TB drive in and within 30 seconds, it posted/mounted/reported into TN just like you'd expect.

If you have any other suggestions for troubleshooting the 6TB I'd happily give it try. But it seems non-functional at present.

Thanks again!
 

TNightster

Cadet
Joined
Jan 8, 2022
Messages
9
I understand why the RAIDZ2 pool is DEGRADED because of a failed drive, but why is the pool OFFLINE? Shouldn't a degraded pool not simply stay ONLINE (as long as enough redundancy is available)?

That's what I thought would happen, and haven't been able to determine why it is "OFFLINE" missing that one drive.

Having said that, I have log messages about another drive which has 5 read errors now. But it's still mounted so I don't understand why the entire pool is unavailable.

I have a brand new TN system running, and I've moved my other pools over already. I'd just like to replicate this pool over if possible as well.

Thanks for the input!
 

TNightster

Cadet
Joined
Jan 8, 2022
Messages
9
"I understand why the RAIDZ2 pool is DEGRADED because of a failed drive, but why is the pool OFFLINE? Shouldn't a degraded pool not simply stay ONLINE (as long as enough redundancy is available)?"

yes, this is also my experience with TrueNAS CORE 12.0-U7

Are you indeed running U2.1 ?
Here is my current version:


TrueNAS version-011022.PNG


I had wondered if this could be a bug/defect, but it's been running fine for weeks and I haven't upgraded anything to maintain stability.

I also wondered if I did apply an upgrade, if it could be more destructive than beneficial. So I can to ask you guys first! lol

Thanks again!
 

TNightster

Cadet
Joined
Jan 8, 2022
Messages
9
Unless there's another failure, it should be online. I'm wondering if this ties in with the 12.0-U7 disk availability defect. There is a defect in U7 that allows the GUI to present a pool member drive as available for use in another pool. You might want to pull in one of the iXsystems people here and see if they need a defect open. @Kris Moore maybe?
Thanks for the response rvassar. Posted the screenshot above, but it reports to be:

Version:
TrueNAS-12.0-U2.1

I'd be happy to apply an update/upgrade if that could resolve a drive mounting issue.

Thanks!
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Thanks for the response rvassar. Posted the screenshot above, but it reports to be:

Version:
TrueNAS-12.0-U2.1

I'd be happy to apply an update/upgrade if that could resolve a drive mounting issue.

Thanks!

Too early. That version is not affected. It looks like you have a bad drive, but RAIDz2 should come up degraded, not offline. What happens if you unplug the bad drive entirely, and bring it up with it missing?
 

TNightster

Cadet
Joined
Jan 8, 2022
Messages
9
Too early. That version is not affected. It looks like you have a bad drive, but RAIDz2 should come up degraded, not offline. What happens if you unplug the bad drive entirely, and bring it up with it missing?

That's basically the state I'm at now. I have the drive out of the chassis. Didn't matter if I moved it to a different slot, or restarted with in the old slot. It tries to spin up and mount, but then fails some type of "CCB" test or process.

So right now I have it just sitting on the bench.

The TN interface is the same either way. Drive reports UNAVAIL and the pool status is OFFLINE
 

TNightster

Cadet
Joined
Jan 8, 2022
Messages
9
Can I "Force" the zpool to mount? using zpool import -F ? Even if the fault tolerance is at risk, this would be the last access of this data during replication
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Does zpool import fail without the force flag? What error message gets generated?

If it does, I'd probably kick off a SMART test against each drive. You may have a double failure.
 

TNightster

Cadet
Joined
Jan 8, 2022
Messages
9
Does zpool import fail without the force flag? What error message gets generated?

If it does, I'd probably kick off a SMART test against each drive. You may have a double failure.
This is the response when I tried the import:

1641925467619.png


Running SMART tests now and will report results.

Thanks!
 

TNightster

Cadet
Joined
Jan 8, 2022
Messages
9
Does zpool import fail without the force flag? What error message gets generated?

If it does, I'd probably kick off a SMART test against each drive. You may have a double failure.
This has been running for 24+hours. Now I'm wondering if I should restart the system and retry the SMART tests:

1642022336734.png
 
Top