Need help fixing my Array

n00by0815

Cadet
Joined
Feb 15, 2024
Messages
5
Hey everyone,

I am pretty new to TrueNAS and got to take over an existing system.

Today a drive was failing, at least according to my email notifications, but didn't show any signs of failure in the Web Interface.
So I ended up being stupid and just replacing the drive. Went there took out the drive, put in a new one, wanted to replace the failed drive and ended up in my current situation.

I had a spare drive in the system (sdc) and replaced drive sdn.

1708036336025.png


Now after resilvering, both drives show up in the corresponding Zpool under spare, but online.

How do I get sdc back to Spare and Available and sdn properly back into my Raid Z2 array? All array consist of 8 drives, this one now consists of 7 + 2 spare.

Thanks in advance!
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Welcome to the forums!

First off, this is a normal display of ZFS sparing out a drive. You do not have 2 spares. It is simply ZFS saying "sdc" is sparing out "sdn". In essence, a temporary 2 way mirror, (with 1 disk considered failing), to restore redundancy to the data.

The normal practice at this point, would be to replace "sdn", both physically and tell ZFS. After the new "sdn" is re-silvered, (aka re-synced to the data), the spare "sdc" would be changed from UNAVAIL to AVAIL.

The documentation, (see "Documentation" link at the top of any forum page), will have details on the proper method to replace a disk. If you find something in the docs that could be improved or is outright wrong today, (software does change...), their should be a "Feedback" button vertically on the right side.


Now it is not clear if "sdn" has been physically replaced. Or if you ran the software side of the process.
  • If you physically replaced "sdn" and NOT the software side, then just do the software side.

ZFS also supports a situation where you have a spare, which is sparing out a failed disk, but you want the spare disk to stay part of the data pool and no longer be used as a spare. I think that means you can remove "sdn", and "sdc" will stop being a spare. This is useful if you don't have a replacement disk and don't expect to.


One thing about TrueNAS & ZFS, is that the software was written for Enterprise Data Centers and computer professionals. So many times their is a learning curve, in the sense TrueNAS support is not as simple as consumer NASes. It is not clear if your "take over an existing system" was home or office use. But, in either case, it is suggested that you read up on ZFS and TrueNAS, both the Documentation I referenced earlier. And various "Resources" available via link at the top of any forum page. And I don't mean read it all, nor read a doc, (or Resource), completely through.
 

n00by0815

Cadet
Joined
Feb 15, 2024
Messages
5
Hello Arwen, thank you very much. I took over the system from our system admin and didn't have a lot of time to prepare for this situation.

So like I said in my post, but as I also noticed, not well phrased:
1. got email alert about failing drive
2. went to the box and physically replaced the drive (which was stupid in hind sight)
3. came back to a resilvering event in TrueNAS and thought it would use the replaced drive for the pool
4. After resilvering found the above screenshot

So it seems I can't just replace sdn from the GUI. The only options I have for sdc and sdn are Extend, Detach or Offline and I am not really sure how to proceed.

My assumption is that I somehow need to remove sdc from that pool, so it goes back to being a spare and put sdn back in the pool properly.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Please read the docs on disk replacement, I don't have the steps memorized. But I did find the section in the manual within 1 minute;

In the mean time, your NAS is fine. RAID-Z2 means it can loose 2 disks without data loss. And with the hot spare in place, you have full, (2 disks worth), redundancy.
 

n00by0815

Cadet
Joined
Feb 15, 2024
Messages
5
Sorry if I seem ignorant, I just don't wanna make things worse, so I am just trying to double check here.

My situation just doesn't seem to be reflected anywhere, so let me ask it this way:

Do I have to:

1. Take sdc which used to be the spare offline, so it becomes a hot spare again?
2. Does that somehow automatically make sdn a proper part of the Z2 Array again? Or do I need to scrub my drives, so this is reflected properly?
or
3. Do I take sdn offline, then EXTEND my Z2 Array to it, then after resilvering take sdc offline, so it becomes the hot spare again?
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
It is reflected in the doc link @Arwen posted. Your hot spare drive is sdc it automatically replaced the failed drive sdn by creating a mirror of the two drives sdc and sdn so any data destined to sdn now gets copied to sdc. That is what it is supposed to do. This now means the hot spare sdc is unavailable as a hot spare as it is currently in use.

You will be working with the drive that failed and it's replacment which as I understand is sdn. You need to follow the "Taking a Disk Offline" section in the link @Arwen posted. Click the Offline and and Confirm Offline button in the ZFS info for the sdn disk for the that you replaced. Once the offline process is complete the Offline button will become an Online button. You can then physically remove the disk from the system when the disk status is Offline. If the replacement disk is not already physically installed in the system, do it now and install it into the system.
Once the disk is offlined and replaced then Click Replace on the Disk Info section for the disk you just offlined. Then select the new drive from the Member Disk dropdown list on the Replace disk dialog. Click Replace Disk to bring it online.

Note: If when you take sdn Offline and reselect it from the Replace dialog and if a popup appears that says "Disk is not clean, partitions were found." then you can cancel the operation and go back and click Online in the ZFS info part and it will online the drive and likely solve the issue. If not then under Replace Disk check the Force checkbox on the Replace dialog popup and it will format the drive allowing it to be used.

To restore the hot spare to waiting status after replacing the failed drive, then follow the Restoring Hot Spare section and remove the hot spare sdc from the pool, then re-add it to the pool as a new hot spare and it will be ready to be used as a hot spare again for the pool

Since you already did the physical swap then @Arwen is saying you just need to follow through on the software part. to properly tell the software the disk is offline and
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
NO, do NOT EXTEND your array.

And @PhilD13 covered the rest.
 

n00by0815

Cadet
Joined
Feb 15, 2024
Messages
5
Hi and a huge thanks to Phil for the more in detail description!

The steps were actually a bit less complicated. Since I had already physically exchanged the drive, I only had to take it offline, then as you described click the REPLACE button under disk info, which honestly I seem to have missed, because it's not right next to OFFLINE, etc.

Then the array resilvered and sdc automagically became a hot spare again. So it looks like the step of bringing sdc back to hot spare status wasn't even necessary.

1708270954955.png
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I hear you need your post edited, but you need to be more specific about the edit you need.
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
@n00by0815 Thanks, and glad you were able to get the system fixed!
 
Top