Pool degraded, drive removed

3dddman

Cadet
Joined
Sep 23, 2021
Messages
5
Let me give you a little back story. I just started with a company and I inherited the TrueNAS. It is running on a Dell R720xd with 12 Dell 2TB 7.2K 6G SAS 3.5" drives. We have been experiencing issues on our VMWare ESXi environment. I didn't receive any warnings, but I thought I would log into the TruNAS and check it out and sure enough there was a message saying the MainPool was degraded and a disk was removed. I'm picking up a replacement drive today and heading down to my datacenter to replace the drive. First, as I'm not that familiar with that server, I'm assuming that it should be easy to identify which bay the bad drive is in by the bay lights assuming there are bay lights. If not, is there a way to identify which bay the bad drive is in? Second, are the drives hot swappable? Finally, after I replace the drive, what are the steps to get it back in the array? Does it get automatically added back into the Raid or do I need to do something to add it back in. Thanks
 

Attachments

  • pool degraded.JPG
    pool degraded.JPG
    39 KB · Views: 157
Last edited by a moderator:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I'm assuming that it should be easy to identify which bay the bad drive is in by the bay lights assuming there are bay lights
Usually not...

is there a way to identify which bay the bad drive is in?
If somebody who set the server up has done their job right, you'll have a list of drive serial numbers and identifiers somewhere either electronically or physically on the server... I bet you don't.

To find the identifier, you should start by looking at the disks view. This might help you to see which identifier is "missing" from the pool. (I can already bet it's /dev/da3)

If you have activity lights on the disk, you could (once you find the disk identifier) use this:
dd if=/dev/daX of=/dev/null bs=1024k count=1000 and watch the activity light on that disk light up. (replace daX with the right identifier)

Second, are the drives hot swappable?
It's a chassis that should have that feature, but it's on you to check that.

Finally, after I replace the drive, what are the steps to get it back in the array?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Can't give you Dell-specific advice, but generally Dell makes niceish server gear and there should be an activity LED and maybe a failure LED, but the failure LED is unlikely to be lit, as that's a software thing.

I would expect Dell server bays to fully support hotswap.

You should be able to identify the drive by the lack of activity on the LED. Some of us go for the extra paranoia and make sure we label our drive bays with the serial numbers of the drives within.

If you are using this for VMware ESXi storage, you should be aware that RAIDZ3 is very bad at that, and that you may need to make further remediations if you are experiencing performance issues. See the following article.

https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

Since the RAIDZ3 has the whiff of incompetence, you might also wish to review how your drives are attached to the system. Dell typically pushes their RAID controllers, and you need to be using an HBA like an H200, H310 crossflashed to IT mode firmware, HBA330, etc.

https://www.truenas.com/community/r...bas-and-why-cant-i-use-a-raid-controller.139/

There's guidance about replacing drives in a pool in the manual. I think it is unlikely to "automatically" do anything, but I don't really know since most of my arrays have a hot spare configured, and the specifics change now and then.
 

3dddman

Cadet
Joined
Sep 23, 2021
Messages
5
Thanks everyone. I already verified the drives are hot swappable in that server. I will use the information provided to identify the bad drive and do a replacement.
 

3dddman

Cadet
Joined
Sep 23, 2021
Messages
5
I was able to easily identify the failed disk on the server. It showed an amber light. I changed the 'removed disk" to offline, then physically replaced the old drive with the new drive and instantly had a green light. The problem I am running into per the 'replacement' instructions is that it states to select a member disk, but when I click the drop down arrow, nothing shows, just a - . Am I missing something? Thanks
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
The failed disk light was a clue, but the automatic replacement is even more strange.

I suspect some hardware RAID involvement.

Based on what you showed, I don't see how it can be, but I don't understand what is going on.

You can try from disk view to wipe the new drive ( and refresh the browser) to see if you see the replace option.
 

3dddman

Cadet
Joined
Sep 23, 2021
Messages
5
Unfortunately when I go to disk view the new disk doesn't show. da0 through da12 all show up except da3. Da3 is where the new drive should be. Is there some place to scan for new drives?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It would be so very helpful if you would describe the hardware involved. Because Dell has a bunch of options on how to attach disks to a server, saying:

Dell R720xd with 12 Dell 2TB 7.2K 6G SAS

isn't particularly helpful or descriptive because it omits the truly useful bits.

Most problematic is that I suspect your system has a PERC H710 Mini in it, which will mean you're probably connected using the... guessing, MFI, driver. This has been known to be problematic in the past.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183618

etc. If so, the firmware on the card probably needs to be told to export the drive as a new JBOD disk, and at that point, you'll probably be happy, but there is a warning here that this has the strong possibility to go south on you at some point in the future, possibly resulting in data loss. You really need to replace the RAID controller with a true HBA.
 

3dddman

Cadet
Joined
Sep 23, 2021
Messages
5
Sorry, this server was setup months ago before I was with the company. All I have is the description on the invoice from when the server was purchased. It has the following info on it. DELL EMULEX LPE12002 8GB and Dell R720xd 12-Bay LFF Dual E5-2660 8C 2.2GHz 64GB (8x8GB)PC3-14900R 1Rx4 PERC H710p 1GB QP 5720 RPS 1100W NTCWP
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yep so you have a PERC H710 in there. I'll leave it to Dell experts to give more specific advice here, there are weird interactions with the internal RAID card PCI slot and other cards, but there's going to be some forward path available to swap in an HBA. Also, apparently some people have had success at crossflashing IT firmware onto the H710.

https://www.truenas.com/community/threads/howto-h710-crossflashing-to-it-mode-guide-online.82725/

I haven't done this but it is a plausible option.
 
Top