ESXi: Replacing Bad SSD attached to LSI 9207

TranceKat

Dabbler
Joined
Feb 17, 2020
Messages
21
Hi all.. Question on replacing a bad ssd in a raid 1 config. I have an lsi 9207-4i-4e with 2 ssd's attached. One went bad so i ordered another of the same ssd. I plugged in the new one but cant figure out how to add it to the existing array using the on card raid utility. Any insights, please?
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Do NOT use any kind of RAID utility:
Manage vdevs from the TrueNAS GUI.
 

TranceKat

Dabbler
Joined
Feb 17, 2020
Messages
21
Do NOT use any kind of RAID utility:
Manage vdevs from the TrueNAS GUI.
Of course. You're absolutely correct. This raid array is for the hypervisor and vm images, one of which is truenas. I have a separate controller for the 8 8tb drives that I pass to truenas, so I'm following best practices.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So, uh, what?

Please provide some sort of even vague description, because there's literally no direction here. You've .... what? You have a RAID card on a hypervisor and you don't know how to do a drive replacement? What hypervisor? IR firmware? Vendor firmware? Do you have storcli installed? Etc. The workflow for "how to replace a LSI RAID member drive on MFI/MRSAS under ESXi with storcli" is very different from "how to replace a LSI RAID member drive with IR under Proxmox with Megaraid Storage Manager".

Also, if that's the issue, while we can help, I'll be moving the thread over to off-topic because it has nothing to do with FreeNAS/TrueNAS.
 

TranceKat

Dabbler
Joined
Feb 17, 2020
Messages
21
So, uh, what?

Please provide some sort of even vague description, because there's literally no direction here. You've .... what? You have a RAID card on a hypervisor and you don't know how to do a drive replacement? What hypervisor? IR firmware? Vendor firmware? Do you have storcli installed? Etc. The workflow for "how to replace a LSI RAID member drive on MFI/MRSAS under ESXi with storcli" is very different from "how to replace a LSI RAID member drive with IR under Proxmox with Megaraid Storage Manager".

Also, if that's the issue, while we can help, I'll be moving the thread over to off-topic because it has nothing to do with FreeNAS/TrueNAS.
So sorry for not giving you all the details.

I have 2 raid contrillers in my server. One is used for my 8x8tb disks and is not relevant to this issue.

The other is used to mirror my os(vmware esxi) in raid 1.

That raid 1 is created and managed by the 9207 card.

One of the disks on that raid 1 went bad.

I start up my server, hit ctl-c to enter into the Avango sas2308 config utility when the controller has initialized and prompts me to hit ctl-c to enter configuration mode. (this card is in ir mode).

In this utility, i see the raid i created with the original 2 ssd. Viewing that raid1, i see only the good disk (bc i removed the failed disk). If i look at thr list of drives connected to the controller, i see the new disk(to reolace the old bad one) but i do not see how to add it to the existing raid1.

I'll include pictures as soon as i get back to my server.

Does that help at all? Apologies again for not providing enough information.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So, do you have storcli installed under ESXi? The BIOS config utilities vary widely across the various LSI/Avago/OEM cards, but in general I doubt that there's a reliably correct way to make the change you're seeking through them, so I'd be hesitant to try.

I suggest installing storcli under ESXi, which will give you /opt/lsi/storcli/storcli which you can use to chat with the controller and cause it to do stuff.

For example:

Code:
[root@somewhere:~] /opt/lsi/storcli/storcli  /c0 show
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.0409.0000.0000 Nov 06, 2017
Operating system = VMkernel6.7.0
Controller = 0
Status = Success
Description = None

Product Name = AVAGO 3108 MegaRAID
Serial Number = FW-AEP7RGMAARBWA
SAS Address =  5003048019d81d01
PCI Address = 00:05:00:00
System Time = 11/04/2021 13:37:15
Mfg. Date = 00/00/00
Controller Time = 11/04/2021 13:36:16
FW Package Build = 24.21.0-0028
BIOS Version = 6.36.00.2_4.19.08.00_0x06180202
FW Version = 4.680.00-8290
Driver Name = lsi-mr3
Driver Version = 7.708.07.00
Current Personality = RAID-Mode
Vendor Id = 0x1000
Device Id = 0x5D
SubVendor Id = 0x15D9
SubDevice Id = 0x809
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 5
Device Number = 0
Function Number = 0
Drive Groups = 2

TOPOLOGY :
========

---------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT     Size PDC  PI SED DS3  FSpace TR
---------------------------------------------------------------------------
 0 -   -   -        -   RAID1 Optl  N  931.0 GB dflt N  N   dflt N      N
 0 0   -   -        -   RAID1 Optl  N  931.0 GB dflt N  N   dflt N      N
 0 0   0   252:0    1   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N
 0 0   1   252:1    0   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N
 1 -   -   -        -   RAID1 Optl  N  931.0 GB dflt N  N   dflt N      N
 1 0   -   -        -   RAID1 Optl  N  931.0 GB dflt N  N   dflt N      N
 1 0   0   252:4    4   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N
 1 0   1   252:5    5   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N
---------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready

Virtual Drives = 2

VD LIST :
=======

---------------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC     Size Name
---------------------------------------------------------------------
0/0   RAID1 Optl  RW     Yes     NRWBD -   ON  931.0 GB somewhere-s0r
1/1   RAID1 Optl  RW     Yes     RWBD  -   ON  931.0 GB somewhere-s1r
---------------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

Physical Drives = 5

PD LIST :
=======

-----------------------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                   Sp Type
-----------------------------------------------------------------------------------
252:0     1 Onln   0 931.0 GB SATA SSD N   N  512B Samsung SSD 860 EVO 1TB U  -
252:1     0 Onln   0 931.0 GB SATA SSD N   N  512B Samsung SSD 860 EVO 1TB U  -
252:2     2 UGood  - 931.0 GB SATA SSD N   N  512B Samsung SSD 860 EVO 1TB U  -
252:4     4 Onln   1 931.0 GB SATA SSD N   N  512B Samsung SSD 860 EVO 1TB U  -
252:5     5 Onln   1 931.0 GB SATA SSD N   N  512B Samsung SSD 860 EVO 1TB U  -
-----------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


Cachevault_Info :
===============

------------------------------------
Model  State   Temp Mode MfgDate
------------------------------------
CVPM02 Optimal 22C  -    2015/05/07
------------------------------------



You will see that this hypervisor has five 1TB SSD's, two sets of RAID1 with a spare. This generally means I don't have to worry when an SSD fails, but we can work through the proper commands to cause one to be replaced. You need to install storcli and replicate the above, so we can see what's what.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I should also note that I'm unclear on whether storcli will work with an IR-mode card.
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
I should also note that I'm unclear on whether storcli will work with an IR-mode card.
Just tried on one of my *yuk* 2019 boxes and it was a no-go with latest storcli64
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Just tried on one of my *yuk* 2019 boxes and it was a no-go with latest storcli64

My notes seem to suggest that megacli is supposed to work. I'm a bit fuzzy-in-the-brain as to how we used to do this.
 

TranceKat

Dabbler
Joined
Feb 17, 2020
Messages
21
Thank you all. I'll try these suggestions tonight. I really appreciate your kindness in helping me here.
 

TranceKat

Dabbler
Joined
Feb 17, 2020
Messages
21
OK, so I seem to have found the issue. I purchased the same brand/model of drive as a replacement, but when I look at the disk capacity vs the capacity of the original disks, the new disk has 1 MB more than the others. ADAT SU800 Rev 8B is the new version and I had something like 7NR originally. Nothing specified that the disks were different: Same make/model - I just used the "Order Again" option in Amazon.

My plan now is to buy a second disk of the new rev, create a new raid array with just the 2 new disks, boot up the system with a boot disk (alpine), and do a dd if=/dev/sd_old_disk of=/dev/sd_newraid.

Anything I may be missing?

Thank you in advance.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Usually, having more space on the replacement is fine. It's not having enough that's usually an issue.
 

TranceKat

Dabbler
Joined
Feb 17, 2020
Messages
21
Agreed.. this just isn't making sense to me.

This is the original raid1 with the remaining good disk:
1636293092001.png


This is the list of disks attacked to the controller:
1636293171389.png


The aforementioned size disparity - which is caused by adding a disk to an array (tried this with 2 new rev 8B disks):
1636293292776.png


I guess my only other option is to use 2 new disks and then hope that dd will do the trick.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I don't have the keystrokes for you, but pay attention at the bottom of the screen for the clue-in there.

You do NOT want to be in the "Create New Volume" screen (last screenshot).

The first screen, where you have "Manage Volume", looks right-ish. Either clicking on Manage Volume or perhaps cursoring down to the missing drive line should get you some options, I think. However, it might also be that you need to go into the SAS Topology menu and highlight the "RAID1 Volume" to get the correct options. I am fairly certain that one of these three things is the thing you need to "sit on" to get the keystrokes to get you to a resolution.

I'm sorry but I just don't memorize this stuff...

The LSI BIOS stuff is not horribly user-friendly but it makes a lot of sense if you look at it the right way. Once you "get it", it's sufficiently intuitive that you just look for how to get the GUI to give you the option you need.
 
Top