HELP! Not sure how to proceed. Scrub shows degraded and faulted drive(s)

Demonlinx

Explorer
Joined
Apr 11, 2022
Messages
53
Old chassis is ok, but always new drives. Can buy 2-3 such sets at price of one new server, and its more reliable then one new even with best sever care pack.
This is one of such DL180 G6 very alike yours (C5 H2 in signature)
This is the second time I've replaced the drives within this same hardware. The behavior of the crash was also the same with multiple drives failing all at the same time. I initially replaced the SFF cables between the HBA and the backplane thinking that was causing the issue(s). Would you still believe that I could use this machine?
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Are there any 12i card variants? I know that when I was looking last there was nothing like that.. Is it only recommended to form pools with drives that are on the HBA?

I'm not completely sure. I think there may be a bandwidth limitation for the slot which is why they are 8i.

I think it would probably be ok to have one pool on two HBA of the same model, etc. But, it is recommended to not mix SAS and SATA connections in same pool.

If possible, I would use the additional slots outside of HBA for smaller SSD pool(s). You can use that as your boot device or put applications on them to speed things up. Then just use the HBA pool for pure storage.

Already have many spare drives available.

Haha. Well. Don't count the pile of old drives as valid spares. You should really get spares that are identical in age and model to the main pool drives. You don't want it to be a temporary solution but a replacement that will live with the pool indefinitely.
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
This is the second time I've replaced the drives within this same hardware. The behavior of the crash was also the same with multiple drives failing all at the same time. I initially replaced the SFF cables between the HBA and the backplane thinking that was causing the issue(s). Would you still believe that I could use this machine?

That is super odd. But, then again, it sounds like you've been trying to recycle from a pile of old/bad drives. But, just like people, there isn't a way to make them younger! :D
 

Demonlinx

Explorer
Joined
Apr 11, 2022
Messages
53
I'm not completely sure. I think there may be a bandwidth limitation for the slot which is why they are 8i.

I think it would probably be ok to have one pool on two HBA of the same model, etc. But, it is recommended to not mix SAS and SATA connections in same pool.
Excuse my inexperience, but am I currently mixing SAS and SATA connections?
If possible, I would use the additional slots outside of HBA for smaller SSD pool(s). You can use that as your boot device or put applications on them to speed things up. Then just use the HBA pool for pure storage.
I'll look into this. That sounds like a good plan.
Haha. Well. Don't count the pile of old drives as valid spares. You should really get spares that are identical in age and model to the main pool drives. You don't want it to be a temporary solution but a replacement that will live with the pool indefinitely.
We do have drives that are new and are from the same purchase order as the WD RED drives that we purchased in the past.
 

Demonlinx

Explorer
Joined
Apr 11, 2022
Messages
53
That is super odd. But, then again, it sounds like you've been trying to recycle from a pile of old/bad drives. But, just like people, there isn't a way to make them younger! :D
Since I believe that I've got all of the data off of the drives, would you recommend that I remove the faulted drive and attempt to repair the pool?
 

Alex_K

Explorer
Joined
Sep 4, 2016
Messages
64
HUS724020ALA640
This is Ultrastar 7K4000 series, from back 2012
Still maybe its to much of a coincidence if its different bunch of drives failing 2nd time.

Only one way to be absolutely sure - check parts separately.

Its still possible to be in a backplane or controller, or PCIe riser or PCI-e slot, though all that fail quite rarely.

Easiest - move controller to different slot

Test drives in different machine. Start with faulted one. If not sure how to identify, doublecheck
1. smartctl -i /dev/sdi
should show Z1P1QEHC as serial number. If that's so,
2. sas2ircu 0 locate 2:8 ON
should light blue led on that drive cage


And you still haven't told us your server configuration. Yep thats CPU/RAM, TrueNAS version/build.. HP BIOS version.
Did you check server health and event log for errors in iLO2?


About 12i controllers. There are none. However, plenty of 16i are sold.
But where would you put the extra drives?
In this chassis, its possible to attach a rear drive cage if you could move controller onto the other side (closer to CPUs), but its complicated.

12x3.5" chassis are perfect fit for 2xRAID-Z2 pools.
For boot drive, you could use PCIe M.2 SATA adapters, the ones that require SATA cable to be attached to motherboard. Like Maiwo KT015. These guaranteed to be bootable with SATA M.2 disks.
If its NVMe, old server may not boot from it.
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Excuse my inexperience, but am I currently mixing SAS and SATA connections?

Yes. From the report that you posted the other day it shows some drives on "protocol" "SAS" and others on "SATA".

The SAS connections would come out of the HBA most likely. The SATA would most likely come directly out of the motherboard or maybe yours has a SATA controller as well (ie. a separate card with a bunch of SATA connectors on it)?

Since I believe that I've got all of the data off of the drives, would you recommend that I remove the faulted drive and attempt to repair the pool?

If you are planning on re-doing the pool with new drives I'm not sure what the point would be of this?

You could instead just destroy the pool and retire/return the bad drives. Re-build from scratch when you get the new drives.

You might go through the process just as a matter of practice to see what it's like. But, I think the moral of the story is that those drives are almost certainly toast and should not be used to store anything of any importance. You could maybe use the ones not showing degraded in a smaller pool for things that you don't mind losing at any given moment...?
 

Alex_K

Explorer
Joined
Sep 4, 2016
Messages
64
Yes. From the report that you posted the other day it shows some drives on "protocol" "SAS" and others on "SATA".

The SAS connections would come out of the HBA most likely. The SATA would most likely come directly out of the motherboard or maybe yours has a SATA controller as well (ie. a separate card with a bunch of SATA connectors on it)?



If you are planning on re-doing the pool with new drives I'm not sure what the point would be of this?

You could instead just destroy the pool and retire/return the bad drives. Re-build from scratch when you get the new drives.

You might go through the process just as a matter of practice to see what it's like. But, I think the moral of the story is that those drives are almost certainly toast and should not be used to store anything of any importance. You could maybe use the ones not showing degraded in a smaller pool for things that you don't mind losing at any given moment...?

All his drives are connected to the backplane, and backplane is connected to 9211-8i
That backplane does work with mix of SAS and SATA buy due to protocol differences it is suboptimal and somewhat risky due to different voltages: SAS requiring higher voltage _I think_ would make all the disks operate at higher voltage, pls someone correct me if I'm wrong
 

Demonlinx

Explorer
Joined
Apr 11, 2022
Messages
53
HUS724020ALA640
This is Ultrastar 7K4000 series, from back 2012
Still maybe its to much of a coincidence if its different bunch of drives failing 2nd time.

Only one way to be absolutely sure - check parts separately.

Its still possible to be in a backplane or controller, or PCIe riser or PCI-e slot, though all that fail quite rarely.

Easiest - move controller to different slot
I'll do this when I start testing drives and see how it does when I rebuild the pool the next time with all fresh drives.
Test drives in different machine. Start with faulted one. If not sure how to identify, doublecheck
1. smartctl -i /dev/sdi
should show Z1P1QEHC as serial number. If that's so,
2. sas2ircu 0 locate 2:8 ON
should light blue led on that drive cage


And you still haven't told us your server configuration. Yep thats CPU/RAM, TrueNAS version/build.. HP BIOS version.
Did you check server health and event log for errors in iLO2?
CPU is Intel Xeon E5620
RAM 23.5 GiB ECC RAM
TrueNAS version: TrueNAS-SCALE-22.02.3
Not sure of HP BIOS Version, I've also still not been able to get into the iLO2 interface yet.
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
All his drives are connected to the backplane, and backplane is connected to 9211-8i
That backplane does work with mix of SAS and SATA buy due to protocol differences it is suboptimal and somewhat risky due to different voltages: SAS requiring higher voltage _I think_ would make all the disks operate at higher voltage, pls someone correct me if I'm wrong

I see. I haven't used a system with a backplane. So new info for me.

RAM 23.5 GiB ECC RAM

I don't think this is the cause of your issues. But, I think you should get more RAM for that size of pool.
 

Demonlinx

Explorer
Joined
Apr 11, 2022
Messages
53
All his drives are connected to the backplane, and backplane is connected to 9211-8i
That backplane does work with mix of SAS and SATA buy due to protocol differences it is suboptimal and somewhat risky due to different voltages: SAS requiring higher voltage _I think_ would make all the disks operate at higher voltage, pls someone correct me if I'm wrong
Assuming I get all 12 drives SATA and then connect them all to the backplane and then the backplane into the 9211-8i will this work? Or am I still missing something by using the 8i with a 12 drive backplane?
 

Alex_K

Explorer
Joined
Sep 4, 2016
Messages
64
I'll do this when I start testing drives and see how it does when I rebuild the pool the next time with all fresh drives.

CPU is Intel Xeon E5620
RAM 23.5 GiB ECC RAM
TrueNAS version: TrueNAS-SCALE-22.02.3
Not sure of HP BIOS Version, I've also still not been able to get into the iLO2 interface yet.
Why not make sure you have all the updates
Here is the link https://support.hpe.com/hpesc/public/home/driverHome?pmasr=0&sp4ts.oid=3884342
Most of it does not require registration or valid warrany.

Assuming I get all 12 drives SATA and then connect them all to the backplane and then the backplane into the 9211-8i will this work? Or am I still missing something by using the 8i with a 12 drive backplane?

Thats exactly how I did it with FreeNAS. Or all 12 SAS.
If it was 12 SSD you'd be limiting their speed.
 
Last edited:

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Assuming I get all 12 drives SATA and then connect them all to the backplane and then the backplane into the 9211-8i will this work? Or am I still missing something by using the 8i with a 12 drive backplane?

It's the 8i that makes the connections SAS. They still connect to SATA drives. But, with a different protocol.
 

Demonlinx

Explorer
Joined
Apr 11, 2022
Messages
53
It's the 8i that makes the connections SAS. They still connect to SATA drives. But, with a different protocol.
But it's only the HP drives that get set to the SAS protocol right? Aren't all the other drives connecting using SATA?
 

Demonlinx

Explorer
Joined
Apr 11, 2022
Messages
53
Why not make sure you have all the updates
Here is the link https://support.hpe.com/hpesc/public/home/driverHome?pmasr=0&sp4ts.oid=3884342
Most of it does not require registration or valid warrany.
There's a ton to sift through here. Can you tell me where I might start? How do I go about updating this machine?
Thats exactly how I did it with FreeNAS. Or all 12 SAS.
Ok, that sounds good. I think we're just going to kick out the SAS drives and go with all SATA drives. I think we're just going to replace all 12 drives with known good drives. Also, note to go through the recommended burn-in from the Hardware guide.
 

Alex_K

Explorer
Joined
Sep 4, 2016
Messages
64

Alex_K

Explorer
Joined
Sep 4, 2016
Messages
64
There's a ton to sift through here. Can you tell me where I might start? How do I go about updating this machine?
...
Ideally you want latest BIOS
but that one require valid warranty or equivalent. If you confirm you don't have latest - may ask seller. Depend on your current BIOS version that might be critical.
Other things are free.
Latest iLO:
iLO Config utility:
in Firmware-Network find latest FW for your network card, unless you plan to put there separate network adapter, something Intel or Chelsio - onboard Broadcom NICs in these basic models suck.
 
Last edited:

Demonlinx

Explorer
Joined
Apr 11, 2022
Messages
53
Ideally you want latest BIOS but that one require valid warranty or equivalent. If you confirm you don't have latest - may ask seller.
Other things are free.
Latest iLO:
iLO Config utility:
How do I go about applying these updates? Do I just put them on USB and then find it somewhere within the bios to update/upgrade firmware?
in Firmware-Network find latest FW for your network card, unless you plan to put there separate network adapter, something Intel or Chelsio - onboard Broadcom NICs in these basic models suck.
I'll do this as well. Is there a way from the TrueNAS CLI to tell what NIC is currently installed?
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
But it's only the HP drives that get set to the SAS protocol right? Aren't all the other drives connecting using SATA?

I was mistaken about that. The drives do have to be SAS (to use that protocol. not necessarily need to be for you). I just used breakout cables from my HBA. So, am a bit lost on how it all goes together with your backplate setup.
 
Joined
Dec 21, 2015
Messages
6
I had a similar problem, resolved by connecting HBA to a different port on my expander.

Supermicro H12SSL-CT (LSI3008) motherboard with BPN-SAS2-836EL1 expander.
Simliar read and write errors, often working for a few hours then bursts of errors.
Tried difference HBA, cables, drives, HBA and expander firmware, fans on HBA.

My expander has 3 x ports PRI_J1, PRI_J2, PRI_J3. The manual instructs to use PRI_J1 for HBA, PRI_J2 for cascade and no mention of what to do with PRI_J3. As instructed I was only using PRI_J1, had no cascade and I was getting errors. I now have the HBA connected to PRI_J2 and PRI_J3 and all working fine with no errors. I have no idea if all those SAS lanes are active but it's been working fine like that for a few weeks now.
 
Top