All 10 drives reporting read fails.

Bryon Brinkmann

Explorer
Joined
Oct 7, 2016
Messages
50
I've been getting an odd error. TrueNAS Scale R630 Dell Perc in NON-Raid mode is reporting the following. This seems a little suspect all 10 drives have read errors. Any thought?

error

CRITICAL​

Pool vol1 state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
  • Disk ST2000NX0263 S460A4MQ0000K6252QE7 is FAULTED
  • Disk ST2000NX0263 S460G82P0000K63305Y7 is DEGRADED
  • Disk ST2000NX0263 S460A45Y0000K62520FQ is DEGRADED
  • Disk ST2000NX0263 S460A6YJ0000K6138WYJ is DEGRADED
  • Disk ST2000NX0263 S460A4DB0000K6223URR is DEGRADED
  • Disk ST2000NX0263 S460AM340000K613A46F is DEGRADED
  • Disk ST2000NX0263 S460C1VZ0000K6234RJ6 is DEGRADED
  • Disk ST2000NX0263 S460A4F40000K6232KKB is DEGRADED
  • Disk ST2000NX0263 S4609SZG0000K6137N1T is DEGRADED
  • Disk ST2000NX0263 S460J38Z0000K6331UZX is DEGRADED
    Screen Shot 2022-06-19 at 10.49.39 AM.png

 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Need to follow the "forum Rules" with a full description of the hardware and software version.
Need enough info to know if all the drives of a SAS controller are degraded?

What are the types of drives.. how long in-service?
 

Bryon Brinkmann

Explorer
Joined
Oct 7, 2016
Messages
50
From the attached picture above YES, all the drives are Degraded and one is in FAULT. I can clear the volume using zpool clear command and it's fat dumb and happy again. But randomly another drive will Fault. NOT THE SAME DRIVE. These drives have been in service for maybe 3 years to 4 years with a very low workload.

TrueNAS-SCALE-22.02.1

Dell R630 (SolidFire)
System Model Storage System
BIOS Version 2.13.0
Firmware Version 2.82.82.82

Controller:
PERC H730 Mini (Embedded)
Firmware Version 25.5.9.0001

Drives
SEAGATE ST2000NX0263 12 GPs SAS
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
From the attached picture above YES, all the drives are Degraded and one is in FAULT. I can clear the volume using zpool clear command and it's fat dumb and happy again. But randomly another drive will Fault. NOT THE SAME DRIVE. These drives have been in service for maybe 3 years to 4 years with a very low workload.

TrueNAS-SCALE-22.02.1

Dell R630 (SolidFire)
System Model Storage System
BIOS Version 2.13.0
Firmware Version 2.82.82.82

Controller:
PERC H730 Mini (Embedded)
Firmware Version 25.5.9.0001

Drives
SEAGATE ST2000NX0263 12 GPs SAS
My guess is something to do with the SAS controller.... otherwise it would be one specific drive misbehaving.

Not sure if there are any hardware diagnostics available?
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
Perhaps your backplane? I used to use a 5 slot 9in 3 5 1/4 bay) backplane and it gave me errors on drives that worked perfectly fine when directly connected to my SAS controller.

maybe Dell has backplane firmware updates available?

Perhaps a TrueNAS veteran could comment on if the fact that seeing degraded next to other pool members is expected when one is faulted, or if there is actually a problem detected with each individual disk.
 

Bryon Brinkmann

Explorer
Joined
Oct 7, 2016
Messages
50
I checked for any firmware updates but, not love. I'm transferring all the data from the pool and going to do some housekeeping (remove the dust, reseat the drives, and rebuild the pool). I did have a power couple of power drops and I think that that could be could have caused some issues but I'm just grabbing at straws. As you can see from the new pic total new drive is labeled as faulted some are online and some are degraded.
 

Attachments

  • Screen Shot 2022-06-21 at 10.55.28 AM.png
    Screen Shot 2022-06-21 at 10.55.28 AM.png
    1.1 MB · Views: 68

Bryon Brinkmann

Explorer
Joined
Oct 7, 2016
Messages
50
My guess is something to do with the SAS controller.... otherwise it would be one specific drive misbehaving.

Not sure if there are any hardware diagnostics available?
I'm going to pull it all apart and check all the connections. Harmonic vibrations can be a pain in the a$$ and could've caused an issue.
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
A replacement sas controller would be an easy and inexpensive fix, hopefully if your clean and reseat doesn’t work, all you need is a 50$ LSI replacement controller
 

Bryon Brinkmann

Explorer
Joined
Oct 7, 2016
Messages
50
Can you recommend a 12GB LSI controller that will connect to a Dell R630 Backplane that supports 10 drives?
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Any LSI SAS HBA… It's just a matter of having the right cable with the right connectors on both ends.
Flash IT firmware rather than IR firmware.
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
I can’t but things that come to mind are
12G SAS
IT mode LSI available
Do you need full height or half height for the slot you plan to use (after checking if its the right PCiE version and lane count)
Does it support the cables you have / cables can be replaced
 
Top