Entire 40T pool offline and won't come back

exodus454

Dabbler
Joined
Nov 24, 2019
Messages
14
Sorry my phone keeps hitting lost while trying to type!

Anyway, I'm currently on OS Version:TrueNAS-SCALE-22.02.4
  • Motherboard make and model : biostar 990+
  • CPU make and model : AMD FX-6300
  • RAM quantity : 16GB
  • Hard drives, quantity, model numbers, and RAID configuration, including boot drives
    • 3 - WD Velociraptor 250G 2.5" SATA
    • 4 - HGST HUST726060ALA6040 6 TB SATA
    • 3 - Hitachi 0F18335 6TB SATA
    • 2 - Toshiba MG04ACA600E 6TB SATA
    • 1 - HGST HUH728080AL4200 8TB SAS
  • Hard disk controllers :
    • 4 onboard SATA ports
    • LSI 9240i-8i (previous) -IT Mode
    • Adaptec ASR-71605-16i - HBA mode
  • Network cards
    • Onboard Gigabit
  • Power supplies
    • Rosewill 300w (original)
    • Zalman Gigamax 500w (new)
    • ALSO ON UPS
  • Storage layout
    • Raid z2 - 9 disk / 6 tb
      • Replaced old disk with
      • 1 - HGST HUH728080AL4200 8TB SAS
    • Apps pool
      • 2 - mirror WD Velociraptor 250G 2.5" SATA
    • Boot pool
      • 1 - WD Velociraptor 250G 2.5" SATA

I'm also getting to be a seasoned zfs user, I've been using some form of truenas since freenas 7 so I definitely know what to expect, and this was unexpected

So everything has been completely fine with my setup until my power supply died like two weeks ago, cap exploded from age. I got a new PS, installed it and everything was gravy. All my disks came back online. When the PS died the pool was in the middle of replacing a disk, so I just roughed the new PS in and just let get back to work.

Everything was back online for 4 days or so on the new PS before I shut it down just to finish routing the wires and stuff. Since then I haven't been able to get ANY of the 6tb drives back on. They aren't reponding to power, sometimes it sounds like they might be trying to do something but it's hard to hear them over the other drives. It's almost like they're in standby.

And yes, I did plug it in.

Normally the pool spanned 8 - HBA and 1 - MB ports.

The 3 disks from the app/boot-pool from the MB work just fine. I tried moving them around to different ports on the motherboard, they all show up like you'd expect.

I tried moving the same disks over all the HBA ports to make sure the problem wasnt on the HBA card, the web console/syslog shows them being initialized.

I tried swapping the old LSI card back in, no change - no "media-pool" drives show up, but testing them with other drives, they all all show up like you'd expect.

I pulled out all the extra drives I have kicking around- a few SSDs, A WD 1TB drive, a seagate 500G that I also used to check. Extra drives show up everywhere you'd expect.

I tried changing power cables, even when directly connected, no diiffernce.

I shutdown the computer and pulled the bios battery for an hour, no change.

I reinstalled the adaptec card, updated it, tried switching the bios to RAID mode and bcak to HBA, no change. Tried "rescan discs" in arcconf, no changes.

I tried an external USB on each drive (not expecting it to work), no change

It's strage to me how they stopped working: only the ones that were actively a part of that pool dropped out? If it was some electrical fault that actually made them fail I would personally expect it to be more of a total failure, of ALL the drives or more sporadic. Even the one connected to the motherboard port is being affected, it's not like the port died.

The resiver completed quickly (around 30 hours) at ~220 mbs, so the disks are healthy


Anyone have any ideas? I just don't wanna lose all my disks
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
The problem is obvious. The disks aren't connected to anything

Hint: PLease read the forum rules about posting your hardware
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
You should definitely establish if the drives spin up when power is applied.

Were all the failed drives on the same power rail and all the surviving drives on another rail, by any chance?

Based on the description, I expect something is blown on all the drives.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Adaptec ASR-71605-16i - HBA mode

This is not acceptable for use. Please see


and note that this absolutely does apply to your ASR-71605.
 

exodus454

Dabbler
Joined
Nov 24, 2019
Messages
14
You should definitely establish if the drives spin up when power is applied.

Were all the failed drives on the same power rail and all the surviving drives on another rail, by any chance?

Based on the description, I expect something is blown on all the drives.

No activity, all dead. I took them all out adn tested on the bench. As far as I know they were all on the same rail. I don't see any burnt traces on the exposed side of any of the drives either. No broken pins, bent connectors, anything really.

If the disks are connected to the power from the power supplies' SATA power connectors, it could be the 3.3v problem:
Wikipedia - SATA - Power Connections

Thanks, I'm aware of this. I only had to deal with on the one SAS drive, the rest of them are still legacy.
This is not acceptable for use. Please see


and note that this absolutely does apply to your ASR-71605.

Should I not be using it? In linux anyway it appears to be passing them correctly. I have smartctl available for all the 3 disks I have attached, and they're showing up as /dev/sda-b-c. The day I switched cards all my disks were automatically all online too, I didn't have to change anything.

I highly doubt the Adaptec card is why my drives are seemingly dead
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Should I not be using it?

That's correct.

In linux anyway it appears to be passing them correctly. I have smartctl available for all the 3 disks I have attached, and they're showing up as /dev/sda-b-c. The day I switched cards all my disks were automatically all online too, I didn't have to change anything.

As noted in the linked document, this isn't really the measure of correctness.

I highly doubt the Adaptec card is why my drives are seemingly dead

I agree, but it seems like a good time to address the issue anyways.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
No activity, all dead. I took them all out adn tested on the bench. As far as I know they were all on the same rail. I don't see any burnt traces on the exposed side of any of the drives either. No broken pins, bent connectors, anything really.

Something like blown TVS diodes may not be obviously visible. That's a lab data recovery territory from now on.
 

exodus454

Dabbler
Joined
Nov 24, 2019
Messages
14
That's correct.



As noted in the linked document, this isn't really the measure of correctness.



I agree, but it seems like a good time to address the issue anyways.
10-4, I'll move back to LSI. It really hasn't let me down at all.
Something like blown TVS diodes may not be obviously visible. That's a lab data recovery territory from now on.
That's what i was thinking. I wonder how super-SMT they are, maybe I'll pop a board off and take a look. Obviously paying a lab to fix my entire pool of drives is definitely not in the cards haha. I'm completely capable of replacing simple components.

Thanks!
 
Top