Freenas crash after power outage, lost vol1 and can't restore

jwtruett

Dabbler
Joined
Apr 2, 2014
Messages
20
If you have a similar system, transplant the drives along with the working controller. If the drives all appear at boot, then it's very likely your original backplane is also toast.
Unfortunately replacing the motherboard does not make a difference
 
Joined
Jun 15, 2022
Messages
674
After a power outage I lost my vol1 which is 6 WD 2GB Red drives off of a IBM1015 controller. The system boots and recognizes the SSD off the MB SATA controller. Everything else seems to function correctly.

I've replaced:
power supply
SATA cables
IBM M1015 and reflashed

I can't tell but the hard drives don't seem to power up and I cannot tell if the drive controller is functioning correctly even though the flash procedure went correctly.

Any help would be appreciated as I've exhausted all my ideas.
Wait a minute. When you say "power outage" what are you meaning? A power outage is when the electrical power goes out unexpectedly. That would not necessitate hardware replacement.

Why did you replace the power supply, SATA cables, and IBM ServeRAID M1015 SAS/SATA Controller? (what symptoms did you observe and what was your line of reasoning that caused you to take that course of action)

Was the ServeRAID flashed to IT mode originally? My experience with HP RAID (not IBM) was an unexpected power loss can cause them to lose configuration and sometimes not re-import the array. This is a stretch, but you said you replaced the card and still have issues so it's prudent to ask.
 

jwtruett

Dabbler
Joined
Apr 2, 2014
Messages
20
Wait a minute. When you say "power outage" what are you meaning? A power outage is when the electrical power goes out unexpectedly. That would not necessitate hardware replacement.

Why did you replace the power supply, SATA cables, and IBM ServeRAID M1015 SAS/SATA Controller? (what symptoms did you observe and what was your line of reasoning that caused you to take that course of action)

Was the ServeRAID flashed to IT mode originally? My experience with HP RAID (not IBM) was an unexpected power loss can cause them to lose configuration and sometimes not re-import the array. This is a stretch, but you said you replaced the card and still have issues so it's prudent to ask.
The power went out unexpectedly. When the system was powered back on, everything came back except VOL1.

By process of elimination of I have tried to replace the items caused to fail by the sudden loss of power. This system was on a UPS and unfortunately the UPS no longer functions after the outage.

The card was flashed originally to IT mode.

The M1015 does not seem to be found by the system, the drivers do not seem to load, the HBAs are not found and thus VOL1 does not mount.

I built this system many years ago following the instructions on this site. It has functioned flawlessly during that time and I guess I've grown lazy in monitoring it and just expect it to keep working, which unfortunately is not the case.

If I built a new system and attached the M1015 and HBAs to it, would the new system recognize the drives and mount VOL1?
 
Joined
Jun 15, 2022
Messages
674
Was that perhaps a power surge that destroyed other electronics? Or a brown-out that dead UPS batteries could not take over from and the circuitry collapsed?

UPS power inverter boards take a lot of stress and can fail suddenly. Small UPS batteries aren't usually maintained, swell, rupture, all sorts of bad things. I understand this, my system is a "salvage project" including the UPS. The goal is to understand what happened to your system and not only get it running, but keep it running.

The first place to start with diagnosing a problem is to figure out what happened, what broke, and why. Sometimes there is a triggering event that sets multiple unrelated situations into play, so that's often the best starting point as it helps the investigator understand what they should be looking for and what set of potential outcomes are probable.
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
All quality PSUs have good overvoltage protection, what were you using previously?
 

jwtruett

Dabbler
Joined
Apr 2, 2014
Messages
20
All quality PSUs have good overvoltage protection, what were you using previously?
I'm unsure why the brand of power supply would make a difference in trying to figure out why the drives will not mount, but the old PS was an Antec 850W. The replacement is a EVGA 1300W.
 

jwtruett

Dabbler
Joined
Apr 2, 2014
Messages
20
Was that perhaps a power surge that destroyed other electronics? Or a brown-out that dead UPS batteries could not take over from and the circuitry collapsed?

UPS power inverter boards take a lot of stress and can fail suddenly. Small UPS batteries aren't usually maintained, swell, rupture, all sorts of bad things. I understand this, my system is a "salvage project" including the UPS. The goal is to understand what happened to your system and not only get it running, but keep it running.

The first place to start with diagnosing a problem is to figure out what happened, what broke, and why. Sometimes there is a triggering event that sets multiple unrelated situations into play, so that's often the best starting point as it helps the investigator understand what they should be looking for and what set of potential outcomes are probable.
The APC UPS is dead. What caused it to fail, I have no idea other than the fact I will not use another APC UPS.

The system went down due to no power. When I powered the system back up, all looked well. Upon closer inspection I noticed VOL1 did not mount. After checking all connections I powered the system on with same result, VOL1 would not mount. I then took the following steps to try and figure out the reason for failure, starting with the fact that a catastrophic power failure had caused some type of damage to a NAS system that has been running for 5+ years:
Replace power supply
Replace power cables
Reflash LSI 9240-8i
Flash/Replace LSI 9240-8i
Replace Super Micro X9SCM-F motherboard

Unfortunately none of the above steps corrected the problem. At this point I make several assumptions:
A software error has occurred causing the driver for LSI controller not to load
All 6 of the WD Red 2TB drives are dead

I am far from a software person and trying to back track a driver from loading is beyond me. To assume all 6 drives are dead seems, to me, not possible.

My next step is to build a new (from old tested parts) NAS, use the new LSI controller and attach the drives in question.
 
Joined
Jun 15, 2022
Messages
674
When I had a server fire I:
  1. shut everything down,
  2. found the problem
  3. and then the cause of the problem,
  4. ordered a replacement mainboard and wiring harnesses,
  5. tore the whole thing down to individual parts,
  6. cleaned everything,
  7. and added and tested one-part-at-a-time.
Each part was suspect until proven good, and that's how I now have a working server again. I'd suggest doing the same, burning in every part and making sure it's good; you might save a bunch of time & effort.

I run an APC UPS, in my opinion they're high-quality compared to the average consumer-grade UPS. Eaton would be next up on my list but that's beyond my budget.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I'm unsure why the brand of power supply would make a difference in trying to figure out why the drives will not mount, but the old PS was an Antec 850W. The replacement is a EVGA 1300W.
Because you said they don't spin and aren't being shown, meaning they could be fried. A poorly built UPS might have known issues with its OVP.
 

jwtruett

Dabbler
Joined
Apr 2, 2014
Messages
20
When I had a server fire I:
  1. shut everything down,
  2. found the problem
  3. and then the cause of the problem,
  4. ordered a replacement mainboard and wiring harnesses,
  5. tore the whole thing down to individual parts,
  6. cleaned everything,
  7. and added and tested one-part-at-a-time.
Each part was suspect until proven good, and that's how I now have a working server again. I'd suggest doing the same, burning in every part and making sure it's good; you might save a bunch of time & effort.

I run an APC UPS, in my opinion they're high-quality compared to the average consumer-grade UPS. Eaton would be next up on my list but that's beyond my budget.
That's what I am trying to do.

The APC unit was supposedly top of the line (ordered incorrectly and non-returnable, so I got a deal). I would have been fine with a destroyed UPS if it apparently hadn't passed the surge through to my server. I have had several other APC UPSs go bad (not anywhere near the 2200s price point) so I'm not keen on using anymore APC products.

The only other possible issue is a corrupt LSI driver, but unfortunately I am incapable of trying to repair that.
 
Joined
Jun 15, 2022
Messages
674
Something is definitely out of bounds, a power outage shouldn't have caused all this destruction. It's unfortunate you're having to go through this.
 
Top