RAID Battery has failed and cannot support data retention. Please replace the battery

fivade

Cadet
Joined
Mar 9, 2021
Messages
7
I just set up a TrueNAS system on a Dell R410 with 4x4TB drives in RAID. this is a hardware raid and is setup through the dell raid configuration system

When I boot my TrueNAS system I get an email notification with the email body containing:
Battery has failed and cannot support data retention. Please replace the battery

The email is definately coming through TrueNAS as I have checked the logs in our smtp system and it is authenticating using the TrueNAS smtp user/pwd that I set up.

However, there is no notification within the truenas web portal to indicate a problem, and I can't see anywhere in truenas that this message is being generated.
Can someone shed some light on this ?

I am presuming it is telling me the raid cache battery has failed, and I need to replace it, which I will look into - but I am confused as to how it was generated by truenas, and why I don't see any hint of this within the portal.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

fivade

Cadet
Joined
Mar 9, 2021
Messages
7
ok thanks for that. I'm fine with using the H700 raid controller card, I have another identical hardware Dell R410 with the perc h700 and truenas with no issues, and the system is used only as backup storage.
My real question is: is there any configuration for these alerts within truenas. the alert emails are a bit vague and do not even state which device the email was generated from.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
There is not, because RAID controllers are not a supported configuration.
 

fivade

Cadet
Joined
Mar 9, 2021
Messages
7
In that case, TrueNAS probably shouldn't be alerting me to raid issues when it is not controlling the raid, and as you say - doesn't even support raid controllers.
Perhaps in a future release they will improve this alerting, or at least allow it to be disabled.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
TrueNAS has software such as ipmitool and similar plugins that will let it search the hardware system event logs, and will forward anything that they mark with a "critical" or "warning" level. If you want the alert to be disabled, the best option is to change it at the source.

With that said, the correct option here is "do not use the H700 or any other RAID controller" - at least yours is being polite enough to inform you of its failure and hopefully has switched to write-through caching.
 

fivade

Cadet
Joined
Mar 9, 2021
Messages
7
thanks for the advice, it is appreciated. however, i'm not looking to debate which hardware we should be using. we are repurposing older hardware (that we would otherwise be scrapping) into backup storage using truenas, so the hardware is-what-it-is, and has been working fine in our situation.

regarding the plugins, I've had a scroll through the community plugins listed in truenas admin portal, and I do not see IPMItool, and can't see it listed on the website either -> https://www.truenas.com/plugins/ - or am I missing another plugin repository?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I still suggest you replace the H700 as a proper Dell HBA will be likely around the USD$20 mark.

ipmitool is part of the TrueNAS base image and is usable from the command-line; the TrueNAS middleware likely uses it as well to poll for events and information. Your RAID card likely can be managed via mfiutil from the CLI as well, but I don't know if there's any functionality to configure or mute the alert there. A failed BBU on a RAID card isn't something that's meant to be ignored.
 

fivade

Cadet
Joined
Mar 9, 2021
Messages
7
I'll have a look at what i can get form those cli tools.
I agree the warning should not be ignored, but if I have multiple truenas servers, and I get an email simply telling me that a battery needs replaced, but doesn't identify which device - then it's a little difficult to action.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
My guess is that it's coming from here:
System | Alert Settings...
1615302274451.png
 

fivade

Cadet
Joined
Mar 9, 2021
Messages
7
didn't realise it would be under ipmi system event when i first looked at that page myself. is there a way to configure the email template used by those alerts, to perhaps include device name ?

mfiutil does seem to work, but interestingly it shows the battery status as ok
I can also see the status of the individual drives in the raid array with smartctl
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
What do you get from ipmitool sensor
 

fivade

Cadet
Joined
Mar 9, 2021
Messages
7
I was mis-reading some of the info from the above commands. the battery is definately faulty, I can see from the full charge capacity is only 89mAh, should be close to 1700mAh.
Anyway, i'm going to replace the battery.
I can see the controller has disabled the bbu, and changed the mode to write-through for now.

I've also written a script and set up a cron job to email me every morning with disk & controller health status - which should help me identify issues going forward.
Thanks for everyones help.
 
Top