wokka
Dabbler
- Joined
- Aug 13, 2013
- Messages
- 16
I had a freenas 9.3 system running with 8 drives for a couple of years with no problems. A few months ago, needed more space and the supermicro chassis I had was maxed out at 8 drives, so bought a supermicro 24 slot setup. I backed up my freenas install, installed a fresh 11.1 and put into the new system, restored the config, and moved the drives over. Everything came up nicely. I added two more drives to the array and all was good. Total of 10 drives, 4tb each running RAID 10. I'm booting from a USB thumb drive.
A day or two later, started having these errors pop up and drive failures, especially under heavy load (copying lots of files). My first instinct was heat issues on the new chassis, or hardware issues (new controller, new backplane, new mobo). Heat was easiest, so pointed a high velocity fan at the drives, but that hasn't changed it. The failed drive is always different and taking it offline and back online will resolve (or a reboot). This seems very similar to https://forums.freenas.org/index.php?posts/476695/
Here is an excerpt of the dmesg : https://gist.github.com/wokka1/2a49d7093613115e8cd79d535de71ba7
I came here to ask for help on troubleshooting the hardware, but after seeing the same problem from someone else, could it be something else?
I've had drives fail, we all have, but this isn't symptomatic of a drive failure, I'd expect it to always be the same drive failing, not 5 or 6 out of the 10 over the course of a week.
Also, I was gone for 10 days on vacation, had no drives fail, so idle they are fine, it just seems to be under higher load.
The server has dual 1100w psu, and I only have 11 drives in it (11th is for a timemachine setup). Nothing in the errors point to a PSU and the IPMI isn't reporting any power problems. I could understand a single PSU failure, but not two at the same time.
Thanks for your help.
EDIT
TLDR;
Bad controller causing the issues, replaced it, no more errors.
A day or two later, started having these errors pop up and drive failures, especially under heavy load (copying lots of files). My first instinct was heat issues on the new chassis, or hardware issues (new controller, new backplane, new mobo). Heat was easiest, so pointed a high velocity fan at the drives, but that hasn't changed it. The failed drive is always different and taking it offline and back online will resolve (or a reboot). This seems very similar to https://forums.freenas.org/index.php?posts/476695/
Here is an excerpt of the dmesg : https://gist.github.com/wokka1/2a49d7093613115e8cd79d535de71ba7
I came here to ask for help on troubleshooting the hardware, but after seeing the same problem from someone else, could it be something else?
I've had drives fail, we all have, but this isn't symptomatic of a drive failure, I'd expect it to always be the same drive failing, not 5 or 6 out of the 10 over the course of a week.
Also, I was gone for 10 days on vacation, had no drives fail, so idle they are fine, it just seems to be under higher load.
The server has dual 1100w psu, and I only have 11 drives in it (11th is for a timemachine setup). Nothing in the errors point to a PSU and the IPMI isn't reporting any power problems. I could understand a single PSU failure, but not two at the same time.
Thanks for your help.
EDIT
TLDR;
Bad controller causing the issues, replaced it, no more errors.
Last edited: