Harddisks keep failing in my virtualized setup

mhm

Cadet
Joined
Jan 13, 2019
Messages
2
Hello,

My setup is the following:
  • GEN8 microserver with esxi booting from USB
  • Internal SSD for the esxi datastore
  • SAS9220-8i in IT mode passthru to the freenas machine
  • 3x Seagate Ironwolf connected to the sas and one ssd for cache.
In the past two months all 3 seagate drives failed and I replaced them (thank god amazon refund policy is not complaining)
Today, again a hdd failed. (at least this is what freenas reported) I took it out, tested it with seagate tools and it does not report as being defect.
In any case I replaced it with another brand new seagate that I had and, after the resilvering process, it failed again. (like 20 minutes after the resilvering process was done)

Is there something wrong with my setup? Are the Seagate drives not compatible with freenas?
I attached a screenshot from the console of the vm. It is flooded with this message.
Screenshot 2019-01-13 at 23.21.49.png

Any recommendation of what should I do?

Thanks!

PS: the previous time a hdd failed I also replace the hdd backpanel with another one from another gen8 thinking that the cables might be the issue but no luck with this.
 

Attachments

  • Screenshot 2019-01-13 at 23.30.32.png
    Screenshot 2019-01-13 at 23.30.32.png
    268.9 KB · Views: 420

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Hello,

My setup is the following:
  • GEN8 microserver with esxi booting from USB
  • Internal SSD for the esxi datastore
  • SAS9220-8i in IT mode passthru to the freenas machine
  • 3x Seagate Ironwolf connected to the sas and one ssd for cache.
In the past two months all 3 seagate drives failed and I replaced them (thank god amazon refund policy is not complaining)
Today, again a hdd failed. (at least this is what freenas reported) I took it out, tested it with seagate tools and it does not report as being defect.
In any case I replaced it with another brand new seagate that I had and, after the resilvering process, it failed again. (like 20 minutes after the resilvering process was done)

Is there something wrong with my setup? Are the Seagate drives not compatible with freenas?
I attached a screenshot from the console of the vm. It is flooded with this message.
View attachment 27774
Any recommendation of what should I do?

Thanks!

PS: the previous time a hdd failed I also replace the hdd backpanel with another one from another gen8 thinking that the cables might be the issue but no luck with this.
Virtualizing FreeNAS is ticklish, but it can certainly be done! I've got three All-In-One systems, all running FreeNAS 11.1 on ESXi v6.0.

What firmware version did you flash your LSI card with? Make sure this matches what FreeNAS expects. I'm pretty sure this would be the same version P20.00.07.00 that we all know and love so well.

One thing I suggest is to reserve CPU and memory resources for the FreeNAS VM. For CPU, I set 'Shares' to 'High' and set 'Reservation' to 500MHz. For Memory, I reserve ALL of the VM memory, so that it's locked (i.e., ESXi won't swap memory out from underneath FreeNAS).

Other than that... you might try the usual stuff: re-seat the HBA card, double-check the cables and cable connections, etc. And make sure your run regular SMART tests on your disks (I run daily short tests and weekly long tests).

Good luck!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Those errors you reported don't mean much on their own. You need to look at the SMART data for the drives in order to understand if it may be the cabling, power, controller or the drives themselves (with so many failed drives, seems it could be one of the other options).

It may be that the "failed" drive will still report all OK on SMART tests (run internally on the drive by itself, not involving the other components) and point the troubleshooting elsewhere.

There are already some resources and loads of discussion threads on how to get SMART reports, so I won't duplicate.
 

mhm

Cadet
Joined
Jan 13, 2019
Messages
2
SMART is passing on both short and long tests.
What I noticed is that, as @Spearfoot said, my firmware version might be the cause. Upon checking I see that I flashed it with version 15.00.00.00

I'm rushing to work now but tonight I'll flash the P20.00.07.00 and hopefully that will fix the issue.

Wish me luck!
 
Top