FreeNAS lockup’s - SMART failures

Status
Not open for further replies.

Shankage

Explorer
Joined
Jun 21, 2017
Messages
79
Hi all

I’ll give you a quick run down of my setup first:
  • Virtualised freenas instance
  • 8 sas drives used for the pool
  • 32gb memory
  • SATA dom for freenas os
  • Pool setup in mirror / raid 10 equivalent
  • Using NFS presented to ESXi hosts
About once a week one particular drive gets SMART errors with impending failure and the freenas instance will lock up, vms and data are inaccessible and we only regain access once freenas and other vms are rebooted.

Question is, why is a single drive failure causing this, also the drives are SAS drives.

Any tips would be appreciated!

Thanks
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
For one, If you have been getting consistent errors on ONE drive, why have you not replaced the drive? For two, does the drive support TLER?
 

Shankage

Explorer
Joined
Jun 21, 2017
Messages
79
For one, If you have been getting consistent errors on ONE drive, why have you not replaced the drive? For two, does the drive support TLER?

For one, it is at a remote location and I am waiting for a tech to get out there and two, I can’t see that it does anywhere... this is the drive ST4000NM0025
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Not a lot of people will support virtualized FreeNAS serving its host as a datastore. It understandable for a home lab but should NEVER be used anywhere the data is important. There are to many things that can go wrong causing crashes at the least.

If you have a drive taking minutes to retry reads or writes and if ZFS for some reason is not taking the drive offline, this could cause your NFS share to hang long enough to kill all of your VMs. That's what is likely happening though you should still have IPMI/IPKVM/iLO/iDrac access to asses the situation from the DCUI and check for hung NFS sessions.

If this is for a client, please just use DAS and RAID. This is one of the only times I would suggest doing so and only if the client is unwilling to get a separate server for the SAN.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
On a side note, you should just offline the bad disk until you get it replaced. Its not doing any good and it likely tanking performance.
 

Shankage

Explorer
Joined
Jun 21, 2017
Messages
79
Thanks for the response, however after having done this a couple of times and testing it first, this is the first time I’m actually having an issue with a virtualizsd freenas instance, it’s been great otherwise!

It’s not the entire host locking up so I can still get access via the host or Vcenter.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
If you have access to Vcenter then you should have access to the console, true? You can then check your drives via the command line and using smartctl to see how they are doing and how the pool is doing via zpool status.

Here is the part which bothers me:
one particular drive gets SMART errors with impending failure and the freenas instance will lock up, vms and data are inaccessible and we only regain access once freenas and other vms are rebooted.
If this is a data drive then I don't understand how this could stop FreeNAS from running and your other VMs. You need to become very specific with the data you provide if you are indeed asking for help. If you cannot provide the information then it's very possible we cannot provide you good help.

Things you should post here are:
1) Is your memory locked for your VM for FreeNAS, 32GB you said above.
2) Which drive is giving you the failure (please don't say it's the SATA DOM, that would be too obvious).
3) If you are running ESXi and FreeNAS as a VM, why use a SATA DOM? Why not create a 10GB vmdk drive and put FreeNAS on that? That is what I do.
4) Post the SMART error message.
5) Post the complete SMART data for the suspect drive (smartctl /dev/adx -x) for example.
6) Post the output of "zpool status".
7) Post what version of FreeNAS you are using.
8) Post your hardware configuration (the actual hardware being used).
9) What version of ESXi are you using and what version of VM are you using?
10) Examine the log files for both ESXi and FreeNAS VM to see if there are any error messages of interest.
11) Run MemTest86+ on your system and a CPU test, maybe your system is not stable.

All of this data will help provide you the best possible help.

Question is, why is a single drive failure causing this, also the drives are SAS drives.
If the single drive failure is the boot DOM then it could be the issue. If you configured ESXi incorrectly this could be the issue. The suspect hard drive could be shorting out the power (pulling too much current) causing the power supply to drop the voltages being output and crashing the system. It's a guessing game at this point in time.
 
Status
Not open for further replies.
Top