11.2U1 not responding after a week of use, repeatedly

wlentz102

Cadet
Joined
Jun 9, 2017
Messages
3
Appliance:
Supermicro 36-drive chassis
1 x Xeon(R) CPU E5-2609 v2
64GB RAM
12 x 6TB drives in a combined RAIDZ2 (2 RAIDZ2 stripes in a single pool; unused now, but planning to reuse; was original storage pool)
10 x 6TB drives in a RAIDZ2 (iSCSI zvol target lives here; recent addition)

Primary use: Veeam backup repository

This appliance has been running FreeNAS since the 9.x days. We've upgraded it periodically without too much issue. I recently went through a project to add space, upgrade to 11.2U2 and switch Veeam from using a Samba share from this appliance to using a 10GB iSCSI target. Performance is much better. I have iSCSI setup with a zvol extent, multipathed via two separate 10GB connections on separate subnets to a Windows Server 2016 server running Veeam. The multipathing is working great - I see traffic going out both connections. I also thought I lost the boot volume (USB flash drive) after the upgrade, so I reinstalled FreeNAS on a new USB flash drive, restored the config, and then setup a boot mirror with a second flash drive.

Since doing the upgrades, the appliance has consistently frozen over the weekends. Symptoms: WebUI inaccessible, but launches the gray page (once). LAN/MGMT NICS completely missing (don't show up on console) - at this point, SSH/WebUI are completely inaccessible (several times). Drive via iSCSI is inaccessible (every time). Today, (Tuesday) it froze again after the regular freezing yesterday. In all cases, I can access the console via IPMI, and it's responsive.

I'm assuming the freezing has something to do with the iSCSI configuration/use, but not sure. I'd be happy to try to get logs, but not sure which ones to grab.

I also thought I could try upgrading to 11.2U2, but 11.2 isn't an available train in the update options.

Oddly enough, I did try to move my System dataset from the first RAIDZ2 pool to the second, but it didn't seem to work, even though the GUI said it was configured.
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
I have a fairly low utilization FreeNAS mini that started freezing in a similar way. For me it was the pool itself freezing. A couple times I could get in via ssh but any attempt to access the pool froze indefinitely. Absolutely no logs anywhere to be found. Only a reboot would fix it until it would happen again a few days later. Very frustrating.

This mini serves out my home Samba and Apple shares. It also seeds several open source torrents (which is the bulk of the IO).

Someone mentioned heat issues which I've never had before with this unit but to rule that out I set the system fan to full on and I ran stressed the CPUs for several hours. Temps never got anywhere near concerning (it's also 70f / 21c in my apartment and these are in an even colder area, so I highly doubt this issue was heat related but who knows). I haven't had a problem since setting the fan full on though. I'm not saying this is related to your issue at all, just noting my observations.

If you get flack about using USB drives for boot, just know that I have used mirrored USB boot drives in several production systems for years without issue. I do have one production system that's completely unusable on 11.2 (I have a ticket open). The other two I have in production (high 24x7 load) are working just fine though on 11.2-U1.

Hopefully we can both get to the bottom of this because this is really concerning to me.
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
Spoke too soon. Started backing up a system over SMB and had it freeze again. I could ping the system. No errors over the console. Just total pool freeze. I'm going to check if there are any tickets open and open one if not.
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211

wlentz102

Cadet
Joined
Jun 9, 2017
Messages
3
This happened again after a week of use. I was seeing the iSCSI errors again before I rebooted. I was able to finally update to 11.2 U2.1 via the legacy web interface after the outage. I cleaned up some legacy items from before this project as well.

Also of note, when this happens I usually have to hard-reset the box through IPMI. When it comes up, it is usually missing most of its NIC configuration (and who knows what else). Rebooting a second time safely fixes the issue.
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
Same exact thing happens to me. Have to reboot twice.
 
Joined
Feb 26, 2018
Messages
3
Hmm this sounds a lot like the problem I have been having since upgrading to 11.2-x. I'm also using it as a Veeam repo, but via SMB not iSCSI. System is pingable but IOs start failing, web ui stuck. Console will let me enter 9 for shell but I never get a prompt. I also have to reset via IPMI, and I also have to do the 2nd reboot for the network to function. My hardware is quite different, an old Sunfire x4170 server with 2 J4400 chassis and 36 1TB drives. It's ancient but it was pretty reliable on 11.1.
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211

wlentz102

Cadet
Joined
Jun 9, 2017
Messages
3
Update: I updated to 11.2 Update 3, and haven't had any issues since (it's been a week). I did finish getting rid of the iSCSI configuration after the update, but I don't know that that is what fixed the issue. I'm cautiously optimistic. :)
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
Awesome. I just upgraded my problem system to 11.2 U3 so we'll see. I know one other person who upgraded to U3 and hasn't had a problem since.
 
Top