FreeNAS 11.3 no longer booting when non-OS storage attached

SexyJeep

Cadet
Joined
Oct 2, 2020
Messages
3
My Hardware Setup:
- I am using a licensed ESXi v7 host with a single VM on it.
- The single VM has FreeNAS v11.3 on it and nothing more.
- The FreeNAS VM is setup with 2 cores, 8GB memory, a single 50GB (thick provisioned) disk for the FreeNAS OS, and 4 additional (and equally sized) 1.8TB sized disks to be used in a raidz1.

My FreeNAS Setup:
I was able to successfully set this up over a week ago and have been using the system to store my backups on ever since. I have one share setup that uses all of the space of the four 1.8TB from my raidz1 (so roughly 5.2TB of total storage on my share.

The Issue:
Yesterday, I got an error that my backups weren't working. When I dug into the error it was saying that the backup location was not accessible.

- I first attempted to browse to the shared path via File Explorer (within Windows 10) but it errored out because the location was also not accessible.
- I then tried to ping the IP of my FreeNAS VM and it replied back with a 2ms response (so looked fine from this angle).
- I then opened my web browser and tried to access the FreeNAS web interface by the IP (as I always have in the past week to manage it), but instead of taking me to the Username and Password fields to login, it just showed a generic message "Connecting to FreeNAS ... Make sure the FreeNAS system is powered on and connected to the network"
- Next, I opened the console to the FreeNAS VM in ESXi and reviewed the latest logging on the screen but did not see anything relevant. I proceeded to press the Enter key and the main menu/options appeared so I decided to just reboot it (option #10). Upon doing this, as it went through the process to reboot, it hung on multiple services trying to shut things down. After 15 minutes it had still not progressed, so I hard powered off the VM and then turned it back on. As soon as it was powered on there was just a blank/black screen with a flashing cursor in the top right corner - and that's now all it does. Doesn't matter how many times you reboot it nor how long you wait (I waited 4 hours), it goes straight to this flashing cursor.
- Basically, at this point it does not even attempt to boot. If you leave the console open to the VM and power it on, as soon as you click "power on" within 1/2 second you are at the blank/black screen with the flashing cursor. It does not attempt to do anything.
- After spending 6 more hours of trying anything/everything you can think of (way too much to list here), I decided to power off this VM and just create a brand new one and re-install FreeNAS 11.3 from scratch. Now I have a brand new VM, new disk (single 50GB for the OS), FreeNAS 11.3 installed and configured with the settings I want (just no additional attached storage yet), and everything is working fine. I can reboot it as well and everything works as expected. So next, I power this new VM off and attached the original four 1.8TB storage disks from the original VM to this new VM. As soon as I power the new VM back on, BAM! - I am immediately right back at the blank/black screen with the flashing cursor.

So the issue is this, for whatever reason, whenever the additional storage I want to use is attached to the FreeNAS VM - instant blank/black screen when powering on the VM. As soon as you remove all the additional attached storage (leaving only the 50GB disk with FreeNAs on it attached to the VM), then it boots into FreeNAS just fine and everything works, well, except for the fact that my storage still isn't there, which is the whole purpose in even having FreeNAS, ugh... I've spent hours more trying to find any article online that could possibly help me with this, but I've found nothing. I am hoping someone here might have an idea of something I can try.

**Update**
I did have an idea as I was writing this, and that was to remove the additional storage and then power on the VM. Once it gets to the FreeNAS boot menu, I pressed the spacebar to pause things from booting. While it was paused, I then went back and added the four additional storage disks I had been using in the raidz1, and pressed Enter to let FreeNAS continue to boot. Nearly an hour later, FreeNAS finally booted, but not to the main menu like it normally does - where you have the 1-11 options. It was just prompting me to login with the username. Odd. I then tried the web browser and it came up as it should and I was able to get logged in. All my original settings and things are there from when I first set it up a week ago, but the Storage Pool is gone. It just shows the pool name I had setup with "?UNKNOWN" next to it. Ugh.

So even now, it's still not booting properly, and even after trying to trick it by adding the storage after the fact it took an hour to boot and the storage pool is not there, so that was a no-go.

Any additional help/advice is greatly appreciated. Thanks in advance!

Jeep
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The FreeNAS VM is setup with 2 cores, 8GB memory, a single 50GB (thick provisioned) disk for the FreeNAS OS, and 4 additional (and equally sized) 1.8TB sized disks to be used in a raidz1.
How are the 1.8T disks being presented to the FreeNAS VM?

Hint: if the answer is anything other than "PCIe passthrough of an HBA" you've likely put yourself into a bad spot.
 

SexyJeep

Cadet
Joined
Oct 2, 2020
Messages
3
How are the 1.8T disks being presented to the FreeNAS VM?

Hint: if the answer is anything other than "PCIe passthrough of an HBA" you've likely put yourself into a bad spot.
The disks are indeed being presented via passthrough mode from an Intel RAID card. Much like the documentation on FreeNAS, if you're not using passthrough then it's either not going to work or you're likely to have problems. The only reason why I have a RAID card in this is because the motherboard doesn't have enough ports for all the drives, so the RAID card provides the additional ports needed.

You bring up a fair point/concern though, but it shouldn't be a variable in this configuration. Where I work at, we have been using this same Intel RAID card, version FreeNAS, and host hardware for our clients backup share location and have not had/seen this issue until this week. This has been something we have been doing since FreeNAS v11.2.

I appreciate your feedback though! If you can think of anything else let me know.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The disks are indeed being presented via passthrough mode from an Intel RAID card. Much like the documentation on FreeNAS, if you're not using passthrough then it's either not going to work or you're likely to have problems. The only reason why I have a RAID card in this is because the motherboard doesn't have enough ports for all the drives, so the RAID card provides the additional ports needed.

The following was written for people in your situation:

https://www.ixsystems.com/community...bas-and-why-cant-i-use-a-raid-controller.139/

In short: your RAID card situation is not expected to work well, if it works at all. There are some edge cases here. Some Intel SCU style controllers (which do have RAID capabilities) are known to work fine, as long as all the RAID stuff is bludgeoned into a disabled state. Whether or not they will work for PCI passthru is a different issue; I've seen that go both ways. But in general any RAID card you've added in is probably not particularly compatible. The only add-on cards you should use with FreeNAS are generic AHCI or LSI HBA's that have been crossflashed. This isn't a debate topic, it's just the way it is.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The disks are indeed being presented via passthrough mode from an Intel RAID card. Much like the documentation on FreeNAS, if you're not using passthrough then it's either not going to work or you're likely to have problems. The only reason why I have a RAID card in this is because the motherboard doesn't have enough ports for all the drives, so the RAID card provides the additional ports needed.

You bring up a fair point/concern though, but it shouldn't be a variable in this configuration. Where I work at, we have been using this same Intel RAID card, version FreeNAS, and host hardware for our clients backup share location and have not had/seen this issue until this week. This has been something we have been doing since FreeNAS v11.2.

I appreciate your feedback though! If you can think of anything else let me know.

If the "Intel RAID card" is one of the SCU SoCs as @jgreco mentions, and it has had its RAID functionality truly disabled (pure AHCI mode) then it may be able to function as expected. Based on what you're describing though it seems as though there may be some manner of BIOS/UEFI hook still remaining that is interrupting the boot process.

Also please note there is a difference between PCIe passthrough (where the entire PCIe device will be passed to the VM) and local RDM/disk passthrough - I suspect you're doing the latter as you've mentioned hot-add of the devices via editing the VM settings. Let me know if I'm off-base here, but that's not supported by default in vSphere, even in the latest 7.0U1 release.

If you've used true AHCI on your add-in card and not individual RAID0s, you could likely swap the "RAID" card for a proper HBA (although vSphere will now complain about the el-cheapo LSI SAS2008 as being unsupported) and do PCIe passthrough of the HBA to your VM.

Edit: If you're using the RS3WC080, that's actually a rebranded LSI SAS3008. Check to ensure that it's using IT mode firmware and not IR.

Next steps:

I would try letting the system fully boot without any additional storage devices attached, then once you've gotten to the FreeNAS UI (and a command prompt) hook up the storage and monitor the dmesg output. Once you've validated all four are present, try to import the volume again.
 

SexyJeep

Cadet
Joined
Oct 2, 2020
Messages
3
Thanks for the additional thoughts/replies to you both. I tried a few more things this afternoon that yielded some interesting results.

- First, I removed the four non-OS disks and attached a new 10GB disk to the FreeNAS VM. The 50GB OS disk for FreeNAS is on an SSD that is plugged directly into the motherboard, so the additional 10GB disk I added is also from that same SSD. What I wanted to know from doing this, was would FreeNAS boot with another attached disk that wasn't one of the four coming through the RAID Controller. The result was that FreeNAS booted without an issue. I could have made a 10GB storage pool from the new disk. This further proved to me that the installation of the VM was 100% legit, and that the issue (based on the feedback in this post) is more likely the RAID card. However, everything worked fine for over a week before this and I can't let that go. Storage technology is something I've dealt with professionally for the last 6 years, and although I am no expert in FreeNAS, I also know that there is very little variance in logical processing - it doesn't succeed or fail based on feelings.

- Considering all the above, I then decided to vet the health of the four drives. Although the health in VMWare shows nothing wrong with them, I wanted to see what the RAID BIOS/Console had to say about things. Luckily I have an RMM on this host, so I rebooted it and got into the BIOS. The mass Storage Controller is setup for AHCI as expected, but what was interesting was that rather than seeing four 1.8TB drives, the BIOS only showed three 1.8TB drives. On this particular board, I found that the only way to access the RAID is through Intel's RAID Web Console 2 or RWC3 (there was no RAID BIOS to my surprise). I exited the BIOS and let the host boot. Once booted, I reviewed the .vmdk's for the FreeNAS VM, but it showed only three 1.8TB drives and a fourth drive of only 8GB! There was also an error log attached for the fourth drive. Now we are getting somewhere! I then tried to login with the Intel RAID Web Console 2 and RWC3 but couldn't get them to work (this is no surprise to me though). I could eventually get them to work but not without going through a gauntlet of megaraid-sas, and LSI Provider drivers to find a compatible match. All I can gather from this is that FreeNAS is not wanting to boot because of this fourth drive being jacked up, and FreeNAS must have known about this despite everything looking normal to me prior to the host being rebooted (it knew about what I couldn't see yet).

- Next, after several most host reboots, the fourth drive began showing up in the BIOS again. So at this point it's still hard to tell if the drive is failing, or if the RAID card is having an issue interfacing with the motherboard that's causing that drive to lose connectivity. I then let the host fully boot - knowing that all four drives are now being seen in the BIOS. Once booted, I opened a console to the FreeNAS VM and it too was fully booted. I then logged into the web interface for FreeNAS and could see my original storage pool. All my data was there and all other settings were just as they had been. I was able to browse the share, kickoff a backup, and everything worked flawlessly again.

- Moving forward, I am going to leave things as they are and see if it breaks again. I am pretty sure it will, and if it does then I'll likely be ordering an HBA to move the drives to that and see how it does - as suggested. One thing I wondered but never got the chance to try was what would have happened if I would have removed that fourth drive (the one that was having issue) from the FreeNAS VM altogether...? Technically, with a raidz1, FreeNAS should have been able to lose that drive and still function normally. When/if this issue happens again I will try that and see if FreeNAS will boot or if the missing drive will still cause it not to boot. The test here is that maybe it wouldn't boot because the drive was only partially there, so had I removed it completely then maybe it would have booted. I bet it would have booted (I hope it would).

- The one thing that is disappointing to learn is that if FreeNAS sees an issue with your attached storage, rather than still being able to boot into FreeNAS and just not showing your storage or reporting a lost/damage storage pool, it just won't boot at all. This seems very bizarre to me. In my situation, with the OS disk for FreeNAS being on a 250GB SSD, plugged straight into the motherboard and being set to boot first, then why wouldn't FreeNAS boot regardless of whatever storage was present. I expect FreeNAS to still boot in that situation. I've never heard of an OS not booting due to Secondary storage that's on a completely different controller - that makes no sense to me. I would love to understand the logic it's using to create that outcome.

In closing, thanks again for both your replies to this thread and trying to help me through this. The guidance on HBA's is the big takeaway here, and even if my issue ends up being due to a bad drive and nothing more, it's worth noting to try and replace the RAID cards with HBA's to avoid future headache. You both make a great point here. At the same time, I also know that the phrase "not supported" is often used as an escape in our industry. Given the right people, team, and time, you can almost always find a way to succeed, and that's what's so great about forums like these!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
At the same time, I also know that the phrase "not supported" is often used as an escape in our industry.

Back in the good old days, when UNIX ran on real computers rather than glorified PC's, it was not uncommon for FE's to come and rectify problems with a faulty controller design, often through replacement.

This no longer happens. Companies like Realtek and Adaptec are mostly targeted at the Windows market, and don't share the documentation for their non-robust hardware designs, making it impossible to support them well. And a NAS, well, it lives or dies by its controllers functioning 100% correctly 100% of the time, otherwise you get all sorts of problems.

The options to "always find a way to succeed" are pretty limited, and generally we redirect people into using the hardware that is known to work swimmingly well, because that is always a way to succeed. The rest of it, for however much it does or doesn't work well, is therefore "not supported," because if your Realtek ethernet controller is limiting you to 700Mbits/sec and locking up every other Thursday, no one here can really do much about it.

You can of course choose to view that as an escape or an excuse, but anyone who wishes to do so is welcome to hang around and do a better job of support. ;-) It's just the sucky reality, lots of PC hardware sucks.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
...snip...
Once booted, I reviewed the .vmdk's for the FreeNAS VM, but it showed only three 1.8TB drives and a fourth drive of only 8GB! There was also an error log attached for the fourth drive.
...snip...
I believe this is the gist of your problem: if your disks are presented to the FreeNAS virtual machines as .vmdks, this means your RAID controller is not passed through correctly per the best practices for virtualizing FreeNAS.

But you're on the right track, if I understand correctly that you plan to ditch the RAID controller and replace it with an HBA.

Good luck!
 
Top