Can the OS boot on a RAID controller?

phox

Cadet
Joined
Nov 19, 2021
Messages
1
I understand all the RAID controller posts vs just HBA passthrough and all but I am asking: Can or should the host of TrueNas be on a striped RAID controller that you setup after BIOS post?

For example:

2 x 512 SSD in RAID 1 for the OS using a PCI RAID controller
4 X 4TB drives plugged into SATA ports and use TrueNas for software raid

Obviously protecting data is key in a NAS but how do you protect the OS from drive failure?

Or just boot TrueNas on a SSD, setup your pools, backup your config and then if your SSD ever dies, just boot a new instance of FreeNas and restore your config file???

Thanks
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It is acceptable, and I actually do in fact, run a bunch of bare metal FreeNAS hosts with Dell PERC H310's in IR mode.

You will notice a bunch of places that I correct people who are insistent about IT mode; an IR mode HBA with the proper LSI IR firmware (20.00.07.00) is equivalent to IT mode, except somewhat slower, and it adds RAID1 capability.

When FreeNAS moved to ZFS for boot, it adopted the FreeBSD zfsroot and boot sector stuff, which is intended to let you run ZFS mirroring of the boot device. Unfortunately, many/most systems will not boot from the "secondary" device if the primary device appears to be showing signs of life. This is especially true of SATA mainboard ports, where sometimes the absence of a device previously configured in the BIOS may stop booting entirely.

Prior to ZFS boot, some of us had already been using IR mode with a pair of SSD's for boot, and this basically works swimmingly well. However, it is worth noting that it does not protect against undetected data corruption. If the LSI RAID code cannot detect that a block is "bad" (i.e. the SSD returns garbage but doesn't report the block as bad), the block is passed up to the OS, and because ZFS has no way to access the "redundancy" behind the RAID controller, you get a zero block instead.

Therefore, there is no "perfect" option available. You either get redundant boot that MIGHT crap out if the primary drive is flaky, but with full ZFS protection, or you get redundant boot that's highly resistant against flaky drives, but is not as good against protecting ZFS.

I suppose you could probably do some sort of hack --

Three identical SSD's on an IR controller

Make two of them an IR RAID1 and configure to boot from that, appearing as da0
Add one as a plain non-RAID disk, appearing as da1
Use ZFS to mirror da0 and da1

I think you actually get the best of both worlds that way. Hm.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Earlier this year I tried to boot a Cisco x86/x64 server off of either the 25th or 26th drive. That server has 24 x 2.5" drives in the front, and 2 x 2.5" drives in the back. We used the 2 drives in the back for the OS. For some stupid reason, the LSI controller "forgot" which were the 2 OS drives, so it would not boot. Cisco and my Tier 3 were of no help.

In this case, it would not have mattered if the 2 drives were IR mirrored or not. They were too high up in the numbering for the BIOS to see them.

Eventually I found a somewhat hidden menu in the LSI BIOS firmware that allowed me to specify the 25th disk as boot, and the 26th disk as alternate boot. Worked great and naturally I had to document it to prevent a repeat of the wasted hours.

Other than that, I really liked those Cisco servers.


Well, those Cisco servers threw me for a loop when I first started supporting them. They had 4 physical cables: 2 x power & 2 x 10Gigabit Ethernet. So I thought I needed to use a crash cart for console & IPMI access. Later I learned that they used UCS through both 10Gigabit Ethernet ports. MUCH better, remote console and redundant network access to the IPMI & console.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yeah, I hear your pain. The boot configuration on all of the LSI products, both the IR/IT and the MPR/MRSAS stuff, tends to be ... obtuse. It's easy to forget to go in and explicitly set these things up, because you've worked hard just to get to that "it works" point, and then, well...

That's almost as exciting as the fun when you add some HDD's to a virtualized FreeNAS on a hypervisor, reboot it months later, and the BIOS decides that some of the added boot devices should be in the candidate HDD list --- before the RAID controller you had told it to use.

So you seem generally level-headed and knowledgeable. Got any opinion as to how good the Cisco stuff is, compared to Dell/Supermicro/HP?

There's a big difference between "we saw it come through the shop a few times" and "daily experience with the annoying bits". I know some people like the Cisco stuff quite a bit, but, I mean, I've seen their OTHER stuff. I have a Cisco 4700M as the stand for my office trashcan, and while I like the 7960 and 7971 IP phones, they are a design trainwreck.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@jgreco - The particular model of Cisco x86/x64 server we are buying, is working out fine. Some of the older ones are starting to loose disks. But, considering they are 4 or more years old and do data warehousing, that's understandable.

Cisco has sort of 2 different server product lines. The blade & chassis design, and the standalone servers. Mine our the standalone servers, 2U and I think model UCS C240, (Intel CPUs). However, both can use special Cisco UCS switches as both switch & console access. We have such a setup, so I log into the virtual IP of the UCS manager, not really caring which is primary, A or B.

UCS has additional features my companies' implementation does not use. For example, in the event of system board hardware failure, UCS can re-provision the replacement with the same Ethernet MAC addresses, (and I would guess any Fibre Channel WWNs). So the replacement is supposed to be painless. As I said, we don't use that aspect, (or other Cisco servers that my team does not manage use something like that).

I did have a network port reported failure, (network was redundant so no outage). Had to get the network team to work that issue. But it was easy to see in the UCS manager GUI.

Compared to HP Proliant standalone servers, I'd give a slight preference to this particular Cisco server. If the UCS manager is used. If not, then probably the HP Proliant series, if iLO is fully working. I do like the HP c7000 blade & chassis setup. But, I can't compare to Cisco's blade & chassis as I have no experience with them.

Have not worked with Dell or Supermicro at home or work, so I can't say.
 
Top