FreeNAS High-Availability (HA)

Fred974

Contributor
Joined
Jul 2, 2016
Messages
190
Hi all,

I am aware that High-Availability (HA) is only available in TrueNAS but I was wondering:
1: Can TrueNAS be bough/downloaded without buying the hardware?
2: Using FreeNAS, is it possible/realistic to create a FreeNAS HA using CARP?
How would you sync the data between the 2 FreeNAS server?

I would like to set up a cloud hosting experiment with XCP-ng using FreeNAS as VM storage via iSCSI
 

ianmbetts

Cadet
Joined
Sep 10, 2022
Messages
3
I can see this is an old thread, but I though I might explain how I solved this problem..

I am running an FreeNAS as a VM on a proxmox HA cluster. All the FreeNAS VM disks are virtiual and backed on Ceph storage.
All the complexity of HA is handled by proxmox. I have been running this for about 5 years now in an industrial installation where we have had a couple of power outs every year and so far its been totally bullet proof.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,906
So this is basically an active-passive cluster with Proxmox acting as the cluster manager (what on bare metal would be handled by IBM HACMP/HP MC ServiceGuard/Veritas Cluster in the past)?

As to Ceph I have only read a few things about it, but seem to remember that it is 1) far from trivial to set up and operate, and 2) has a relatively high latency. Is that understanding correct?

Architecturally, and this is not primarily about ZFS, I think it is debatable whether an entire OS and file system should be run on top of another distributed file system, purely to provide the means for different ways to access it via network (SMB, NFS, iSCSI). But this may be a wrong impression due to my lack of knowledge about Ceph.

In addition there is the question, similar to RAID controllers, how much of a risk the multiple levels of abstraction between ZFS and the HDDs pose.

I freely admit that I have no practical experience with such a setup. But it comes with a number of questions and I would caution to adopt it without a lot of knowledge. Plus there is certainly the use-case to consider as well.

Nevertheless, thanks @ianmbetts for bringing this to the discussion!
 

ianmbetts

Cadet
Joined
Sep 10, 2022
Messages
3
The proxmox setup is a so called hyperconverged configuration, whereby there are three nodes. The nodes provide both compute (VM) and storage capacity. Ceph privides for redundant distributed block devices. The nodes are all active nodes but there is sufficient capacity that should any node fail the VMs running on the failed node will simply restart on one of the surving two nodes.
I understand your doubts about latency and I have not got any benchmark data to share.
What I should explain is that there is significant parallelization in the backing storage, each node contributes 10 spinning disks, and there is a separate 10G network dedicated to storage interconnecting the three nodes. This parallelization hides some of the overheads and VMs can enjoy quite respectable storage throughput (~100MB/s).
This would all be overkill simply to build an HA FreeNAS, but the cluster was already used to run several other business applications as VMs. Originally we were using an Owncloud VM for storage but we moved an Windows share on FreeNAS and decided to build it as a VM on the cluster to benefit from the HA.
So this is basically an active-passive cluster with Proxmox acting as the cluster manager (what on bare metal would be handled by IBM HACMP/HP MC ServiceGuard/Veritas Cluster in the past)?

As to Ceph I have only read a few things about it, but seem to remember that it is 1) far from trivial to set up and operate, and 2) has a relatively high latency. Is that understanding correct?

Architecturally, and this is not primarily about ZFS, I think it is debatable whether an entire OS and file system should be run on top of another distributed file system, purely to provide the means for different ways to access it via network (SMB, NFS, iSCSI). But this may be a wrong impression due to my lack of knowledge about Ceph.

In addition there is the question, similar to RAID controllers, how much of a risk the multiple levels of abstraction between ZFS and the HDDs pose.

I freely admit that I have no practical experience with such a setup. But it comes with a number of questions and I would caution to adopt it without a lot of knowledge. Plus there is certainly the use-case to consider as well.

Nevertheless, thanks @ianmbetts for bringing this to the discussion!
 

ianmbetts

Cadet
Joined
Sep 10, 2022
Messages
3
The proxmox setup is a so called hyperconverged configuration, whereby there are three nodes. The nodes provide both compute (VM) and storage capacity. Ceph privides for redundant distributed block devices. The nodes are all active nodes but there is sufficient capacity that should any node fail the VMs running on the failed node will simply restart on one of the surving two nodes.
I understand your doubts about latency and I have not got any benchmark data to share.
What I should explain is that there is significant parallelization in the backing storage, each node contributes 10 spinning disks, and there is a separate 10G network dedicated to storage interconnecting the three nodes. This parallelization hides some of the overheads and VMs can enjoy quite respectable storage throughput (~100MB/s).
This would all be overkill simply to build an HA FreeNAS, but the cluster was already used to run several other business applications as VMs. Originally we were using an Owncloud VM for storage but we moved an Windows share on FreeNAS and decided to build it as a VM on the cluster to benefit from the HA.
Given the redundant distributed block devices (Ceph RBD) behind the virtual disks offered to the FreeNAS VM I did ponder about creating a ZFS pool with no native redundancy i.e. raid 0, but in the end I bottled out because it is highly frowned upon and because whilst I have full confidence in RBD I don't know enough about ZFS or FreeNAS to be confident that I was missing some nuance.
I should also add that the VM disk images are backed up nightly so it is simple to recover even in a disaster scenario if for example RBD failed.
 
Top