Thoughts and Questions on GlusterFS

jasonsansone

Explorer
Joined
Jul 18, 2019
Messages
79
I see that Gluster is now listed as a service for the Nightly builds of TrueNAS SCALE. I have enabled glusterd as a service and Gluster appears fully configurable from the CLI. Is there anything stopping playing with it other than a lack of GUI integration?

Also, how will healing be handled with Gluster? Are ZVOLS treated as a single "file" or will only underlying changes be copied during a heal? My understanding is that Gluster runs a complete rsync anytime a brick is out of sync (usually from downtime). That is fine for files on ZFS. Only some of your files within a dataset may change and thus only changed files need to be copied to the node which was out of sync with your other(s). I am not overly concerned about Gluster for a typical HA file store which is providing NFS or SAMBA services. However, my experiences with Gluster make me concerned regarding its viability to provided clustered storage in an HCI appliance, which is what I understand SCALE is aimed to be.

I tested GlusterFS as a filestore for Proxmox in a lab environment. It was a COMPLETE disaster. The problem is that Gluster treats a raw disk or qcow2 as a single object. If the running VM changes even a single block of a 100GB qcow2 virtual disk, the entire 100GB qcow2 had to be copied to the out of sync brick. Now, scale that up to ten VM's... because I did. I used a three node replicated cluster on Proxmox running ten virtual machines all with 100GB qcow2 virtual disks. I intentionally took one node out of service "for maintenance" and then brought it back online. Gluster wanted to copy 10 x 100GB "files" in order to heal, thus crippling IOPS and freezing up the VM backend.

Contrast that with Ceph. Ceph, being a block based storage, only heals the actual underlying changes for each VM when OSD's go out of sync. I ran the exact same tests described above using Ceph, and the heal process was only a few GB which replicated over 10GBe in a minute's time. Ceph didn't need to copy the entire VM virtual disk's contents, only the actual changes.

Is there a plan or has there been testing yet regarding how Gluster will handle healing for linux containers or virtual machine disks? My experiences tell me it will attempt to copy the entire RAW disk or zvol, which is extremely inefficient.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
The implementation of Gluster isn't intended to be for the replication of virtual HD files, but rather for the configuration and databases of Docker applications and other data stored directly under ZFS.

You need to structure your gluster volumes to avoid ZVOLs and RAW disks.

At least that's how I was understanding the intention.
 

jasonsansone

Explorer
Joined
Jul 18, 2019
Messages
79
That all makes sense, but I don't understand how SCALE can provide HA to containers and VM's if those underlying file stores aren't clustered, ie a single point of failure? Or is the answer simply, "it won't"?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
HA to containers
The containers are running directly on the host, so no need for a RAW disk or ZVOL, just a passthrough of the path with the docker or LXC process.

HA for VMs will be a different challenge and indeed I don't think that challenge has been overcome yet.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Gluster on ZFS is used to scale-out and provide redundancy across nodes. The characteristics of ZFS are different from LVM. For iSCSI backing store, we will be using files/datasets and not ZVOLs. There will be more performance data coming after we have everything working as expected.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
cluster.data-self-heal-algorithmSpecifies the type of self-heal. If you set the option as "full", the entire file is copied from source to destinations. If the option is set to "diff" the file blocks that are not in sync are copied to destinations.
 
Top