TrueNAS High-Availability (HA) Explained

}

April 28, 2015

o

TrueNAS

I am often asked if the two storage controllers in a TrueNAS high availability configuration are active/active or active/passive. They’re neither. They’re “active/standby.” Allow me to explain the difference.

In the case of an active/passive and active/standby disk array, when a LUN is created it is presented to a host server via a primary storage controller and all data I/O for that individual LUN is transmitted and received via that controller. The secondary storage controller awaits the authority to take ownership of the LUN if a controller or storage network path failover is invoked. Once failover is invoked, the secondary storage controller assumes control of the LUN and facilitates all reads and writes until control is passed back to the original storage controller.

Active/Passive
In an active/passive array, the 2nd storage controller may not be connected to a disk drive, but is connected to an intermediate layer, sometimes called a storage matrix, that connects components to every other component they need to talk to. So each storage controller is connected to the matrix and it is the matrix that is connected to every disk. The simplified picture below illustrates this:

Should the 1st storage controller fail, the 2nd storage controller has to register with the storage matrix before it can perform any I/O. Additionally, the 2nd storage controller might not be powered on and waiting, so it has to boot up from a cold state. Finally, any cache in the 1st controller can’t be mirrored to the 2nd, cold controller, so any cache starts out unpopulated, writes sitting in the cache in the 1st storage controller could be lost, and the cache in the 2nd controller has to be re-built from future I/Os. The end result is that fail-over to the 2nd controller can take multiple minutes, causing a performance impact, which could result in some clients timing out.

Active/Standby
In an active/standby array, every disk drive is dual-ported, allowing the 2nd storage controller to be connected directly to each disk at all times. The 2nd controller waits for the authority to handle I/O operations. Finally, any cache in the 1st storage controller can be synchronized to the 2nd controller, ensuring it does not have to be re-populated after a failover event. The end result is that a failover operation can happen in seconds rather than minutes, significantly reducing the chance of a client timeout.

Active/Active
Active/active arrays service I/O rather differently. They use two or more storage controllers to service read/write requests to the same LUN. The use of multiple active controllers gives a number of benefits, the primary being the ability to load balance I/O operations with host-based software.
A failure of a storage controller in an active/active array requires that the remaining storage controller handle all the I/O. This will reduce the available bandwidth of the storage array, reduce throughput, and increase latency. In a worst-case scenario a total outage may occur since the remaining controller may not be able to handle all the I/O, causing some traffic to be permanently delayed or lost. Additionally, services such as compression, deduplication and replication may be delayed or disabled. Typically, this outage risk is mitigated by balancing the load between both controllers without exceeding the load that a single controller can handle. So, when people think they’re getting the performance of two controllers, they in fact aren’t, typically.

Conclusion
We looked at active/active, active/passive, and active/standby options when we developed TrueNAS and concluded that an active/standby controller design would do the best job at safeguarding access to critical data. It provides the best balance of operational simplicity, performance, and failover times to help avoid the loss of revenue that an outage can cause. Our design ensures that your TrueNAS appliances will work exactly the way you need them to. To learn more about TrueNAS call 1-855-GREP-4IX or visit www.iXsystems.com/TrueNAS.

Gary Archer
Director of Product Marketing

TrueNAS High-Availability (HA) Explained

April 28, 2015

TrueNAS

Share On Social: