Understanding LOG and cases of possible dataloss (without a mirrored LOG).

norbs · Apr 3, 2021

First off I use Truenas as a way to present an iSCSI datastore to ESXi.
Because of this I was told it's best to set my pool to always sync. Obviously this can cause some performance issues.

So in the past I've used a LOG device (1x intel 3700? ssd) and a UPS, it was specifically made to only show up as 10GB out of 100GB.

This seemed to work fine but I never really had a disaster even to know if I'd lose data.

Recently I migrated all my data to a larger pool and now I wanted to add the LOG device to that pool. I was greeted with a warning that I was only adding on device and it could result in dataloss.

So my question here is... what even would need to happen for me to lose data in this config? Would I need to lose my LOG device and have the powercut out on the Truenas box at the same time? Or would just having my LOG device die cause data loss on it's own?

Samuel Tai · Apr 3, 2021

Typically, that error message only appears if you added an ordinary vdev to your pool, instead of a log vdev, as this creates a stripe.

norbs · Apr 3, 2021

Adding it under log and still would need to do the force option.

HoneyBadger · Apr 3, 2021

norbs said:
Would I need to lose my LOG device and have the powercut out on the Truenas box at the same time?

Power loss, kernel panic, HBA suddenly dying, failure of an external SAS cable to a JBOD shelf; basically anything that causes your pool to suddenly become unavailable. This has to happen in concert with the failed LOG vdev.

Of course, a failed LOG vdev alone will also cause the same effect as enabling sync=always without one, that being a significant loss in performance for sync write workloads. For ESXi hosts hitting it over iSCSI this could result in VMs becoming extremely slow and/or the OS and applications running on them to be perceived as "not responding." Whether that matters to you is up to your workload.

norbs · Apr 3, 2021

HoneyBadger said:
Power loss, kernel panic, HBA suddenly dying, failure of an external SAS cable to a JBOD shelf; basically anything that causes your pool to suddenly become unavailable. This has to happen in concert with the failed LOG vdev.

Of course, a failed LOG vdev alone will also cause the same effect as enabling sync=always without one, that being a significant loss in performance for sync write workloads. For ESXi hosts hitting it over iSCSI this could result in VMs becoming extremely slow and/or the OS and applications running on them to be perceived as "not responding." Whether that matters to you is up to your workload.

Thanks that clears up a lot.

So I'm assuming that Truenas has all writes that aren't written to the "Data VDevs" in both RAM and LOG at the same time until it gets flushed to the "Data VDevs"? I was mainly worried that it's all only in LOG and not in RAM once it's written to LOG.

Yeah the performance aspect would not matter in my use case but I appreciate the heads up.

Any recommendations on a more modern LOG device? I do have room for a NVME device...

norbs · Apr 3, 2021

HoneyBadger said:
Power loss, kernel panic, HBA suddenly dying, failure of an external SAS cable to a JBOD shelf; basically anything that causes your pool to suddenly become unavailable. This has to happen in concert with the failed LOG vdev.

Of course, a failed LOG vdev alone will also cause the same effect as enabling sync=always without one, that being a significant loss in performance for sync write workloads. For ESXi hosts hitting it over iSCSI this could result in VMs becoming extremely slow and/or the OS and applications running on them to be perceived as "not responding." Whether that matters to you is up to your workload.

And I just realized, you helped me get my LOG device set up about 5 years ago as well.
Thanks for all the help, and nice to see you're still around.

Etorix · Apr 4, 2021

norbs said:
Any recommendations on a more modern LOG device? I do have room for a NVME device...

An Optane drive, if you can find one for a decent price. Optane DC M.2 drives are 22110, so may not fit if you have only a 2280 slot. But for a home/lab NAS, a consumer M10 Optane (preferably 64 GB for throughput and endurance) would be fine.

HoneyBadger · Apr 4, 2021

norbs said:
So I'm assuming that Truenas has all writes that aren't written to the "Data VDevs" in both RAM and LOG at the same time until it gets flushed to the "Data VDevs"? I was mainly worried that it's all only in LOG and not in RAM once it's written to LOG.

Correct. "Pending writes" that are queued up in a transaction group exist both in RAM and in the ZIL (on SLOG or in-pool) and the "flush" to data vdevs comes from RAM. The ZIL/SLOG is never read from except after a crash.

norbs said:
Any recommendations on a more modern LOG device? I do have room for a NVME device...

SATA devices are generally too slow; enterprise SAS or NVMe are the de facto standard now. NVMe can't hotswap properly (yet) but for most people that isn't a concern. Optane devices are excellent, but pricey, so are the older DC P3700s. Consumer M10 ones are viable but lower endurance compared to enterprise. The Radian RMS-200 (battery-backed RAM) used to be a good choice but has doubled in price recently due to increased interest.

It might be worth checking on the total amount of data written to your current SLOG device to see an idea of how active your pool is and how much of an SLOG is needed.

norbs said:
And I just realized, you helped me get my LOG device set up about 5 years ago as well.
Thanks for all the help, and nice to see you're still around.

Glad to still be helpful after all this time!

norbs · Apr 4, 2021

Etorix said:
An Optane drive, if you can find one for a decent price. Optane DC M.2 drives are 22110, so may not fit if you have only a 2280 slot. But for a home/lab NAS, a consumer M10 Optane (preferably 64 GB for throughput and endurance) would be fine.

HoneyBadger said:
Correct. "Pending writes" that are queued up in a transaction group exist both in RAM and in the ZIL (on SLOG or in-pool) and the "flush" to data vdevs comes from RAM. The ZIL/SLOG is never read from except after a crash.

SATA devices are generally too slow; enterprise SAS or NVMe are the de facto standard now. NVMe can't hotswap properly (yet) but for most people that isn't a concern. Optane devices are excellent, but pricey, so are the older DC P3700s. Consumer M10 ones are viable but lower endurance compared to enterprise. The Radian RMS-200 (battery-backed RAM) used to be a good choice but has doubled in price recently due to increased interest.

It might be worth checking on the total amount of data written to your current SLOG device to see an idea of how active your pool is and how much of an SLOG is needed.

Glad to still be helpful after all this time!

Thinking about pulling the trigger on an Optane DC P4801X.

Keeping in mind that this is for a home system and my main goal is to not lose data (but obviously performance matters).

I have 5x 16TB Sata drives in RAIDZ1 and 32GB ECC RAM.

Would this be super overkill? Just want to make sure this isn't completely insane...

Patrick M. Hausen · Apr 4, 2021

First, with 16TB drives I would strongly recommend against RAIDZ1. If you need to rebuild after a drive failure, chances of another failure during resilver are quite high with drives that large. Although I cannot provide proper figures, of course, this is more than just a gut feeling. The "death of RAID5" is well known and documented for quite some time with current hard disk sizes.

Etorix · Apr 5, 2021

RAIDZ2 for data safety indeed. But performance with iSCSI would demand mirrors rather than any form of RAIDZ.

As for figures, the math is simple. For n bits to resilver, the probability of success is p = exp(-n*u), where u is the URE rate; conversely, the probability to have an Unrecoverable Read Error (and fail if there is no more redundancy in the pool) is q = 1 - exp(-n*u).
Let's take u = 1E-15 ("less than 1 in 1E15 bits" is spec sheet). With 50% full drives, there would be 4*8*8E12 bits to resilver in a RAIDZ vdev where a drive has failed: p = 77%, q = 23%. Ahem… In a typical game of Russian roulette, the player survives 5 to 6, but this is already almost 3 to 4.
"The death of RAIDZ" basically means that, with multi-terabyte arrays and the current URE rates, we lose one level of redundancy to read errors: RAIDZ2 can safely survive the loss of one drive; RAIDZ1 cannot safely lose any drive.
But this applies to mirrors, only in a less severe form. A 2-way mirror of 16 TB drives, also 50% full and rated for 1 URE in 1E15 bits, has a 6.2% probability of not resilvering after losing a drive. A decade after the "death of RAIDZ" we are approaching the "death of two-way mirrors".

At this point, the choice between poor iSCSI performance on secure(*) RAIDZ2, good performance on not-so-secure 2-way mirrors and good performance of secure but expensive 3-way mirrors becomes difficult. But the risk of losing data to a failing non-redundant SLOG, or to an URE in the non-redundant SLOG while recovering from an unexpected shutdown, is negligible compared to the risk of losing a data vdev.
(*) "secure" against a single drive loss

Important Announcement for the TrueNAS Community.

Understanding LOG and cases of possible dataloss (without a mirrored LOG).

norbs

Explorer

Samuel Tai

Never underestimate your own stupidity

norbs

Explorer

HoneyBadger

actually does care

norbs

Explorer

norbs

Explorer

Etorix

Wizard

HoneyBadger

actually does care

norbs

Explorer

Patrick M. Hausen

Hall of Famer

Etorix

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

Understanding LOG and cases of possible dataloss (without a mirrored LOG).

Explorer

Never underestimate your own stupidity

Explorer

actually does care

Explorer

Explorer

Wizard

actually does care

Explorer

Hall of Famer

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Understanding LOG and cases of possible dataloss (without a mirrored LOG)."

Similar threads