Understanding LOG and cases of possible dataloss (without a mirrored LOG).

norbs

Explorer
Joined
Mar 26, 2013
Messages
91
First off I use Truenas as a way to present an iSCSI datastore to ESXi.
Because of this I was told it's best to set my pool to always sync. Obviously this can cause some performance issues.


So in the past I've used a LOG device (1x intel 3700? ssd) and a UPS, it was specifically made to only show up as 10GB out of 100GB.

This seemed to work fine but I never really had a disaster even to know if I'd lose data.


Recently I migrated all my data to a larger pool and now I wanted to add the LOG device to that pool. I was greeted with a warning that I was only adding on device and it could result in dataloss.


So my question here is... what even would need to happen for me to lose data in this config? Would I need to lose my LOG device and have the powercut out on the Truenas box at the same time? Or would just having my LOG device die cause data loss on it's own?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Typically, that error message only appears if you added an ordinary vdev to your pool, instead of a log vdev, as this creates a stripe.

1617506386781.png
 

norbs

Explorer
Joined
Mar 26, 2013
Messages
91
Adding it under log and still would need to do the force option.

logadd.png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Would I need to lose my LOG device and have the powercut out on the Truenas box at the same time?
Power loss, kernel panic, HBA suddenly dying, failure of an external SAS cable to a JBOD shelf; basically anything that causes your pool to suddenly become unavailable. This has to happen in concert with the failed LOG vdev.

Of course, a failed LOG vdev alone will also cause the same effect as enabling sync=always without one, that being a significant loss in performance for sync write workloads. For ESXi hosts hitting it over iSCSI this could result in VMs becoming extremely slow and/or the OS and applications running on them to be perceived as "not responding." Whether that matters to you is up to your workload.
 

norbs

Explorer
Joined
Mar 26, 2013
Messages
91
Power loss, kernel panic, HBA suddenly dying, failure of an external SAS cable to a JBOD shelf; basically anything that causes your pool to suddenly become unavailable. This has to happen in concert with the failed LOG vdev.

Of course, a failed LOG vdev alone will also cause the same effect as enabling sync=always without one, that being a significant loss in performance for sync write workloads. For ESXi hosts hitting it over iSCSI this could result in VMs becoming extremely slow and/or the OS and applications running on them to be perceived as "not responding." Whether that matters to you is up to your workload.

Thanks that clears up a lot.

So I'm assuming that Truenas has all writes that aren't written to the "Data VDevs" in both RAM and LOG at the same time until it gets flushed to the "Data VDevs"? I was mainly worried that it's all only in LOG and not in RAM once it's written to LOG.


Yeah the performance aspect would not matter in my use case but I appreciate the heads up.


Any recommendations on a more modern LOG device? I do have room for a NVME device...
 
Last edited:

norbs

Explorer
Joined
Mar 26, 2013
Messages
91
Power loss, kernel panic, HBA suddenly dying, failure of an external SAS cable to a JBOD shelf; basically anything that causes your pool to suddenly become unavailable. This has to happen in concert with the failed LOG vdev.

Of course, a failed LOG vdev alone will also cause the same effect as enabling sync=always without one, that being a significant loss in performance for sync write workloads. For ESXi hosts hitting it over iSCSI this could result in VMs becoming extremely slow and/or the OS and applications running on them to be perceived as "not responding." Whether that matters to you is up to your workload.
And I just realized, you helped me get my LOG device set up about 5 years ago as well.
Thanks for all the help, and nice to see you're still around. :smile:
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Any recommendations on a more modern LOG device? I do have room for a NVME device...
An Optane drive, if you can find one for a decent price. Optane DC M.2 drives are 22110, so may not fit if you have only a 2280 slot. But for a home/lab NAS, a consumer M10 Optane (preferably 64 GB for throughput and endurance) would be fine.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
So I'm assuming that Truenas has all writes that aren't written to the "Data VDevs" in both RAM and LOG at the same time until it gets flushed to the "Data VDevs"? I was mainly worried that it's all only in LOG and not in RAM once it's written to LOG.
Correct. "Pending writes" that are queued up in a transaction group exist both in RAM and in the ZIL (on SLOG or in-pool) and the "flush" to data vdevs comes from RAM. The ZIL/SLOG is never read from except after a crash.

Any recommendations on a more modern LOG device? I do have room for a NVME device...
SATA devices are generally too slow; enterprise SAS or NVMe are the de facto standard now. NVMe can't hotswap properly (yet) but for most people that isn't a concern. Optane devices are excellent, but pricey, so are the older DC P3700s. Consumer M10 ones are viable but lower endurance compared to enterprise. The Radian RMS-200 (battery-backed RAM) used to be a good choice but has doubled in price recently due to increased interest.

It might be worth checking on the total amount of data written to your current SLOG device to see an idea of how active your pool is and how much of an SLOG is needed.

And I just realized, you helped me get my LOG device set up about 5 years ago as well.
Thanks for all the help, and nice to see you're still around. :smile:
Glad to still be helpful after all this time!
 

norbs

Explorer
Joined
Mar 26, 2013
Messages
91
An Optane drive, if you can find one for a decent price. Optane DC M.2 drives are 22110, so may not fit if you have only a 2280 slot. But for a home/lab NAS, a consumer M10 Optane (preferably 64 GB for throughput and endurance) would be fine.


Correct. "Pending writes" that are queued up in a transaction group exist both in RAM and in the ZIL (on SLOG or in-pool) and the "flush" to data vdevs comes from RAM. The ZIL/SLOG is never read from except after a crash.


SATA devices are generally too slow; enterprise SAS or NVMe are the de facto standard now. NVMe can't hotswap properly (yet) but for most people that isn't a concern. Optane devices are excellent, but pricey, so are the older DC P3700s. Consumer M10 ones are viable but lower endurance compared to enterprise. The Radian RMS-200 (battery-backed RAM) used to be a good choice but has doubled in price recently due to increased interest.

It might be worth checking on the total amount of data written to your current SLOG device to see an idea of how active your pool is and how much of an SLOG is needed.


Glad to still be helpful after all this time!

Thinking about pulling the trigger on an Optane DC P4801X.

Keeping in mind that this is for a home system and my main goal is to not lose data (but obviously performance matters).

I have 5x 16TB Sata drives in RAIDZ1 and 32GB ECC RAM.


Would this be super overkill? Just want to make sure this isn't completely insane...
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
First, with 16TB drives I would strongly recommend against RAIDZ1. If you need to rebuild after a drive failure, chances of another failure during resilver are quite high with drives that large. Although I cannot provide proper figures, of course, this is more than just a gut feeling. The "death of RAID5" is well known and documented for quite some time with current hard disk sizes.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
RAIDZ2 for data safety indeed. But performance with iSCSI would demand mirrors rather than any form of RAIDZ.

As for figures, the math is simple. For n bits to resilver, the probability of success is p = exp(-n*u), where u is the URE rate; conversely, the probability to have an Unrecoverable Read Error (and fail if there is no more redundancy in the pool) is q = 1 - exp(-n*u).
Let's take u = 1E-15 ("less than 1 in 1E15 bits" is spec sheet). With 50% full drives, there would be 4*8*8E12 bits to resilver in a RAIDZ vdev where a drive has failed: p = 77%, q = 23%. Ahem… In a typical game of Russian roulette, the player survives 5 to 6, but this is already almost 3 to 4.
"The death of RAIDZ" basically means that, with multi-terabyte arrays and the current URE rates, we lose one level of redundancy to read errors: RAIDZ2 can safely survive the loss of one drive; RAIDZ1 cannot safely lose any drive.
But this applies to mirrors, only in a less severe form. A 2-way mirror of 16 TB drives, also 50% full and rated for 1 URE in 1E15 bits, has a 6.2% probability of not resilvering after losing a drive. A decade after the "death of RAIDZ" we are approaching the "death of two-way mirrors". :eek:

At this point, the choice between poor iSCSI performance on secure(*) RAIDZ2, good performance on not-so-secure 2-way mirrors and good performance of secure but expensive 3-way mirrors becomes difficult. But the risk of losing data to a failing non-redundant SLOG, or to an URE in the non-redundant SLOG while recovering from an unexpected shutdown, is negligible compared to the risk of losing a data vdev.
(*) "secure" against a single drive loss
 
Top