SLOG recommendation check

pbrunnen · May 16, 2021

Hello all,
I'm on my first experience with TrueNAS and I'm setting up an iSCSI host to run a handful of VMs from. The system is an Inventec dual Xeon E5-2630 @ 2.30GHz with 32GB ram. I have two small SSDs in mirror providing the boot pool and a LSI SAS2308 with a 12-drive backplane holding 8x Enterprise SATA 3TB 7200rpm drives for the primary storage. The NICs are two I350 1GbEs for management/shares and a Mellanox CX312A two-port 10GbE for the iSCSI traffic (failover path setup with two separate IP ranges).

I've not purchased any SLOG drives yet, but from reading the forums, it makes sense to have one for iSCSI setups... Based on what I've read, I calculate 1/8th of my 32GB would be 4GB and the slog should be at least two transactions worth... So I figure 8GB and only 1/4 provisioned for better life. I'm looking at 32GB SSDs so that I can also increase RAM to 64GB and still be good... The system doesn't have any available PCIe slots so NVMe is out. I'm considering a pair of TDK SDE1B032GTKDWBA0ESA0 in mirror which are 32GB SLC SATA SSDs (325MB/s sustained write) and aren't too expensive. They seem to be some hybrid SLC (pSLC? never heard of before) so they only have 20k cycles, but it does indicate "Power Fail Data Safety" and "Power Back-up Circuit".

Thoughts from the experts?

CLEsportsfan · May 17, 2021

Take a look at this resource for good SLOG drives: https://www.truenas.com/community/threads/slog-benchmarking-and-finding-the-best-slog.63521/ It's probably the best resource for drive performance on this forum.

In general, look for a "write intensive" or "10DWPD" model SAS SSD drive to use. Ones that were recommended to me were Hitachi HUSMH/HUSMM series (Ultrastar SSD1600) and Toshiba PX04SM/PX05SM. I ended up going with a Toshiba PX04SV because I couldn't find a PX04SM on eBay. It's a "mixed-use" labeled drive, but I overprovisioned it to hopefully get better life from it more similar to the write-intensive ones. Speed-wise, it topped out at 534MB/s and was a little less expensive than your TDKs. I'm pretty sure SAS drives by design have more bandwidth available to them over SATA, so get a SAS SSD for your SLOG.

Also, if you're able to add more RAM to your system, do that. It's an inexpensive upgrade and will help performance greatly since TrueNAS will use available RAM to cache most frequently used files. When I upgraded my server, it was maybe $18 per gb for RAM

Also, I'm no expert at all but got VERY good advice from experts on this forum and relaying that info to you

pbrunnen · May 18, 2021

Hi @CLEsportsfan,
Thanks for the recommendation... I had not come across that long thread, so thank you for the link. Per your recommendation, I found three of the smaller Ultrastar SSD1600 series drives with SED support relatively cheaply on eBay, so I grabbed them. Two for SLOG and one on hand as a backup in case of failure.

I was originally going to just up to 64GB, but watching my ram utilization, except when I'm copying over data there is a large chunk that remains free (anywhere from 1GB to 4GB), so I'm not sure how much benefit it will be. My ARC cache hit rates are pretty high, so I'm probably not working it very hard.

Thanks again!

CLEsportsfan · May 19, 2021

No problem!

I believe your ARC cache will grow the more files are accessed. With lower RAM, it'll just cache less and require that data to be pulled from disk, which is obviously much slower. The bare minimum recommended RAM is 64gb when using block storage. Until your VMs are up and running for a few days, you might not have a good indication of your true ARC cache hit rate. I have 192gb in mine hosting storage for a dozen VMs, and my hit rate is about 90%, sometimes up to 95%.

Since you'll be using your TrueNAS to host VMs, are you setting up a bunch of small mirrored vdevs for your storage to improve IO? This is a fantastic resource on how to set up your TrueNAS server for performance in block storage: https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

pbrunnen · May 19, 2021

CLEsportsfan said:
Until your VMs are up and running for a few days, you might not have a good indication of your true ARC cache hit rate.

So I've been running eight days now... My ARC hit ratio is (min/mean/max) 84, 99, 100 and right now I show 2GB free still. It seems to be behaving well. The size seems stable per the reporting at 20GB.

CLEsportsfan said:
Since you'll be using your TrueNAS to host VMs, are you setting up a bunch of small mirrored vdevs for your storage to improve IO?

Yes... I did read that thread during my planning stages, so my pool is two 3TB mirrors. I'm actually going to add a third drive to each mirror for safety next. Performance has been so much better than the QNap it replaces, I'm almost wondering if the external SLOG is necessary...

Cheers!

CLEsportsfan · May 19, 2021

pbrunnen said:
So I've been running eight days now... My ARC hit ratio is (min/mean/max) 84, 99, 100 and right now I show 2GB free still. It seems to be behaving well. The size seems stable per the reporting at 20GB.

That's good. My graph shows 100% for me most hours, but backups take a hit so the average shown when I run arc_summary via ssh is lower.

pbrunnen said:
Yes... I did read that thread during my planning stages, so my pool is two 3TB mirrors. I'm actually going to add a third drive to each mirror for safety next.

I've never tried a 3 drive mirror vdev. I assume it's supported? If you expand the pool, all the vdev sizes all have to match. So, it might be more useful to add one drive as a global spare for the whole zpool. The hot spare could be used by any of the vdevs.

pbrunnen said:
Performance has been so much better than the QNap it replaces, I'm almost wondering if the external SLOG is necessary...

That's great! Since you're using iSCSI, one thing you'll want check is, do you have sync=always set on your zpool? If not, there's a potential for data loss. Things will slow down immensely without a good SLOG. Run zfs get sync to check your sync setting. If it's set to sync=standard, it's probably not using your SLOG at all

sretalla · May 20, 2021

CLEsportsfan said:
I've never tried a 3 drive mirror vdev. I assume it's supported?

It is.

CLEsportsfan said:
If you expand the pool, all the vdev sizes all have to match.

No, they don't, but all drives in a VDEV will take on the capacity of the smallest member. If you want balanced IO across all VDEVs in a pool, you will need to clear the pool and repopulate it from backup after adding a VDEV (in that case you may want to keep sizes the same).

CLEsportsfan said:
So, it might be more useful to add one drive as a global spare for the whole zpool.

Hot spares can be useful in a small set of circumstances, usually where the machine isn't easily accessible within a relatively short time period.
You should usually be getting warnings about failing drives in enough time to prepare a replacement if you can reach the machine easily.

pbrunnen · May 20, 2021

sretalla said:
CLEsportsfan said:

If you expand the pool, all the vdev sizes all have to match. So, it might be more useful to add one drive as a global spare for the whole zpool. The hot spare could be used by any of the vdevs.

Click to expand...

No, they don't, but all drives in a VDEV will take on the capacity of the smallest member. If you want balanced IO across all VDEVs in a pool, you will need to clear the pool and repopulate it from backup after adding a VDEV (in that case you may want to keep sizes the same).

Yea, I read about having three, and yea while I do loose the space I'd rather have the peace of mind that should something happen to one drive in the mirror, I'm not relying on running in ideal conditions during the 'resilver' (or is it resliver...) process.

CLEsportsfan said:
That's great! Since you're using iSCSI, one thing you'll want check is, do you have sync=always set on your zpool? If not, there's a potential for data loss. Things will slow down immensely without a good SLOG. Run zfs get sync to check your sync setting. If it's set to sync=standard, it's probably not using your SLOG at all

Good point. I thought I had set this to sync=always, but then again I'd been messing with the setup several times before the last iteration and it seems it is set to sync=standard at the moment... So I'll need to change that immediately and we'll see how bad the performance is until the SLOG drives come in Monday. In the meantime, it is already on a UPS with auto-shutdown enabled, but that wasn't a good state so thanks for the reminder.

pbrunnen · May 24, 2021

Hi all,
My Hitachi SSDs came in today and I'm just trying to add them to the pool hosting my iSCSI targets, but the UI is reporting something confusing to me. I chose "Log VDev" from the "Add VDev" menu, and put the drives into the "Log VDev" section. It at first complained about striping and shows the error "A stripe log vdev is highly discouraged and will result in data loss if it fails"... Now I plan to add these two drives in a mirror configuration, But I'm concerned about that message. From what I read here, I was under the impression that loss of the Log VDev would not cause data loss to the pool. Or is the message just misleading?

Thanks!

P.S. Should I also enable TRIM support on the Pool or is that only necessary if the data VDevs are SSDs?

Sirius · May 24, 2021

Losing the log vdev can cause data loss which is why it's recommended to add it in a mirrored configuration.

I've seen that message before, but it went away when I picked a mirrored option. I think you could even do something silly like RAID-Z SLOGs.

pbrunnen · May 25, 2021

Sirius said:
Losing the log vdev can cause data loss which is why it's recommended to add it in a mirrored configuration.

Hi Sirius,
Just for clarity, the only way you would loose data is if there were transactions on the SLOG that were not committed to the pool data VDevs, correct? I do plan to do a mirror, but I just want to make sure I'm not surprised by a way that the pool could go sideways unexpectedly.

Sirius · May 25, 2021

Yeah, that's my understanding. I think you'd have to be super unlucky to have 2 SLOGs die without something also killing the disks (eg. lightning strike). There may be a way to force recover the pool if both SLOGs somehow die, but I'm not sure - someone else more experienced would have to answer that.

The main reason you use 2 is just in case one dies, plus by going with enterprise rated drives you also reduce the risk further. Something like a Samsung 970 Pro NVMe is a great SSD for example but it has nothing on the endurance levels of even an old Intel S3700 SATA SSD.

The only "consumer" drives worth considering are the Optane 900p or 905p.

pbrunnen · May 26, 2021

Hi @Sirius
Thanks for the confirmation... Agreed that I doubt this will be a real failure scenario without a catastrophic event like you mentioned. Yea, I chose two 200GB Hitachi HUSMM112 which are enterprise sas with a decent average 480.7 Mbytes/s random write throughput.

Did you happen to know about the Trim parameter?

HoneyBadger · May 26, 2021

Sirius said:
Yeah, that's my understanding. I think you'd have to be super unlucky to have 2 SLOGs die without something also killing the disks (eg. lightning strike). There may be a way to force recover the pool if both SLOGs somehow die, but I'm not sure - someone else more experienced would have to answer that.

If both SLOGs fail, you can discard the data on the missing log vdev and forcibly mount the pool (zpool import -m), but this of course means guaranteed data loss and in cases with other filesystems on top (eg: VMFS) often means metadata is lost at that level, and a restore from backups may be necessary.

pbrunnen said:
Did you happen to know about the Trim parameter?

TRIM is dependant on your HBA passing it through, the SAS2308 is much better at this than the SAS2008. Check by using the line sysctl -a |grep zio_trim - you have two boot SSDs so that might skew the numbers a bit, but the majority of TRIMs should be hitting your SLOG devices.

I know there was some discussion in early TrueNAS 12 releases about pool autoTRIM being disabled - I don't know if this impacts log vdevs still though.

pbrunnen · May 27, 2021

Hello @HoneyBadger and thank you for the reply...

HoneyBadger said:
If both SLOGs fail, you can discard the data on the missing log vdev and forcibly mount the pool (zpool import -m), but this of course means guaranteed data loss and in cases with other filesystems on top (eg: VMFS) often means metadata is lost at that level, and a restore from backups may be necessary.

Ok, I'm confused by this now... The only things on the log vdev should be the pool uncommitted transactions which I thought from reading should only be up to 8GB worth of transactions (1/8th of my 32GB RAM by two transactions worth). So the last two transactions at most should be lost, but based on when it fails, why is this a guaranteed data loss? I can appreciate inconsistency in the stored blocks which may roll-back in the filesystem and I can see if it fails during heavy operation or maybe off-hours when there is hardly any activity making a difference in how much is lost. Or am I just misunderstanding how this works?

HoneyBadger said:
TRIM is dependant on your HBA passing it through, the SAS2308 is much better at this than the SAS2008. Check by using the line sysctl -a |grep zio_trim - you have two boot SSDs so that might skew the numbers a bit, but the majority of TRIMs should be hitting your SLOG devices.

I know there was some discussion in early TrueNAS 12 releases about pool autoTRIM being disabled - I don't know if this impacts log vdevs still though.

I see an option to tick when I go for the "Pool Options" from the Pool gear icon. It is by default unchecked...
And I guess you bring up a point about looking if I can enable Trim for the system boot SSDs... Those are on the native SATA interface, not my LSI, but I will have to check exactly which LSI SAS controller is onboard.

Thanks!

HoneyBadger · May 27, 2021

pbrunnen said:
So the last two transactions at most should be lost, but based on when it fails, why is this a guaranteed data loss?

Because you're "guaranteed" to lose those one/two transaction groups worth of data if you're forced to import with a missing log vdev.

The problem is that your client OS (vSphere/Hyper-V) thinks that data is safely written (it was, to the SLOG device) and it will be expecting it there. If that data was part of a VMDK/VHD, then the impact might be limited to that virtual machine seeing in-guest corruption. If the missing data was a metadata update to the filesystem or other crucial data, the impact might be "Error: Cannot mount VMFS datastore" and all VMs are impacted by being unavailable.

Note that in order to see this kind of failure, you have to lose your entire LOG vdev at the same time that you experience a system hang or power loss. If you have a power loss and lose one SLOG you're still safe, and you'll replay the logs on next boot (and swap out the dead SLOG) - lose both SLOGs but not power and you're still safe; but your performance just got kicked in the teeth, and you're likely to experience hung/unresponsive guest applications or VMs if your pool isn't capable of handling the I/O demands, so swap out those dead SLOGs.

pbrunnen · May 27, 2021

Hi @HoneyBadger
Ah, ok... Thank you for the clarification. That makes a lot more sense to me and is actually better than expected... So as long as I don't loose both SLOG drives and abruptly power we'll be fine. I can live with those odds...

Important Announcement for the TrueNAS Community.

SLOG recommendation check

pbrunnen

Dabbler

CLEsportsfan

Dabbler

pbrunnen

Dabbler

CLEsportsfan

Dabbler

pbrunnen

Dabbler

CLEsportsfan

Dabbler

sretalla

Powered by Neutrality

pbrunnen

Dabbler

pbrunnen

Dabbler

Sirius

Dabbler

pbrunnen

Dabbler

Sirius

Dabbler

pbrunnen

Dabbler

HoneyBadger

actually does care

pbrunnen

Dabbler

HoneyBadger

actually does care

pbrunnen

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

SLOG recommendation check

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Powered by Neutrality

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

actually does care

Dabbler

actually does care

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "SLOG recommendation check"

Similar threads