Hardware RAID controller as a SLOG device?

Status
Not open for further replies.

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If you try this, please report back. I'd guess performance improvements with a spinning-disk SLOG would be marginal if not actually negative.

This isn't true in practice, though I understand your thinking.

What actually happens is that the write cache on the RAID controller actually soaks up the writes at very high speed, at least until the cache is filled. The POSIX sync write idea is that you want the write to be committed to some form of stable storage prior to acknowledging a write, but for the purposes of a RAID controller, battery backed write cache is considered stable storage.

So if you have a 2GB cache of which 1GB is BBWC, you can soak up 1GB of small sync writes pretty much as fast as you can chuck write calls at the kernel (and for ZFS, without even crossing the user/kernel boundary). These are then streamed out to the backing SATA or SAS device.

Now, where it gets interesting is that flash wears out. So you can do all sorts of tricks to optimize for SSD, such as keeping the SLOG partition size just large enough to soak up a few transaction groups, but sooner or later you're likely to burn through your flash's endurance. The upside here is that you might be able to do this at ~500MBytes/sec until the thing dies. As Eric noted, though, you might not be able to accomplish this feat with consumer SSD's.

Hard disk, on the other hand, has nearly infinite endurance (relative to flash). However, the fastest hard drives are maybe ~200MBytes/sec. Because SLOG writes are generally sequantial, in practice you can build a BBWC-RAID-plus-HDD based SLOG device that is capable of a 1GByte burst write followed by sustained 200MBytes/sec of sync write activity. Virtually forever.

It is probably more practical and affordable in this era to get yourself a nice Intel PCIe SSD like the 750 (lowish endurance) or better yet the P3700, both of which sport power loss protection, high speed PCIe, and the total solution cost may be lower than buying a BBWC RAID plus HDD's plus the annoyance of a bunch of figure-it-out-yourself. But obviously the OP has the hardware already. Doesn't hurt to see how well it works.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
@jgreco When did you get back?! Welcome!
I'm betting our Resident Grinch had a script checking the Members area of the forum every day, checking if he was still number 2. When it finally raised the alarm (I like to imagine it being connected to a real siren with a red strobe), he decided to stop stealing toys and drop by to share his wisdom with those who don't pay him instead of those who pay him...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I've actually got a script that's going to go through and delete the posts of all uppity posters with five digits worth of posts. Unless they pay me.

Ever the Practical Grinch,

-J
 

Vidis

Dabbler
Joined
Jan 25, 2017
Messages
21
It is probably more practical and affordable in this era to get yourself a nice Intel PCIe SSD like the 750 (lowish endurance) or better yet the P3700, both of which sport power loss protection, high speed PCIe, and the total solution cost may be lower than buying a BBWC RAID plus HDD's plus the annoyance of a bunch of figure-it-out-yourself. But obviously the OP has the hardware already. Doesn't hurt to see how well it works.

Hmmm I had basically given up on this idea but as you say since I already have the RAID controller and the 15k SAS drives in place and have no other use for them, it might be better to use them as a SLOG than not use them at all.
Would you have tried this configuration or would you simply skip the SLOG device and set "sync=disabled" for the environment that I'm building?
If you would go for the RAID & SAS disk config as the SLOG device, how big partition would you create on the disk? The same size as the cache or use the full disk? The cache size of the H710 is only 512 MB so the H710 card might be to crappy to use for a config like this.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
For VM storage a lab environment? Heck, yeah, if you don't mind throwing some time and effort that might potentially not pay off, there's also the upside of gaining some insight into why this kind of unconventional solution can work. The traditional ZFS wisdom says "don't do it" but that's largely informed by people with bad experiences with RAID controllers running pool HDD's (and is 100% correct wisdom for that use case).

With the advent of great PCIe SSD's that are a good choice for SLOG, the actual practical deployment of this kind of solution in production is ... not likely. Yet there is still a lot of value in understanding why unconventional solutions can work in certain cases. If, for example, you had a server storing backups (a mostly-write-intensive application) and you wanted SLOG, the massive endurance afforded by RAID+HDD could be a win. Ditto, logging operations, or database operations.

I will note that I had some issues with the performance of the MegaRAID mfi driver when I experimented with this on bare metal, which I believe were resolved. It has been awhile.
 

Vidis

Dabbler
Joined
Jan 25, 2017
Messages
21
Then I think I will go ahead and at least give it a go and do some performance testing with and without the RAID controller config.
The whole point with a lab environment is to learn things right :rolleyes:

In my setup I also have 194 GB of RAM. This will yield a significant ARC so hopefully a lot of the reads will be really quick.
I have a Cisco 3750-X here at home also that I plan to connect everything through. This switch does only have 1Gbit ports.
I have more then enough Intel 1Gbit dual port NIC's laying around that I can add to the servers.
How much throughput do you think that I will be able to squeeze out of my NAS?
Would you use NFS or iSCSI to connect from the ESXi host to the NAS?
How many NIC's would you aggregate (LAG for NFS) (MPIO for iSCSI) to get the proper network speed out of the NAS and network?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I think you'll have to discover the answers to most of those questions yourself, because so much of it is hardware and configuration dependent. I'm mostly just trying to set the record straight on the RAID+HDD issue as I've done it. I don't think I'd go out of my way to do it in this modern era, but I did do it and it did work.
 

Vidis

Dabbler
Joined
Jan 25, 2017
Messages
21
Thanks for all the help with this and I will do a proper analysis of this as soon as all my hardware arrives here.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Please do feel free to follow up. It isn't that I'm not interested, it's that I'm too busy doing Real Work(tm) to do this sort of playing around myself, at least this month. Timewise, everything is feast or famine around here... I've been running 100% busy the last few months. But if you get stuck, speak up and I'll see if I can help.
 

Vidis

Dabbler
Joined
Jan 25, 2017
Messages
21
Thanks. I will do that.

After some more reading I have decided to go for iSCSI since there is support for the VAAI and I do want to use that feature.
I will start out by using either 4 or 6 Gbit NIC for the iSCSI MPIO connectivity.

The challenge now is how to size the SLOG device. Since I only have 512 MB cache on the RAID controller it will get saturated pretty fast. I really don't want much more data than that to be written to the SLOG device. But if I have 6 x 1 Gbit connectivity that will result in 6 x 0.125 = 0.75 x 5 (sec cache) = 3.75 GB that I need to be able to cache on the SLOG device.

I'm guessing that I will experience some REALLY crappy performance after the initial 512 MB. But then again the RAID controller will start immediately to dump the cached data to backend SAS drives so it might take a bit longer until I notice the performance drop. But I guess I will notice that when I test drive this :)
 
Last edited:

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
SLOG device size should be sufficient to cover the write speed for the time of several transaction groups, so your math is right for the minimal size, though I would give it few times more to be safe. SLOG device overflow forces synchronous flush of all data to main pool, that can be very slow. Limiting SLOG size is beneficial for SSDs to help wear leveling algorithm, but it is almost irrelevant for HDDs.

What's about the cache size, it of curse would be good to have cache size covering whole SLOG, then you would not need HDD at all, but that may be not critical. 512MB of the cache should be more then enough to convert synchronous writes initiated by ZFS (which are very bad for HDD performance) into asynchronous ones with deep request queue (which HDD should handle better). Unfortunately SLOG write may be not really sequential if there are multiple active datasets/ZVOLs same time, but hopefully big RAID's cache can mitigate that.
 
Status
Not open for further replies.
Top