SLOG SSD died

HoneyBadger · Dec 23, 2020

mrstevemosher said:
Thank you. A couple of you folks I need to follow around on the boards. We are not sure we are provisioning the datastores correctly. Some do this, some do that.

After having a read the link you posted we found that we do have a couple IBM 12G SAS SSDs here that were listed as heavy write drives.

Thanks again.

Happy to help if you want to spin up a new thread summarizing the current configuration, so that it doesn't get lost in the mix here. I do see a couple concern points from your description (eg: RAIDZ2 pool for block storage) - check out the resource called "The path to success with block storage" for some basic information about why mirrors are significantly better than RAIDZ for this.

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

Let me take some time to look over your system config and see if I can propose a different/improved mix. You do have six PCIe x8 slots and onboard SAS2008 which gives us a good amount of expansion/upgrade potential. Feel free to @ tag me into your new thread when/if you create it.

Herr_Merlin · Dec 23, 2020

mrstevemosher said:
Thank you.

We are picking up a pair of ZeusRAM devices dumping the consumer NvME drives. Thank you @Herr_Merlin
Now we're wondering if we should push the consumer NvME drives out and use something like the ZeusIOPS 800s (or smaller) for cache? Thoughts?

First of do you need L2ARC?
Check you system and the ARC and L2ARC hit ration.
Second if L2ARC is consumed how much?
If if it's only like 32,64,128GB over all pools go for RAM. Add RAM. as RAM is ARC e.g. "L1"ARC and always faster than L2ARC.
If more go for SSDs. As L2ARC is a read cache is read more than written. Thus write endurance is not so important. There is small but. The cheapest consumer NVMe drive with just a few hundret TBW might be to less. A "good" consumer drive with long lasting fast reads and writes might be good choice. Really check for endurance during reading/writing. Many SSDs are DRAM/SLC cache and firmware as well as tempreture limited and thus only look good for the first few benchmark runs.. Another factor to keep in mind if you want to access data from the write cache latency is bad, which leads ultimatly back to RAM followed by Intel Optane. Intel Optane might be a quite expensive way. Everything is better than reading from the disk if RAM is not enough.
I personally would go for max TLC SSDs with a really high read speed as well as short access time. There are seldom benchmarks on access times. You have to spend some hours googleing.

mrstevemosher said:
The POOL is RAIDZ2 with 8 4.3TB HGST drives. 3 iSCSI mounted vmware datastores and 2 iSCSI mounted Windows drives and what appears to be a SMB Share that my team uses as an ISO dumping ground. 34 Win 10 Horizon clients and 8 Win 2016 Servers. Datastores are zvols and the SMB share is a dataset.

I think as @HoneyBadger already mentioned disk pool layout and usage would be another toppic and goes in hand with overall possible system improvements.
RAIDZ2 is usually very very slow compared to mirrors for block storage. But even there are ways around if you run 12.0. With metadata devs a freaking huge ARC as well as L2ARC. It still won't be as fast as mirros.

mrstevemosher said:
We are completely open to suggestions. We still feel that we are in over our heads here after this thread.
Appreciate you folks.

Talking Storage with Allyn Malventano

+ Allyn's Twitter: https://twitter.com/malventano?lang=en**********************************Thanks for watching our videos! If you want more, check us out onl...

www.youtube.com

You may check this out to get a deeper understanding on how SSDs work. This will hopefully hint you why so many consumer SSDs are not an option for enterprise storage like freenas/truenas.
Most SSDs you may compare to a drag race car. A Short burst of fast but won't be able to sustain a 24h race or with a ton of stops and starts and thus be slower. You as enterprise storage need performance all time and not a start stop wiht overhauls inbetween. Thus pick the 24h race car even if it seems slower on the 1/4mile but it will last for the time you need and thus be faster and finish first.

mrstevemosher · Dec 24, 2020

Thank you folks.

I'll start another thread. I still have to read 1 more sticky post again about block storage and 'using lots of vdevs'.
You folks gave us a lot of information and probably need to go back over this thread again and again as well.
I really need to get finance on board with these Optane concepts. Nickel and dime while kicking the can down the road doesn't make much sense to me.

Thanks for the replies.

HoneyBadger · Dec 24, 2020

mrstevemosher said:
I really need to get finance on board with these Optane concepts. Nickel and dime while kicking the can down the road doesn't make much sense to me.

The question to ask from a finance/business perspective is "what does downtime or degraded service cost our business?"

I'd wager the cost of being down for even an hour, let alone a day, vastly outstrips the cost of a pair of Optane or RMS devices for use as SLOG.

Constantin · Dec 24, 2020

... or what is the alternative to hosting VMs on a server. It’s always good to put a couple of options in front of folk to show you’ve done the homework before suggesting a cost effective approach.

let’s face it, for businesses, these expenses are typically pretty minor. For us doing this stuff at home, it’s a bit harder to justify. Especially if he/she/it who must be obeyed has an insight into the associated expenses.

Chris Moore · Dec 29, 2020

Constantin said:
The L2ARC drive is not as important - the data in it is a redundant copy of what is inside your pool. So if you lose it, the server can recreate it. As of TrueNAS 12, you can make this data persistent via a flag in the tunables (otherwise the cache has to get "hot" after every time you reboot since reboots will normally clear the cache).

Would you point to more info on this. I may need to set that tunable for some of my systems at work?

Constantin · Dec 29, 2020

Chris Moore said:
Would you point to more info on this. I may need to set that tunable for some of my systems at work?

Your wish is my command. See here.

Chris Moore · Dec 29, 2020

HoneyBadger said:
The question to ask from a finance/business perspective is "what does downtime or degraded service cost our business?"

I'd wager the cost of being down for even an hour, let alone a day, vastly outstrips the cost of a pair of Optane or RMS devices for use as SLOG.

That downtime question was how I justified the procurement of an entire redundant server. If the primary is down, for any reason, we could have five or six people that get paid $50 (or more) per hour and they can't work without that data. Not to mention the operational impact if their work can't be done. If it was just five people at $50 an hour for a single day, that is $2000... A few years back, we had a server down for over three weeks while a warranty replacement disk controller was shipped in from Canada... The 45drives company is run by nice folks, but they can't make shipping go faster when it must go through customs. Since I was able to show around $30,000 lost (just in wages) while we waited for a part, management figured a second server was a small price to pay to make sure that doesn't happen again.

I know it is a little off topic, but I thought it might be interesting.

Important Announcement for the TrueNAS Community.

SLOG SSD died

HoneyBadger

actually does care

The path to success for block storage

Herr_Merlin

Patron

Talking Storage with Allyn Malventano

mrstevemosher

Dabbler

HoneyBadger

actually does care

Constantin

Vampire Pig

Chris Moore

Hall of Famer

Constantin

Vampire Pig

Chris Moore

Hall of Famer

Similar threads

Important Announcement for the TrueNAS Community.

SLOG SSD died

actually does care

Patron

Dabbler

actually does care

Vampire Pig

Hall of Famer

Vampire Pig

Hall of Famer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "SLOG SSD died"

Similar threads