SLOG SSD died

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Thank you. A couple of you folks I need to follow around on the boards. We are not sure we are provisioning the datastores correctly. Some do this, some do that.

After having a read the link you posted we found that we do have a couple IBM 12G SAS SSDs here that were listed as heavy write drives.

Thanks again.

Happy to help if you want to spin up a new thread summarizing the current configuration, so that it doesn't get lost in the mix here. I do see a couple concern points from your description (eg: RAIDZ2 pool for block storage) - check out the resource called "The path to success with block storage" for some basic information about why mirrors are significantly better than RAIDZ for this.


Let me take some time to look over your system config and see if I can propose a different/improved mix. You do have six PCIe x8 slots and onboard SAS2008 which gives us a good amount of expansion/upgrade potential. Feel free to @ tag me into your new thread when/if you create it.
 

Herr_Merlin

Patron
Joined
Oct 25, 2019
Messages
200
Thank you.

We are picking up a pair of ZeusRAM devices dumping the consumer NvME drives. Thank you @Herr_Merlin
Now we're wondering if we should push the consumer NvME drives out and use something like the ZeusIOPS 800s (or smaller) for cache? Thoughts?

First of do you need L2ARC?
Check you system and the ARC and L2ARC hit ration.
Second if L2ARC is consumed how much?
If if it's only like 32,64,128GB over all pools go for RAM. Add RAM. as RAM is ARC e.g. "L1"ARC and always faster than L2ARC.
If more go for SSDs. As L2ARC is a read cache is read more than written. Thus write endurance is not so important. There is small but. The cheapest consumer NVMe drive with just a few hundret TBW might be to less. A "good" consumer drive with long lasting fast reads and writes might be good choice. Really check for endurance during reading/writing. Many SSDs are DRAM/SLC cache and firmware as well as tempreture limited and thus only look good for the first few benchmark runs.. Another factor to keep in mind if you want to access data from the write cache latency is bad, which leads ultimatly back to RAM followed by Intel Optane. Intel Optane might be a quite expensive way. Everything is better than reading from the disk if RAM is not enough.
I personally would go for max TLC SSDs with a really high read speed as well as short access time. There are seldom benchmarks on access times. You have to spend some hours googleing.

The POOL is RAIDZ2 with 8 4.3TB HGST drives. 3 iSCSI mounted vmware datastores and 2 iSCSI mounted Windows drives and what appears to be a SMB Share that my team uses as an ISO dumping ground. 34 Win 10 Horizon clients and 8 Win 2016 Servers. Datastores are zvols and the SMB share is a dataset.
I think as @HoneyBadger already mentioned disk pool layout and usage would be another toppic and goes in hand with overall possible system improvements.
RAIDZ2 is usually very very slow compared to mirrors for block storage. But even there are ways around if you run 12.0. With metadata devs a freaking huge ARC as well as L2ARC. It still won't be as fast as mirros.

We are completely open to suggestions. We still feel that we are in over our heads here after this thread.
Appreciate you folks.

You may check this out to get a deeper understanding on how SSDs work. This will hopefully hint you why so many consumer SSDs are not an option for enterprise storage like freenas/truenas.
Most SSDs you may compare to a drag race car. A Short burst of fast but won't be able to sustain a 24h race or with a ton of stops and starts and thus be slower. You as enterprise storage need performance all time and not a start stop wiht overhauls inbetween. Thus pick the 24h race car even if it seems slower on the 1/4mile but it will last for the time you need and thus be faster and finish first.
 
Last edited:

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
Thank you folks.

I'll start another thread. I still have to read 1 more sticky post again about block storage and 'using lots of vdevs'.
You folks gave us a lot of information and probably need to go back over this thread again and again as well.
I really need to get finance on board with these Optane concepts. Nickel and dime while kicking the can down the road doesn't make much sense to me.

Thanks for the replies.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I really need to get finance on board with these Optane concepts. Nickel and dime while kicking the can down the road doesn't make much sense to me.

The question to ask from a finance/business perspective is "what does downtime or degraded service cost our business?"

I'd wager the cost of being down for even an hour, let alone a day, vastly outstrips the cost of a pair of Optane or RMS devices for use as SLOG.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
... or what is the alternative to hosting VMs on a server. It’s always good to put a couple of options in front of folk to show you’ve done the homework before suggesting a cost effective approach.

let’s face it, for businesses, these expenses are typically pretty minor. For us doing this stuff at home, it’s a bit harder to justify. Especially if he/she/it who must be obeyed has an insight into the associated expenses.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The L2ARC drive is not as important - the data in it is a redundant copy of what is inside your pool. So if you lose it, the server can recreate it. As of TrueNAS 12, you can make this data persistent via a flag in the tunables (otherwise the cache has to get "hot" after every time you reboot since reboots will normally clear the cache).
Would you point to more info on this. I may need to set that tunable for some of my systems at work?
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The question to ask from a finance/business perspective is "what does downtime or degraded service cost our business?"

I'd wager the cost of being down for even an hour, let alone a day, vastly outstrips the cost of a pair of Optane or RMS devices for use as SLOG.
That downtime question was how I justified the procurement of an entire redundant server. If the primary is down, for any reason, we could have five or six people that get paid $50 (or more) per hour and they can't work without that data. Not to mention the operational impact if their work can't be done. If it was just five people at $50 an hour for a single day, that is $2000... A few years back, we had a server down for over three weeks while a warranty replacement disk controller was shipped in from Canada... The 45drives company is run by nice folks, but they can't make shipping go faster when it must go through customs. Since I was able to show around $30,000 lost (just in wages) while we waited for a part, management figured a second server was a small price to pay to make sure that doesn't happen again.

I know it is a little off topic, but I thought it might be interesting.
 
Top