Fun with TrueNAS

Streaming_Nerd · Aug 24, 2021

Hello All,

I have been a FreeNAS, now on to TrueNAS user for years with older hardware as it cycles down the food chain of life. This years "Core Systems" funds went to storage. So I decided to build a larger 50 disk system with 50 10K SAS drives and mirror them. Goal was decent storage with strong read/write and low latency. So off to the web I went got 34 more 1.2TB used drives and placed them into my Dell 720xd and MD1220 setup. Tossed in 128GB ram and picked up two NVME cards that are in there own PCIe slots and figured I could play with SLOG or L2ARC stuff with them.

My biggest VM guest abuser of drives is my core streaming server. It see's a solid 400 - 450Mbps writing to it 24/7/365 of HLS chunks. It streams out 500Mbps - 2Gbps all day long as well to end users. With this latest build my Solarwinds shows the SAN to be about 15ms to 20ms for latency that it is told via vCenter. I did find with HT turned off that the ms was more stable no matter the load on it. With HT on it would see a larger spread say 15 - 25ms. Depending on the tasks it had to handle. So off it stays at least for now. (CPU's are E5-2667 pair)

My ESXi hosts are 10GB dual round robin to each san I have and jumbo frames is set to 9000. IOPS set to 1.
.
Over the years I have tried many things RAIDZ2 in a HUGE vdev mirrors. NFS, iSCSI.

I am not in the need for PB's of storage its fast relabel storage that can at times handle hard hits as I migrate full VM's from one storage system to another. For updates or dealing with issues.

My questions are:

1. Is there a point say 20 - 30 ect. Disks that its better to create smaller pool sizes vs one HUGE one?
a. Like would two groups of 24 in a mirror be better then one group of 50?
2. My ARC sees 90 - 100% hits and the misses show every so often would taking 128GB and making it 256GB be a good investment?
a. I have another 64GB on my desk it was a thought to toss it in and see how it responds. (Todays test possibly)
b. I read the rule of thumb is about GB for TB. (So I have 54.5TB of raw storage so 1GB per TB that is 55GB currently 111.5GB in the ZFS Cache)
3. iSCSI vs NFS for ESXi Hosts?
a. Anyone using one over the other in a outside of the home solution say Small Biz/Ent. (I have not played with NFS for yeas been iSCSI all the way)
4. Caching disks? SLOG/L2ARC ect.
a. I know iSCSI doesn't do SLOG but you can force it to inside TrueNAS at the pool. I didn't find it to help it actually hurt it a bit. My latency came up 10 - 15ms at the reported VM level via solarwinds. But it was a fun test. Next is to try a L2ARC and see what happens with it.
5. Montering! Who is using what to monitor latency / vm's disks ect?

I am still learning as I go. Just looking to see if I am in the ball park of what this thing can do or if there is still room to get a bit more out of it.

Thanks,
Mike

jgreco · Aug 24, 2021

There's probably a point at which it makes sense to make multiple pools, but this has more to do with your tolerance for losing a pool if you lose multiple disks. If you have a thousand disks but only two-way mirroring, what are the odds that two disks in the same vdev could crap out at the same time?

256GB would be a good investment if you could show that your pool would gain significant performance improvements. If you are already reading mostly from ARC, the room for improvement is marginal, and depending on workload, may be effectively almost nothing.

The rules of thumb are really only to be able to point people at SOMETHING that convinces them trying to run a 100TB pool on an 8GB APU box is not going to work. Once you get out past 64GB or so, you really need to be able to read the stats and figure out where your system is on a spectrum of possibilities.

Lots of people use both iSCSI and NFS in commercial environments. iXsystems literally wrote the product for and sells the product to these types of users. Curious why you would think otherwise.

Your point 4/4a seems to think that SLOG is some sort of cache. It isn't.

https://www.truenas.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

Sync writes with SLOG *always* slows a pool down compared to a pool that isn't doing sync writes. i.e. it will always "actually hurt it" and often more than "a bit".

Streaming_Nerd · Aug 24, 2021

As for NFS vs iSCSI the question was directed more of a A vs B for performance aspects. From my readings over the years iSCSI seams to be the go to method for connecting TrueNAS to ESXi hosts. I have stayed with iSCSI based on what I have red over the years.

sretalla · Aug 24, 2021

jgreco said:
what are the odds that two disks in the same vdev could crap out at the same time?

Actually, the more disks you add, the better the odds get...

With 6 disks, you get
6/30 = 1/6 (a 20% chance that the second failed disk is in the same mirror)

and with 20 disks...
20/380 = 1/19 (a little more than 5% chance of the second disk being in the same mirror)

Yellow=Pool Degraded, Red=Pool Loss

You can use the basic visual principle that every VDEV you add, much more yellow gets added than red.

Please gamble responsibly.

Streaming_Nerd · Aug 24, 2021

That was one of the ideas with having 25 vdev's in a pool mirror. Also mirror is the best for performance from what I have tested and red about.

jgreco · Aug 24, 2021

sretalla said:
Actually, the more disks you add, the better the odds get...

Your underlying assumption that the number of disks that break in short order is always two.

In practice, you do not get such a guarantee of exclusivity.

The odds of two disks in a vdev crapping out at the same time should actually be a constant, like, as in, "99.5% of the time, a vdev will survive five years." But when you introduce a second vdev, the reliability is reduced somewhat, because 99.5% of the time the first vdev will survive, and 99.5% of the time the second vdev will survive, but the pool survival is dependent upon BOTH vdevs surviving, so the probability of pool survival is lower than 99.5%.

sretalla · Aug 24, 2021

jgreco said:
Your underlying assumption that the number of disks that break in short order is always two.

Indeed my table talks about the "first two drives to fail"... assuming no others fail until you can deal with the first (and/or the second) failure.

I put those tables together to help people understand the difference in risk of pool loss between RAIDZ1, RAIDZ2 and Mirrors (shared in other posts about 6 drive setups).

I agree that RAIDZ and Mirrors isn't a fair direct comparison since the resilvering process for mirrors is much less demanding on the remaining good disks (so perhaps failure #2 is somehow less likely... not sure if you're saying the opposite of that when you mention the failure constant), but I'm just putting the comparison out there to help people get their heads around what it means in the different pool layouts as drives fail in sequence... and thought it may be useful in context of this thread.

No problem if people think otherwise.

Although I would point out that the chances of hitting the "right disk" with the next failure is always lower/better/safer with a higher number of VDEVs.

jgreco · Aug 24, 2021

sretalla said:
Indeed my table talks about the "first two drives to fail"... assuming no others fail until you can deal with the first (and/or the second) failure.

But that's generally a fallacy. For example, if you have a thousand disks, and you've had a handful of years of uptime, drives start dropping off potentially in the several-per-day range, and there is a nonzero MTTR. Your graphics sorta assume that spares are available and MTTR is zero.

Survival of the pool requires that *all* member vdevs survive.

This is actually worse than it sounds, because with simple two-way mirrors, you also lose redundancy when a disk fails, so any further URE can actually cause substantial damage to the pool as well.

This is a variation on the "why we don't recommend RAIDZ1" issue, dealing with the loss-of-redundancy aspect.

sretalla · Aug 24, 2021

jgreco said:
Your graphics sorta assume that spares are available and MTTR is zero.

As stated, I assume that MTTR is short enough to get ahead of the third failure.

If someone is sweating their entire pool of disks past the MTBF, then they already signed up for carnage at the moment of first failure, so that's what may arrive. I am primarily addressing failures within MTBF.

Again, just submitting information for consideration in the context already raised, not trying to claim it's as simple as the chart might imply if you ignore real world factors.

NugentS · Aug 24, 2021

Backups

jgreco · Aug 24, 2021

sretalla said:
Again, just submitting information for consideration in the context already raised,

The context already raised was 34 used disks purchased, so I kind of took

sretalla said:
disks past the MTBF

to be a distinct possibility.

I guess it actually has been nearly ten years since the Seagate 1.5TB/3TB drive debacle, so I'm not sure your distinction is meaningful, but I did actually have a client at the time whose staff was fighting a losing battle to keep ahead of rebuilds on RAIDZ1 vdevs with something like a 7-vdev 48-drive pool, where they were literally having additional pool drives fail during the resilver of the previous disks, and there was a nontrivial chance of catastrophic failure. Wasn't my doing or my responsibility, it was a ZFS build by some IT guys, but still scary as hell. This went on over a period of many months, until Nexenta sold them their own solution with non-Seagate drives. I believe they ended up replacing nearly half the drives by the time all was said and done. As a result, I am particularly sensitive to the issue of multiple disk failures and am reluctant to just wave it off as a nonconcern with such a cavalier attitude.

Important Announcement for the TrueNAS Community.

Fun with TrueNAS

Streaming_Nerd

Cadet

jgreco

Resident Grinch

Streaming_Nerd

Cadet

sretalla

Powered by Neutrality

Streaming_Nerd

Cadet

jgreco

Resident Grinch

sretalla

Powered by Neutrality

jgreco

Resident Grinch

sretalla

Powered by Neutrality

NugentS

MVP

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

Fun with TrueNAS

Cadet

Resident Grinch

Cadet

Powered by Neutrality

Cadet

Resident Grinch

Powered by Neutrality

Resident Grinch

Powered by Neutrality

MVP

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Fun with TrueNAS"

Similar threads