[Newbie] Truenas Scale Advice / Vdev/Pool Optimal Configuration?

killmas

Cadet
Joined
May 26, 2023
Messages
5
Full Disclosure: I have no idea what I am doing (from the standpoint as to this is my first foray into building a long term archival NAS) and am hoping someone might be able to provide some guidance / advice / tell me I am planning on doing something super dumb before I actually implement this configuration. Also subnote, some of the hardware configuration may seem a bit overkill, but my goal was to not have to expand the hardware beyond expanding the storage pool in the future (maybe in the future throwing in a higher speed NIC)

Hardware Configuration:
Chassis / MB: 24x Bay 2.5" SuperMicro X9DRI-LN4F+, 2x PSUs
CPU: 2x E5-2670 V2 2.50GHz 10C
Ram: 16x 32GB DDR3 RDIMM (512 GB total)
NIC: 1x Intel X540-BT1 10Gb RJ45, 4x 1Gb Onboard (including 1 IPMI)
HBA: LSI 9305-24i in IT mode SAS HBA
Initial Storage:
1: QNAP QM-4P-384 4x M.2 NVME PCIE 3.0 Expansion Card w/ 4x 2 TB Samsung 980 Pro NVME SSDs installed
2. 16x Samsung 870 Evo SATA SSDs in 16 of the 24 2.5" bays

Anticipated Software / VDEV Configuration
Truenas Scale Latest Version installed on 2x 2TB 980 Pros
Remaining 2x 980 Pros to be used as Cache
Pool:
1. 8x 870 4TB Evos in VDEV Raidz
2. 8x 870 4TB Evos in VDEV Raidz
3. (Future) 8x 4TB 870 Evos in VDEV Raidz
4. (Future) 8x TBD TB HDD in VDEV Raidz in External JBOD via SAS HBA*
5. (Future) 8x TBD TB HDD in VDEV Raidz in External JBOD via SAS HBA*
6. (Future) 8x TBD TB HDD in VDEV Raidz in External JBOD via SAS HBA*
*Note: I am not sold that #4,5,6 need to be part of the original #1,2,3 pool, as I'd be fine with a "Fast" SMB and a "Slow" SMB set of shares, but my goal is to just have 1 SMB share.
**Second Note: I am not sold on Raidz vs Raidz2, still figuring out which I will end up with.

Use Case:
Primarily, I will be using this NAS as a Mass Storage SMB folder. I don't intend to be writing to this 24x7. The intent is to be more of a write once in a while, mainly Read operation. When I do a write operation, it will be at ~700 GB at a time, but the frequency will be a few weeks / months between write operations. I will be integrating Truenas with a separate Windows Server 2012R2 host (for Active Directory sync). The majority of its operation will be Reads (watching videos, listening to music, etc.)

Maybe in the future I'd think about running PLEX or something on here, but I am very doubtful of that. My intent is to reallocate more than the 50% limit of RAM via the workaround mentioned for other Scale applications.

Thoughts?
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
1. 8x 870 4TB Evos in VDEV Raidz
2. 8x 870 4TB Evos in VDEV Raidz
3. (Future) 8x 4TB 870 Evos in VDEV Raidz
4. (Future) 8x TBD TB HDD in VDEV Raidz in External JBOD via SAS HBA*
5. (Future) 8x TBD TB HDD in VDEV Raidz in External JBOD via SAS HBA*
6. (Future) 8x TBD TB HDD in VDEV Raidz in External JBOD via SAS HBA*
*Note: I am not sold that #4,5,6 need to be part of the original #1,2,3 pool, as I'd be fine with a "Fast" SMB and a "Slow" SMB set of shares, but my goal is to just have 1 SMB share.
**Second Note: I am not sold on Raidz vs Raidz2, still figuring out which I will end up with.
How many 4TB SSDs do you have on hand now? 16? How much storage do you anticipate needing?

Primarily, I will be using this NAS as a Mass Storage SMB folder. I don't intend to be writing to this 24x7. The intent is to be more of a write once in a while, mainly Read operation. When I do a write operation, it will be at ~700 GB at a time, but the frequency will be a few weeks / months between write operations. I will be integrating Truenas with a separate Windows Server 2012R2 host (for Active Directory sync). The majority of its operation will be Reads (watching videos, listening to music, etc.)
Why the choice of SSDs for your workload? Seems like you could have easily gotten away with HDDs if I am understanding the usecase correctly. Is it just because you have them on hand?

How many concurrent clients do you anticipate? When you are doing a write operation, are you just "ingesting" more Movies, TV Shows and Music?
Are those write coming from a single host or many?

Maybe in the future I'd think about running PLEX or something on here, but I am very doubtful of that. My intent is to reallocate more than the 50% limit of RAM via the workaround mentioned for other Scale applications.
With 512GB of RAM you should have PLENTY of ARC even if you dedicate more than 50% of your RAM via sysctl.

My system has 256GB of RAM and is probably a not too dissimilar workload and it's already overkill. Notice how low my L2ARC hits are.

1685233539710.png
 

killmas

Cadet
Joined
May 26, 2023
Messages
5
How many 4TB SSDs do you have on hand now? 16? How much storage do you anticipate needing?
I have 16 on hand right now (was going to start with 8 but then decided to indulge myself and got 8 more). The problem I am trying to mitigate currently is YEARS of neglect (solely my fault) of non-backed up storage, run directly off my AD server using USB hard disks (@~142 TB of useable space of the USB drives). At the end of the day, I would like to get up to the ~512 TB realm of raw space, so somewhere in the realm of ~372 TB of usable. Depending on level of redundancy that would decrease somewhat, but that's the "goal".

Why the choice of SSDs for your workload? Seems like you could have easily gotten away with HDDs if I am understanding the usecase correctly. Is it just because you have them on hand?
Honestly probably my own gumption. I have (for the most part) migrated most of my storage (with the exception of the USB drives) to SSDs. If I had the money, I'd be attempting to make the entire solution SSDs, but since that ain't happening, I'll eventually use this mix of SSDs/HDDs. I have had really good luck with SSDs in general (except for one corsair 1st gen SSD). I have the desire to make "networking" my bottleneck from a performance standpoint (as I don't see me pressing into 100Gb or greater any time soon, I'd be in bigger trouble with my SO than this current project).

So first part of this project is to get some of my "fast" storage in place, then start working the "slow" storage side of things via external HBAs/JBOD(s). That's why I was curious if the mix of fast/slow VDEVs is a good idea in the same pool, or if that was dumb and I should just have separate pools/shares with different performance for each.

My only other "constraint" besides money is power, so using SSDs I was attempting to mitigate the total power draw of the system down somewhat.

How many concurrent clients do you anticipate? When you are doing a write operation, are you just "ingesting" more Movies, TV Shows and Music?
Are those write coming from a single host or many?
Probably at most 5 -10 clients during a write operation, with the majority of the time being just 1 client. Correct on the just ingesting. Typically, that ingestion will just be 1 client.

With 512GB of RAM you should have PLENTY of ARC even if you dedicate more than 50% of your RAM via sysctl.

My system has 256GB of RAM and is probably a not too dissimilar workload and it's already overkill. Notice how low my L2ARC hits are.

View attachment 66938
Awesome, that does look promising then. Are those spikes just at the end of long write sessions where you've run out of ARC and let L2CARC tide the data transition? or is that spike during like a "flush" of data as you're transferring to LTS?
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I have 16 on hand right now (was going to start with 8 but then decided to indulge myself and got 8 more). The problem I am trying to mitigate currently is YEARS of neglect (solely my fault) of non-backed up storage, run directly off my AD server using USB hard disks (@~142 TB of useable space of the USB drives). At the end of the day, I would like to get up to the ~512 TB realm of raw space, so somewhere in the realm of ~372 TB of usable. Depending on level of redundancy that would decrease somewhat, but that's the "goal".
Wow you are quite the data hoarder :P

How much of your 142TB is used currently?

So first part of this project is to get some of my "fast" storage in place, then start working the "slow" storage side of things via external HBAs/JBOD(s). That's why I was curious if the mix of fast/slow VDEVs is a good idea in the same pool, or if that was dumb and I should just have separate pools/shares with different performance for each.
You should not put SSDs and HDD VDEVs in the same pool. ZFS doesn't have any logic to account for this. In newer versions of ZFS, a write will land in whatever VDEV is fastest (older versions used least full). So basically whatever data you write first will likely mostly end up on your SSDs. I don't think that's your intention.
What reason are you planning on having data in either the fast or slow storage pool? What are you using to determine the right place to put X data in Y pool?

Awesome, that does look promising then. Are those spikes just at the end of long write sessions where you've run out of ARC and let L2CARC tide the data transition? or is that spike during like a "flush" of data as you're transferring to LTS?
Those are large spikes are generally associated with "seeding" data or re-encoding large video files.
 

killmas

Cadet
Joined
May 26, 2023
Messages
5
Wow you are quite the data hoarder :P

How much of your 142TB is used currently?

I am at ~123 TB used out of the 142 TB available

You should not put SSDs and HDD VDEVs in the same pool. ZFS doesn't have any logic to account for this. In newer versions of ZFS, a write will land in whatever VDEV is fastest (older versions used least full). So basically whatever data you write first will likely mostly end up on your SSDs. I don't think that's your intention.
What reason are you planning on having data in either the fast or slow storage pool? What are you using to determine the right place to put X data in Y pool?
My intent (based on my lack of understanding how the VDEV / Pools / Truenas worked, and my own general stupidity) was to have a cascade effect of data storage occur during activity. Something like:
ARC (RAM)->L2ARC (NVME SSDs)->Fast Storage (SATA SSDs)->Slow Storage (HDDs)

What I didn't know was if a mixed pool of fast/slow storage, there was some feature where data that is used more frequently is stored in the fastest part of the pool. My thought was that over time, data that is read more frequently would get cached onto the fast storage vs. ending up on the slow storage portion of the pool, and the pool would be smart enough to rearrange data. Again, no idea what I am doing here, so these thoughts aren't based on reality, just my own thoughts.

This is also why I was thinking I would end up with possible two pools (one for fast and one for slow storage), with me mechanically deciding based on the data where it should end up. The "goal" is to keep the bottleneck from a data rate perspective on the networking side of the house (as that can be even more slowly upgraded over time, easier to sneak by my SO in regards to purchases :wink:).

Those are large spikes are generally associated with "seeding" data or re-encoding large video files.

Gotcha. Now that I am looking at your Signature, your configuration from a HW perspective is very similar to what I was looking at doing (I have been watching those EMC 15 Bay Shelf on Ebay, thinking I might go down that path for the HDDs)
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
This is also why I was thinking I would end up with possible two pools (one for fast and one for slow storage), with me mechanically deciding based on the data where it should end up. The "goal" is to keep the bottleneck from a data rate perspective on the networking side of the house (as that can be even more slowly upgraded over time, easier to sneak by my SO in regards to purchases :wink:).
This is probably for the best. You may want to consider a ZFS SPECIAL Metadata VDEV for your "slow" pool though, so you can kind of do what you wanted.

Couple that with an L2ARC on the slow pool and setting:
Code:
l2arc_exclude_special


You will have metadata and small files stored on SSDs and then the only data cached in L2ARC will be data not metadata. So effectively, you will be most of the way there with what you had intended. You can use 2 or 4 of the SSDs you already have for the SPECIAL VDEV and then for an L2ARC you can grab a couple of Optane M.2s for pretty darn cheap:


Gotcha. Now that I am looking at your Signature, your configuration from a HW perspective is very similar to what I was looking at doing (I have been watching those EMC 15 Bay Shelf on Ebay, thinking I might go down that path for the HDDs)
Alot of people talk about NetApp SAS shelves. I really like the EMC ones better.

I don't like that NetApp shelves because they use the non-standard QSFP cable.


EMC's going to be more power friendly and has standard 8088 ports. I have two of them and I have my mirrors setup like this:
b5627e3ba7c027e025644d7a492e09d2fc68709c_2_465x550.png



So I can lose an entire shelf and my pool wouldn't go offline. You also get better performance because you are taking 4 SAS lanes and driving 15 drives, rather than taking 4 SAS lanes and driving 24.

You can buy newer/quieter/more power efficient PSUs from EMCs newer SAS3 shelves and they are compatible with these older SAS2 ones
P/N: 071-000-553 - 3rd Generation VE

They come with SAS interposers and work with both SAS and SATA drives. Just make sure the interposers you have are SATA compatible. Mine are
P/N: 303-115-003D

EMC KTN-STL3 15 bay chassis

 

killmas

Cadet
Joined
May 26, 2023
Messages
5
Update on my end. Once I saw your setup, you inspired me to pull the trigger...and straight up copy you. I ended up getting x4 of those EMC shelves in the exact same mirror config (i.e. 2x2). Due to oversight on my part (2U is hard to fit even 90deg cables into), I ended up getting a Adaptec AEC-82885T expander card to work with my LSI 9305-24i, and threw in a Mellanox Connect X4 (ended up ditching the 10G Intel NIC because....why not) to get 25G working (already had the networking available to do so).

Now I can just slowly expand the server with additional HDDs in mirrors like you mentioned over time.

Appreciate all the help (and inspiration) NickF!
 
Top