Noob looking for some direction with design choices

Albotross

Cadet
Joined
Jun 15, 2022
Messages
4
I am new to TrueNas and the forums, and I'm planning a build later in the year. I'm starting to plan so I can budget and I'm looking for some direction. I have spent hours looking for answers to some of these questions that make sense, but so much of the information I read seems to be dated, in some cases by years, so I expect that many of the answers may be inaccurate based on all the recent updates to TrueNas. Any/all responses are greatly appreciated. So here goes:

My plan:
I'm looking for an expandable system that will serve 3 primary functions:
1) Media/File Server (just a share drive as my primary media consumption will be at home on a wired network, so no Plex server or internet streaming necessary)
2) Backup target
3) iSCSI target for 2-3 (desktop OS) VMs (the VMs will be hosted on another computer via 10g network)
4) REDUNDANCY & DATA SAFETY is my primary concern followed closely by expandability. Cost is at the bottom of my priorities, BUT I am cost conscious. So, starting with a larger number of 1-2tb drives and upgrading them later is good (system complexity - for future expansion - is designed from the beginning, but initial costs are lower due to smaller HDD size). A smaller number of larger HDDs is not as good (as the capacity expands, the costs to do so may be significantly higher at each increment due to need to purchase larger quantities of larger HDDs at once) since the design complexity may limit what each expansion looks like. So, cost is a consideration, but I'll spend more now if it means spending less later (although, probably higher TCO over the long run) OR it buys an easier upgrade path later

My first questions are about VDEV/Pool layout

1) How are Hot Spares allocated across the VDEVs and Pool. Specifically, if I have a (example) 16 bay chassis (not including boot), and I'm considering 2x 8 drive groups, so I'd have 2x 7 drive z3 VDEVs and 2 hot spares. Will that be 1 hot spare per VDEV (each VDEV has access to only 1 hot spare) or 2 hot spares available in the pool (so either VDEV has access to both hot spares as needed?

2) What are the current recommendations about # of drives per VDEV? Does this change if I'm using 2tb drives vs 4tb drives vs 10tb drives, etc. I know the higher drives per VDEV will be more cost efficient per/TB usable space, but I'm looking for stability/reliability first, then cost in a 70 (stability)/30 (cost) balance

A) 7 drives per VDEV + a hot spare (so 8 drives per group) (Chassis with multiples of 8 drives)
B) 11 drives per VDEV + a hot spare (so 12 drives per group) (chassis with multiples of 12 drives)
C) 13-14 drive per VDEV + 1-2 hot spare (so 15 drives per group) (Chassis with multiples of 15 drives)

3) To increase capacity (assuming available HBA ports and empty drive slots exist), which is better for capacity expansion, again, cost is a lower priority, but I don't want to waste money for no reason:

A) Add a VDEV first (cheaper option)
B) Swap out HDDs to a higher capacity first (seems to be more expensive sooner)

C) If swapping HDDs to a higher capacity (assume 2x Z3 VDEVs striped), is the upgrade 1 drive at a time (for the pool) or 1 drive per VDEV at a time (so 2 drives could be swapped out at once as long as it's 1 drive per VDEV)

4) Any thoughts (besides hardware failure rate and cost) as to the pros/cons of 1 larger chassis (i.e. Supermicro 24/36/45/60/90 bays vs several 12 bay (or similar) JBODs. I'm considering either:

A) Larger Supermicro chassis (45-90 bays - Multiple 15 drive groups) - PROS: higher physical density, fewer control boards to fail CONS: Limited to 6gbps backplanes because I don't intend to spend $10k on a new(er) one
B) Multiple Supermicro CSE-801L (12 drive groups - 1 per chassis), Chenbro RM25324 (multiple 12 drive groups) or AIC RSC-2MS (multiple 8 drive groups) with the necessary Supermicro CSE-PTJBOD-CB (control) & Intel RES3TV (expander) cards. PROS: Lower startup cost (I can add as I need) potential for 12gbps for less cost CONS: Higher cost to expand because I have to purchase both the chassis/control/expander as well as the drives, more devices = higher potential failure rate

5) If building my own JBOD(s), is there a recommendation between the version 1 and version 3 Supermicro JBOD control boards CSE-PTJBOD-CB1 vs CSE-PTJBOD-CB3. As best I can tell the only real advantage to the v3 board is fan speed control via IPMI (which IS significant). Are there other advantages/disadvantages to either board (other than cost and availability)?

Last question (for the moment):

6) Does anyone have any experience with something along the lines of the following hardware for a pool. I know Highpoint HBAs have previously been considered less reliable (drivers/support, etc.), but I haven't found much regarding the newer NVME cards in TrueNas. SO, the following SEEMS like a spectacular idea, but can someone speak to it being a good idea or a bad idea and why - If a bad idea, is there a better option for this level of performance:

HighPoint SSD7580 (A or B - is one better than the other other than hot swap, which I don't need)
IcyDock MB873MP-B
8x 1tb NVME drives (I only need 2-3tb usable, but I could always upgrade to 2tb sticks if I needed more)
Should this be 4x 2drive (mirrored) then striped VDEVs (i.e. raid 10) for the pool or 1x 8drive z3 for a single VDEV pool
Consideration should be for redundancy. Raid 10 (for the sake of conversation) means that 4 drives could fail, BUT only 1 per VDEV vs any 3 drives could fail in the Z3 option (Z3 seems like a better choice for redundancy, but remember, I'm looking for recommendations)

Thanks in advance
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
1) How are Hot Spares allocated across the VDEVs and Pool. Specifically, if I have a (example) 16 bay chassis (not including boot), and I'm considering 2x 8 drive groups, so I'd have 2x 7 drive z3 VDEVs and 2 hot spares. Will that be 1 hot spare per VDEV (each VDEV has access to only 1 hot spare) or 2 hot spares available in the pool (so either VDEV has access to both hot spares as needed?
Generally speaking, for RAIDZ 2 or 3 I wouldn't recommend hot spares at all unless your case involves difficult physical access due to distance (where it may take yo some time to get to replacing a failed drive manually).

Hot spares can be allocated to a pool and can act in any of the VDEVs of the assigned pool and to me make sense when using Mirrored VDEVs. (since you're without redundancy at the point of any drive failure).

2) What are the current recommendations about # of drives per VDEV? Does this change if I'm using 2tb drives vs 4tb drives vs 10tb drives, etc. I know the higher drives per VDEV will be more cost efficient per/TB usable space, but I'm looking for stability/reliability first, then cost in a 70 (stability)/30 (cost) balance
Once you're in the TB range, it doesn't change a lot... drive count in the VDEV (width) is important to keep to a reasonable number and there are some "sweet spots" in terms of aligning with block allocation, but you shouldn't really get worried about that as long as you're in the range of 4-12 wide.

3) To increase capacity (assuming available HBA ports and empty drive slots exist), which is better for capacity expansion, again, cost is a lower priority, but I don't want to waste money for no reason:

A) Add a VDEV first (cheaper option)
B) Swap out HDDs to a higher capacity first (seems to be more expensive sooner)
Adding a VDEV costs you the redundancy (3 disks for Z3) each time.

Increasing disk size costs you nothing in additional redundancy and potentially gives you back all the old disks for use elsewhere.

C) If swapping HDDs to a higher capacity (assume 2x Z3 VDEVs striped), is the upgrade 1 drive at a time (for the pool) or 1 drive per VDEV at a time (so 2 drives could be swapped out at once as long as it's 1 drive per VDEV)
I've recently seen a thread showing some strange things going on when replacing 2 drives in the same pool in different VDEVs, seeming to show that maybe only one was actually resilvering at a time.

Theory says it can do them in parallel.

It will take longer the more full your pool already is when you start. And the nex round will take longer still with larger drives (and presumably more data).

6) Does anyone have any experience with something along the lines of the following hardware for a pool. I know Highpoint HBAs have previously been considered less reliable (drivers/support, etc.)
Stick to LSI-chipped HBAs that are flashed to IT mode. The millions of production hours on those drivers can't be equaled and you're asking for trouble with anything else. It's something you'll need to accept if you want to go with FreeBSD and ZFS. (and don't want to lose your data)
 

Albotross

Cadet
Joined
Jun 15, 2022
Messages
4
6) Does anyone have any experience with something along the lines of the following hardware for a pool. I know Highpoint HBAs have previously been considered less reliable (drivers/support, etc.)
Stick to LSI-chipped HBAs that are flashed to IT mode. The millions of production hours on those drivers can't be equaled and you're asking for trouble with anything else. It's something you'll need to accept if you want to go with FreeBSD and ZFS. (and don't want to lose your data)


So do you have a recommendation for a redundant NVME pool? Based on your response, I would think the only real option is a Broadcom Tri-Mode card with U.2 or U.3 drives since I'm not aware of a good way to connect multiple m.2 cards that would use an LSI based controller. Is that correct or is there a better option?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
...

So do you have a recommendation for a redundant NVME pool? Based on your response, I would think the only real option is a Broadcom Tri-Mode card with U.2 or U.3 drives since I'm not aware of a good way to connect multiple m.2 cards that would use an LSI based controller. Is that correct or is there a better option?
NVMe drives are different. As long as you have free NVMe or PCIe slots, you are good. The reliability issue is for addon SATA or SAS cards. Also, any built in Intel SATA port is generally okay.

However, their are some caveats with NVMe drives in PCIe carrier boards. Some of the PCIe carrier boards, (the cheap ones), require bifurcation of the PCIe slot when using more than one NVMe card / drive on the board. Some boards support 4 x NVMe, in a x16 lane PCIe slot. BUT, require the CPU & BIOS to allow dividing up this specific x16 lane PCIe into 4, x4 lane groups. Then the quad NVMe drive x16 lane PCIe card works fine.

Other, (more expensive), NVMe drive PCIe carrier boards have a PCIe switch. So it does not matter the size of the PCIe slot, (except for performance reasons), or the number of NVMe drives your carrier board has, 1, 2, 3 or 4.

In the case of the Broadcom Tri-Mode cards, they were mostly designed to support Enterprise servers or disk arrays. Thus, a single disk slot could support either SATA, SAS or NVMe drives, (depending on the slot configuration).
 

Albotross

Cadet
Joined
Jun 15, 2022
Messages
4
3) To increase capacity (assuming available HBA ports and empty drive slots exist), which is better for capacity expansion, again, cost is a lower priority, but I don't want to waste money for no reason:

A) Add a VDEV first (cheaper option)
B) Swap out HDDs to a higher capacity first (seems to be more expensive sooner)
Adding a VDEV costs you the redundancy (3 disks for Z3) each time.

Increasing disk size costs you nothing in additional redundancy and potentially gives you back all the old disks for use elsewhere.

So, if I'm interpreting what you said here correctly:
1) Adding a VDEV = Faster incorporation into the pool and normal capacity considerations given the parity drives, vs
2) Increasing drive size = 1 drive at a time upgrade, so it could take a while

But otherwise, either works with no real preference as to which might be better (other than my need to get it done faster or not and/or immediate cost concern at the time of starting the upgrade)

It almost sounds like copying the data off, destroying the pool, swapping all HDDs, rebuilding the pool then copying everything back seems to make more sense (assuming you have the capacity off server to do this.... Or better yet, 2 mirrored servers, so you can take 1 offline to perform the upgrade of all drives, then swap and do the same to the 2nd server.
NVMe drives are different. As long as you have free NVMe or PCIe slots, you are good. The reliability issue is for addon SATA or SAS cards. Also, any built in Intel SATA port is generally okay.

However, their are some caveats with NVMe drives in PCIe carrier boards. Some of the PCIe carrier boards, (the cheap ones), require bifurcation of the PCIe slot when using more than one NVMe card / drive on the board. Some boards support 4 x NVMe, in a x16 lane PCIe slot. BUT, require the CPU & BIOS to allow dividing up this specific x16 lane PCIe into 4, x4 lane groups. Then the quad NVMe drive x16 lane PCIe card works fine.

Other, (more expensive), NVMe drive PCIe carrier boards have a PCIe switch. So it does not matter the size of the PCIe slot, (except for performance reasons), or the number of NVMe drives your carrier board has, 1, 2, 3 or 4.

In the case of the Broadcom Tri-Mode cards, they were mostly designed to support Enterprise servers or disk arrays. Thus, a single disk slot could support either SATA, SAS or NVMe drives, (depending on the slot configuration).
1st off, thank you for answering.

As for the response, I understand the the connection methods/limits. I was hoping to use a drive bay like IcyDocks MB873MP-B unit, but their documentation states that it is incompatible with the Tri-Mode cards due to a pinout difference on the occulink connector and say that the compatible PCIe controller should be Highpoint's SSD7580. Of course, when I requested any comments about that card the response was... "....LSI chipped cards only is your best bet....". So, that being the case, I am looking for a recommendation for the cheapest way to add 8 NVMe drives to a single PCIe 4.0 16x slot. Using m.2 cards in the above mentioned drive caddy is certainly much cheaper and takes up less space in the server chassis (1x 5.25 drive bay) then using U.2 drives. It also overcomes the limit of only 1 available PCIe slot (so multiple 4 M.2 connector boards seems to be the wrong direction)

Here is the overall goal:

Motherboard: Supermicro X11 or X12 SDV-4c (limited to 1x 16x port and 1x 8x port)
Pool & HBA 1: Broadcomm/LSI 9300/9400/9500 series card (PCIe 8x) to 1 or more JBODs for my volume data storage
NVME Pool (proposed): Highpoint SSD7580 connected to Icydock MB873MP-B with 8x 1TB M.2 sticks in 4x mirrored pairs

The pros for this setup are: expandable HDD storage in pool 1 & 4tb NVMe (or 5tb via Z3, but that seems like a step back in performance) that can grow to 8tb (by replacing the drives with 2tb units)
The cons for this are: limited to 8tb NVME without getting stupidly creative & expensive (PCIe expansion backplanes or 8 or 16 tb enterprise SSD's, etc.) and the "... don't use the highpoint HBA in TrueNas..." problem.

2 other options for consideration are
1) I could step up to a more conventional Xeon motherboard that offers more PCIe slots, BUT unless I get a massively expensive processor, the extra PCIe slots aren't going to provide the additional PCIe lanes necessary to expand. So the 1 slot option - if I can overcome the connection to multiple NVMe drives seems the cheapest option. and
2) consider a more readily available NVME only chassis (so I'm utilizing the built in NVME backplane(s) rather than trying to build my own) as my head unit (i.e. Supermicro SSG-110P-NTR10), but this significantly raises my costs and limits my expandability options
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
So, if I'm interpreting what you said here correctly:
1) Adding a VDEV = Faster incorporation into the pool and normal capacity considerations given the parity drives, vs
2) Increasing drive size = 1 drive at a time upgrade, so it could take a while

But otherwise, either works with no real preference as to which might be better (other than my need to get it done faster or not and/or immediate cost concern at the time of starting the upgrade)

It almost sounds like copying the data off, destroying the pool, swapping all HDDs, rebuilding the pool then copying everything back seems to make more sense (assuming you have the capacity off server to do this.... Or better yet, 2 mirrored servers, so you can take 1 offline to perform the upgrade of all drives, then swap and do the same to the 2nd server.
Yes, you seem to understand it all pretty well... destroy and rebuild may be faster if you have the hardware for it.

Regarding the NVME/Highpoint/architecture thing...

When you're in this kind of territory, you need to not only consider how much NVME can be squeezed into a single chassis, but rather what you need out of the system.

If you put 8 NVME drives good for 8GB/s each in a system and then can't get more than 12GB/s out of it, you're going to be a little upset.

There are many factors which will limit you once you're in that territory and PCIe lanes, CPU, RAM and Network are the biggest ones.

CPU for SMB can be quite problematic when dealing with massive throughput.

Not that I in any way endorse what they do, but LTT had their hands on some very nice kit recently (don't have the link handy, sorry) which ended up having to split out the NVMEs to smaller servers with 400Gbits of network between them (and a custom filesystem) to allocate the sufficient amounts of RAM and CPU to each bundle of NVMEs to make it work.

I suspect your suggestion to go with more "off the rack" hardware would likely produce better results... maybe even save yourself the headaches and just buy a certified system straight from iX and let performance issues be their problem. They probably aren't as crazy expensive as you think and compared to what's on your shopping list, maybe even on par.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
HighPoint SSD7580 (A or B - is one better than the other other than hot swap, which I don't need)
Also, I guess you saw that you would need 2 of those for each 8 NVME drives... they only have 4 ports each and the IcyDock has 8. ... EDIT: maybe they are offering a cable that splits each port out to 2... not sure if that really works for full speed on each of the 8 NVMEs though... still only 16 PCIe Lanes on the card.

Setting aside that there's probably no driver for it anyway.
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@Albotross - You seem to have a handle on your NVMe drives. We get a mixed bag here in the TrueNAS forums, so I did not know.

As for 8 U.2 slots, (with M.2 NVMe drive carriers), that is a bit beyond me. I looked at it for my last TrueNAS build, and came to the conclusion that it was not needed.

Without doing the research, I can't help you on your NVMe controller and drive cage selection. Though I can make one comment, LSI Tri-Mode controllers with NVMe kinda expect a SAS Expander that supports NVMe / U.2 slots. Otherwise, you would be limited to 1 to 2 U.2 / M.2 drives per controller, (4 lane or 8 lane on the controller).
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
A few comments, based off 'reading between the lines' of your questions.

a) RaidZ2 vdev sizing:
The general trade offs to keep in mind, some of which you've identified, is cost vs reliability vs speed.
The logic is basically; wider vdevs tends to be slower [in itself, but particularly compared to the same number of drives in 2 vdevs], but more space efficient. They also run higher risk during resilvering.
The 'rule of thumb' of how wide (number of drives in each vdev) tends to be between 6 to 10.
I'd pick a nice multiple that works with your chassis solution, preferably with a drive slot or two empty - ready for spares (to resilver with old drive still in).

b) 2nd Nvme pool.

I feel, as others also pointed out, you're out on territory the forum rarely see user reports from.
To assure the smoothest experience, avoid adding complexity that is 'cutting edge' in terms of driver support. I'd avoid that icy-brick, and avoid newer LSI's for nvme if possible. Simple adaptors (requiring bi-furcation is fine)

For the nvme pool, I see you've left the ideas of raidz, and turned to mirrors - excellent.
Rather than using many smaller drives, as pointed out - performance will most likely bottleneck elsewhere - I suggest maybe just run 2st or 2x2 larger units.
Maybe just a single mirror, with perhaps a slower hot spare, like an SSD. (I've ran similar but with old laptop rotating rust for hot spare on multiple mirrors).

c) Motherboard considerations.
I believe you should and prioritize PCIe lanes, and slots for that matter with sufficient bifurcation support.
Basically, I'd look for a dual cpu E5, with fewer cores and higher clockfrequency where possible.
The dual sockets grants some additional lanes to work with, and the significant amounts of RAM you could make use of with such amounts of drives.
 
Last edited:

Albotross

Cadet
Joined
Jun 15, 2022
Messages
4
Thanks you for the reply.

As to your comment about being disappointed with performance: I say Nay Nay (to quote the late great John Pinnette).... you have to understand my motivations (see below)

The purpose is NOT to extract greater than 12gb/s out of the system (to use your example), but rather to

1) create the same level of redundancy (and/or redundancy options) with an NVME pool that a more normal spinning HDD pool has
2) have the some level of expandability
3) while I may not get more than the 12gb/s, I want to make sure that I'm guaranteeing the 12gb/s at all times
4) be able to say I did it
5) Opportunity to purchase (mostly) completely unnecessary but still very cool toys
6) Be the envy of my 2 dogs (at least in my head) and any other computer geeks, oh I mean professionals that are willing to listen to my pompous gloating

So, given those VERY important reasons stated above, I have to say I'm really disappointed that "... maybe even save yourself the headaches and just buy a certified system straight from iX and let performance issues be their problem...." is a likely worthwhile suggestion... :(
Also, I guess you saw that you would need 2 of those for each 8 NVME drives... they only have 4 ports each and the IcyDock has 8. ... EDIT: maybe they are offering a cable that splits each port out to 2... not sure if that really works for full speed on each of the 8 NVMEs though... still only 16 PCIe Lanes on the card.

Setting aside that there's probably no driver for it anyway.
Also, I guess you saw that you would need 2 of those for each 8 NVME drives... they only have 4 ports each and the IcyDock has 8. ... EDIT: maybe they are offering a cable that splits each port out to 2... not sure if that really works for full speed on each of the 8 NVMEs though... still only 16 PCIe Lanes on the card.

Setting aside that there's probably no driver for it anyway.


The cables they sell do split out to 2 each, so this is the recommended card for the IcyDock drive bay. No, it won't be full speed to the 8 drives, I don't think there is any way to do that, but again.... that's not fully the point. The point is, can 8 NVME drives be used in a VDEV and do it for way less than the $7-8k starting point of an NVME only chassis i.e. Supermicro's SSG-110P-NTR10. There are other ways to accomplish what I want, but mostly it would be with U.2/U.3 drives which take up way more space (conventional drive bay to drive caddy) then the IcyDock 1x 5.25 to 8x M.2 I was hoping to use. I'll keep looking and dreaming. Of course, when it comes time to actually spend the money (hopefully a little later this year), I'll probably end up with 4x 2tb in a striped/mirrored configuration in a single PCIe 4 slot card, because that will be way cheaper and way less aggravating then what I had hoped for and while I'm less concerned about the cost, there comes a point where I could be ridiculous, and I won't end up there when push comes to shove.

Thanks again for taking the time to look to see if there is a practical way to indulge my fantasy
 
Top