Design considerations for 25gbe saturation - veeam - inherited nightmare

DemohFoxfire

Dabbler
Joined
May 2, 2023
Messages
11
4 months into inheriting a small datacenter (~4 racks) running mostly HP G6 / G7 24 bay servers using 10gb iSCSI with each freenas/truenas server with running all drives into single raidz2. These are iSCSI targets for ESXi and its a 50/50 spread of 2.5 SAS disks and consumer level 2.5 SSDs. Im not planning on touching that right now. The 5 or so veeam servers are all windows servers with individual single disks (not even jbod) as backup repositories. Im writing the entirety of the setup off as a total loss and going to rebuild from scratch over the next 18 months.

Ive been around FreeNAS/TrueNAS service / repair / replace a few dozen of these 24 bay units but have always been handed the hardware "This is what you get" so I never was on the spec or tuning side of things here so I would say a bit inexperienced.

The Goal:
50TB + server which can get close to or exceed 25GBE. Expand to 100+TB
The ability to receive multiple backup jobs at once. The current company method are backup chains 20+ long across multiple servers *crying here*
(future) 2nd server as a backup copy destination for redundancy / different retention policies.
Design is modular - Ill likely need to deploy many more than just 2 but this setup would be the largest single storage capacity;

Ive been reading a lot about optimizing for 10gbe and faster speeds, optimizing zfs, the network, etc, I think I need to take a step back and get some fundamentals down. Instead of emulating the current design with raidz2 maybe I should just be using mirrors?

I was thinking using an HPE Apollo 4200 LFF chassis as a base with boot & any other truenas specific drives populated in the rear on mirrored sets. My end goal is to end up fully populating the unit with 24x X sized 3.5 SAS drives but since I dont need 100+ TB right now I am hoping to start small, maybe with 8x drives. Im not married to the Apollo because I have plenty of rack space, I just would like to have everything consolidated (more drives in less servers) instead of being limited to how many drives we can jam into a single chassis. Or using DL380s exclusively and using 2u expansions to additional controllers external sas cables.

Is RaidZ2 fine?
Should I end up with multiple pools / vdevs, IE single 24 drive pool vs 3x 8 drive pools?
What does capacity expansion look like? IE start with an 8 drive pool, if I add another 8 drives is it best to add them to the existing pool or create a 2nd pool?

Any direction would be appreciated so I could potentially cut down weeks of readying / research between all the other fires I am putting out. I figure I will play around with the backup infrastructure prior to attempting a 24x NVMe build for the actual production data so my mistakes arent as expensive.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680


 

DemohFoxfire

Dabbler
Joined
May 2, 2023
Messages
11
Thanks, that definitely corrects some of my initial misconceptions and I might need to re-read a few more times for it to fully sink in; a good start and allows for a good laugh on my part. I also read through over a dozen threads by a few users running similar setups and it solidifies how my predecessor truly had no idea what he was doing when creating these truenas servers.

The iSCSI storage servers that host the VMFS volumes probably fall into the "not good at all" category with 24-wide RaidZ2.... Ill tackle that after backups. No wonder why performance was so abysmal....

I found @blanchet 's hardened repository writeup which is definitely going to be implemented in some capacity. Oddly I cannot find it in the handbook (link in the forum post is broken) and I have a lot of handbook reading to do now that Ive (re)discovered it.

My 2nd round thought process is now leaning more towards ditching raidz2 vdevs or at least limiting their size and adding space via additional vdevs as needed and a much harder look into supermicro offerings.

To confirm that I understood correctly, if I started out with 8 drives as 4 mirrors in the pool, I would have performance which would be linked to available free space and in order to achieve higher speeds I would add drives 2 at a time in mirrored sets to the pool and any performance gain wouldnt be immediate, it would increase over time as the vdev usage levels out. Is this performance related to the increase of free space or because of spreading the writes out over the additional vdevs? I would simply be adding vdevs as needed for both performance and capacity and could theoretically end up with 1 pool with 48 drives configured as 24 mirrored sets.

Would this hold true if it was raidz2 vdevs, assuming 6 drives per vdev and I increase the number of vdevs based on storage / performance as needed? Theoretically ending up with 1 pool with 48 drives configured as 8 6-drive raidz2 vdevs.

Since these are backups and its been the norm to "lose" an entire node prior to me taking over I am not too concerned with losing the pool in the event of a catastrophic failure as this will be 1 of 3 onsite copies of the data as I would have probably a slower freenas from repurposed hardware as the final copy in the event of a pool loss.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
"not good at all" category with 24-wide RaidZ2....

Ow.

I found @blanchet 's hardened repository writeup which is definitely going to be implemented in some capacity. Oddly I cannot find it in the handbook (link in the forum post is broken) and I have a lot of handbook reading to do now that Ive (re)discovered it.

Feel free to use the "Report" button under a post with important broken links. Unfortunately there's a tendency amongst the younger generation of webdevs not to care about things such as maintaining links over the long term. This means that we here in the forums have to make up for that by manually repairing these as they are found. You are probably looking for


I would have performance which would be linked to available free space and in order to achieve higher speeds

Write performance is linked to fragmentation which is in turn related to available free space. Keep lots of free space on your pool for fast write speeds. If the data you write is fragmented (because it's a copy-on-write filesystem), read speeds may be slower due to seeks, so lots of free space at the time of writing also translates to better read speeds when retrieving. But "lots" of free space really means "use less than half your space" if you really want it to perform well.

add drives 2 at a time in mirrored sets to the pool and any performance gain wouldnt be immediate, it would increase over time as the vdev usage levels out.

Actually what will happen is ZFS will suddenly start heavily preferring the new vdev with lots of free space. That will actually be pretty fast but it will be highly focused on that new vdev. For something like a backup repo, this will slowly level out over time as old backups are cleared out.

Is this performance related to the increase of free space or because of spreading the writes out over the additional vdevs?

"Yes". Writing to sequentially available free space reduces seeks, so increasing free space by adding an empty vdev gives you the opportunity to do lots of contiguous allocations. That will be fast. However, spreading writes out over more vdevs also allows you to write to a bunch of disks in parallel. Think of it this way: if you issue a seek followed by a 4KB write to a HDD, you've written 4KB in a few milliseconds. However, if you issue two dozen seeks to two dozen different drives and then do two dozen writes, you've written 96KB (two dozen times as much) in that same few milliseconds.

I would simply be adding vdevs as needed for both performance and capacity and could theoretically end up with 1 pool with 48 drives configured as 24 mirrored sets.

That's a fine way to go. You don't even need them to be matched, though that's preferred. The one thing you probably want to do moving forward is if you are buying new drives, plan for excess capacity. Need 20TB? Try to arrange for 35-50TB. Buy large drives as much as possible. Even if they get more heavily with I/O, this will have a major impact on performance.
 

DemohFoxfire

Dabbler
Joined
May 2, 2023
Messages
11
I am almost done decommissioning one of those "25 wide raidZ2" pools, it was actually a failed mainboard and USB boot that I just ran the SAS cables to one of my potential new servers. Import pool, add iscsi, connect esxi and force esxi to mount it (long story) and confirm data is intact. Pulling a veeam backup of all VMs puts me back into ATA133 days with 5.5TB @ 40MB/s. Luckily this is just historical data but my coworker compare ourselves to the Brotherhood of Steel in our effort to preserve all data. 16 years and and havent lost client data, not about to start now.

Anyways, this brings me to the hardware. What Ive found in the catacombs is 2 unused supermicro servers that I would like to reuse for backups and interim storage for migration tasks as I jockey data around off the remaining 8 or so HP 25 bay iSCSI sans or so. Theres a SC826 12xLFF chassis w/ an X9DRI-LN4F+ and 2x E5-2600 V2 series cpus and a SC846 24xLFF with some Asus server board w/ dual opteron 6000 series cpus. Id want to throw the opterons out due to age even though it functions fine as I dont think it has any business in a semi-production environment.

If I load up the supermicro board up with DDR3 RDIMMs totaling 256GB (doesnt support memory sparing, boo....) would this board (running 2x E5-2690 V2) be sufficient or leave much to be desired? Id like to use the hardware on-hand so I am taking a serious look into using these supermicro units. Id like to run either 5x 6 raidz2 or 4x 8 raidz2 and potentially add a PCIe P4600 for SLOG. Boot will be some enterprise SSD with PLP, I have a small hoard of HPE drives that are m.2 I use when deploying vm hosts but would need to get an adapter.

The 846 has the TQ backplane which is SAS version agnostic but I have to either use 3x 9211-8i (I have 5 of these on-hand) and move over or supply a mainboard to this chassis, run a SAS expander, or replace the backplane. The 826 has the SAS2 backplane so thats easy. Either way I would need to turn one of the chassis into a JBOD and will likely do it via one of the CSE-PTJBOD-CB3 units.

My questions, ill list the current idea at the end:
Any concerns running JBOD where I would be forced to have the pool spread out? I think the consensus is to not split vDevs across enclosures but that wont stop some catastrophic damage if the JBOD goes offline. Or does it? How are write operations stopped / pool integrity maintained in the event of an enclosure failure?

Does it matter which enclosure I would use as the JBOD? The 12bay is ready to go as it doesnt need a backplane or sas expander but the 24bay would need the mainboard moved over.

Should I run 3-4 9211-8i cards or grab something either with more ports altogether and pcie 3.0 or something SAS3008 based with an expander? My current mix of cards is about 50/50 LSI/IBM but all flashed identically. I was looking at the HPE 876907-001

I was going to run XXV710 for networking but all of the VMs run the flexibleLOM adapters which are all connectX-4 based, I figure this doesnt matter but my future storage servers I intend on being HPE so I can see staying Mellanox across the board might be beneficial. Any thoughts on this?

With that raidz config (12tb SAS3 4k drives) I know I probably cant saturate 25gb but this storage isnt meant to be for datastores. Its meant to be a veeam repository. I MIGHT use it for file serving but I doubt it. I will use it for temporary storage for when I am doing other operations elsewhere in the datacenter like an interim datastore to replace another one of those hacked together 25 bay monstrosities I inherited. I was going to go default but it sounds like I should increase the block size a bit since I think there would be a lot of sequential writes.

How does truenas handle memory sparing when it comes to supermicro? Im used to HPE and ESXi and like how that works (but havent had any failures yet so untested) so I would want to know if I should swap out the mainboard for something newer that supports sparing. What about memory size? if I swap I would try to go DDR4. And finally processing power for parity / resilvering for the existing board running E5-2690v2s? Is there benefit / motivation to desire this level of data integrity or should I just run with what I have? My choices with the current board are 256gb @ 1600mhz or 384gb @ 1066mhz, no sparing.

What I am thinking of doing is running the existing board in the 4u, use a 9311-8i to the HP 36port SAS3 expander, leave the backplanes in both chassis, use the 6x breakout cables in the 4u chassis (with sideband) and the 7th would run SAS2 to 2u-turned jbod for the remaining 12 drives. I cant decide between 5x 6raidz2 with 2 of the zDevs residing on the jbod with the hotspares in the main chassis or 4x 8 with the 4 hotspares in the jbod.

Id love to see thoughts from anybody who uses freenas for veeam and what their setups and experiences are.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
(doesnt support memory sparing, boo....)

I thought the X9DR boards did support this. I don't have one handy to check though. Check the BIOS, the option is probably a bit buried.

846 has the TQ backplane which is SAS version agnostic

Not SAS version agnostic. Known to do 3Gbps and 6Gbps SAS fine. Does not support 12Gbps or faster.

Any concerns running JBOD where I would be forced to have the pool spread out? I think the consensus is to not split vDevs across enclosures but that wont stop some catastrophic damage if the JBOD goes offline. Or does it? How are write operations stopped / pool integrity maintained in the event of an enclosure failure?

You really do want an entire vdev to go offline all at once, at least if the goal is to suspend pool operations. A vanished vdev can be brought back online later (export/import pool) and because no new txg's were written, it is expected to be recoverable.

I was going to run XXV710 for networking but all of the VMs run the flexibleLOM adapters which are all connectX-4 based, I figure this doesnt matter but my future storage servers I intend on being HPE so I can see staying Mellanox across the board might be beneficial. Any thoughts on this?

Mellanox cards are not recommended, but if you already have them, you can certainly try them out. Worst that could happen would be they don't work and you replace them with an XXV710.

I think the remainder of your questions resolve from this and your reaction to this.
 

blanchet

Guru
Joined
Apr 17, 2018
Messages
516
For Veeam hardened repository, I am now using a Bhyve virtual machine running Linux Debian on TrueNAS Core (with only a serial console) to benefit XFS Fast Cloning and I follow the Veeam user guide

This setup is even better than the guide I have written a year ago.
 

DemohFoxfire

Dabbler
Joined
May 2, 2023
Messages
11
For Veeam hardened repository, I am now using a Bhyve virtual machine running Linux Debian on TrueNAS Core (with only a serial console) to benefit XFS Fast Cloning and I follow the Veeam user guide

This setup is even better than the guide I have written a year ago.
Ill look into that. As for veeam and my immediate project, what are the specs / zdev config / how many concurrent jobs you are running with what kind of throughput so I can prep myself prior to committing to a lot of the hardware.

Im going to testbed with what hardware I do have once a few other critical pieces come in. I wont have the main storage drives yet but I could probably get a good idea with a fleet of matching older sas2 drives which allows me to play around with vdev stripe width, block size, failure simulations, etc... Or at least thats my desire. Skipping some iterations by seeing what works for others would be helpful so I can make sure I am headed the right direction.

@jgreco , yep, memory sparing works on this X9, it just doesnt say so in the manual. Other X9 boards have it in the manual but not this one. As for the TQ backplane, I wont be heartbroken if they dont link at SAS3; i was just going off of other user reports ive read that they had drives linking at SAS3.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I wont be heartbroken if they dont link at SAS3; i was just going off of other user reports ive read that they had drives linking at SAS3.

Negotiating at SAS3 is not the issue, signal integrity is. I'm guessing they're happy to link at SAS3. But what I'd worry about is whether or not data was accurately transferred. Supermicro switched to using blue connectors to indicate SAS3 capability:

image


So if your backplane lacks blue connectors or "SAS3" in the part number, they are not expected to cleanly support 12Gbps. If they did, Supermicro wouldn't have bothered with the new models.
 

blanchet

Guru
Joined
Apr 17, 2018
Messages
516
I have just purchased a TrueNAS X20-HA with a Bronze contract, and it is really more pleasant to use than a Do-it-yourself server.
When you manage a datacenter you already have enough issues to solve everyday to avoid wasting your time on building your own storage solution.

A TrueNAS R20 is also a good option for Veeam, but having HA is really interesting.
 

DemohFoxfire

Dabbler
Joined
May 2, 2023
Messages
11
... When you manage a datacenter you already have enough issues to solve everyday to avoid wasting your time on building your own storage solution....
I see me saying this in a few years, Except the amount of dopamine on tap with all of this unraveling of my predecessor's setup is hard to let it all go to waste. Currently burning-in a VM node running 6248Rs to replace a Dell R910 and a few other R?10 series servers. One of these days once I am "out of the woods" and everything is back to running smoothly.

For the non-truenas stuff we run traditional HPE and Dell arrays. Or rather we never deploy truenas at a client, only fully supported solutions.

I cant wait to tackle all of the routing & switching. The current setup is 1/4 of a rack of fortigate 60Cs and 60Ds as ipsec endpoints. At least those arent being complained about like the vm processing and storage performance yet. *cringe* ..... But at client sites its all 60E or 60F with contracts.... What do they say, the mechanic drives a beater, a shoemaker's soles are worn through? The concept holds true for IT.

Do you end up preferring Veeam running iSCSI, SMB, NFS? All of our veeam internally and clientside is windows based as we only dont have the labor base to support more than a handful of *nix servers. Prior to me taking over 100% of veeam servers used local storage and for the DC I am working on I am not sure if we will consolidate from half a dozen veeam servers down to 1 but it is headed that direction. Sure thats a whole other post for probably a different forum but hey, more information is better.
 
Top