Newbie Build

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
Should I prefer the Mellanox ConnectX-3 Pro 10GbE SFP+ Adapter or the Intel(R) Gigabit 4P X520/I350 rNDC for this 730xd ? The seller swapped the Intel card rather than the Mellanox, but I can go back to the Mellanox with SPF+ if I prefer. I would kind of rather have than and connect via fiber rather than RJ-45 copper.

Thanks.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The X520/I350 has two SFP+ ports and two 1GBase-T ports, doesn't it? I'm partial to that one, since you get the 1G ports and Intel is less finicky than Mellanox.
 

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
The X520/I350 has two SFP+ ports and two 1GBase-T ports, doesn't it? I'm partial to that one, since you get the 1G ports and Intel is less finicky than Mellanox.
The one I have has two RJ45 10Gig ports and two RJ-45 1Gig ports.

Listed as (two each)
Intel(R) Ethernet 10G 4P X540/I350 rNDC
Intel(R) Gigabit 4P X540/I350 rNDC

I was thinking the fiber runs cooler at the switch than the RJ-45 adapters, and the transceiver for fiber is cheaper. But if the Mellanox will have problem with TrueNAS I can certainly live with the Intel.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
But if the Mellanox will have problem with TrueNAS I can certainly live with the Intel.
I didn't quite say that, it was a general comment. I'll let someone with Mellanox experience on TrueNAS chime in.

The one I have has two RJ45 10Gig ports and two RJ-45 1Gig ports.
I'd forgotten there was one of those, too. I have one of the SFP+ units in production at work. Is getting one of those an option? There's also an X710+I350 rNDC which has the newer feature set. There's even a 25GbE ConnectX-4 rNDC for SFP28.
 

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
I didn't quite say that, it was a general comment. I'll let someone with Mellanox experience on TrueNAS chime in.


I'd forgotten there was one of those, too. I have one of the SFP+ units in production at work. Is getting one of those an option? There's also an X710+I350 rNDC which has the newer feature set. There's even a 25GbE ConnectX-4 rNDC for SFP28.

It would appear the seller has an Intel c63dv they can offer. Is that the model you were referring to? What type of transceivers do you recommend? I was thinking fiber to the MIcrotik switch, but I do have a couple generic TwinAx cables that I suppose are worth testing. Cable run would be less than 2m.

Thanks for all the insight from everyone.

 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It would appear the seller has an Intel c63dv they can offer. Is that the model you were referring to?
I meant Dell Part Number 06VDPG.
What type of transceivers do you recommend? I was thinking fiber to the MIcrotik switch, but I do have a couple generic TwinAx cables that I suppose are worth testing. Cable run would be less than 2m.
If you have the cables, no harm in trying them out. Generally speaking, fiber is more compatible across vendors and DACs are sometimes less-than-compatible.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Ah, the X520 with SFP+ ports, yeah, it should work well. The X710 has a more advanced feature set, but that's rarely a big deal.
 

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
So I am getting close to done with the quadruple four pattern badblocks test on all 12 drives. I am now trying to figure out the best RAID layout. Like everyone I need speed AND reliability ;-)

For those not wanting to read the entire thread:
I have 12 10TB 12Gbps SAS drives on the IT mode Dell HBA330. The backplane has an expander built in if this matters (e.g. 8 SAS lanes expanded to the 12 drives)

10Gbps networking on server and clients machines.

Thoughts on an ideal layout? We deal with both smaller image sequence files (~1-100MB) and large QuickTime files (up to about 10GB).

The NAS will be used by at most 3 concurrent users, more often 1 or 2.

12 drive RAIDZ3
2 - 6 drive RAIDZ2 striped together
2 - 6 drive RAIDZ2 mirrored together
5 - 2 drive mirrors, striped together with 2 hot spares

Anything else?

We will have all data backed up, but of course making backups live is never a fast process (though seriously looking at a second server that could simply be switched over if the need arose).

Thanks in advance.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
12 drive RAIDZ3
a bit too wide, minimal performance (1)
too high a price for the capacity of 9 drives?
2 - 6 drive RAIDZ2 striped together
should be a fair balance of performance and capacity (8); needs cold spares at hand
2 - 6 drive RAIDZ2 mirrored together
not possible with ZFS
5 - 2 drive mirrors, striped together with 2 hot spares
maximal performance, minimal capacity (5), fastest resilver

With only 1-3 clients, I'd guess that 2*(6-wide Z2) is adequate.
Do you have the opportunity to make tests with realistic workloads before deploying in production?
 

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
a bit too wide, minimal performance (1)
too high a price for the capacity of 9 drives?

(Re:12 RAIDZ3) Fair enough. Kind of my thought

should be a fair balance of performance and capacity (8); needs cold spares at hand
(RE:2x6 RAIDZ2) I am leaning towards this. I do have two spare drives beyond the 12 hot drives. I could grab a couple more if I need to. Since I will likely use the same drives in a backup server I will probably end up having 3-5 spares.

maximal performance, minimal capacity (5), fastest resilver

(RE: 5 - 2 drive mirrors, striped together) I like the performance, somewhat concerned about losing the two wrong drives at the same time (or during resilvering). Though, I do plan on having a pretty hot backup. So maybe not that big of a deal. Testing would be ideal.

With only 1-3 clients, I'd guess that 2*(6-wide Z2) is adequate.
Do you have the opportunity to make tests with realistic workloads before deploying in production?
Yes, I think this is going to be the best thing to do, assuming I have time. Usually something comes up at the last minute and we just have to grab and go what is good enough, which I hate for years ;-)

I guess I try the 2x6 RAIDZ2 and see if I can come anywhere near saturating the network link. At least for reads I have 256GB of RAM, and the actively used data should usually fit in there.

Thanks again.
 

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
Any suggestions for benchmarking? In raw dd tests of a 32GB file (/dev/zero to a file and the file to /dev/null) from the TrueNAS server to its own drives (in the shell)

5 stripes mirrored pairs: W:2820 MB/s R:6180 MB/s
(using only 10 drives, with the other two to be hot spares)

2x6RAIDz2: W:3299 MB/s R:6449MB/s
(using all 12 drives)

That seems like the reads are using the ARC cache, but most of our projects will fit in the memory ARC cache anyway. Oh, weird the dashboard shows the cache only has 0.3GB. So that may not be from the cache. Perhaps dd bypasses that.

I plan on testing with our actual workload, but I am not yet setup for 10Gig ethernet on this machine. With 1 Gig it of course saturates the link both ways no problem.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Zeroes compress quite well… If you have to use dd, make your file out of /dev/random.
 

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
Zeroes compress quite well… If you have to use dd, make your file out of /dev/random.
I knew those numbers were absurd! But man that would have been great! Testing it both ways with a 32 GB file of random data. And this IS showing up as filling up the cache now!
 

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
OK. Here we go. I made a 32GB file from /dev/random.

Encryption was ON at the default for both pools.

I measured dd'ing the file to /dev/null before it was cached, and after it was cached:

Uncached Read:
5 striped mirrored pairs: R:780MB/s

2x6RAIDz2: W:775MB/s

I'd say within a margin of error of each other, as I only ran it a few times.


Cached Read:
Either pool type: R:~2900MB/s - from RAM presumably after reading this file a few times to be sure it was cached, though even the second read was about as fast as the third or fourth.


Write (from cached 32GB file):
5 striped mirrored pairs: W:1190MB/s - should roughly max out 10Gig network - Also, strange that write is faster than read, no?

2x6RAIDz2: W:330MB/s - Is RAIDz2 really this slow? Note the first 5-GB are much faster (HD caches maybe?) It starts at about 900MB/s and slowly drops throughout the 32GB file with the big drop around 5GB mark. For instance a 4GB file did about 930MB/s, and an 8GB file is about 550MB/s



So from these test it looks like I should probably go with the 5 striped mirrored pairs since the read speed for our workload is going to be almost all from RAM anyway (we-re-read the same files over and over and over). But the write speed on the 2 x 6drive RAIDz2 is just terrible considering all the drives.

Am I missing something (other than actual test of the true workload)? I suppose most of our files don't exceed 5-10GB, so perhaps the z2 is not so bad. Ugh! I hate commitment! ;-)

Any info/suggestions welcomed and appreciated. Thanks.
 
Last edited:

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
So I tried yet another layout: 3 x 4RAIDz2.

This might be the best balance. (If this is a bad idea for some reason please tell me!)

I get about the same on the reads...

But on the write of the 32GB file it gets about 600MB/s, and and 8GB file does 779MB/sec, 4GB does 1051MB/sec.

Is there some reason NOT to use this setup 3 VDEVs of 4 drives each RAIDz2 ? It has about the same usable capacity as the 5 VDEVs of 2 mirrored drive each. It can handle at least any two drive failing, rather than the mirrors, which if the wrong two drives fail I am toast.

Thoughts?

Thanks.
 
Last edited:

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
Well, I decided to go with the mirrors. Did I make a bad call be deciding to use 6 2-drive mirrors? This leaves no hot spare, but I have cold spares, and every user of the system is capable of swapping drives, and receives an email notification of failures.

Any comments? Bad call?
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
2x6RAIDz2: W:330MB/s - Is RAIDz2 really this slow? Note the first 5-GB are much faster (HD caches maybe?) It starts at about 900MB/s and slowly drops throughout the 32GB file with the big drop around 5GB mark. For instance a 4GB file did about 930MB/s, and an 8GB file is about 550MB/s
ZFS caches writes in RAM up to a "transaction group" (typically 5 s, which would be about 5 GB if fed from a saturated 10 GbE link), then commits data to disk and start again. If the disks cannot cope, ZFS throttles data income, which is what you're seeing.

Am I missing something (other than actual test of the true workload)?
Concurrent workloads maybe? Tough with only 3 clients, raidz2 may still do it—and that's where mirrors shine most.

I suppose most of our files don't exceed 5-10GB, so perhaps the z2 is not so bad. Ugh! I hate commitment! ;-)
Unfortunately, only you can take a decision on your workload.
From post #35, the workload is mostly reads and most writes are "not too large" the throttle on >5 GB files seemed like a small price to pay for 60% more capacity than mirrors.
If resiliency is paramount, 2*6z2 is better than mirrors, and 3*4z2 is best.
If performance is paramount, mirrors rule. If it's your call, it's the right call.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Hello and welcome to the journey :)
We will have all data backed up,
Great - this would mean that you could easily avoid a few pitfalls you're already deep into..
2x6RAIDz2: W:330MB/s - Is RAIDz2 really this slow?
The correct way to put it is; RAIDZ2 isn't really that slow, but your drives are.
The number is along the lines what to expect.
Note the first 5-GB are much faster (HD caches maybe?) It starts at about 900MB/s and slowly drops throughout the 32GB file with the big drop around 5GB mark. For instance a 4GB file did about 930MB/s, and an 8GB file is about 550MB/s
The first 5GBs are faster due to ZFS will 'collect' the data into big enough transaction group in RAM prior it being flushed out to disk. This happens to default at whatever happens first, 5 seconds of data, or X amounts of GB. (don't recall on top of my head exact size of the default setting). This is also why it is difficult to stare at a transfer speed, even from the POV of ZFS to judge the speed.

Welcome to your pitfall of measurements, first you're looking at data from a 32GB file, which gives some sort of average both speed and time. As the speed is heavily accelerated in the beginning due to the data being soaked into transaction groups prior to being flushed to disk, both the 4GB and 8GB measurements will be skewed, the smaller file = more of the "cache soak" than the larger file, thus bigger difference in numbers.

This is also why it does not make sense for you to do benchmarks other than such that PRECISELY mimics what your actual work load is.
Since you've already stated you have the data backed up, the best thing is to start testing with actual data, on actual drives.
Release yourself from this need to benchmark fictions ;)

Also, strange that write is faster than read, no?
No, due to reasons above.
Am I missing something (other than actual test of the true workload)?
Good on you! This is exactly what you are missing.

So I tried yet another layout: 3 x 4RAIDz2.

This might be the best balance. (If this is a bad idea for some reason please tell me!)
Meanwhile a Z2 solution might be fine and dandy with 20% pool utilization, it might be horrendously slow for the users when being in use for a bunch of months seeing a lot of data come and go, plus being used near the 85% mark.

I think this solution makes the least amount of sense. It is sort of taking a sports car and putting bigger tires on it, to claim it would do well off road too.
I'm for a polarizing setup. Any time performance is of interest, and "cost of hardware lost to redundancy" is acceptable - run with mirrors.
The only reason for using in this setup is due to you really need the space and cannot afford mirrors.
If Z2 is adequate performance to your needs - happy times - profit the better value of your hardware.

Rather than half-assing two things, I'm typically an advocate of dual-pool setup whenever there is both a requirement of storage, and speed.
Usally the "working set" can be fit ontop a few SSDs/nvme.
Then the storage pool can comfortably be setup for space maximization as priority rather than performance.
For example, 10wide z2.

Any info/suggestions welcomed and appreciated. Thanks.
Keep in mind, that pools performance degrades with fill rate/fragmentation. Nothing will be ever as fast as an empty pool.
Performance will be fastly different with a 30% fill rate compared to a 84% fill rate.
There are a few threasholds, generally, somewhere around 40-50% is where you have maximum performance.
Beyond 80% the decline is really steep. ZFS basically shifts logic to maximize space instead of performance.
IIRC there are threasholds aroun 85%, 90% too, but here is a territory where you don't want to go. Both due to performance, but also because if you accidentially over-fill your pool, you cannot delete files. ZFS is Copy-on-Write which means it actually needs space - to delete things.
There are some stories on the forums where people basically lost a fully healthy pool due to it being overfilled and - could not be saved.
Set quotas.


I suppose most of our files don't exceed 5-10GB
Then I suggest you don't even bother doing experiments with files larger than that, to not skew your expectations about what the system can do.
The reality is that both the 'transaction group' buffering in RAM for writes, and ARC for reading cache (more RAM the merrier!) will HEAVILY skew what your system "appears" capable of - which is the holy beauty of ZFS. It does more magic during "normal workload" than benchmarks really capture. Ie, a system that benches better than what you've seen so far, does not by any stretch mean ZFS won't be a better user experience!

Ugh! I hate commitment! ;-)
Treat the first pool or two layouts as mere one night stands and you'll be golden.
Forget expectations about future relationships - just pure fun for the moment ;)
 

FlyingHacker

Dabbler
Joined
Jun 27, 2022
Messages
39
Thank you all so much for the info. I figured the initial write speed was due to some RAM caching on the part of ZFS, because it was just too fast otherwise! But the awesome thing is that it will cache most of the files we will use this way. The ARC cache is especially big (with 256GB of RAM), and definitely makes the reads good.

So been testing with my actual workload (at 10Gig) and it is blazingly fast. I am almost never seeing anything go slower than the link speed (10Gig).

I am going with the 6x2 mirrors with cold spares. Then will also have the data all backed up onto a backup server that will likely be 8xRAIDz2.

Will update as I have more info, but as of right now I am a very happy camper.
 
Top