What is needed to achieve 10Gbps read with RAIDZ-2 (raid6)

svtkobra7 · Mar 14, 2019

titusc said:
This is intriguing with what your test demonstrated. I actually didn't understand what you were trying to show despite knowing exactly what each of the command you typed mean, until I read the following on https://blog.programster.org/zfs-record-size.

In the interest of providing sound advice (not that I was attempting to provide any), I wouldn't run off and hammer out "zfs set recordsize=1M" for every dataset in your pool ...
- Rather first, I'd make sure I understand what this does prior to changing,
- and second, if you make the change, do so via GUI instead of CLI.
- [I don't want my post to be interpreted such that everyone should change the default recordsize from 128K to 1M (default = suitable for most) ... not speaking at you (sounds like have some understanding here)]
Separately, I think you get why I chose to post raidZ 3x4x10.0 TB output, but you were originally questioning 12 disk raidz2 speeds ...
- I believe @SweetAndLow did a nice job of illustrating his point,
- but if you would like to see other sequential read/write "benchmarks" w/ or w/o Optane (if you care), for the same disks, I'm happy to share. Just let me know.

Chris Moore · Mar 14, 2019

svtkobra7 said:
Do you bother to burn in Enterprise drives?

Some people might do burn-in on these type of drives, but our operations branch wanted to get the servers in use because we had data arriving and needed a place to put it. So, in both situations, I did not do a full burn-in. On the Seagate drives, I did do an initial short and long SMART followed by a full zero fill of the drives and then a second short and long SMART test. No problems. Then we started pushing data into the server. It has about 330TB of data in it now.

svtkobra7 said:
Ouch regarding the 3 drives - I suppose that speaks to infant mortality.

The WD Red Pro drives were disappointing. I have some WD Gold drives in another system and they have been a bit less disappointing but I still had one of those drives throw 8 bad sectors inside the first month.

jgreco · Mar 15, 2019

svtkobra7 said:
As a tangent and since you presented the opportunity to ask (and unrelated to cited failure rate) => Do you bother to burn in Enterprise drives?

My IT experience = limited to being a hobbyist and I would guess no, as (1) for a 10TB HDD, the 4 bb patterns book-ended by SMART extended tests = better part of a week = probably introduces a deployment "lag" where cost > benefit + (2) I assume part of what you pay for with an enterprise HDD at a higher price is a lower risk of a HDD that won't pass burn in (EXOS non-recoverable errors per bits read 1 per 10^15 & MTBF @ 2.5M hours)

You still want to burn in enterprise drives. One of the things working in favor of cheap consumer drives is that if one fails, you can go down to Best Buy and get a replacement. With enterprise drives, best case scenario is typically no faster than next day, which is usually costly.

Reliability isn't the thing you're paying for with enterprise drives. It's typically speed, and these drives may be 7200RPM or faster, meaning that they're hotter and therefore somewhat more problematic.

In the end, there's probably more of an effect based on drive model. Some models just suck.

https://www.computerworld.com/artic...-be-more-reliable-than-enterprise-drives.html

svtkobra7 · Mar 15, 2019

jgreco said:
You still want to burn in enterprise drives. One of the things working in favor of cheap consumer drives is that if one fails, you can go down to Best Buy and get a replacement. With enterprise drives, best case scenario is typically no faster than next day, which is usually costly.

Appreciate the insight ... "iocage console timemachine" isn't working as intended (so I could get a CS degree instead). ok bad joke, point holds

jgreco said:
Reliability isn't the thing you're paying for with enterprise drives. It's typically speed, and these drives may be 7200RPM or faster, meaning that they're hotter and therefore somewhat more problematic.

hallelujah - Enterprise can have their speed and heat! (fully understand your point o/c)
The second batch of WDC easystore HDDs (5400 rpm, He) replaced HGST Deskstar NAS HDDs (7200 RPM, non-He). [1] I'd love a nice waterfall chart bridging average temps of the two with the delta due to less friction and the delta due to RPM. Kidding, but relatively curious.
Have a looksie if you like graphs (not sharing anything you don't know o/c) ... Apples:Apples environment (ambient temp, chassis, fans, etc) and '01 is '02 replicating 50+TB (WDC => WDC) ATM.
- WDC easystore Tavg (last hour) = 33°C | HGST Tmin (3 months) = ~33°C (removing some outliers)
- Without pulling the logs and doing the math, it looks like the WDC easystore HDDs are 4°C cooler on average than the HGST Deskstar NAS HDDs.
Temps are so much lower that I had to adjust the Y-axis on the RRDTOOL graphs. ;)

[1] Not that I ignored the Hardware Guide - they were purchased well before I had ever heard of FreeNAS.

FreeNAS-01 - Current HDD Config - WDC easystores

Code:

Mar 15 19 04:33:00 | Tmin,avg,max = 32,33,34°C | FAN MODE: [02]OPTIMAL NOT CHANGED

FreeNAS-02 - Current HDD Config - WDC easystores

Code:

Mar 15 19 04:33:00 | Tmin,avg,max = 32,33,34°C | FAN MODE: [02]OPTIMAL NOT CHANGED

FreeNAS-02 - Prior HDD Config - HGST Deskstar NAS HDDs

[horrible representation I know - different scales on both axes etc]

Chris Moore · Mar 15, 2019

jgreco said:
You still want to burn in enterprise drives. One of the things working in favor of cheap consumer drives is that if one fails, you can go down to Best Buy and get a replacement. With enterprise drives, best case scenario is typically no faster than next day, which is usually costly.

We put in the contract that the vendor provide five extra drives for cold spares, which makes life easier for me. I have cold spares for all the sizes we still have in use. I hope to get the 2TB ones out of the system this year.

titusc · Mar 15, 2019

svtkobra7 said:
In the interest of providing sound advice (not that I was attempting to provide any), I wouldn't run off and hammer out "zfs set recordsize=1M" for every dataset in your pool ...

Rather first, I'd make sure I understand what this does prior to changing,

and second, if you make the change, do so via GUI instead of CLI.

[I don't want my post to be interpreted such that everyone should change the default recordsize from 128K to 1M (default = suitable for most) ... not speaking at you (sounds like have some understanding here)]

Separately, I think you get why I chose to post raidZ 3x4x10.0 TB output, but you were originally questioning 12 disk raidz2 speeds ...

I believe @SweetAndLow did a nice job of illustrating his point,

but if you would like to see other sequential read/write "benchmarks" w/ or w/o Optane (if you care), for the same disks, I'm happy to share. Just let me know.

Yes I got the bit about the record size was only meant to proof a point. Me like you are so tired by the time I'm here on the forum that it usually take me reads before I get what is being said. I just passed out and woke up again! I think you were trying to show using as many vdevs as possible yet allowing 2 drive to fail yet still operational with any vdev.

In my head I have been thinking about the following variations of what I can fit in a 12 bay system.
I actually wouldn't mind seeing the benchmarks for the ones highlighted.

1 vdev with 4 drives in RAIDZ/2.
50% efficiency and 2 x speed and 1x IO.
2 vdevs each with 4 drives in RAIDZ/2.
50% efficiency and 4 x speed and 2 x IO.
1 vdev with 6 drives in RAIDZ/2.
66.7% efficiency and 4 x speed and 1 x IO.
2 vdevs each with 6 drives in RAIDZ/2.
66.7% efficiency and 8 x speed and 2 x IO.
1 vdev with 8 drives in RAIDZ/2.
75% efficiency and 6 x speed and 1 x IO.
1 vdev with 12 drives in RAIDZ/2.
75% efficiency and 10 x speed and 1 x IO.

Given what I have learned over the last 2 days it is certain I need to use either [2 vdevs each with 6 drives in RAIDZ/2] or [1 vdev with 12 drives in RAIDZ/2]. I'm using a 4 bay RAID6 setup at the moment which is why I'm curious how big of an improvement we get with the 12 disks setup with either 1 vdev or 2 vdevs.

I'm looking at selecting between the Lenovo SR550 or the HPE ProLiant DL380 G10. I might in fact get an older server as I'm not after compute and the older servers are significantly cheaper.

titusc · Mar 15, 2019

Speaking about drives I just noticed something interesting ranked in order of descending price. Why buy the ST1000NM0008 when you can get SSD for less price! I recall Google did an experiment and the conclusion is that it really doesn't matter what drives you choose as most of them are die early or die late. So one might as well get as many of the cheaper Barracuda drives as cold spare as possible than buying the more expensive IronWolf or FireCuda ones. In fact isn't surveillance is supposed to be 24/7 also? It's dirt cheap.

Seagate ST1000NM0008 1TB Enterprise Capacity 3.5 HDD Exos 7E2 SATA3 /128MB Cache (7x24) $638
Seagate FireCuda 1TB ST1000DX002 Gaming SSHD (Solid State Hybrid Drives) 7200rpm, 64MB Cache HDD $559
Seagate IronWolf NAS 2TB ST2000VN004 SATA3 6Gb/s /64MB Cache HDD $505
Seagate SkyHawk Surveillance 1TB ST1000VX005 SATA3 6Gb/s /64MB Cache HDD $318
Seagate BarraCuda 1TB ST1000DM010 SATA3 6Gb/s /64MB Cache HDD $304

svtkobra7 · Mar 15, 2019

titusc said:
I think you were trying to show using as many vdevs as possible yet allowing 2 drive to fail yet still operational with any vdev.

Actually raidZ 3x4 or 4 x 3-wide Z1 = Tolerates only 1 drive failure per vdev. 2 drive failures on a single vdev = toasted pool.
4 x 3-wide Z1 is a more meaningful description. The GUI notes as RaidZ 3x4x[Size in TB] and I've gotten in the happen of using that nomenclature.

titusc said:
In my head I have been thinking about the following variations of what I can fit in a 12 bay system.
I actually wouldn't mind seeing the benchmarks for the ones highlighted.

Will revert back tonight with the following (I started replicating so don't have an empty pool any longer, but definitely have everything archived):

RaidZ2 6x2
RaidZ2 12x1

No mirrored pairs? I would think a 2x2 Mirror would be preferred compared to 4 drives in RaidZ2 which there is no use case (that I can think of) with 4 drives total, 2 to parity. I could advocate further, but ...

titusc said:
Given what I have learned over the last 2 days it is certain I need to use either [2 vdevs each with 6 drives in RAIDZ/2] or [1 vdev with 12 drives in RAIDZ/2].

RaidZ2 6x2 over RaidZ2 12x1 any day of the week and twice on Sunday.

titusc said:
I'm using a 4 bay RAID6 setup at the moment which is why I'm curious how big of an improvement we get with the 12 disks setup with either 1 vdev or 2 vdevs.

apples to oranges, but I think others have called that out.

svtkobra7 · Mar 16, 2019

@titusc: Here is some material to get you started ...

RaidZ2 6x2 v. RaidZ2 12x1 @ 128k recordsize

General comments: Subject pool architectures compared for sync=disabled, =standard, =always @ 128k recordsize. No SLOG - we will come back to that.
General Summary: RaidZ2 6x2 is clearly more performant for writes. About even for reads.

RaidZ2 6x2 v. RaidZ2 12x1 @ 1M recordsize

General comments: Subject pool architectures compared for sync=disabled, =standard, =always @ 128k recordsize. No SLOG - we will come back to that.
General Summary: RaidZ2 12x1 is clearly more performant for writes. About even for reads.

Sync Write Performance Comparison for RaidZ2 6x2 v. RaidZ2 12x1 @ 128k & 1M recordsize with no SLOG, 1 SLOG, and 2 SLOGs (striped and mirrored, separately).

General Comments: While I presented sync=disabled, =standard, =always previoulsy, this graph only shows synchronous writes (sync=always).
Not sure how much you have dived into this, but adding a SLOG is a great way to boost sync writes from double digits to high triple digits.
I have several INTL Optane 900p - 280GB drives and I've never been able to pass through to FreeNAS correctly (even using the known workaround), so those SLOGs are 20GB virtual disks presented to FreeNAS from ESXi. Passthrough or baremetal would be much preferred, but similar results are achieved even with vDisks.
NB: I'm not sure that dd is the best way to benchmark the impact here as it certainly doesn't represent a real world usage scenario and may be a flawed way to present, generally.

The tangible impact of adding a SLOG is best presented in a different manner, IMO. When I have time to find my previously saved benchmarks, I'll reply back and show you what performance in a VM looks like when stored on a VM datastore presented from FreeNAS.
This may be of no use / interest for your use case and if so, just let me know. The benefit is limited to synchronous writes and you are never going to actually make your pool faster. Top speed is achieved via sync=disabled (asynchronous writes), but using the RaidZ2 6x2 pool as an example, we are able to achieve 82% of that for synchronous writes.

@ FreeNAS community: While raw pool speeds are easily measured with DD, I have yet to master a more comprehensive benchmark suite that provides a broader scope I desire. I spent a bit of time learning Iometer, which does have its place, but wish I'd invested in fio instead. Should anyone come across this request and if you can provide a few examples, I'd be very grateful. I few like learning a new benchmark where one doens't have prior exposure is a quite time consuming endeavor and unforunately overly prone to confirmation bias. Example: As you first get up and running, you are looking to produce results with you deem "reasonable" and then you focus on accuracy, which is either learned from additional experience or peer result exchanges using the same parameters. Anyway, a kick in the right direction would be great!

jgreco · Mar 16, 2019

The problem with ZFS is that most benchmarks turn out to be bullchips, meaningless in (usually) several different directions simultaneously.

Most people do useless things like trying to benchmark ZFS performance on a new pool with artificial benchmark tests.

You really need to test ZFS in an environment that simulates what your production workload is going to resemble, and make sure that you've aged the pool so that things like fragmentation reach steady state.

SweetAndLow · Mar 16, 2019

svtkobra7 said:
@titusc: Here is some material to get you started ...

RaidZ2 6x2 v. RaidZ2 12x1 @ 128k recordsize

General comments: Subject pool architectures compared for sync=disabled, =standard, =always @ 128k recordsize. No SLOG - we will come back to that.

General Summary: RaidZ2 6x2 is clearly more performant for writes. About even for reads.

View attachment 29284

RaidZ2 6x2 v. RaidZ2 12x1 @ 1M recordsize

General comments: Subject pool architectures compared for sync=disabled, =standard, =always @ 128k recordsize. No SLOG - we will come back to that.

General Summary: RaidZ2 12x1 is clearly more performant for writes. About even for reads.

View attachment 29285

Sync Write Performance Comparison for RaidZ2 6x2 v. RaidZ2 12x1 @ 128k & 1M recordsize with no SLOG, 1 SLOG, and 2 SLOGs (striped and mirrored, separately).

General Comments: While I presented sync=disabled, =standard, =always previoulsy, this graph only shows synchronous writes (sync=always).

Not sure how much you have dived into this, but adding a SLOG is a great way to boost sync writes from double digits to high triple digits.

I have several INTL Optane 900p - 280GB drives and I've never been able to pass through to FreeNAS correctly (even using the known workaround), so those SLOGs are 20GB virtual disks presented to FreeNAS from ESXi. Passthrough or baremetal would be much preferred, but similar results are achieved even with vDisks.

NB: I'm not sure that dd is the best way to benchmark the impact here as it certainly doesn't represent a real world usage scenario and may be a flawed way to present, generally.

View attachment 29286

The tangible impact of adding a SLOG is best presented in a different manner, IMO. When I have time to find my previously saved benchmarks, I'll reply back and show you what performance in a VM looks like when stored on a VM datastore presented from FreeNAS.

This may be of no use / interest for your use case and if so, just let me know. The benefit is limited to synchronous writes and you are never going to actually make your pool faster. Top speed is achieved via sync=disabled (asynchronous writes), but using the RaidZ2 6x2 pool as an example, we are able to achieve 82% of that for synchronous writes.

@ FreeNAS community: While raw pool speeds are easily measured with DD, I have yet to master a more comprehensive benchmark suite that provides a broader scope I desire. I spent a bit of time learning Iometer, which does have its place, but wish I'd invested in fio instead. Should anyone come across this request and if you can provide a few examples, I'd be very grateful. I few like learning a new benchmark where one doens't have prior exposure is a quite time consuming endeavor and unforunately overly prone to confirmation bias. Example: As you first get up and running, you are looking to produce results with you deem "reasonable" and then you focus on accuracy, which is either learned from additional experience or peer result exchanges using the same parameters. Anyway, a kick in the right direction would be great!

Iozone is the testing tool you will want to get familiar with. Here is a simple example. The flags are easy and you can manipulate block size and affect sync write performance.

iozone -i 0 -i 1 -i 2 -s 150g -t 1

Elliot Dierksen · Mar 16, 2019

SweetAndLow said:
Iozone is the testing tool you will want to get familiar with.

Do you have a good primer for the options of iozone and how to interpret the output? I must confess to feeling a bit lost when trying to interpret the results of iozone.

SweetAndLow · Mar 16, 2019

Elliot Dierksen said:
Do you have a good primer for the options of iozone and how to interpret the output? I must confess to feeling a bit lost when trying to interpret the results of iozone.

Check out the man page it has great examples. Output should be easy to interpret when compared to other runs. Try doing it on a sync dataset and a sync disabled dateset. You should see big difference.

Chris Moore · Mar 17, 2019

svtkobra7 said:
I would think a 2x2 Mirror would be preferred compared to 4 drives in RaidZ2 which there is no use case (that I can think of) with 4 drives total, 2 to parity. I could advocate further, but ...

The advantage of RAIDz2 with 4 drives over mirrors is that you can survive the loss of two drives in a single vdev. You don't get as many vdevs, but you have greater resistance to disk failure. As for the OP, I think his usecase needs IOPS more, so that 3-way mirrors would be better as it would give the dual failure resiliency and four vdevs instead of just three. Larger drives also offer greater performance than smaller drives when speaking of spinning disks. This test you have done with 10TB drives shows more performance than would be possible with 1TB drives (as an example) because the smaller drives are not capable, mechanically speaking, of the same data transfer rate of the 10TB drives.
Here is the output of zpool status for a pool in my system that I configured as an example.

Code:

  pool: Test
 state: ONLINE
  scan: scrub repaired 0 in 0 days 03:31:21 with 0 errors on Sat Mar 16 23:47:00 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        Test                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/2e919d3d-2c1a-11e9-af8b-00074306773b  ONLINE       0     0     0
            gptid/2f292da6-2c1a-11e9-af8b-00074306773b  ONLINE       0     0     0
            gptid/2fb95d07-2c1a-11e9-af8b-00074306773b  ONLINE       0     0     0
            gptid/30514e6b-2c1a-11e9-af8b-00074306773b  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/41d3312f-2c1a-11e9-af8b-00074306773b  ONLINE       0     0     0
            gptid/426b7b47-2c1a-11e9-af8b-00074306773b  ONLINE       0     0     0
            gptid/43029d18-2c1a-11e9-af8b-00074306773b  ONLINE       0     0     0
            gptid/af54c9c6-4277-11e9-af8b-00074306773b  ONLINE       0     0     0
          raidz2-2                                      ONLINE       0     0     0
            gptid/8e2b6d1f-becf-11e8-b1c8-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/8efea929-becf-11e8-b1c8-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/8fd4d25c-becf-11e8-b1c8-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/90c2759a-becf-11e8-b1c8-0cc47a9cd5a4  ONLINE       0     0     0

errors: No known data errors

There are more factors involved in all of this than what you have discussed.

svtkobra7 · Mar 17, 2019

Chris Moore said:
The advantage of RAIDz2 with 4 drives over mirrors is that you can survive the loss of two drives in a single vdev.

Understood of course. Calculating the risk of data loss over x years + System MTTDL does favor RaidZ2 4 x 1, but due to the number of variables involved there are other concerns I might consider, such as longer resilver = increased risk exposure and the probability of a read error during resilver being higher. Relatively immaterial when viewed in light of the calculated difference in fault tolerance favoring RaidZ2 4 x 1, yet still worthy of calling out in my opinion, if only as an educational point and not a determining factor in pool architecture selection.

Chris Moore said:
You don't get as many vdevs, but you have greater resistance to disk failure.

Probability of "disk failure" = constant.
Probability of "pool failure" = significantly lower.

Chris Moore said:
As for the OP, I think his usecase needs IOPS more, so that 3-way mirrors would be better as it would give the dual failure resiliency and four vdevs instead of just three.

Certainly an option, but you just reduced S/E to 33%.
Advantages present themselves in other areas, at the expense of S/E, but for me personally, the next thought I would have is what is my "cost per usable GB"?
And from that perspective, I'd suggest you have just strengthened the case for using SSDs instead of HDDs, which adds even more options to consider (and I'm not suggesting more need be proffered). :)

Chris Moore said:
Larger drives also offer greater performance than smaller drives when speaking of spinning disks. This test you have done with 10TB drives shows more performance than would be possible with 1TB drives (as an example) because the smaller drives are not capable, mechanically speaking, of the same data transfer rate of the 10TB drives.

Agree, but the value, if any, in those benchmarks was simply to create a relative comparison between pool types.
Similar performance characteristics should hold when comparing pool type A to pool type B, using 1 TB disks or 10 TB disks.

Chris Moore said:
There are more factors involved in all of this than what you have discussed.

I second that - maybe just a few more ;)
Based upon review of prior messages, I figured the saturation point may have been hit, figured what I had on hand may be of some use, and it was intended to be additive to the ongoing discussion, but not comprehensive.
Frankly, I doubt I have the knowledge to call out every possible variable relevant in this amazingly complex pool architecture selection algorithm which has been constructed. [no snarkiness intended, and definitely not at you, but on Sunday at 5 AM, my sense of humor is even worse than usual).

Yorick · Mar 17, 2019

svtkobra7 said:
Certainly an option, but you just reduced S/E to 33%.

As an aside, and point of comparison: That’s exactly what Cisco’s HCI does. 2 copies of the data so you can lose two full systems - with all disks in them - and still be up and running. If now with a huge pucker factor to get those back up and running and synchronized before something else fails.

Cost of the disks is but one cost factor. What about cost of underperforming apps? Often that’s in the “we don’t even want to contemplate that, jobs and customers will be lost” realm. At that point, throwing more spindles at the IOPS problem is entirely reasonable.

titusc · Mar 17, 2019

svtkobra7 said:
@titusc: Here is some material to get you started ...

Thanks very much for the data. Really appreciated it despite some have suggested it to be not reflective of actual scenario. It does give a good idea of where things are.

In fact it appears that the slowest read of 2275MB/s RAIDZ/2 with 2 vdevs of 6 disks each with 128k recordsize is already 18.2Gbps which is almost double my original requirement. This is surprising because I was thinking with 200MB/s per drive, each RAIDZ/2 6 disks array should be providing 800MB/s and with 2 vdevs this will double to 1600MB/s, which is 12.8Gbps. So you test proves that with the slowest combination for 12 disks it is still 42% faster than what I wanted.

I wonder how a 8 disk array + 2 disk for SLOG would fair. The reason I'm saying this is if I can easily find a 1U old server that can take 10 x SFF drives. I can fit in a 2U chassis to my closet for 12 x LFF but that'd mean I need to do a bit of re-work to move things around whereas 1U is just going to be a drop in. Speaking about this any comments about using SFF drives?

I have been researching a lot over the last 2 days on which server to go with and I'm narrowing down to the following. I think the 2U Supermicros I see with the Broadcom 3008 all should work with ZFS out of the box but not too sure about the HP H240 or the Lenovo SR250.

2 U height servers
ProLiant DL180 Gen9 (around $1700) E5-2603 v4 + H240
SuperStorage 5029P-E1CTR12L $1,996 Xeon Scalable + Broadcom 3008 (requires SES3 not sure what this is)
SuperStorage 5028R-E1CR12L $1,926 E5-2600 v4 + Broadcom 3008 IR mode

1U height servers
SuperStorage 5018D4-AR12L (this supports 12 LFF disks but at 32" deep it is too long to fit in my closet)
THINKSYSTEM SR250 $1,200 E-2100 (haven't found out what disk controller it comes with or can use)

Chris Moore · Mar 17, 2019

svtkobra7 said:
there are other concerns I might consider, such as longer resilver = increased risk exposure and the probability of a read error during resilver being higher.

That is a theoretical concern I have seen expressed many times but the duration of resilver is not inherently longer in a RAIDzX vs a mirror because the duration is controlled by the amount of data that needs to be written to the new disk. Unless you have a system that is significantly under-powered for the task, not enough CPU and RAM, or it is busy with other work like serving clients. There are factors that can influence the resilver, but the biggest one is how much data needs to be placed on the new drive and that depends more on the amount of data in the pool, the number of other vdevs in the pool than if that vdev is a mirror or not.

svtkobra7 said:
Certainly an option, but you just reduced S/E to 33%.

Advantages present themselves in other areas, at the expense of S/E, but for me personally, the next thought I would have is what is my "cost per usable GB"?

And from that perspective, I'd suggest you have just strengthened the case for using SSDs instead of HDDs, which adds even more options to consider (and I'm not suggesting more need be proffered).

There is a definite trade being made to get IOPS and still have room for two drives to fail without data loss, but that is one of the things the OP was asking for. From the beginning, the OP wanted to have room for any two drives to fail without loosing data, and it is possible but there is a cost, especially if you also want IOPS which is what is called for in his ask, even if the OP didn't originally recognize that. It is the reason this comment was made on the first page of the thread:

Mlovelace said:
If you're limited to 6 disks and require line-rate 10Gbe reads then go SSD and be done with it. You'll need several more spindles to achieve the same results otherwise.

And that is exactly why many enterprise systems are being constructed (for years now) using SSD instead of spinning disks. Cost of SSD is dropping to the point it is even something that home users are building into their arrays. Spinning disks are still the "cheap" answer for bulk storage but when you have an application that needs speed, and you also need resistance to double disk failure, you begin to need a fairly large number of spinning disks to get the speed. If you need speed and bulk storage, a massive number of spinning disks are nice, I have a system at work with 124 disks in it and part of the reason it is configured that way is for the speed, but it also houses over 330 TB of data with room for more.

svtkobra7 said:
Agree, but the value, if any, in those benchmarks was simply to create a relative comparison between pool types.

Similar performance characteristics should hold when comparing pool type A to pool type B, using 1 TB disks or 10 TB disks.

True and the effort is appreciated. You said

svtkobra7 said:
While raw pool speeds are easily measured with DD, I have yet to master a more comprehensive benchmark suite that provides a broader scope I desire.

I was wondering if you have looked at this utility?

solnet-array-test (for drive / array speed) non destructive test
https://forums.freenas.org/index.php?resources/solnet-array-test.1/

It is something that one of the other moderators put together and it might be helpful for your testing.

titusc said:
In fact it appears that the slowest read of 2275MB/s RAIDZ/2 with 2 vdevs of 6 disks each with 128k recordsize is already 18.2Gbps which is almost double my original requirement.

I am telling you with absolute certainty that you will not get that level of performance. Not from two vdevs of six drives each. I have done the real world testing, not using synthetic bench-marking tools. I guess that nobody bothered to look at my home NAS build but that is exactly the pool layout that I have, using 4TB drives. I did some real world file transfer tests back when I upgraded to a 10Gb switch and posted some graphics showing the results, probably around a year ago, but nobody cared to listen to someone that has done it. Back then I also setup a pool of 16 drives in mirror vdevs to test that configuration over the 10Gb link. I spent a couple weekends testing different configurations between my server and my workstation over my shiny, new to me, 10Gb switch. It isn't like this is the first time the question has been asked. You folks have spent days discussing this, but the answer was already known.

titusc said:
So you test proves that with the slowest combination for 12 disks it is still 42% faster than what I wanted.

No, synthetic testing doesn't indicate real world file transfer performance. I have never seen an instance where the two were even very close. For double sure it doesn't prove anything about your potential build unless you are using the same drives with the same performance characteristics that the tester was using. Build it and find out.

titusc said:
Speaking about this any comments about using SFF drives?

I commented about SFF drives days ago, but you must have ignored it. I won't repeat myself.

Chris Moore · Mar 17, 2019

svtkobra7 said:
Anyway, a kick in the right direction would be great!

PS. Did you turn off compression when you were doing these tests?

titusc · Mar 21, 2019

Chris Moore said:
I commented about SFF drives days ago, but you must have ignored it. I won't repeat myself.

It's not that people are ignoring you but when you have about 4 - 6 hrs of sleep every day this is forgotten when you get to the 2nd to 3rd page.

I have been considering the use of SSD actually because of the following.

Lower heat per disk.
Less disks required to provide similar performance, which means even less heat.
IOPS is limited when using RAIDZ (read & write both = 1 * IOPS / drive) a non mirrored (read = N * IOPS / drive & write = 1 * IOPS / drive) or striped pool (read & write both = N * IOPS / drive).
Higher transfer rate of about 2.5x than spinning disks.
Possibility to use 1U chassis.

So here I have the following comparison.

ProLiant DL20 Gen10
- 1U high / 15" deep
- 6 x SFF
- Xeon E-2176M 45W TDP 6 cores at 2.7GHz

Samsung 860 Evo 2TB
- $330 on B&H
- 550MB/s and 520MB/s for read / write sequential per manufacturer
- 98k / 90k for read / write IOPS per manufacturer
- Average 3.0 W Maximum 4.0 W per manufacturer

So with 6 of these I get the following.
- (6 - 2) * [550MB/s read, 520MB/s write] = 2.2GB/s read and 2.08MB/s write
- 1 x [98k read, 90k write] = 98k read and 90k write
- 6 * 4W = 24W max
- 6 * $330 = $1980
- (6 - 2) * 2TB = 8TB

SuperMicro SuperChassis 826BE1C-R920LPB
SuperMicro X10DRH-CLN4
- Xeon E5-2630L v3 55W TDP 8 cores at 1.8GHz
- 2U high / 25.5" deep
- Broadcom 3008 with JBOD IT mode
- 12 x LFF

Seagate BarraCuda 1TB ST1000DM010
- $50 on Amazon
- 166MB/s and 142MB/s for read / write per https://hdd.userbenchmark.com/Seagate-Barracuda-1TB-2016/Rating/3896 as of the submitted results by time of this post
- Unknown IOPS
- Average 4.6W Typical 5.3W per manufacturer

So with 12 of these setup in 2 * 6 way RAIDZ (ie 2 vdevs of 6 disks each) I get the following.
- 2 * (6 - 2) * [166MB/s read, 142MB/s write] = 1.328GB/s read and 1.136GB/s write
- 2 * unknown IOPS / drive
- 12 * 5.3W = 63.6W max
- 12 * $50 = $600
- 2 * (6 - 2) * 1TB = 8TB

Important Announcement for the TrueNAS Community.

What is needed to achieve 10Gbps read with RAIDZ-2 (raid6)

Patron

Hall of Famer

Resident Grinch

Patron

Hall of Famer

Dabbler

Dabbler

Patron

Patron

Resident Grinch

Sweet'NASty

Guru

Sweet'NASty

Hall of Famer

Patron

Wizard

Dabbler

Hall of Famer

Hall of Famer

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "What is needed to achieve 10Gbps read with RAIDZ-2 (raid6)"

Similar threads