What is needed to achieve 10Gbps read with RAIDZ-2 (raid6)

titusc

Dabbler
Joined
Feb 20, 2014
Messages
37
Background
I am trying to host all my Lightroom catalogue and RAW photos on a NAS so I can share it between computers. The problem is Lightroom is designed to run with the files on local disk because the pictures are quite large at around 25GB per photo so you can imagine how long it'd take to load an album full of photos. People have done it by hosting everything on a NAS as shown on fstopper.

Requirement
I want to upgrade my NAS to one that can provide a sustained read of 10Gbps with RAID6 or RAIDZ-2.

Questions
  1. Must I use a 12 disk setup in order to achieve 10Gbps? Per the benchmark available on calomel.org I can see the the following.
    4 disks: r=183MB/s
    6 disks: r=488MB/s
    12 disks: r=1065MB/s
  2. Would I not be better off running RAID1 which provides improved read performance and dual disk failure redundancy? For example the same link above shows 3 disks in RAID1 provides r=589MB/s which already exceeds 6 disks setup in RAIDZ-2. So perhaps I can achieve the same performance with only 6 disks? Somehow the thought of buying 6 disks is cheaper than buying 12 disks and I can fit 6 disks on a 1U server whereas 12 disks requires a 2U server.
  3. I know calomel.org says I need to use 12 disks to achieve 10Gbps with RAIDZ-2 but this requires a 2U server to host all these disks. I'd much rather get a way with something like a Thinksystem SR250 (4 x LFF or 10 x SFF) or a ProLiant DL20 G10 (2 x LFF or 6 x SFF). Is it possible to achieve 10Gbps with RAIDZ-2 on these smaller 1U servers which have a limit on the number of disks.

Thanks.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Are you serving the same small pool of small or large files over and over again or are the files changing all the time?

You may be able to get away with 4 or 6 L2ARC disks (fast SSDs) and a boat-load of RAM (128GB +) to get you to the result with a smaller number of spinning disks behind it in RAIDZ2.

We will need to understand the requirement better to make the right recommendation.

RAIDZ2 on its own doesn't work exactly like RAID6, so performance of reads isn't necessarily any faster (and may even be slower) as you add disks (when not having sufficient ARC and L2ARC).

Do you care about how long it takes to write or is read the only important factor?
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,555
In Lightroom you don’t need access to the RAW files directly, you can use smart preview to build a representative library. Then you just need to download the actual files on export. I’m not sure how well that plays with that big files, but it’s worth looking at.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
What is needed to achieve 10Gbps read with RAIDZ-2 (raid6)
More vdevs. Each vdev provides roughly the performance of a single disk. A single disk (depending on the disk) can provide between 100 and 200 MB/s of transfer. If your disks are slow (100 MB/s) you will need ten vdevs, but if they are really fast (250 MB/s) you might be able to get by with four vdevs.
In any case, the answer is more disks because you don't have enough to saturate a 10Gb network link

Per the benchmark available on calomel.org I can see the the following.
I don't know that I trust those results. I have looked at the site before and I think they are laboring under a misconception and their testing is skewed by a lack of disks to work with. All their testing only uses 24 drives. If you only have 24 drives, the best performance you can get is 12 mirror vdevs, because it gives you the most number of vdevs, NOT because those vdevs are mirror vdevs.
I have a server at work with 124 drives and I can see how IO scales with vdevs in ZFS. It isn't the number of disks, it is the number of vdevs that matter.
4 disks: r=183MB/s
6 disks: r=488MB/s
12 disks: r=1065MB/s
This would be completely dependent on the performacne of the individual disks involved. No, the number of disks in a single vdev doesn't matter. It is the number of vdevs. So, the only way this works is if each disk is a vdev which would be running with no redundancy.
Would I not be better off running RAID1 which provides improved read performance and dual disk failure redundancy?
We don't have any "RAID1" in ZFS. The thing we have that provides "dual disk failure redundancy" is RAIDz2.
For example the same link above shows 3 disks in RAID1 provides r=589MB/s which already exceeds 6 disks setup in RAIDZ-2.
One of the reasons I think their testing is poor is the inconsistent use of terminology. There is no RAID1 in ZFS. You can have a pool of mirror vdevs, which is more like a RAID10, to use the Non ZFS words (as bad as they taste) but words matter, because it changes what it is you are saying.
I can fit 6 disks on a 1U server
Why would you want a 1U server? How do you think you are getting 6 disks in a 1U server? How much storage capacity do you need?
I know calomel.org says I need to use 12 disks to achieve 10Gbps with RAIDZ-2 but this requires a 2U server to host all these disks.
I wish that site would melt away because the information they give out is useless. I have done many of the same tests and used many more disks to do them. It is just not accurate to any real world performance that you will actually see with files vs synthetic testing.
SFF (Small Form Factor) disks are slower than LFF (Large Form Factor) and each disk has a lower maximum capacity of storage. These details matter, and 12 disks is not many disks at all when talking of 10Gb speed, so you need to get past that. You probably need 24 LFF disks in a pool of 12 mirror vdevs to see the kind of performance you want.
Is it possible to achieve 10Gbps with RAIDZ-2 on these smaller 1U servers which have a limit on the number of disks.
NO. Not unless you go solid state, you know, SSD.
 
Joined
Dec 29, 2014
Messages
1,135
NO. Not unless you go solid state, you know, SSD.
I can get 9.1G reading off spinning rust in a pool with dual Z2 vdevs. Writes are much less (~=4G), but I have a pretty good cpu and a lot of memory (256G).
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I can get 9.1G reading off spinning rust
That is good. Many models of disk read faster than they write. I have some that read around 140MB/s but write around 60MB/s.
What kind of disks are they?
 
Joined
Dec 29, 2014
Messages
1,135
That is good. Many models of disk read faster than they write. I have some that read around 140MB/s but write around 60MB/s.
What kind of disks are they?
Seagate ST91000640NS - 1TB SATA 7200rpm
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Is that for random or sequential read?
 
Joined
Dec 29, 2014
Messages
1,135
Is that for random or sequential read?
Likely sequential. I get the number from looking at the network reporting graph when doing a VMware storage VMotion off the FreeNAS.
 

titusc

Dabbler
Joined
Feb 20, 2014
Messages
37
Are you serving the same small pool of small or large files over and over again or are the files changing all the time?
It is possible for example when I am browsing an album scrolling back and forth the photos in the same album. On the other hand it'd be proper for me to say no because unless I am searching for a photo and have no idea where it is, usually I go through the photos one a time and once I am done I will not look at them again for some time.

You may be able to get away with 4 or 6 L2ARC disks (fast SSDs) and a boat-load of RAM (128GB +) to get you to the result with a smaller number of spinning disks behind it in RAIDZ2.
This got me thinking a bit but if I'm scrolling down photos each are 25GB, I need a 500GB SSD just to scroll up and down 20 photos. And if I don't need to scroll back and forth I guess this won't help.

We will need to understand the requirement better to make the right recommendation.
Do you care about how long it takes to write or is read the only important factor?
Writing is important but typically it happens only once and I can live with a 5Gbps speed. What is more important is read because it is needed when viewing album and also needed when creating thumbnails of all the photos I have at the moment.


In Lightroom you don’t need access to the RAW files directly, you can use smart preview to build a representative library. Then you just need to download the actual files on export. I’m not sure how well that plays with that big files, but it’s worth looking at.
Yes I can use smart preview although I also have a lot to build so it is something I'd use definitely but I don't want to rely on that.


More vdevs. Each vdev provides roughly the performance of a single disk. A single disk (depending on the disk) can provide between 100 and 200 MB/s of transfer. If your disks are slow (100 MB/s) you will need ten vdevs, but if they are really fast (250 MB/s) you might be able to get by with four vdevs.
Okay this is interesting. I'm somewhat rusty with ZFS but I brushed up on the terminology. So the max number of vdev I can have is the number of disks but vdev usually consists a number of drives in RAID1 or RAID5 or RAID6 mode. Does this mean if I have a 6 disk setup the max number of vdev I can have is 6, and each vdev only have one disk so when any one disk fail I lose the whole vdev because there is no redundancy? My understanding is the latest benchmark shows hard disks are 20 - 200MB/s for regulator drives. So if I can only fit in 4 disks that means I can only create 1 vdev consisting of 4 disks setup in RAIDZ/2 mode, if I wish to have RAIDZ/2? This means if the max speed of the disks are 200MB/s then that's all I get from the NAS?

Why would you want a 1U server? How do you think you are getting 6 disks in a 1U server? How much storage capacity do you need?
Because this needs to fit in a closet. I have space for 1U and likely 2U but I'd prefer not to use up the space if possible. I have a 2TB RAID6 NAS at the moment and wish to get to 5TB next. Not enough to justify a lot of disks.

Let's say I get the THINKSYSTEM SR550 which can fit in 12 x LFF disks and to start I buy only 6 x 4TB LFF and setup a vdev in RAIDZ/2, I get 16TB in capacity. Can I simply add another 6 x 4TB LFF (again RAIDZ/2) vdev later and then combine the two vdev?

Per RAID calculator I see it says 4 x read speed for a 6 disks RAID6 array. Does this mean if I'm using 200MB/s disks I get 1000MB/s (10Gbps minus overhead) for the first vdev?

Any recommendations on what is the most economical 2U rack server to go by?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Per RAID calculator I see it says 4 x read speed for a 6 disks RAID6 array. Does this mean if I'm using 200MB/s disks I get 1000MB/s (10Gbps minus overhead) for the first vdev?
Quit looking at things that are not talking about ZFS because you are just wasting your time.
Regular RAID (hardware RAID) does not work the same as ZFS.
 

titusc

Dabbler
Joined
Feb 20, 2014
Messages
37
I can get 9.1G reading off spinning rust in a pool with dual Z2 vdevs. Writes are much less (~=4G), but I have a pretty good cpu and a lot of memory (256G).
Sorry are you saying you are using 2 vdev only and each vdev is a RAIDZ/n? How many disks are there per vdev and which RAIDZ/n are you using for each of the vdev?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Does this mean if I have a 6 disk setup the max number of vdev I can have is 6, and each vdev only have one disk so when any one disk fail I lose the whole vdev because there is no redundancy?
If you have each disk setup as a separate vdev in a storage pool, the information written to the pool is striped across all vdevs, so if you loose a disk (vdev), you loose the whole pool. That is why the minimum redundancy would be mirror vdevs, so six disks would get you 3 vdevs.
 
Joined
Dec 29, 2014
Messages
1,135
Sorry are you saying you are using 2 vdev only and each vdev is a RAIDZ/n? How many disks are there per vdev and which RAIDZ/n are you using for each of the vdev?
Here is what the pool in question looks like.
Code:
  pool: RAIDZ2-I
 state: ONLINE
  scan: scrub repaired 0 in 0 days 01:34:28 with 0 errors on Sun Feb 24 01:34:30 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        RAIDZ2-I                                        ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/67a9a148-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/68893123-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/696903c2-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/6a501044-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/6b4526cb-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/6c34b281-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/6d271bd9-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/6e33d52c-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/a1436a28-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/a24a517e-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/a3404858-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/a43c8614-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/a53a0b93-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/a657fa7a-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/a761f10f-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
            gptid/a8b3b2da-de13-11e8-adca-e4c722848f30  ONLINE       0     0     0
        logs
          nvd0p1                                        ONLINE       0     0     0
        spares
          gptid/c01c4d23-de13-11e8-adca-e4c722848f30    AVAIL

errors: No known data errors

It is 2 vdevs of 8 disks each in RAIDZ2 with a spare and an Intel Optane 900P as an SLOG to make NFS synch writes faster.
 

titusc

Dabbler
Joined
Feb 20, 2014
Messages
37
Quit looking at things that are not talking about ZFS because you are just wasting your time.
Regular RAID (hardware RAID) does not work the same as ZFS.
Okay sorry I thought it's close enough. I just searched around online and found a few ZFS calculator but the ones I found only let me know the usable size, not the speed increase. Any suggestions?

If you have each disk setup as a separate vdev in a storage pool, the information written to the pool is striped across all vdevs, so if you loose a disk (vdev), you loose the whole pool. That is why the minimum redundancy would be mirror vdevs, so six disks would get you 3 vdevs.
One of the advantages I like with RAID6 is that any two of the disks in the array can fail. Even if I have a 6 disk setup with 3 vdevs, each consisting of a mirrored pair of drives, I lose the whole storage if the 2 disks that happened to fail is on the same vdev. This is more like RAID10 when both disk in the same mirrored pair fails I lose the whole array. So I don't think I can accept this potential risk. So looks like there are 2 options:

Option 1
- 6 disks in total.
- 2 vdev each consisting of 3 mirrored drives.
- Capacity is 2 drives out of the 6 disks aka 33%.
- Read speed is 2 x speed of each drive.

Option 2
- 6 disks in total also.
- 1 vdev consisting of the whole 6 disks in RAIDZ/2.
- Capacity is 4 drives out of the 6 disks aka 66%.
- Not sure what the read speed is here......


It is 2 vdevs of 8 disks each in RAIDZ2 with a spare and an Intel Optane 900P as an SLOG to make NFS synch writes faster.
Wow this is insane. That's 16 disks minimum but are we saying this is what is required to saturate a 10Gbps link for read using ST91000640NS and any less disks than this will not be able to saturate the 10Gbps link? What server is this on?
 
Joined
Dec 29, 2014
Messages
1,135
Wow this is insane. That's 16 disks minimum but are we saying this is what is required to saturate a 10Gbps link for read using ST91000640NS and any less disks than this will not be able to saturate the 10Gbps link? What server is this on?
Click the 'Primary FreeNAS' link in my signature to see a full description. I don't know about how many drives are required as all the different pieces affect the speed. This happens to be what I have put together based on prior experience and lots of reading/suggestions from the forum. My understanding is 6-8 drives is the optimal number for a RAIDZ2 vdev. I picked 8 on this server because it has 24 drive bays, and two of them are in a mirror for the system dataset. On my secondary system where I have an external case with 25 bays, I have 4 vdevs of 6 drives each and a spare. Truthfully I don't really push my stuff all that hard. IT is a hobby/passion as well as a job. My wife is ok with it because it keeps me off the streets and out of bars during the day. It also has to stay within the confines of my office, or the approval wanes quickly. :smile:
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Option 1
- 6 disks in total.
- 2 vdev each consisting of 3 mirrored drives.
- Capacity is 2 drives out of the 6 disks aka 33%.
- Read speed is 2 x speed of each drive.

Option 2
- 6 disks in total also.
- 1 vdev consisting of the whole 6 disks in RAIDZ/2.
- Capacity is 4 drives out of the 6 disks aka 66%.
- Not sure what the read speed is here......
If you're limited to 6 disks and require line-rate 10Gbe reads then go SSD and be done with it. You'll need several more spindles to achieve the same results otherwise.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Wow this is insane. That's 16 disks minimum but are we saying this is what is required to saturate a 10Gbps link for read using ST91000640NS and any less disks than this will not be able to saturate the 10Gbps link? What server is this on?
You just don't get it do you? The number of disks doesn't really matter that much. The number of vdevs is what is important and if the data comes out of memory then disks don't matter at all.

here is an example read from my pool. The first read is from disk and the second read is from memory. As you can see I have more vdevs and more disks than the previous post and get slower disk read speeds. But if you read out of memory speeds are as fast as memory can move.
Code:
root@tubby:/mnt/tubby # dd if="10gig.zip" of=/dev/null bs=1M
9898+1 records in
9898+1 records out
10379532936 bytes transferred in 49.319507 secs (210454919 bytes/sec)
root@tubby:/mnt/tubby # dd if="10gig.zip" of=/dev/null bs=1M
9898+1 records in
9898+1 records out
10379532936 bytes transferred in 2.369600 secs (4380288256 bytes/sec)

Code:
config:

        NAME                                            STATE     READ WRITE CKSUM
        tubby                                           ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/fe7bc8a4-bf37-11e5-8237-0cc47a696106  ONLINE       0     0     0
            gptid/080402eb-fbf1-11e7-9400-0cc47a696106  ONLINE       0     0     0
            gptid/00f797cc-bf38-11e5-8237-0cc47a696106  ONLINE       0     0     0
            gptid/0239d831-bf38-11e5-8237-0cc47a696106  ONLINE       0     0     0
            gptid/edb57a76-8907-11e8-8f76-0cc47a696106  ONLINE       0     0     0
            gptid/04b6902c-bf38-11e5-8237-0cc47a696106  ONLINE       0     0     0
            gptid/05ef0d89-bf38-11e5-8237-0cc47a696106  ONLINE       0     0     0
            gptid/0722ea4a-bf38-11e5-8237-0cc47a696106  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/76bb0e2c-c3e1-11e5-bdb5-0cc47a696106  ONLINE       0     0     0
            gptid/77db5018-c3e1-11e5-bdb5-0cc47a696106  ONLINE       0     0     0
            gptid/792589b9-c3e1-11e5-bdb5-0cc47a696106  ONLINE       0     0     0
            gptid/7a737c62-c3e1-11e5-bdb5-0cc47a696106  ONLINE       0     0     0
            gptid/7bbf5e31-c3e1-11e5-bdb5-0cc47a696106  ONLINE       0     0     0
            gptid/7d13ca62-c3e1-11e5-bdb5-0cc47a696106  ONLINE       0     0     0
            gptid/7e5e8f8f-c3e1-11e5-bdb5-0cc47a696106  ONLINE       0     0     0
            gptid/7fa7703f-c3e1-11e5-bdb5-0cc47a696106  ONLINE       0     0     0
          raidz2-2                                      ONLINE       0     0     0
            gptid/1a6e49d4-f496-11e7-9683-0cc47a696106  ONLINE       0     0     0
            gptid/1afe4bed-f496-11e7-9683-0cc47a696106  ONLINE       0     0     0
            gptid/1ba8675e-f496-11e7-9683-0cc47a696106  ONLINE       0     0     0
            gptid/1c3f4f37-f496-11e7-9683-0cc47a696106  ONLINE       0     0     0
            gptid/1cd9beea-f496-11e7-9683-0cc47a696106  ONLINE       0     0     0
            gptid/1d7c5f6e-f496-11e7-9683-0cc47a696106  ONLINE       0     0     0
            gptid/1eb20e6b-f496-11e7-9683-0cc47a696106  ONLINE       0     0     0
            gptid/1fe2372f-f496-11e7-9683-0cc47a696106  ONLINE       0     0     0
 

titusc

Dabbler
Joined
Feb 20, 2014
Messages
37
Click the 'Primary FreeNAS' link in my signature to see a full description. I don't know about how many drives are required as all the different pieces affect the speed. This happens to be what I have put together based on prior experience and lots of reading/suggestions from the forum. My understanding is 6-8 drives is the optimal number for a RAIDZ2 vdev. I picked 8 on this server because it has 24 drive bays, and two of them are in a mirror for the system dataset. On my secondary system where I have an external case with 25 bays, I have 4 vdevs of 6 drives each and a spare. Truthfully I don't really push my stuff all that hard. IT is a hobby/passion as well as a job. My wife is ok with it because it keeps me off the streets and out of bars during the day. It also has to stay within the confines of my office, or the approval wanes quickly. :)
Interesting. Why Cisco over say Dell / HP / Lenovo for a server or storage pod?

If you're limited to 6 disks and require line-rate 10Gbe reads then go SSD and be done with it. You'll need several more spindles to achieve the same results otherwise.
Is there really no way to achieve 10Gbps read speed using spinning disks if I want to go with RAIDZ/2 or RAID6, limited to 12 disks bay? One thing I'm still not certain yet (I know you and @SweetAndLow are saying vdev is what matters) is whether there is really no benefit of more disks within a vdev in terms of read speed or are you saying the increase is not scaling as well as with a RAID (non ZFS) setup? If there is really no benefit to having more disks in a vdev and I want 2 disks failure redundancy than I can go with vdevs that are 4 disks in RAIDZ/2 setup. With a 12 disk bay I can at least squeeze out 3 vdevs. But given @ElliotDierksen has indicated that even with a 2 vdev setup he is able to get 10Gbps then I guess I can achieve that with 2 vdevs each with 4 disks (total 8 disks). Yes?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Per RAID calculator I see it says 4 x read speed for a 6 disks RAID6 array. Does this mean if I'm using 200MB/s disks I get 1000MB/s (10Gbps minus overhead) for the first vdev?

No way in hell. Well, maybe if there's like GOBS of free space, like 80%++ of your space is unused. Over time, fragmentation is going to mean that you do not have the long sequential runs of blocks necessary to be able to consider drives to be running at "200MB/s". They will run SUBSTANTIALLY slower.
 
Top