SOLVED Sanity check of performance, RaidZ2 server 50% of RaidZ1 server

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
So I have two servers (full specs in my signature). They are both SuperMicro 2U servers with 10GbE, one main one backup...

My main server has 6x14TB in RaidZ2
My backup server has 7x8TB in RaidZ1

I am getting wildly different speeds for Read/Write for sustained large single files

On my main server at best I can get about ~375MB/s Read and ~220MB/s Write (I noticed it caches some at the start, will hit like 700MB/s for a few seconds)
On my backup server I get about ~575MB/s Read and ~575MB/s Write all consistent across the entire file transfer.

I know the main server being RaidZ2 would have some performance loss.. but this just seems like a lot to me. I wanted a sanity check here to see if this is normal. I am considering blowing away my main sever pool and either trying just RaidZ1 or adding another drive just like my backup.. I want the speeds my backup server has essentially :)

Also I tried playing around with these flags in my SMB config on the main server,

aio write size = 0
aio read size = 0

They didn't seem to change the average speed much.. maybe slightly faster using them. Definitely more consistent speed with them on. With them off, Reads were all over the place, going from 200MB/s to 600MB/s ever few seconds.
 

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
Just a couple extra data points, running this command on both servers,

Read Test
fio --name=seqread --rw=read --direct=0 --iodepth=32 --bs=128k --numjobs=1 --size=128G --group_reporting

As for the read speeds here is the output,

Main Server:
READ: bw=215MiB/s (225MB/s), 215MiB/s-215MiB/s (225MB/s-225MB/s), io=128GiB (137GB), run=611040-611040msec

Backup Server:
READ: bw=734MiB/s (770MB/s), 734MiB/s-734MiB/s (770MB/s-770MB/s), io=128GiB (137GB), run=178524-178524msec

That's almost 3x faster!! I can't imagine RaidZ2 adding that much overhead...

Write Test
fio --name=seqread --rw=write --direct=0 --iodepth=32 --bs=128k --numjobs=1 --size=16G --group_reporting

Main Server:
WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s), io=16.0GiB (17.2GB), run=61241-61241msec

Backup Server:
WRITE: bw=553MiB/s (580MB/s), 553MiB/s-553MiB/s (580MB/s-580MB/s), io=16.0GiB (17.2GB), run=29624-29624msec

Essentially 2x faster.


The last thing I will add that I see.

When looking at the processor utilization while doing the reads, my main server seems to spike one thread at 100% a lot, essentially the whole time during the test. Where my backup server will have one thread maybe hit 70% for a second but usually they are all below 15%
 
Last edited:
Joined
Jan 4, 2014
Messages
1,644
It seems the Ultrastar DC is an SMR drive. I wonder if this could be part of the problem?
 

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
It seems the Ultrastar DC is an SMR drive. I wonder if this could be part of the problem?

I think the newer ones are SMR, but according to this, mine (530 series) is CMR. "the Ultrastar DC HC530 is based on conventional magnetic recording (CMR) technology "

 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I have yet to see reports of a disk 8TB or larger that's SMR in any case... Even the WD RED 8TB was never SMR when the 2, 4 and 6 were.
 
Joined
Jan 4, 2014
Messages
1,644

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
Archive disks generally are. Pre-SMR awareness era, shucked archive drives might have been used on FreeNAS. Here are the latest incarnations... https://www.westerndigital.com/comp...-first-20tb-smr-and-18tb-cmr-hard-disk-drives

I'm confused, are you saying they might be SMR even when they say its CMR???

I think I am going to do one more test today and pull the drives out of my backup server and main server, put the main ones in the backup server and just verify their is nothing going on with the server itself... I doubt this will do anything, but just to rule out it could be anything else other then the disks.
 

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
Also one other question. In RaidZ2. When reading from the Pool, will it only ever read from 4 Disks at a time since 2 are parity? I noticed when running zpool iostat it only read data from 4 disks at a time, as in it will pull ~100MB/s from 4 disks then the other two it reports as like 30k... essentially 0. It will shift around too, so its not like just the first 4 all the time, its random essentially, but only ever 4 reading at full speed. Is this normal behavior for a RaidZ2 config?

Last I put in 4x 4TB drives into my main server in a stripe configuration. I was able to get a consistent >550MB/s read and >500MB/s right across large files... This makes me think either RaidZ2 is not a good fit for my 6 Disks and its really hurting performance. Or the Disks themselves are not that good.. I am leaning towards RaidZ2 not being good for these 6 Disks as I can clearly see single disk R/W speeds from zpool iostat hitting at least 150MB/s for reads. I might just have to switch to RaidZ1 for these disks as the RaidZ2 performance hit is just insane.. at least 2X slower then I should be getting.
 
Last edited:

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
I've been doing some tests all day, on just 3x4TB and 4x4TB disks just to see if the servers have the same results, so far all tests are matching on both servers, ill post more data soon... but I have been comparing it to the data I found here, https://calomel.org/zfs_raid_speed_capacity.html. I'll just say, my data doesn't match theirs at all! Not sure why they would be different but our 3disk and 4disk tests are not showing the same results.
 
Joined
Jan 4, 2014
Messages
1,644
I'm confused, are you saying they might be SMR even when they say its CMR???
No, not at all. It depends on the model. I saw 14TB Ultrastar DC in your signature block and incorrectly assumed you were using the HC620, which is SMR.
 
Joined
Jan 4, 2014
Messages
1,644
My main server has 6x14TB in RaidZ2
My backup server has 7x8TB in RaidZ1

I am getting wildly different speeds for Read/Write for sustained large single files

On my main server at best I can get about ~375MB/s Read and ~220MB/s Write (I noticed it caches some at the start, will hit like 700MB/s for a few seconds)
On my backup server I get about ~575MB/s Read and ~575MB/s Write all consistent across the entire file transfer.

I know the main server being RaidZ2 would have some performance loss.. but this just seems like a lot to me.
What you're seeing is pretty much spot on. The performance of your RaidZ2 is 4 x streaming read/write speed of a single disk (375/4=93.75), while the performance of your RAIDZ1 is 6 x streaming read/write speed of a single disk (575/6=95.83).

Extract from the article Six Metrics for Measuring ZFS Pool Performance Part 2...
Here’s a summary:
N-wide RAIDZ, parity level p:

  • Read IOPS: Read IOPS of single drive
  • Write IOPS: Write IOPS of single drive
  • Streaming read speed: (N – p) * Streaming read speed of single drive
  • Streaming write speed: (N – p) * Streaming write speed of single drive
  • Storage space efficiency: (N – p)/N
  • Fault tolerance: 1 disk per vdev for Z1, 2 for Z2, 3 for Z3 [p]
 
Last edited:

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
Here are my results from testing Single/Stripe/RaidZ1/RaidZ2 across 3 and 4 disks configs on my two servers...

TL/DR:
1. Both servers appear to operate within margin of error so I do not believe either of the two servers are the issue.
2. The odd thing I noticed was in RaidZ1 and the Read Speeds, with both 3 and 4 disks, the read speeds were split in half, with 3 disks 2 disks went half the speed of one disk, and with 4 disks 2 disks went half the speed of 2 disks. Then with RaidZ2, they all were read from at the same speed.

This is very strange as in my 6 Disk RaidZ2 pool on my main server, ill see 4 disks at 100% and 2 disks at 0MB/s, only 4 disks essentially sending data at any one point in time. And on my 7 Disk RaidZ1 pool all disks are read from for maximum speed >600MB/s.

Also my 6 Disk RaidZ2 pool seems to have the same Write speed limit that I am seeing across the board on all these tests, ~200MB/s. Which is odd as once I got to a 7 Disk RaidZ1 pool I double my Write speeds.


1631825319512.png
 

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
What you're seeing is pretty much spot on. The performance of your RaidZ2 is 4 x streaming read/write speed of a single disk (375/4=93.75), while the performance of your RAIDZ1 is 6 x streaming performance of a single disk (575/6=95.83).

Extract from the article Six Metrics for Measuring ZFS Pool Performance Part 2...

I don't think those equations work, especially for writes. (Assumes 125MB/s R/W speed which is actually very conservative, I think these disks can hit 150MB/s). That said I guess I don't know all the overhead involved in ZFS for actual single disk read/write speeds. Maybe if a disk could do 125-150MB's, after processing that's probably like 100-115MB/s especially if raidZ2 vs raidZ1, I assume 2 has a higher overhead... That would explain the write speeds a bit better potentially.

Main Server
  • Streaming read speed: (N – p) * Streaming read speed of single drive = 6-2*125MB/s = 500MB/s
  • Actual ~250-300MB/s
  • Streaming write speed: (N – p) * Streaming write speed of single drive = 6-2*125MB/s = 500MB/s
  • Actual ~210MB/s
Backup Server
  • Streaming read speed: (N – p) * Streaming read speed of single drive = 7-1*125MB/s = 750MB/s
  • Actual ~750MB/s, pretty much spot on...
  • Streaming write speed: (N – p) * Streaming write speed of single drive = 7-1*125MB/s = 750MB/s
  • Actual ~550MB/s, off by some, but I think this equation for write speeds again is off...
 
Last edited:
Joined
Jan 4, 2014
Messages
1,644
I don't think those equations work, especially for writes.
The theory is sound. In practice, there will be other factors that impact the overall performance. It's a bit like a Gigabit NIC. In practice, you never actually achieve the full 1 Gb/s speed. Real-life figures approach the theoretical maximums, but rarely match them. What I've demonstrated is that the effective read speed of a single drive on your setup is around 95 MB/s. Your empirical study is also showing that the effective write speed is around the same on the RAIDZ1 pool, but less on the RAIDZ2 pool. To understand why, you'll need to head down the rabbit hole and investigate further.
 
Last edited:

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
The theory is sound. In practice, there will be other factors that impact the overall performance. It's a bit like a Gigabit NIC. In practice, you never actually achieve the full 1 Gb/s speed. Real-life figures approach the theoretical maximums, but rarely match them. What I've demonstrated is that the effective read speed of a single drive on your setup is around 95 MB/s.

fair enough :). I think the only true test will be to blow away either my main or backup pool and test both single disk and both raidz1 and raidz2 configs…. I did order a new 14TB drive that I may be throwing in my main server and converting that to a 7xRaidZ1 config… I would prefer raid z2 but I really want >500MB/s R/W speeds. And since I have a main and backup nas I think raidz1 will be ok…. I may also try blowing away the backup pool first and just make a 6 disk pool to see if it behaves like my main pool… got some thinking to do.
 

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
Just out of curiosity, anyone with a RaidZ2 pool with 6 disks, can you confirm if you copy a large file from that pool to your local disk if you see it read from all 6 drives at the same speed using this command,

cmdwatch -n 1 zpool iostat -vy 1 1

You should see under bandwidth that it is pulling from all drives at the same speed, probably around 80MB/s is my guess.

So far I have tested 3,4 and 5 disks in RaidZ2 and it reads from all disks at the same rate all the time.. however in my 6 disk pool, it only reads from 4 disks at full speed, it randomly swaps between 4 disks while 2 go to 0MB/s.

Here is what a point in time output looks like, can see only 4 disks have any operations going on at one point in time.

Code:
v01                                             46.8T  29.4T  2.53K      0   200M      0
  raidz2                                        46.8T  29.4T  2.53K      0   200M      0
    gptid/e77367e4-2da8-11eb-bca9-002590827b48      -      -    733      0  49.9M      0
    gptid/e7f9d07d-2da8-11eb-bca9-002590827b48      -      -    622      0  50.3M      0
    gptid/e818448e-2da8-11eb-bca9-002590827b48      -      -      0      0  30.8K      0
    gptid/e832ba7b-2da8-11eb-bca9-002590827b48      -      -      0      0  30.8K      0
    gptid/e87cc732-2da8-11eb-bca9-002590827b48      -      -    625      0  50.0M      0
    gptid/e89aa8ff-2da8-11eb-bca9-002590827b48      -      -    614      0  50.2M      0
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
in my 6 disk pool, it only reads from 4 disks at full speed, it randomly swaps between 4 disks while 2 go to 0MB/s.
If ZFS is deciding that reading the parity isn't helpful to accelerate the reads, it won't do it. RAIDZ2 doesn't store all parity on the same drives for all blocks, hence the changing of the 2 "idle" drives as reading continues.

In narrower VDEVs (3, 4 & 5) ZFS may be calculating some of the data from Parity rather than only from direct reading and with fewer spindles/heads to work with that may deliver benefit. Or in the cases where numbers don't work, there may be no blocks that aren't stored across all drives, so no parity reads, just no cases where one or more disks is idle to get to the data.
 

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
If ZFS is deciding that reading the parity isn't helpful to accelerate the reads, it won't do it. RAIDZ2 doesn't store all parity on the same drives for all blocks, hence the changing of the 2 "idle" drives as reading continues.

In narrower VDEVs (3, 4 & 5) ZFS may be calculating some of the data from Parity rather than only from direct reading and with fewer spindles/heads to work with that may deliver benefit. Or in the cases where numbers don't work, there may be no blocks that aren't stored across all drives, so no parity reads, just no cases where one or more disks is idle to get to the data.

Thanks. I think at the end of the day it confirms what I am thinking, 6 Disks with RaidZ2 really isn't going to give the performance I am looking for, its actually really poor... It seems 6 Disks with RaidZ2 is about as performant as 4 Disks with RaidZ2 at just over 200MB/s R/W speed. Oddly 5 Disks with RaidZ2 is better then 6 Disks with RaidZ2 at just over 400MB/s Read as it seems to read from all Disks at once in this scenario. Its very interesting to see how different it writes the parity across different numbers of disks which in turn effects the read speeds. The only thing I haven't tested was striping two 3 Disk RaidZ1 Vdevs. I will have to wait on that test as I don't have enough empty drive slots in either server to do that test today, ill have to delete an existing main or backup pool to test that.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Oddly 5 Disks with RaidZ2 is better then 6 Disks with RaidZ2 at just over 400MB/s Read as it seems to read from all Disks at once in this scenario.
Just remember that you're testing with a "single client", so real-world performance where there may be more users/clients of your pool could produce better results with 6 than 5.
 
Top