Estimating Resilvering Speeds

Status
Not open for further replies.

GrumpyBear

Contributor
Joined
Jan 28, 2015
Messages
141
I'm reasonably confident of my hardware now and have some live data on it. Now I'm starting to investigate typical (atypical?) tasks one of which is dealing with disk failures. While I *think* I understand the process (detach/wipe/replace) I want to try it out now rather than try to figure out with a degraded array in production.

I have a 8 3TB disk RAIDZ3 configuration and want to simulate a disk failure and while practicing the steps for replacing a disk in a degraded array I'm interested in how long the resilvering process takes in my specific scenario. The catch is that while I only have 2.75GB currently that number will go up over time so I am interested in testing the resilver speed with more content on the array and would welcome suggestions on how to simulate this.

Googling found these results which some members have pointed out offer only a idea of the best case performance for the specific build.

What I propose to do is to use dd with if=/dev/urandom rather than /dev/zero to create additional files to load my array up to about 50% utilization (6TB) and then while replacing a single disk note how long the resilvering takes.

The majority of files in my usage case would be 15-25MB "RAW" image files; 1.5 - 2GB H264 video files (DVD backup with 5.1 pass-through) and 15-20GB video files (H264/5 BluRay backups). Note that the RAW image files are internally compressed and H264 files are processed and so would be more likely to be random from a compression standpoint.

The majority of the files would be RAW image files (not compressible) with xml side-car files (extremely compressible) and finished client JPEGs (HQ so very compressible) and these I suspect would create a random I/O mix. The video files would create a sequential I/O mix with a much smaller number of files but at significantly larger file sizes.

Assuming 10000 additional RAW image files (and the same number of sidecar and JPEGs)

10000 * (25MB + 3MB + 2KB) = 273.5GB

So we need 3TB of video. I've been transitioning from DVD to BluRay so I would estimate the future mix to be about 10:1 in favor of BluRay with no anticipated need for 4K (my eyes aren't that discerning nor is my display either capable or big enough)

60 1.75GB video files = 105GB
170 15.5GB video files = 2975GB
Total 3080GB

As dd is single threaded and /dev/urandom is also single threaded and additionally CPU bound I propose to use 6 threads to generate the simulated video files and a single thread for the image files and preserve a single thread.

Suggestions? Improvements?
 
Last edited by a moderator:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I agree that the results are not surprising given the known ideal performance of a RAIDZ3/Z2/Z/Mirror etc and that they do indicate a best case result for specific hardware. Do they not provide an idea of what the best case performance (downhill with a tailwind and 64k UDP packets as we say in networking) with the hardware might be?

That seems dissonant.

We care about the resilver time because we're paranoid about the loss of redundancy levels.

The best resilver time always involves shutting down client/user access to the array ("the workload") and letting ZFS do its thing.

The people who can afford to do that usually don't care as much about the length of time it takes.

The people who can't afford to do that worry about the amount of time it'll take while also sustaining a workload.

Therefore testing the "best case" on a pool without a workload seems mostly to be a pointless exercise. The people who can quiesce an array probably don't care too much. The people who can't will be testing under load.

I've seen ZFS pools sufficiently busy that they basically cannot successfully resilver a disk (or, more hilariously, that cannot resilver a disk before another fails).

As we are getting way off-topic here I will post a separate thread.

And I'll move my reply if I find it.
 

GrumpyBear

Contributor
Joined
Jan 28, 2015
Messages
141
... I'll move my reply if I find it.
Created a post under "Performance" here. Feel free to move it to another forum if more apropos.

I do agree with what you are saying. In my case I would take the system off-line to resilver as we are talking about consumer (prosumer) grade disks but that is just me and it's not like at work where the line ups start at the door the minute a large scale production system goes off line.

I'm still curious though :D and would value input into making a possibly meaningless test, umm, less meaningless?

--- merged / jg ---

Awesome random blogs. He's trying to test the difference between striped mirrors and raidZx by writing to it using /dev/random as a source? That's hilarious.

Also, for sequential writes (which dd is), as long as the CPU is fast enough, a 4 disk z2 should be equivalent to a 4 disk striped mirror. Random is where the striped mirror would exceed of course.

Not sure why people can't simply disable compression and use /dev/zero. Again, purely sequential test, but better than testing your cpu with /dev/random.
I believe there may be some confusion. What the author stated he was attempting to ascertain was the resilvering performance of various pooling strategies not the write performance.

I agree that the results are not surprising given the known ideal performance of a RAIDZ3/Z2/Z/Mirror etc and that they do indicate a best case result for specific hardware. Do they not provide an idea of what the best case performance (downhill with a tailwind and 64k UDP packets as we say in networking) with the hardware might be?

As we are getting way off-topic here I will post a separate thread.
 
Last edited by a moderator:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I agree that the results are not surprising given the known ideal performance of a RAIDZ3/Z2/Z/Mirror etc and that they do indicate a best case result for specific hardware. Do they not provide an idea of what the best case performance (downhill with a tailwind and 64k UDP packets as we say in networking) with the hardware might be?

That seems dissonant.

We care about the resilver time because we're paranoid about the loss of redundancy levels.

The best resilver time always involves shutting down client/user access to the array ("the workload") and letting ZFS do its thing.

The people who can afford to do that usually don't care as much about the length of time it takes.

The people who can't afford to do that worry about the amount of time it'll take while also sustaining a workload.

Therefore testing the "best case" on a pool without a workload seems mostly to be a pointless exercise. The people who can quiesce an array probably don't care too much. The people who can't will be testing under load.

I've seen ZFS pools sufficiently busy that they basically cannot successfully resilver a disk (or, more hilariously, that cannot resilver a disk before another fails).
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Apologies to the audience, I tried to do some creative thread moderation but XenForo did something unexpected and I ended up editing some user posts to regain readability of the general discussion.
 

GrumpyBear

Contributor
Joined
Jan 28, 2015
Messages
141
Creating the 30,000 image files only took 15 to 20 minutes.

The "BluRay" video files (15GB) are underway. I started 7 parallel threads and am partway through the 6th iteration out of 23 total. The CPU is at 87.5% utilization and the aggregate speed reported by dd is 70.5MB/s so it should take about 9.75hrs to complete.

The disks are running at 28-32C with the case fans at minimum speed (700rpm) the system temperatures are:
  • CPU Temp 60C
  • System Temp 44C
  • Peripheral Temp 40C
  • PCH Temp 48C
  • VRM Temp 47C
  • DIMMA1 Temp 35C
  • DIMMB1 Temp 32C
And the system is consuming 108-122W
 
Last edited:

GrumpyBear

Contributor
Joined
Jan 28, 2015
Messages
141
Arrgh,

About 70% through the resilver some idiot (me) accidentally started sending large files to the NAS as they did not notice that the output from rips were going to a network drive rather than a local drive.

All in all the resilvering took about 4.5hrs for 7TB where the scrub I did before disconnecting and wiping one of the disks took about 9 hours.

The interesting thing was I expected the resilvering to be more block level and to stream better but the disk throughputs seemed to more in line with the scrub showing lots of ups and downs for some segments (lots of small files?) and some pretty good throughput for sustained periods (great big chunks of large files?). I guess this makes sense as just the data gets resilvered not the whole volume.

I think I'll just give up at this point and move on to some final items. I was happy to see that I got emails with respect to the volume being degraded. I did not get an email when it was back ("Optimal" ?) though which might be nice for remote admin (or one could get off one's butt and console in and "zpool status").

I'll post the exact times tonight and some observations about using the speeds posted for estimating completion time.

Though the usefulness of this is in question I believe I have an estimate now for the likely least amount of time a resilvering will take in my specific setup (faster than I'd thought :) )
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, since the resilvering or scrubbing is a traversal of the data on the pool (technically starting at the root bp and then working thru the metadata), the amount of time it takes can vary greatly depending on many factors. A pool that's full but only been written to once (think archival) is likely to scrub massively faster than a highly fragmented pool that's been handling database or block protocols for a year and is 50% full. The fact that ZFS scrub/resilver only touches live data is not always the win that it is often presented as being.

some idiot (me)

We've all been there. ;-)
 
Status
Not open for further replies.
Top