Why is resilvering a simple mirror unexpectedly slow?

Status
Not open for further replies.

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
I've used zpool attach to add a mirror to a single drive vdev, as part of setting up my new server (I had to do it that way as I'm migrating both disks and data; there's a backup of the data on the old server).

What surprises me is that after attaching, its resilvering speed was consistently about 39-50MB/s, according to zpool status.

Why is it going at 39-50MB/s for what should in theory be a straightforward sequential data copy between 2 disks (whatever's stored on them it can be mirrored sequentially), even though the disks are capable of about 3 times that speed and there's no other load or demand on them?

Update - after an hour the figure shown by zpool status suddenly shot up. But the change raises more questions than it answers, and I'm not even sure I'm looking at the correct figure - "iostat -x" shows just 20MB/s (!) I/O on individual drives:
Code:
device	r/s   w/s	kr/s	kw/s qlen svc_t  %b
ada0	182.9   0.9 20641.9	19.1	2   3.0  24
ada1	  0.0 177.2	 0.5 19080.4	1   1.0  17
ada5	170.2   1.0 19030.1	19.1	2   3.6  27
ada6	  0.0 191.1	 0.5 20691.8	1   0.6  12

even though zpool status shows a much higher figure:
Code:
zpool status | egrep "to go|done"
1.06T scanned out of 6.68T at 193M/s, 8h29m to go
761G resilvered, 15.89% done

In the meantime systat gives a different figure for these disks as well:
Code:
   /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
ada0  MB/s XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 115.66
  tps| XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 1020.80
ada1  MB/s XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 105.90
  tps| XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 935.04
ada5  MB/s XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 106.23
  tps| XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 932.46
ada6  MB/s XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 115.63
  tps| XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 1020.20

What's gone on behind the scenes, what do the numbers mean (and why do they differ so much), and why is it apparently resilvering so much slower than expected?
 
Last edited:

Artion

Patron
Joined
Feb 12, 2016
Messages
331
Hi, the first output is not in KBps but kw/s (kilo writes per second). try iostat -d instead
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,976
Update - after an hour the figure shown by zpool status suddenly shot up
That's normal behavior. You'll see the same thing when performing a scrub.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
what should in theory be a straightforward sequential data copy between 2 disks
I may be wrong here, but I don't think that's how ZFS scrubs or resilvers. I know there's a project to change the algorithm such that a first pass collects the list of blocks that need to be checked and then reorders the list so that it is more sequential, but this work has not yet been merged. The effort is supposed to lead to significant increases in resilver speed. I think currently the order is by block creation.

Again, I may be wrong here.
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
@fracai - I found the projects you mentioned. They were dated 2016, wonder if/when they'll be available in FreeBSD?

Until then, I suppose the replacement question is - given the various stats available from the system, which have people found to give the most reliable for scrub/resilver completion time and actual data rates??
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,600
@fracai - I found the projects you mentioned. They were dated 2016, wonder if/when they'll be available in FreeBSD?
Almost certainly that feature will make it to FreeBSD, and then to FreeNAS. It's one of the main reasons
OpenZFS was created. Before OpenZFS, there were several un-related projects to bring ZFS to FreeBSD,
Linux, MaxOS and the open source version of Solaris. It's just a mater of time before the new features
arrive. (Don't be in a rush, let others test them out :).
Until then, I suppose the replacement question is - given the various stats available from the system, which have people found to give the most reliable for scrub/resilver completion time and actual data rates??[/USER]
Most of us simply wait a bit for zpool status to get reasonable numbers, then use those. It's
not perfect, but I'd rather the scrub or re-silver be perfect, than ETA or statistics.
 
Status
Not open for further replies.
Top