Totally unbalanced reads on fully-burdened two disk mirror

Status
Not open for further replies.

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
I have two pools. Pool "backupone" is a 2 disk mirror of Seagate ST4000VN000 4TB drives. Pool "firstvol" is striped mirrors with the 6 of the same disks, 2 disks per vdev.

When transferring many large files (>5 GB) from backupone to firstvol, I am noticing via zpool iostat that the reads on backupone can be extremely uneven.

Here's an example, note the difference in bandwidth and ops for the two disks backupone.

Code:
										   capacity	 operations	bandwidth
pool									alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
backupone							   2.31T  1.31T	110	  0   110M	  0
  mirror								2.31T  1.31T	110	  0   110M	  0
	gptid/cb74ec4e-42a9-11e5-82d0-002590f06808	  -	  -	110	  0   110M	  0
	gptid/c64e0866-c650-11e3-a8b9-002590f06808	  -	  -	  0	  0	  0	  0
--------------------------------------  -----  -----  -----  -----  -----  -----
firstvol								7.29T  3.59T	 42	689   315K   124M
  mirror								2.39T  1.23T	 16	248  91.6K  47.5M
	gptid/5fa53587-4121-11e5-82d0-002590f06808	  -	  -	  9	116  66.1K  47.6M
	gptid/5a40f730-c136-11e3-b86a-002590f06808	  -	  -	  6	118  25.5K  48.6M
  mirror								2.52T  1.10T	 11	201  71.7K  26.2M
	gptid/f800d612-421f-11e5-82d0-002590f06808	  -	  -	  6	 85  31.9K  26.2M
	gptid/95397879-c136-11e3-b86a-002590f06808	  -	  -	  4	 88  39.8K  26.2M
  mirror								2.37T  1.25T	 14	239   151K  50.2M
	gptid/2d62c253-42df-11e5-82d0-002590f06808	  -	  -	  6	125  82.8K  50.3M
	gptid/2def5aed-42df-11e5-82d0-002590f06808	  -	  -	  7	124  68.5K  50.3M
--------------------------------------  -----  -----  -----  -----  -----  -----
freenas-boot							9.69G  5.06G	  0	  0	  0	  0
  da7p2								 9.69G  5.06G	  0	  0	  0	  0
--------------------------------------  -----  -----  -----  -----  -----  -----


One disk is doing nothing and the other disk is doing all the reads, for a period of 5 seconds or more. It is not consistent which disk is doing all the work. It does this very frequently but not all the time. Sometimes both disks are reading.

Here's a look at the fragmentation and other stats.
Code:
NAME		   SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
backupone	 3.62T  2.14T  1.48T		 -	10%	59%  1.00x  ONLINE  /mnt
firstvol	  10.9T  7.46T  3.41T		 -	30%	68%  1.00x  ONLINE  /mnt


When I add a properly executed dd command to fully burden the reads on backupone, this uneven behavior keeps occurring. So it's not like the other pool is too slow to receive so zfs decides to loaf around with the reads on the mirror.

Can anyone explain this unbalanced behavior? I'm disappointed that transfers do not seem to be maxing out the throughput of the disks in the mirror.

Full disclosure. In another scenario, I am getting what I consider to be poor performance from firstvol during reads of large, "sequential" files. Approximately 160 MB/s in a properly executed dd test of a file with only one segment. I will address this in another thread eventually, but I mention it here for full disclosure.

Hardware in my signature.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
That seems odd but I can't say that it's wrong. Could you try another test using multiple files to transfer and the command zpool iostat 5 to average the result out over a 5 second duration and see what happens. May sure you create some test files to transfer. I guess it's possible that if all the data were read into RAM form one drive and cached for the copy/move operation then the system would not need to look at the other drive. You have 32GB RAM and only a small 5GB file if you think about it that way. And have you tried to search the internet for this phenomena? Use something like "bsd iostat read mirror" for an example. I don't know if that would give you the results you desire but it's an example.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Thanks for replying. That is actually the exact test that I was doing, with the 5 seconds to average the activity. And I was transferring multiple files (large uncompressible files).

I searched with a variety of terms, but to no avail.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
One disk is doing nothing and the other disk is doing all the reads, for a period of 5 seconds or more. It is not consistent which disk is doing all the work. It does this very frequently but not all the time. Sometimes both disks are reading.

Can anyone explain this unbalanced behavior? I'm disappointed that transfers do not seem to be maxing out the throughput of the disks in the mirror.

I cannot explain the behavior with certainty as I don't have enough knowledge. But I did find the following thread that might help explain what the algorithm is trying to do: https://svnweb.freebsd.org/base?view=revision&revision=256956

"The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request."

So should be "random," but looks like it's not. I wonder what weight the locality has on the issue? Meaning that if you are moving large files that are sequential on disk it may be more efficient to keep reading from the drive that's already done it's seek and just keep reading. Esp. if prefetch is ON.

Full disclosure. In another scenario, I am getting what I consider to be poor performance from firstvol during reads of large, "sequential" files. Approximately 160 MB/s in a properly executed dd test of a file with only one segment. I will address this in another thread eventually, but I mention it here for full disclosure.

The file may not be sequential on disk. The fragmentation is 30%, so maybe the file is hoping around to non-sequential blocks at this point? (Though with a TB+ available, I would think a reasonably sized file could be laid down sequentially.)
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
The mirror was being read from, not written to, and it is the low fragmentation pool, 10% with lots of free space. And I know that the files being read were written as sequentially as possible, because I had just copied them there and I was the only one writing to the disk.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
Yes, reads. Which is why I linked a page talking about N-way mirror read performance on bsd. Several other online sources suggest that a 2-way mirror gives 1.5x the read performance of a single disk. So I don't think "maxing out the throughput of the disks in the mirror" is gonna happen in the real world from what I've read. With SSDs, yes, folks are seeing almost 2x read performance in a 2-way mirror.

On the 2nd part (quote) you mentioned reads of large sequential files from firstvol in another scenario. Which is why I referenced the 30% fragmentation for that volume.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Sorry, I was replying with a kid in one arm and didn't read the article. I just read it and now understand that your quote is referring to the different algorithms, one that is an improvement on the other. The current algorithm is using a modulo that should result in pseudo random loading of the drives. The improved algorithm makes up for imperfections in the pseudo random loading, but for matched drives we'd expect basically even loading. And we'd never expect to see a drive at zero. So we still don't know why this extreme uneven loading is happening.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
Very strange indeed. I hope someone more knowledgeable can help out on this one.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Fragmentation in ZFS is freespace fragmentation, not file fragmentation.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Fragmentation in ZFS is freespace fragmentation, not file fragmentation.

Yes, but I just wrote the files while the freespace was not fragmented, so the files should similarly not be very fragmented.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Besides, they are mirrors, so even if the files were fragmented, each record must be written to both disks. So then we'd expect to see fairly even read IO during reads.
 
Status
Not open for further replies.
Top