Lose a lot of performance when I added multiple disks

Status
Not open for further replies.

kikotte

Explorer
Joined
Oct 1, 2017
Messages
75
You don't. You use the pool normally and ZFS will optimize things to maximize performance.

So you think it's normal that it uses 3 disks than 6 disks?

Of course, all 6 disks will be used if nothing is wrong.
 
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
So you think it's normal that it uses 3 disks than 6 disks?
Writes are balanced to maximize performance. Over time, a steady state will be reached - in the case of identical vdevs, all vdevs end up about as full as the others. It's far too early to say that something is wrong, since the new vdevs are basically empty.
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
Create a new dataset without compression
Run dd again and post the output of zpool iostat-v while dd is running (in code tags) so we can see the extent of the issue
 

kikotte

Explorer
Joined
Oct 1, 2017
Messages
75
Create a new dataset without compression
Run dd again and post the output of zpool iostat-v while dd is running (in code tags) so we can see the extent of the issue

Code:
[root@freenas ~]# zpool iostat -v																								   
										  capacity	 operations	bandwidth													 
pool									alloc   free   read  write   read  write													
--------------------------------------  -----  -----  -----  -----  -----  -----													
Store2								  4.24T  28.4T	 16	150  2.07M  9.50M													
  mirror								1.31T  4.13T	  5	 23   688K  1.43M													
   gptid/9d17bcbc-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   343K  1.44M										
   gptid/9df33389-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   345K  1.44M										
  mirror								1.34T  4.10T	  5	 22   708K  1.42M													
   gptid/a03134f9-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   354K  1.42M										
   gptid/a0f17bac-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   355K  1.42M										
  mirror								1.34T  4.09T	  5	 23   714K  1.44M													
   gptid/a3334340-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   357K  1.45M										
   gptid/a3f64588-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   357K  1.45M										
  mirror								81.3G  5.36T	  0	 15  10.1K   741K													
   gptid/5d4a0fd3-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.00K   743K										
   gptid/5e12941f-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.07K   743K										
  mirror								82.8G  5.36T	  0	 19  10.8K   845K													
   gptid/612876f0-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  5  5.45K   847K										
   gptid/61eacf52-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  5  5.39K   847K										
  mirror								82.9G  5.36T	  0	 17  10.8K   819K													
   gptid/65096079-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.40K   821K										
   gptid/65d19969-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.40K   821K										
logs										-	  -	  -	  -	  -	  -													
  mirror								 135M  1.09T	  0	 60	  0  4.26M													
   gptid/a4d713ca-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  0	 57	  0  4.26M										
   gptid/a549c4e3-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  0	 57	  0  4.26M										
--------------------------------------  -----  -----  -----  -----  -----  -----													
freenas-boot							2.08G   109G	  0	  0  4.72K  1.80K													
  mirror								2.08G   109G	  0	  0  4.72K  1.80K													
   da12p2								  -	  -	  0	  0  4.69K  1.81K													
   da13p2								  -	  -	  0	  0  4.70K  1.81K													
--------------------------------------  -----  -----  -----  -----  -----  -----


Code:
[root@freenas ~]# zpool iostat -v																								   
										  capacity	 operations	bandwidth													 
pool									alloc   free   read  write   read  write													
--------------------------------------  -----  -----  -----  -----  -----  -----													
Store2								  4.28T  28.3T	 16	151  2.07M  9.56M													
  mirror								1.32T  4.12T	  5	 23   688K  1.44M													
   gptid/9d17bcbc-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   343K  1.44M										
   gptid/9df33389-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   345K  1.44M										
  mirror								1.35T  4.09T	  5	 22   708K  1.42M													
   gptid/a03134f9-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   354K  1.42M										
   gptid/a0f17bac-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   355K  1.42M										
  mirror								1.35T  4.09T	  5	 23   714K  1.45M													
   gptid/a3334340-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   357K  1.45M										
   gptid/a3f64588-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   357K  1.45M										
  mirror								88.6G  5.35T	  0	 15  10.1K   754K													
   gptid/5d4a0fd3-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.00K   756K										
   gptid/5e12941f-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.07K   756K										
  mirror								90.1G  5.35T	  0	 19  10.8K   858K													
   gptid/612876f0-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  5  5.45K   860K										
   gptid/61eacf52-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  5  5.38K   860K										
  mirror								90.3G  5.35T	  0	 17  10.8K   832K													
   gptid/65096079-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.40K   834K										
   gptid/65d19969-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.40K   834K										
logs										-	  -	  -	  -	  -	  -													
  mirror								 147M  1.09T	  0	 60	  0  4.29M													
   gptid/a4d713ca-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  0	 57	  0  4.29M										
   gptid/a549c4e3-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  0	 57	  0  4.29M										
--------------------------------------  -----  -----  -----  -----  -----  -----													
freenas-boot							2.08G   109G	  0	  0  4.72K  1.80K													
  mirror								2.08G   109G	  0	  0  4.72K  1.80K													
   da12p2								  -	  -	  0	  0  4.69K  1.81K													
   da13p2								  -	  -	  0	  0  4.70K  1.81K													
--------------------------------------  -----  -----  -----  -----  -----  -----


Code:
[root@freenas ~]# zpool iostat -v																								   
										  capacity	 operations	bandwidth													 
pool									alloc   free   read  write   read  write													
--------------------------------------  -----  -----  -----  -----  -----  -----													
Store2								  4.52T  28.1T	 16	154  2.07M  9.91M													
  mirror								1.36T  4.08T	  5	 24   687K  1.47M													
   gptid/9d17bcbc-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   342K  1.47M										
   gptid/9df33389-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   345K  1.47M										
  mirror								1.38T  4.05T	  5	 22   708K  1.45M													
   gptid/a03134f9-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   353K  1.45M										
   gptid/a0f17bac-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   355K  1.45M										
  mirror								1.39T  4.05T	  5	 23   714K  1.48M													
   gptid/a3334340-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   357K  1.48M										
   gptid/a3f64588-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  1	  6   357K  1.48M										
  mirror								 133G  5.31T	  0	 16  10.1K   831K													
   gptid/5d4a0fd3-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.02K   834K										
   gptid/5e12941f-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.08K   834K										
  mirror								 134G  5.31T	  0	 20  10.9K   937K													
   gptid/612876f0-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  5  5.46K   939K										
   gptid/61eacf52-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  5  5.40K   939K										
  mirror								 134G  5.31T	  0	 18  10.8K   909K													
   gptid/65096079-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.41K   911K										
   gptid/65d19969-1312-11e8-b729-0cc47a5808e8.eli	  -	  -	  0	  4  5.41K   911K										
logs										-	  -	  -	  -	  -	  -													
  mirror								9.53M  1.09T	  0	 61	  0  4.46M													
   gptid/a4d713ca-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  0	 58	  0  4.46M										
   gptid/a549c4e3-066b-11e8-b8a8-0cc47a5808e8.eli	  -	  -	  0	 58	  0  4.46M										
--------------------------------------  -----  -----  -----  -----  -----  -----													
freenas-boot							2.08G   109G	  0	  0  4.72K  1.80K													
  mirror								2.08G   109G	  0	  0  4.72K  1.80K													
   da12p2								  -	  -	  0	  0  4.69K  1.81K													
   da13p2								  -	  -	  0	  0  4.70K  1.81K													
--------------------------------------  -----  -----  -----  -----  -----  -----
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
Okey, I’m in no way an expert on Disk IO but to me this looks like what I said originally, you are maxing our your SLOG. ZFS then distributes that bandwidth over the vdevs lowering the bandwidth of individual vdevs. Basically, your pool is as fast as your SLOG
 

kikotte

Explorer
Joined
Oct 1, 2017
Messages
75
Okey, I’m in no way an expert on Disk IO but to me this looks like what I said originally, you are maxing our your SLOG. ZFS then distributes that bandwidth over the vdevs lowering the bandwidth of individual vdevs. Basically, your pool is as fast as your SLOG

When I tested slog/zil before I put it in the pool, it writes and read between 1000-1200 MB.
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Okey, I’m in no way an expert on Disk IO but to me this looks like what I said originally, you are maxing our your SLOG. ZFS then distributes that bandwidth over the vdevs lowering the bandwidth of individual vdevs. Basically, your pool is as fast as your SLOG
I'm curious as to how you came to that conclusion from the data that's been posted. OP stated his SLOG is an Intel P3520 NVMe disk. That should not be the bottleneck.
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
It's not a matter of bandwidth... SLOG is primarily a matter of IOPS.
 

kikotte

Explorer
Joined
Oct 1, 2017
Messages
75
It's not a matter of bandwidth... SLOG is primarily a matter of IOPS.

Intel P3520
  • Random Read (100% Span)320000 IOPS
  • Random Write (100% Span)26000 IOPS
  • Latency - Read20 µs
  • Latency - Write20 µs
How much IOPS is using it right now?
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Run your tests again and check gstat -I 1 during. Wonder if one of the new disks has an issue that's slowing it down...check %b (busy) and other stats.
 
Last edited by a moderator:

kikotte

Explorer
Joined
Oct 1, 2017
Messages
75
Run your tests again and check "gstat -I 1" during. Wonder if one of the new disks has an issue that's slowing it down...check %b (busy) and other stats.

Do not know how to make a dump in the best way.

But there are different disks that come up between 90-2000%.

But otherwise they are between 0-50%.
 
Last edited by a moderator:

kikotte

Explorer
Joined
Oct 1, 2017
Messages
75
Looks like my slog/zil problem with IOPS may have to replace it with something better.

What do you think about this?

Intel optane dc p4800x 375gb

https://www.intel.com/content/www/u...e-dc-p4800x-series/p4800x-375gb-aic-20nm.html

  • Sequential Read (up to)2400 MB/s
  • Sequential Write (up to)2000 MB/s
  • Random Read (100% Span)550000 IOPS
  • Random Write (100% Span)500000 IOPS
  • Latency - Read10 µs
  • Latency - Write10 µs
The one I have now.

Intel SSD DC-P3520 1,2TB

https://ark.intel.com/products/88722/Intel-SSD-DC-P3520-Series-1_2TB-12-Height-PCIe-3_0-x4-3D1-MLC

  • Sequential Read (up to)1700 MB/s
  • Sequential Write (up to)1300 MB/s
  • Random Read (100% Span)320000 IOPS
  • Random Write (100% Span)26000 IOPS
  • Latency - Read20 µs
  • Latency - Write20 µs

Edit: Will I get hard disk space if I switch to 375gb?
 
Last edited:

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Looks like my slog/zil problem with IOPS may have to replace it with something better.
No need. Low latency is the most important for a SLOG device and your current P3520 is fine...20 micro seconds. Your current SLOG device is plenty fast and capable.
 

kikotte

Explorer
Joined
Oct 1, 2017
Messages
75
No need. Low latency is the most important for a SLOG device and your current P3520 is fine...20 micro seconds. Your current SLOG device is plenty fast and capable.

My goal is to get up to around 800 MB what do I need to do then?
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
I'm curious as to how you came to that conclusion from the data that's been posted. OP stated his SLOG is an Intel P3520 NVMe disk. That should not be the bottleneck.

I have no idea what causes the slow performance.
I’ve never played with NVMes but wouldn’t the SLOG fill up if the issue were one of the underlying drives? OP has 1.2 TB of it and uses a few MB
 
Last edited by a moderator:

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
The SLOG wouldn't ever fill up. It should realistically only see a couple of transaction groups worth of data at a time.
 
Last edited:

kikotte

Explorer
Joined
Oct 1, 2017
Messages
75
I find that one of my World Cup test servers had given up and this I put on the new disks but seem like there is a problem there.

Before that, it only uses 3 disks but then when I create a new dataset and move over it, then it starts using 6 disks, but it got a problem.

ESXI dump.log (NFS to FreeNAS)
Code:
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/a3334340-066b-11e8-b8a8-0cc47a5808e8.eli					   
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/a3f64588-066b-11e8-b8a8-0cc47a5808e8.eli					   
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/a4d713ca-066b-11e8-b8a8-0cc47a5808e8.eli					   
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/a549c4e3-066b-11e8-b8a8-0cc47a5808e8.eli					   
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  da7p1																
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/65becc6a-1312-11e8-b729-0cc47a5808e8						   
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  da7p2																
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/65d19969-1312-11e8-b729-0cc47a5808e8						   
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/65096079-1312-11e8-b729-0cc47a5808e8.eli					   
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/65d19969-1312-11e8-b729-0cc47a5808e8.eli
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
I find that one of my World Cup test servers had given up and this I put on the new disks but seem like there is a problem there.

Before that, it only uses 3 disks but then when I create a new dataset and move over it, then it starts using 6 disks, but it got a problem.

ESXI dump.log (NFS to FreeNAS)
Code:
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/a3334340-066b-11e8-b8a8-0cc47a5808e8.eli					
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/a3f64588-066b-11e8-b8a8-0cc47a5808e8.eli					
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/a4d713ca-066b-11e8-b8a8-0cc47a5808e8.eli					
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/a549c4e3-066b-11e8-b8a8-0cc47a5808e8.eli					
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  da7p1																
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/65becc6a-1312-11e8-b729-0cc47a5808e8						
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  da7p2																
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/65d19969-1312-11e8-b729-0cc47a5808e8						
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/65096079-1312-11e8-b729-0cc47a5808e8.eli					
   0	  0	  0	  0	0.0	  0	  0	0.0	0.0  gptid/65d19969-1312-11e8-b729-0cc47a5808e8.eli


Hi,

One thing that @Arwen mentioned early on in this thread still seems relevant.

Can you tell us exactly how this volume was built? In your original post, you said you had the volume, it had data on it and then you added additional mirrors.

Arwen is correct, ZFS attempts to keep all vdevs "level", so if 2 are empty and 2 are half full, it will fill the 2 empty ones until they are all "level" and then it will fill all the vdevs. So, any data that was written before the addition of new vdevs will continue to live only on the vdevs that were present when it was written. (and performance will be limited to what those vdevs can do)

But, it's actually worse than that. Since ZFS is a copy-on-write filesystem, if you change a pre-expansion file, it will free space only on a subset of vdevs, which means at some point in the future, when zfs decides to re-use those free blocks, you will experience lower performance because zfs will write to a subset of of the full pool.

At $dayjob, we store a lot of media files. I realize you've been given a lot of advice here, and I don't dispute any of it, but if I was building what it appears you are trying to build, I would evacuate all my data, destroy the pools and recreate them as a raidZ2. I would add additional disks to get the capacity I needed. I would then use the nvm SSDs you have to apply a slog and cache. since you have 2 SSDs there is no harm in mirroring the slog. I would simply stripe the cache though.



edit,

Actually, your hardware list says you have 12 WD gold drives.. My recommendation would be to Just pull your data off and build a 12 disk Raidz2. In terms of raw disk performance, you should be able to easily reach your 800MB/s number.
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Hi,

One thing that @Arwen mentioned early on in this thread still seems relevant.

Can you tell us exactly how this volume was built? In your original post, you said you had the volume, it had data on it and then you added additional mirrors.

Arwen is correct, ZFS attempts to keep all vdevs "level", so if 2 are empty and 2 are half full, it will fill the 2 empty ones until they are all "level" and then it will fill all the vdevs. So, any data that was written before the addition of new vdevs will continue to live only on the vdevs that were present when it was written. (and performance will be limited to what those vdevs can do)

But, it's actually worse than that. Since ZFS is a copy-on-write filesystem, if you change a pre-expansion file, it will free space only on a subset of vdevs, which means at some point in the future, when zfs decides to re-use those free blocks, you will experience lower performance because zfs will write to a subset of of the full pool.

At $dayjob, we store a lot of media files. I realize you've been given a lot of advice here, and I don't dispute any of it, but if I was building what it appears you are trying to build, I would evacuate all my data, destroy the pools and recreate them as a raidZ2. I would add additional disks to get the capacity I needed. I would then use the nvm SSDs you have to apply a slog and cache. since you have 2 SSDs there is no harm in mirroring the slog. I would simply stripe the cache though.



edit,

Actually, your hardware list says you have 12 WD gold drives.. My recommendation would be to Just pull your data off and build a 12 disk Raidz2. In terms of raw disk performance, you should be able to easily reach your 800MB/s number.

Well I wouldn’t recommend a 12 wide RaidZ2. But 2x6 would be good.

Also the vdev leveling thing, as with most ZFS stuff, is trickier than that. The vdev which returns first gets the next transaction written. This tends to be the emptier vdev, but not always.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Well I wouldn’t recommend a 12 wide RaidZ2. But 2x6 would be good.

We have 30 or 40 in production..
Different strokes for different folks. As always there are tradeoffs. Our workloads are similar to the OP's, which is why I suggested it.
 
Status
Not open for further replies.
Top