improve 2 stripped x 6 drive vdev in RaidZ

Status
Not open for further replies.

MtK

Patron
Joined
Jun 22, 2013
Messages
471
Hey,
at the moment I have 12 x 300 Gb SAS drives set up in 2 stripped vdevs, each with a RaidZ configuration.
This pool serves web sites, meaning it's full of lots of small files (PHP, HTML, CSS, JS, images, etc).

At the moment I'm using less then 50% of the storage, so I don't really have to rush into buying new disks, but I would like to improve performance.

So one option I see easy (assuming budget allows) to do, is to buy an extra set of 6 (same) drives, and stripe them, so eventually I will have 3 x 6 drive RaidZ.
would this make any improvement?

another option would be to get the same 6 new drives, and create a better configuration of the pool. something with mirrors probably, and assuming space fits the current usage, move everything to the new pool, and then add the rest of the 12 drives in a similar setup.

any suggestions?

MtK
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I'd go to 32GB of RAM first. RAM = performance with regards to ZFS.

Mirrors is another good idea, but I'd do more RAM first.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
I'd go to 32GB of RAM first. RAM = performance with regards to ZFS.

Mirrors is another good idea, but I'd do more RAM first.

this is a different system (not the one in my sig.), which has 3x8Gb - not alot, but more :)
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
A very similar question came across the FreeBSD-FS mailing list this week. This guy wanted help with adding a large L2ARC in his system in order to speed up his web application: http://lists.freebsd.org/pipermail/freebsd-fs/2014-March/019150.html . Not sure if it really applies to you but I like the coincidence.

You say you want to increase performance. Have you already determined that your performance bottleneck is actually your FreeNAS server... and that the bottleneck is at the ZFS Pool level and not the network level? Would hate to have you spend a lot of time redoing your ZFS layout and have the real problem end up being a poorly written web application.

Instead of trying to speed up your backend storage (ZFS), you could also consider setting up a Reverse Proxy to cache things at the web application level. Something like Varnish - https://www.varnish-cache.org/ may do the trick for you.

==========

Ok, let's assume that you do need to increase the performance of your ZFS Pool....

You currently have 3 TB of pool storage that is 50% full, so 1.5 TB of data. How much of that data is regularly/repeatedly accessed each day? If it is a smallish portion of data, then adding more RAM or a L2ARC may help as either will allow you to cache more data.

Otherwise you'll want to improve the raw IO performance of the underlying ZFS pool. This article may help you decide how to lay out your vdevs. Short answer is that mirrored vdevs give significantly better random read IOPS performance then RAIDZ vdevs at the cost of storage space - http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance

Since you have lots of small files I would bet that you get a lot of random read requests.

Check my math... based upon the blog post I linked to I calculate your current pool performance as follows:

A 15K RPM SAS Enterprise drive should offer around ~175 IOPS [1]. You have 12 of these disks.

=====
A RAIDZ array of 6 disks will offer the following:
read: 175 IOPS​
write: 175 IOPS​

Stripe your 2 RAIDZ vdevs together and you get the following aggregate* performance for your current setup:
read: 350 IOPS​
write: 350 IOPS​
total storage: 3 TB​

=====

A 2 disk Mirror vdev offers:
Read: 350 IOPS​
Write: 175 IOPS​

Stripe 6 Mirror vdevs, aggregate* performance:
Read: 2100 IOPS​
Write: 1050 IOPS​
total storage: 1.8 TB​

A big difference in aggregate* read IOPS.

At first glance it seems that you could easily fit your existing 1.5 TB of data into the 1.8 TB pool, but that would make your pool 83% full. Some people recommend that you never exceed 80% pool utilization [2], so you would need to add another mirrored vdev to give you some more space.
=====

* See "Putting Vdevs Together" section of blog I linked to above to read about Aggregate vs single thread performance. Your web server may be running multiple threads, but if you are connecting to your FreeNAS storage over NFS then you may be limited to a single NFS thread (not sure?? maybe someone else can chime in).

Not sure if I really answered your question, but it was fun to do the math...

[1] http://en.wikipedia.org/wiki/IOPS

[2] After a ZFS pool hits a magic limit of utilization it switches to a different write allocation strategy that is much slower. Older ZFS code did this at 80% utilization. I wasn't able to determine exactly what current ZFS code does.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
A very similar question came across the FreeBSD-FS mailing list this week. This guy wanted help with adding a large L2ARC in his system in order to speed up his web application: http://lists.freebsd.org/pipermail/freebsd-fs/2014-March/019150.html . Not sure if it really applies to you but I like the coincidence.
I'm guessing every storage owner asks this question... right?
the good thing, in my case, I bought a 24 bay case, knowing I have only 12 disks, so I left myself the option to play with the setup before I buy knew disks and seal my fate.

You say you want to increase performance. Have you already determined that your performance bottleneck is actually your FreeNAS server... and that the bottleneck is at the ZFS Pool level and not the network level? Would hate to have you spend a lot of time redoing your ZFS layout and have the real problem end up being a poorly written web application.
no, ZFS is not the only suspect, but since all the server using (accessing) it get a high IOWait at the same time... it is a primer suspect ;-)

Instead of trying to speed up your backend storage (ZFS), you could also consider setting up a Reverse Proxy to cache things at the web application level. Something like Varnish - https://www.varnish-cache.org/ may do the trick for you.
this is already done on the web server side.

==========
Ok, let's assume that you do need to increase the performance of your ZFS Pool....
just to shed some more light on the setup, we are talking about a XenServer machine (well, actuall more than one) that use this ZFS box as a shared storage via NFS. The XS have a few VMs with several VDIs using the shared storage (no local storage on XS those machines).

You currently have 3 TB of pool storage that is 50% full, so 1.5 TB of data.
to be exact:
Code:
# zfs list
NAME                  USED  AVAIL  REFER  MOUNTPOINT
{pool}                1.39T  1.26T  61.4K  /{pool}

doesn't really matter actuall... :)
How much of that data is regularly/repeatedly accessed each day? If it is a smallish portion of data, then adding more RAM or a L2ARC may help as either will allow you to cache more data.
good question! how do I measure that?
Otherwise you'll want to improve the raw IO performance of the underlying ZFS pool. This article may help you decide how to lay out your vdevs. Short answer is that mirrored vdevs give significantly better random read IOPS performance then RAIDZ vdevs at the cost of storage space - http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance
yeah, read it long ago, read it again now... thanks!
Since you have lots of small files I would bet that you get a lot of random read requests.
yes. just a reminder, file are being written to the VDI of the XS, and not directly to the ZFS (if that matters).
Check my math...
no need to, let's assume (for the rest of the examples) that your math is write...
based upon the blog post I linked to I calculate your current pool performance as follows:

A 15K RPM SAS Enterprise drive should offer around ~175 IOPS [1]. You have 12 of these disks.

=====
A RAIDZ array of 6 disks will offer the following:
read: 175 IOPS​
write: 175 IOPS​

Stripe your 2 RAIDZ vdevs together and you get the following aggregate* performance for your current setup:
read: 350 IOPS​
write: 350 IOPS​
total storage: 3 TB​
would it be right to assume, that a 3 RAIDZ vdevs together (I can stripe another 6 disks) would give us:
read: 525 IOPS​
write: 525IOPS​
total storage: 4.5 TB​
=====
A 2 disk Mirror vdev offers:
Read: 350 IOPS​
Write: 175 IOPS​

Stripe 6 Mirror vdevs, aggregate* performance:
Read: 2100 IOPS​
Write: 1050 IOPS​
total storage: 1.8 TB​
A big difference in aggregate* read IOPS.

At first glance it seems that you could easily fit your existing 1.5 TB of data into the 1.8 TB pool, but that would make your pool 83% full. Some people recommend that you never exceed 80% pool utilization [2], so you would need to add another mirrored vdev to give you some more space.
that's why I pointed the (more) exact numbers, because it would fit in this setup... :)

but, to do that, I would have to either:
  1. have some spare (big enough) disks to move data to them while I set up, which will take a lot time to transfer back and forth, which means a very long downtime for the websites/services.
  2. I would have to buy 12 new disks, to be set up as a new pool
    advantage: then data is transfered only once to the new pool + I will have 12 disks spare to add to the pool (mirror or stripe), and a total of 24.
    disadvantage: expensive!!!
* See "Putting Vdevs Together" section of blog I linked to above to read about Aggregate vs single thread performance. Your web server may be running multiple threads, but if you are connecting to your FreeNAS storage over NFS then you may be limited to a single NFS thread (not sure?? maybe someone else can chime in).

Not sure if I really answered your question, but it was fun to do the math...

[1] http://en.wikipedia.org/wiki/IOPS

[2] After a ZFS pool hits a magic limit of utilization it switches to a different write allocation strategy that is much slower. Older ZFS code did this at 80% utilization. I wasn't able to determine exactly what current ZFS code does.
I'm glad you enjoyed it... ;)


Something else just came to mind.

1.5 TB is not really a lot of data and SSD pricing is pretty good. You could spend $2,000 to purchase 4 x 1 TB SSD Drives, configure them in a RAIDZ configuration, and use them instead of your 12 SAS drives.




Here is the SSD drive I was using for pricing info: http://www.newegg.com/Product/Product.aspx?Item=9SIA29P1EC5324
yeah, you are right, but this is also expensive, and I "waste" the 12 disks that are in use at the moment... :/
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
just to shed some more light on the setup, we are talking about a XenServer machine (well, actuall more than one) that use this ZFS box as a shared storage via NFS. The XS have a few VMs with several VDIs using the shared storage (no local storage on XS those machines).

I don't have experience with XenServer, but is it similar to VMware ESXi where all NFS writes are forced to be synchronous? If so that's most likely your problem.

For more details see: http://forums.freenas.org/index.php...xi-nfs-so-slow-and-why-is-iscsi-faster.12506/
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
And the difference between direct NFS mount to ZFS and whatever you consider to be "indirect" is? Not seeing how "direct" is a keyword that actually means anything at all.

And FYI, Xenserver doesn't do sync writes like ESXi does.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
no diference.
eraser pointed me to a NFS-vs-iSCSI discussion, and I'm not going to switch anyway, so I suggested we continue the discussion as if XS/NFS is not the issue here, so we could focus on the vdevs setup...
 

aufalien

Patron
Joined
Jul 25, 2013
Messages
374
Kinda late to this game but curious about something MtK, did you say you have a RaidZ vdev with 6 drives? This isn't a proper ZFS setup in terms of drives per vdev and raid level.

For a RaidZ, you'd have the number of drives; 3/5/7/9 etc...
For a RaidZ2; 4/6/8/10 etc...

In your case, you really do have 12 drives, I do 4 vdevs having 3 disks each @RaidZ and see how it goes. Or you can do 3 vdevs having 4 disks each at RaidZ2.

I may have misread your config though, but your setup of RaidZ using 6 disks is a problem.

Keep in mind that its about number of vdevs and not number of drives per se, that speeds up ZFS. Apologies if you already know and covered this.
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
thanks.
can we continue as if it was a direct NFS mount to ZFS.. ok?

Sorry that I wasn't clear. I did not mean to suggest that you switch from NFS to iSCSI. I was suggesting that if your performance issue is due to NFS sync writes then following the advice in the article about adding a SLOG would be the fix instead of adding more vdevs. However if sync writes are not the problem then adding a SLOG will be a waste of money and may make things worse.

Sure we can start over:
  • First is to test disk performance from your FreeNAS server directly and verify that it performs as expected. iozone is a good benchmarking tool to use here.
  • Second is to verify network performance between FreeNAS and your NFS Client (XenServer). iperf is the tool to use here.
  • Third is to test disk performance from the NFS client and verify that it performs as expected. (iozone again).
  • Fourth is to test disk performance from inside the Guest VM itself.
As an example (but not a recommendation), here is one iozone command that I sometimes like to run. Increase the -s parameter to be larger then your FreeNAS server installed RAM if you want to rule out caching from your results. (Make sure to change to a directory in your ZFS dataset first so the temporary files are created in the correct place):

iozone -+w 0 -+y 0 -+C 0 -+n -e -c -t 1 -r 512K -s 500M -i 0 -i 2
Other tools that may help you troubleshoot include:
  • zilstat
  • arc_summary.py
  • gstat
  • zpool iostat
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
let's assume for a second the right configuration of a single vdev with RaidZ).
would it be right to assume that a stripe of 3 vdevs (each with RaidZ) will perform better than 2 vdevs (each with RaidZ)?
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
let's assume for a second the right configuration of a single vdev with RaidZ).
would it be right to assume that a stripe of 3 vdevs (each with RaidZ) will perform better than 2 vdevs (each with RaidZ)?


In general, yes.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
good, then, in general, why would I choose a single 6-RaidZ2 vdev over a 2x3-RaidZ vdev?

In both cases you have 6 drives and are protected against two disk failures, but the disk failure cases are different.
  • In your single 6 disk RaidZ2 vdev, any two drives can fail at once and your ZFS pool stays up. Better protection at a cost of reduced performance.
  • In your 2 x 3 disk RaidZ vdev configuration, if two drives fail at once *that are part of the same vdev* then you lose your ZFS pool. However you can have one drive fail from each vdev and still stay up. This configuration should have better performance.
Your ultimate selection will depend on how long you think it will take for you to notice a failed disk and replace it. Also how likely you think it is to have multiple disks fail at the same time (or have an additional disk fail while a vdev is busy reslivering itself after the original failed disk is replaced).
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
yeah, thanks. forgot to mention that the question was performance-wise :smile:


so taking this a little bit further but back a few posts, a stripe of 3xRaidZ is better than a stripe of 2xRaidZ, right?
 

aufalien

Patron
Joined
Jul 25, 2013
Messages
374
I think you are missing something fundamental to ZFS which is the more vdevs, the higher performance. I'd rather you get concepts versus yes or no answers.

To quote myself;

"Keep in mind that its about number of vdevs and not number of drives per se, that speeds up ZFS."

Based on this, you should be able to draw a conclusion.

So let me ask you, for highest performance w/o regard for fault tolerance, how would you configure your vdevs?
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
24 vdevs?
probably mirrored :smile:
 

aufalien

Patron
Joined
Jul 25, 2013
Messages
374
Well, you're on the right track. With 24 disks that you are proposing, you'd have 12 mirrors, stripped amongst one another.

So you basically got it.

Erasers figures are a very good reference.
 
Status
Not open for further replies.
Top