How to detect IOPS bottlenecks?

Status
Not open for further replies.

viniciusferrao

Contributor
Joined
Mar 30, 2013
Messages
192
Hello guys,

I'm not satisfied with my storage performance, and I would like to do some debugging to find where's the bottleneck. The question is: how to do this?

Here are my setup:
Supermicro X9SCM with Xeon E3-1240V2
32GB DDR3 1600MHz with ECC
Intel RAID Controller flashed to LSI in IT Mode
24 x 3TB Seagate SATA Disks
2x Kingston SSDV300 128GB for SLOG

We have three ZFS pools, one with 4 disks in RAID-Z1 with forced async writes just for useless data and the two others are in RAID-Z2 with 10 disks each and independent zpools. Each SSD acts as SLOG devices to the 10 disk pools. This pools have sync=always enabled.

Here are the setup:
Code:
storage# zpool list
NAME           SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
intpool0      10.9T  3.74T  7.14T    34%  1.00x  ONLINE  /mnt
storagepool0  27.2T  2.95T  24.3T    10%  1.00x  ONLINE  /mnt
storagepool1  27.2T  11.5T  15.7T    42%  1.00x  ONLINE  /mnt
storage# zpool status
  pool: intpool0
 state: ONLINE
  scan: scrub repaired 0 in 2h13m with 0 errors on Sun Oct 20 03:13:21 2013
config:
 
NAME                                            STATE     READ WRITE CKSUM
intpool0                                        ONLINE       0     0     0
 raidz1-0                                      ONLINE       0     0     0
   gptid/345d97a2-f960-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/34e503e5-f960-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/3569dbe2-f960-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/35f7b991-f960-11e2-8474-001018427ad4  ONLINE       0     0     0
 
errors: No known data errors
 
  pool: storagepool0
 state: ONLINE
  scan: scrub repaired 0 in 7h45m with 0 errors on Sun Oct 20 08:45:59 2013
config:
 
NAME                                            STATE     READ WRITE CKSUM
storagepool0                                    ONLINE       0     0     0
 raidz2-0                                      ONLINE       0     0     0
   gptid/5b0acab8-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/5b8bf65a-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/5c0bcbf2-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/5c8ef3b1-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/5d14fa55-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/5d971bea-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/5e1f2120-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/5ea100f0-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/5f252675-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/5fac6691-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
logs
 gptid/601457a3-f95f-11e2-8474-001018427ad4    ONLINE       0     0     0
 
errors: No known data errors
 
  pool: storagepool1
 state: ONLINE
  scan: scrub repaired 0 in 3h35m with 0 errors on Sun Oct 20 04:35:36 2013
config:
 
NAME                                            STATE     READ WRITE CKSUM
storagepool1                                    ONLINE       0     0     0
 raidz2-0                                      ONLINE       0     0     0
   gptid/8e7889e1-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/8efb2430-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/8f7c685f-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/9001a059-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/907eefbe-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/90fbcbf8-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/9184701e-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/9207f18b-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/928bcc13-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
   gptid/93106513-f95f-11e2-8474-001018427ad4  ONLINE       0     0     0
logs
 gptid/937a3a6a-f95f-11e2-8474-001018427ad4    ONLINE       0     0     0
 
errors: No known data errors


Code:
storage# zfs get sync
NAME                              PROPERTY  VALUE     SOURCE
intpool0                          sync      standard  local
storagepool0                      sync      always    local
storagepool0/lvm0                 sync      always    inherited from storagepool0
storagepool1                      sync      always    local
storagepool1/lvm1                 sync      always    inherited from storagepool1


And finally here are the performance graphs. It's two intel NICs in lagg mode:

generate-3.png
generate-2.png
generate.png


As you can see the Max Throughput is something like 200Mbits, which is very slow.

Any information is welcome.

Thanks in advance,
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
For starters, your actual function for the pool is important. Using it with iscsi is different from CIFS sharing which is different yet from ESXi with NFS sharing. So you'll need to provide more info on what you are actually using FreeNAS for.

If you want to go with sync=always you really need to look at pools that have multple vdevs so you can multiply the I/Os. I'm not sure how much space you are needing, but 5 vdevs of mirrored disks would be a good place to start.

Other than that, I really can't provide any more help until you let us know what you are trying to use FreeNAS for. sync=always is a pretty demanding request. I'm not sure what your exact need is for sync=always, but I'd like to know why you think it is so important in your situation. Just by setting that you are pretty much forcing yourself to buy MUCH more powerful hardware to maintain a given performance level. After all, you can't expect to get increased reliability for free or it would be the default setting.
 

viniciusferrao

Contributor
Joined
Mar 30, 2013
Messages
192
Hello cyberjock, thanks for the reply.

I'm using sync=always because this iSCSI serves as Storage Repository to a Citrix XenServer. Inside those iSCSI ZVOLs, XenServer uses LVM, and each partition of the LVM is one Virtual Machine.

It's something like the VMFS from ESXi but using LVM. This is why I'm using sync=always, there are Virtual Machine Disks in this zpools.

This is sufficient info?

PS: A picture of one "bottlenecked" server, at this moment.
Screenshot 2013-11-28 01.31.14.png
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No, I was looking for the info on Citrix.

I'm not sure how many VMs you have, but if you have more than 2 or 3 you may be forced to go with 64GB of RAM or more. Naturally this is not something you are wanting to hear since your board is limited to 32GB. You definitely have an uphill battle by choosing to use FreeNAS to host VMs and then limiting yourself to 32GB of RAM.

Two recommendations for you:

1. So what you need to do is try adding an l2arc(SSD). Don't go bigger than 80GB or so though as you have only 32GB of RAM. You should never exceed 5x arc size for the l2arc. This is the part where being limited to 32GB of RAM cripples your ability to use a big l2arc to help absorb some of the I/O.
2. Destroy your pool(s) and go from a single vdev to multiple mirrored vdevs in the pool.

If neither of these are options I don't know what to tell you. Your system is kind of limited in its ability to provide good sustained performance because of the maximum system RAM.
 

viniciusferrao

Contributor
Joined
Mar 30, 2013
Messages
192
Hello Cyberjock!

The situation is extreme :)
You're missing a zero on the 2 or 3 VM's range, it's something like 20-30 VMs :)

We're willing to change the motherboard of the FreeNAS and the option are clear for me: Xeon E5 with lots of RAM. There are three things that I would like to know:

1. There's a way to really detect where is the bottleneck?
2. Are your sure about the RAM? Because I can ask for the money for a new motherboard kit, cpus and memory. Our resources exists but they are somewhat restricted, we are an University, in Brasil...
3. Can I implement a temporary solution? I was thinking in partitioning the two SSDs to share L2ARC and SLOG on the same devices, it's possible to have a RAIDED L2ARC?

Many thanks in advance,
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Using the same device for the L2ARC and SLOG doesn't work. In fact, it makes both of them kind of suck performance-wise.

If you have that many VMs running 64GB of RAM isn't going to be enough. You'll probably be talking 128GB of RAM, and perhaps more. There's no rule for how high you'll need to go since there's a lot of factors that affect your performance. If you want validation that this works there's plenty of other people that have had the same issue you are experiencing. Feel free to read up on what they did.

As for the bottleneck, that's easy. See the high latency you have in your VMs? That's because your pool has too much I/O. All of the recommendations I gave is to help offload the I/O to other devices or increase your total I/O to help keep pool latency low.

There is no "temporary solution". You just need to right size your server for your needs.

Keep in mind that you may be forced to get rid of the RAIDZ2 for mirrors to get performance up to a good level. There's no cheat sheet for what you'll need to do since your FreeNAS server's performance as well as the actual I/O needs of your VMs plays a big part in the final performance.
 

viniciusferrao

Contributor
Joined
Mar 30, 2013
Messages
192
Thanks once again Cyberjock. I've questioned about the partitioned SLOG with L2ARC device, because we use two 128GB SSDs. And it's a lot of space for SLOG. But it was OK.

About the hardware. We can buy this motherboard, for example, http://www.supermicro.com/products/motherboard/xeon/c600/x9dr7-ln4f.cfm

It's LGA2011, 16 memory banks, can go up to 128GB with Unbuferred ECC which is nice. We already have this memory, we can rearrange the memory, so it's cheap to buy the motherboard and two Xeon's E5. The motherboard here is costing us 2900R$ which is something like 1200USD.

We don't have the CPU prices at this moment, but it should be something like 1000USD each.

At this moment we will use only 128GB, because is what we have, the next upgrade will be with Registered ECC memory, hope we can get something like 512GB (overkill?), but we need 32GB memory stickies first.

I hope this solves our problem :)

About the mirrored disks, do you really thing this helps? My only fear is to have two disk fails in the same vdev, which will render the zpool unusable. RAID-Z2 gives more reliability no? I'm wrong? Will a lot of RAM and SLOG disks, RAID-Z2 is really an issue in performance?

Thanks in advance,
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Mirrors really increase your I/O because zfs uses disk in a per-vdev. In your case, if you had to write just 512-bytes you are technically going to write to 3 disks. You won't be able to do any other writes or reads from the vdev while that write is in progress. So you can easily see that multiple vdevs really add up, and fast. You are going to be limited to I/Os of the 1 vdev(which in tern is basically a single disk's worth). That's not much to go on as soon as you start throwing multiple systems(VMs) on it.

You are right and wrong with the RAIDZ2 giving more reliability. You'd need 3 disks out of the 8 you have, but with the mirrors you'd need 2 disks that both happen to be the same matched set. There's nothing stopping you from doing 3-disk mirrors either(except the cost). Losing the pool is always a possibility, that's why backups are VERY important and should never be neglected!

I wouldn't try to do more than 3 or 4 machines without going to a mirrored setup.

Remember, there's no hard and fast rules for how to get good performance for a given situation. There's figures that can give you round-about areas where you can expect to get decent performance. There's also many degress of "decent performance". So just like everything in life, your mileage may vary.
 
Status
Not open for further replies.
Top