ZFS Performance Adventure or How to spend a lot of time reading the forums

Status
Not open for further replies.

fuzzbawl

Cadet
Joined
Apr 5, 2014
Messages
5
My adventure started when I needed to build a storage array for work. We needed at least 3TB of storage for our current needs plus some expansion room, it needed to be somewhat fast (using ESXi) and reliable (ZFS scrubs were very attractive for that along with RAID-Z2/Z3). So, using my best information collecting tools (my eyeballs) I set out to do what any good server administrator does when deploying something they've never used before: Read the forums.

After several days of pouring over forum posts, guides, PDF files and who knows what else I decided upon a design:
  • Supermicro X9DRD-7LN4F-JBOD (Has an LSI SAS2308 in IT mode) motherboard
  • Supermicro CSE-846BE16-R1K28B chassis
  • 64GB of ECC DDR3 RAM
  • 2x Kingston KC300 60GB SSD
  • 2x Intel X520-DA2 Dual port 10-Gigabit SFP+ Ethernet NICs
  • 8x Seagate Constellation 64MB cache SAS2 7200rpm drives (to start with)
The reason I chose this hardware was that the LSI2308 was a SAS2 card in IT mode, so FreeNAS would have direct access to the raw disks. No weird "pass-through" or JBOD mode on a card not designed for that behavior, just plain raw access to the drives. The chassis gave me 24 bays, the RAM was at the beginning of what seems to be the recommendation before using a SLOG and the 60GB SSDs would be used in a mirror as the SLOG itself. The Dual port Intel 10GbE NICs are to connect to each of the ESXi hosts (and later a 10GbE switch when we need to expand) and the Seagate drives were a happy medium between crappy slow and too-high-for-my-budget.

My intent was to use NFS instead of iSCSI. There are a multitude of reasons that I want to avoid iSCSI, mainly the potential for data loss if something completely out-of-the-ordinary were to happen in the data center and cause an improper shutdown and a bunch of other reasons I won't list right now. I fired up FreeNAS and configured the array in a RAID-Z2 (4 drives per vdev, 2 vdevs). I fired up ESXi, connected my hosts and attached the shiny new NFS export. I ran some initial tests and checked "zilstat" to make sure that I actually needed a SLOG. The screen was full of numbers (no zeros) so from my understanding, yes I needed a SLOG. I added the SSDs to the chassis and configured them as a mirrored SLOG in FreeNAS. Everything was going fine but things seemed a little slow. I did a bunch more reading, got concerned that somehow jgreco or cyberjock would somehow jump through the screen and slap me for thinking about doing sync=disabled as a production solution and decided to do more reading instead. After a couple hours, my head was swimming with information. I tried a few other things but nothing was changing my results.

My bonnie++ tests were coming back at roughly 22MB/s block writes, 25MB/s re-writes, 394 MB/s read and 2440 random seeks. I decided to try something, I removed the SLOG and did the test again. This dropped my writes to the 5 MB/s to 7 MB/s everyone else was seeing without SLOG. So I thought maybe something else was going on. I tested my Ethernet cards with iperf and was getting 9.89 Gbit/s on short 10 second tests and full 1 hour tests. The cards are fine. So then it just had to be a problem with ZFS, I thought. I took another day and read more material, pouring over post after post until the middle of the night. Then I woke up and spend yet a third day going over documentation and forums.

What I learned is that I have much more to learn about ZFS. But I found a few utilities and decided to run them to see what was going on. I fired up my ESXi VM and ran bonnie++ again, this time while watching "zpool iostat -v 5" and "gstat". I noticed that the SSDs were on average 80% busy at less than 1000 ops/s. If I understand everything I have read correctly, this would indicate that my choice in SSD was poorly made. The Kingston KC300 I thought would be a decent choice, however it appears that the Intel DC S3700 or the Samsung 840 Pro would have been better choices.

So, my question is this: Am I on the right track here? I definitely don't want to set sync=disabled but I'm pretty confident I just need a faster SSD. I would opt for one of the Fusion-io (or similar) cards but it's just not in the budget this time around. Once we get everyone addicted to FreeNAS then perhaps we can upgrade :)

I've attached my "zpool iostat -v" output.
 

Attachments

  • zpool_iostat.txt
    1.6 KB · Views: 326

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
why do you use a z2 instead of mirrored devices? this kills speed...you loose massive IOs..not a good choice when you use vsphere.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
If the pool is primarily used as a datastore for vmware, and especially because you're running 10gig, I'd run mirrored vdev's. With 8 drives you can do 4 sets of mirrors and appx double your iops.

I would definitely TEST things with sync=disabled to see how much the zil / slog is affecting performance. But if you're seeing 80% busy I would agree they seem to be a limiting factor.
 

fuzzbawl

Cadet
Joined
Apr 5, 2014
Messages
5
Earlier today I deleted the pool and created a new one as all mirrors and I was able to acquire two additional drives as well so now I have 5 mirror vdevs in the pool. That doubled the speeds I was getting. A huge thanks for that suggestion, zambanini and titan_rw! I set sync=disabled and together with the new pool I definitely saw a huge improvement. A friend also let me borrow a spare Intel DC s3700 400GB SSD to run some tests with. Bonnie++ tests are doing around 470 MB/s writes, 206 MB/s re-write and 400 MB/s read with sync=disabled. I set sync=standard and I get 102 MB/s write, 76 MB/s re-write and 400 MB/s read. My read speeds are great but the sync writes are still a bit slow. I ran gstat during all the tests and saw that the Intel SSD was doing about 3200 ops/s at 50% busy. Any other suggestions or am I going to have to upgrade to a PCIe RAM card for SLOG?
 

fuzzbawl

Cadet
Joined
Apr 5, 2014
Messages
5
After some further reading (I feel like I'm reading more hours of the day than actually working on servers) I am going to try throwing more RAM in the FreeNAS server (maybe up to 128GB) then review arcstat and probably install an L2ARC. With 128GB of RAM, that makes the max ARC size 112GB, so L2ARC shouldn't be any larger than 560GB. Would it be safe to say a 256GB SSD would suffice for this? I've tried looking for some kind of data that would provide an indication if an L2ARC is necessary and what size to make it. I would rather like to avoid a 400GB SSD since they are crazy expensive.
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
throughoutput is not everything. vsphere will bring you much random read requests, and this will hurt you. this is also the reason why Cyberdog will bark that benchmarks are not a real world use test. ..here I disagree..we should start a benchmark thread. you will have to evaluate evaluate your workload. test everything with different recordsize and so on. for example the disk latency within your vmware guest may be more important to you. much text, only to tell you that the maximum sequential write speed is not that important.

one of my freenas
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
on one of my big freenas boxes(96gb ram, 15sas disks,zil) I had to remove l2arc, because the ssd was to slow. what workload do you expect, which and how many vmware guests?
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
an S3700 works great as a SLOG but if you need really high I/O with sync on you should be looking at a STEC ZuesRAM , or for less money a HGST s840Z (specifically made for ZIL usage but ive never seen them benchmarked)
 

fuzzbawl

Cadet
Joined
Apr 5, 2014
Messages
5
on one of my big freenas boxes(96gb ram, 15sas disks,zil) I had to remove l2arc, because the ssd was to slow. what workload do you expect, which and how many vmware guests?

We will be starting out with 10 VMs on the datastore. The plan is to move up to 25. About 50% of the VMs are low I/O, 40% are medium I/O and the other 10% are probably considered high I/O. I'm using FreeNAS 9.2.1.6 and ESXi 5.5 currently. I might try loading just 5 VMs onto the datastore and see how it performs with those.
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
I think one of those new Intel SSD DC P3700 PCIe based SSD's would be faster than the Zeus because of its interface. As long as you don't squirt more than 36 petabytes through it in under 5 years.... :)
I dont believe there is NVMe support in FN yet. It looks very intriguing as it eliminates some latency of AHCI.. but i still think the lower latency DDR2 on the Zues would still make it a bit better for a SLOG.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
There is no NVMe support. It's scheduled for FreeNAS 9.3, but as it isn't actually in there yet it may or may not actually end up in 9.3.
 
Status
Not open for further replies.
Top