Testing if my SSDs are a viable option for L2ARC

Status
Not open for further replies.

viniciusferrao

Contributor
Joined
Mar 30, 2013
Messages
192
Hi guys,

I'm opening a new thread since the old one was blown :)

I have a good ZFS machine: 2x Xeon E5-2620 and 128GB of ECC RAM.

My disks are 24x Seagate SATA 3TB 7200RPM Model: ATA ST3000DM001-1CH1 CC24; they are in stripe of mirrors configuration. So there are 12 vdevs with 2 disks in one zpool.

My SSD's are 2x Kingston V300 120GB SATA Model: SV300S37A120G configured as L2ARC.

I'm really believing that they are slowing up my pool. The point is: how to prove this?

At this moment I'm running some benchmarks. I've removed the SSDs from the disks zpool and created a new zpool with one stripe with 240GB of usable space.

All iozone tests are run with this command:
Code:
iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 240G


I've chosen 240GB of size to blow up the ARC on RAM, and 4MB of record size. Why 4MB? I don't know.

Here are some results, first on the SSD pool, and them in the disk pool.
Code:
Iozone: Performance Test of File I/O
       Version $Revision: 3.420 $
Compiled for 64 bit mode.
Build: freebsd 
 
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
            Al Slater, Scott Rhine, Mike Wisner, Ken Goss
            Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
            Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
            Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
            Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
            Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
            Vangel Bojaxhi, Ben England, Vikentsi Lapa.
 
Run began: Wed Apr  9 00:18:30 2014
 
Dedup activated 0 percent.
Dedupe within & across 0 percent.
Dedupe within 0 percent.
Auto Mode
Record Size 4096 KB
File size set to 251658240 KB
Command line used: iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 240G
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
       251658240    4096 2732317 1388588  3317126  3487875 2139369 2165839 2184993  6870975  2309243  2440627  1531569 2423249  2587110


Disk pool:
Code:
[root@storage] /mnt/pool0# iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 240G
Iozone: Performance Test of File I/O
       Version $Revision: 3.420 $
Compiled for 64 bit mode.
Build: freebsd 
 
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
            Al Slater, Scott Rhine, Mike Wisner, Ken Goss
            Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
            Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
            Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
            Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
            Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
            Vangel Bojaxhi, Ben England, Vikentsi Lapa.
 
Run began: Wed Apr  9 00:43:59 2014
 
Dedup activated 0 percent.
Dedupe within & across 0 percent.
Dedupe within 0 percent.
Auto Mode
Record Size 4096 KB
File size set to 251658240 KB
Command line used: iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 240G
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
       251658240    4096 2568673 1253497  2887512  3032571  331192 1875399  727754  6479734  1054113  2360520  1215763 1912467  2093973


There are somethings that I really don't get. In sequential tests the pool performed almost the same as the SSDs which is bad in my opinion. I was expecting much more from those SSDs, even this sh***y ones. But at random read we some huge improvement, almost 10 times faster. But it isn't sufficient. I will benchmark a single SSD with iozone to see some better results.

Theres another problem, some serious compression is happening during iozone tests, even with the options "-+w 0 -+y 0 -+C 0" to generate incompressible data.
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
Hi guys,

Theres another problem, some serious compression is happening during iozone tests, even with the options "-+w 0 -+y 0 -+C 0" to generate incompressible data.

Please try "-+w 1 -+y 1 -+C 1" instead. (Edit: you need use ioZone v3.424 to fully take advantage of this)

I will go update the wiki.

edit: to indicate which version of ioZone is required for recommended parameters.
 

viniciusferrao

Contributor
Joined
Mar 30, 2013
Messages
192
No problem eraser... I've just disabled compression in the pool to do the benchmark.

Anyway it would be great to find a way to run iozone generating incompressible data, if it's possible.

I'll post more results after this roundup.
 

viniciusferrao

Contributor
Joined
Mar 30, 2013
Messages
192
More results:

SSD pool without compression:
Code:
[root@storage] /mnt/ssdtest# iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 200G
Iozone: Performance Test of File I/O
      Version $Revision: 3.420 $
Compiled for 64 bit mode.
Build: freebsd
 
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
            Al Slater, Scott Rhine, Mike Wisner, Ken Goss
            Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
            Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
            Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
            Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
            Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
            Vangel Bojaxhi, Ben England, Vikentsi Lapa.
 
Run began: Wed Apr  9 16:06:12 2014
 
Dedup activated 0 percent.
Dedupe within & across 0 percent.
Dedupe within 0 percent.
Auto Mode
Record Size 4096 KB
File size set to 209715200 KB
Command line used: iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 200G
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
                                                            random  random    bkwd  record  stride                                 
              KB  reclen  write rewrite    read    reread    read  write    read  rewrite    read  fwrite frewrite  fread  freread
      209715200    4096  794492  566144  722605  733060  593072  288812  763302  6406607  870476  316128  439961  699051  701288


Disk pool without compression:
Code:
[root@storage] /mnt/pool0# iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 200G
Iozone: Performance Test of File I/O
      Version $Revision: 3.420 $
Compiled for 64 bit mode.
Build: freebsd
 
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
            Al Slater, Scott Rhine, Mike Wisner, Ken Goss
            Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
            Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
            Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
            Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
            Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
            Vangel Bojaxhi, Ben England, Vikentsi Lapa.
 
Run began: Wed Apr  9 14:41:59 2014
 
Dedup activated 0 percent.
Dedupe within & across 0 percent.
Dedupe within 0 percent.
Auto Mode
Record Size 4096 KB
File size set to 209715200 KB
Command line used: iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 200G
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
                                                            random  random    bkwd  record  stride                                 
              KB  reclen  write rewrite    read    reread    read  write    read  rewrite    read  fwrite frewrite  fread  freread
      209715200    4096  795177  703034  1493073  1512012  113844  748816  492634  6535497  649467  775932  750880 1314659  1448270


Analysing those benchmarks, we can see that the SSDs are shitty in read speed in comparison with the disk pool. The only notable difference are in random read.

Now with those results I'm unsure if I would put those SSDs as L2ARC, or if I will just remove them. @cyberjock, can you assist?

EDIT: Pool with SSD as L2ARC.
Code:
[root@storage] /mnt/pool0# iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 320G
        Iozone: Performance Test of File I/O
                Version $Revision: 3.420 $
                Compiled for 64 bit mode.
                Build: freebsd
 
        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
                     Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
                     Vangel Bojaxhi, Ben England, Vikentsi Lapa.
 
        Run began: Wed Apr  9 18:08:02 2014
 
        Dedup activated 0 percent.
        Dedupe within & across 0 percent.
        Dedupe within 0 percent.
        Auto Mode
        Record Size 4096 KB
        File size set to 335544320 KB
        Command line used: iozone -+w 0 -+y 0 -+C 0 -a -r 4096 -s 320G
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
       335544320    4096  793090  780088  1475692  1505028   93169  748138  399793  6555507   581936   760913   773669 1442482  1459475
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh boy.. someone called for me.

To be honest(and this isn't personal) I try to avoid giving advice on things like this because it's alot of "this-for-that", tradeoffs, and is quite complex. Far more complex than 99% of people will ever truely appreciate. Not to mention I try to keep this stuff to my paid support because it's complex and it's far easier to discuss things on the phone or skype than to write out a book for every user with your problem(and believe me.. we get these every 2 or 3 days). I've started writing down some(only some) of the special information for L2ARCs(and ZILs). It's currently like 25 pages. I'm expecting it to end up over 100 pages. Remember.. I said *some* back there since it's a work in progress. LOL. It's extremely complex and there is no "one stop shop" for all of the answers.

To brush on a few of your problems:

1. Sequential reads should never be faster with an L2ARC. That's not what an L2ARC is for, and that's not how it works. (This alone tells me you are ill-equipped to deal with your problem. This is a common newbie mistake. It's nothing personal, but that's probably the #1 mistake for L2ARCs)
2. Your benchmarking criteria needs a lot of work. There's a lot at work and a laundry list of mistakes, so I'm sorry but I don't want to go into detail because I'd spend the rest of the day trying to explain it. Not to mention that I'd need more information before I could even list all of the potential mistakes you made. I'm pretty sure you've made mistakes that can't be validated with the information provided because so many people make the mistakes you are making too. So I'm betting there's more and that's all I'll say about that.
3. I'm not a fan of Kingston. I don't think they are the best choice for a ZIL or L2ARC, but they should work just fine regardless. Especially as an L2ARC. I've never had a client use Kingston before, but looking at the specs of the V300 I'd have to expect a performance increase even in your situation.

When dealing with all the different aspects of ZFS(many of which you may not even be aware of) its like a balanced scale. If you lean too much on one side of the scale you sacrifice something on the other end. This alone is 99.999% of the problems because people can't appreciate all of the relationships and come to a setup that really will do you some good. It's super easy to lean one way inappropriately and see pool performance go down despite buying bigger, faster, and more expensive hardware. I've learned a lot from everyone elses' screw ups because they make the same ones because they think they know what they are doing but don't really know what they think they do. I also see relationships in real-world situations from dozens of people since I basically live here, and that's very handy because most people won't get that much real-world experience in a decade of working with ZFS.

Now, to be honest, I'm going to disregard your benchmarking because it's not really useful the way you are doing it. But, those SSDs should definitely provide some improvement. If they aren't I don't have any advice without digging very deep into your server. How it's setup on the software side, the workload it has, how much workload it has, etc. It's not something I'm really wanting to jump into because it can quickly turn into a 2-3 day event. Like I said.. it's complex and stuff is far more complex than you probably realize.

So that's why I keep my nose out of these threads(and haven't posted until you asked for me by name). The result is always the same. Like other threads, the OP gets pissed because I don't want to write a 50 page explanation and the OP doesn't understand why I'm saying they are wrong. So I just avoid these kinds of things. If you want to do some paid support feel free to PM me. I'm in IRC almost all the time too. If this is for a business you will almost certainly be impressed with what I provide and what I charge. Pretty much everything can be done remotely with SSH, Teamviewer, Skype, and/or phone calls.

Other than that, I don't have any good advice to provide. You appear to be doing everything right on the surface. But deep down something is clearly wrong.
 
Status
Not open for further replies.
Top