Is there a Comprehensive Performance testing guide?

sfcredfox · Sep 4, 2014

Greetings,

I'm looking for a comprehensive guide to properly analyzing the performance of a FreeNAS system. Maybe it's already out there and I can't find it.

Please read whole novel before you reply with 'You didn't look hard enough, I knew exactly what to search for...'

Questions:
Are there any posts I've missed that give a start to finish analysis that explains the process, what commands to use, and how to use them (Not a post where a few commands were used, but with no explanation of why or what they mean, see background)?

I am looking for things like:

The overall process (How to properly test, not get meaningless numbers)?
Reason why commands are used, and how to use them in different scenarios?
What their results actually mean (since so many people say some results are artificial)?
What commands are used simultaneously and why (E.G. - avoid using ARC when you're trying to test disk)?

If there is no already created guide for this, can someone compose a response that would teach someone from start to finish how you would like to see them conduct a performance review?

Background / Further Discussion
There are a ton of forum posts where people talk about their specific performance issue, and some tests they did to show some metrics for some reason, but its hard to understand how to apply that if you don't know what they are supposed to be doing and showing.
http://forums.freenas.org/index.php?threads/write-performance-issues-mid-level-zfs-setup.13372/
https://wiki.freebsd.org/ZFSTuningGuide (this was a good start)
http://forums.freenas.org/index.php?threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/ (tremendously helpful to understanding differences)

After reviewing the above references, manual, the noob PPT guide, and dozens of posts, I still don't have a great grasp on where to start for conducting a true performance analysis. ARC plays a part, so if you improperly use a command like DD for example, are you truly testing disk performance or your ability to write some stuff into memory? Maybe it looks really good because the file was too small to fill the buffer/group/forget the name that gets written to ZIL and pool. These examples may be totally wrong, but that's the point of asking for help.

I have seen many commands used for various reasons: DD,xDD,ioStat,ArcStat,output files, etc. The man pages give a basic descritpion and a ton of variables/inputs/etc, but for a non FreeBSD person, it's like telling a person to use ifconfig to setup their network interface, but not knowing it needs an IP address, subnet mask, DF, etc, and maybe not even knowing what those are. Found out that DD can wipe out data too when doing it to dev of=/dev/da1. Oops.

Further questions:
How do you decide if numbers are good or bad?
How do you fully decide if you need a SLOG? What should you be doing/seeing/etc?
How do you performance test a SLOG (Two mirrored SSDs Vs a RAID1+0 spindle disk, mentioned in JGreco's post)?
How do you know

Google/Forums only gets me so far since I don't know what I don't know.

This information would also be useful in being able to determine if you need a SLOG for the ZIL or not. Many posts, and the noob guide say if you don't know if you need one, you probably don't. How do you know if you do?

iSCSI is all sync heavy and will tax the system much harder than not, so do you automatically need to look at a SLOG with iSCSI? How do you test/review a SLOGs performance? Same questions for ARC and L2ARC. I don't have RAM maxed out, so I won't even be thinking about it, but how do you review and interpret the performance of ARC?

Again, this might be out there already, but google and forum searching has only gotten me more questions. I need some guidence/mentoring.

vikingboy · Sep 4, 2014

I'd be interested in some guides for benchmarking FreeNAS to provide a degree of comfort my system is achieving the performance it should. Im trying to apply a tiered methodical approach benchmarking components along the way, so single disk, array of disks in parallel, HBA cards vs pcie bottlenecks, CPU and network....I haven't got very far with numbers that marry what I was expecting though hence why Id be interested in some guidance.

diehard · Sep 4, 2014

We should ask Cyberjo.. sorry i mean PrincessPeach about how he feels bout benchmarking ZFS.

vikingboy · Sep 4, 2014

yeah, I know what he'll rant...that new avatar doesn't fool me...that princess still has sharp teeth behind that cute smile! :)

I do think there is value in benchmarks, synthetic or real world production. However, FreeNAS is definitely complicated so its pretty easy to produce numbers which either mean nothing or can mislead. Thats what the OP was asking for, some clarity on how any why. There are threads buried in this board which I've stumbled upon that have been helpful in understanding what and how to benchmark certain aspects of FreeNAS performance, they just need pulling together. I suspect once I've finished building my system I'll try and assemble my notes into something useful.

diehard · Sep 4, 2014

I can address the iSCSI issue if it will help at all. iSCSI itself isnt necessarily "sync heavy" , it's generally the thing that it is connected to (ESXi in my case) inability to know what needs to be a sync write and what doesn't. This is what you will find most articles on and why most end up forcing sync writes. I don't believe iSCSI connected to a target running directly on a filesystem (aka MS iSCSI Initiator with NTFS) will have any more sync requests than any other protocol. JGreco would know more about this , however.

Forced sync on a VM(like that on ESXi) will require an SLOG for any sort of adequate performance.

zilstat can help you determine the need for an SLOG, arcstat.py and arcsumary.py can help you determine the need for more ARC.

sfcredfox · Sep 5, 2014

diehard said:
Forced sync on a VM(like that on ESXi) will require an SLOG for any sort of adequate performance.

zilstat can help you determine the need for an SLOG, arcstat.py and arcsumary.py can help you determine the need for more ARC.

zilstat is definitely on my list of things to learn how to properly use. I will be doing a datastore for ESXi 5.5.U1.

I am extremely interested in comparing performance with no SLOG to a build with (unless done stupidly, should obviously be better). More specifically in assessing whether spindled disks will help or hurt performance when used as a SLOG (gets into the stupid part). Based on jgrecos post, he mentioned doing RAID with cache as a possible alternative to SSDs.

It's obvious that SSD is the advised/preferred/supported choice if you can and would out perform any spindle disk, but how far off is an alternative setup like he suggested? Is it REALLY a viable option, or just sounds cool? Maybe it's been done before, not sure yet, checking for posts. REF: http://forums.freenas.org/index.php?threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

cyberjock · Sep 5, 2014

Yeah, spindle based SLOGs are kind of pointless. The whole purpose of the SLOG is to buy a small but VERY fast disk in relation to the disks in your pool. WD 10k and Seagate Cheetahs were big before SSD came out. SSDs are so much faster than all spinning rust disks that it's basically foolish to do it except to say "see, I did it!"

diehard · Sep 5, 2014

You talking about using a RAID controller cache w/ BBU ? I've never tried, jgreco hasnt really given much more info about the feasibility of that, but he is posting lately so he might have some incites on how well it went. I'm still waiting for NVMe + Intel SD P3500 to hopefully dethrone the STEC ZuesRAM as the go-to high end SLOG.

jgreco · Sep 5, 2014

cyberjock said:
Yeah, spindle based SLOGs are kind of pointless. The whole purpose of the SLOG is to buy a small but VERY fast disk in relation to the disks in your pool. WD 10k and Seagate Cheetahs were big before SSD came out. SSDs are so much faster than all spinning rust disks that it's basically foolish to do it except to say "see, I did it!"

You could argue SSD based SLOGs are pointless for pretty much the same reason... there's latency in the system since it has to be a sync write.

If you can cough up a RAID with write cache, though, you eliminate some of the back-and-forth involving multiple controllers (HBA then HDD/SSD and back to HBA) and instead you get a very fast commit to BBU cache RAM that doesn't involve that second controller.

The downside to SSD based SLOG devices is that the good ones are relatively expensive, and the cheap ones wear out too quick. On that basis, I'd argue that spindle based SLOG's fronted by a BBU write cache are quite possibly THE way to go ... massive endurance and lower latency. However, in practice the write speeds of spinny rust are slower, so that also has to be taken into consideration.

From a practical point of view, I was amused when I figured out that since I *already* had a mirrored RAID1 with BBU write cache as an ESXi datastore, I could put a small vmdk on it and use it as SLOG for a filer that was running as a VM. At that point it was all gravy because it wasn't really costing anything extra, except perhaps reduced IOPS on the datastore for other VM's when the FreeNAS VM was busy doing SLOG writes.

It has very interesting performance characteristics too, because with the 1GB cache on the LSI2208, I can turn on sync=always and it can absorb several hundred MB of write traffic before filling the write cache, at which point it does slow down to the write speed of the underlying hard drives.

sfcredfox · Sep 5, 2014

vikingboy said:
I'd be interested in some guides for benchmarking FreeNAS to provide a degree of comfort my system is achieving the performance it should. Im trying to apply a tiered methodical approach benchmarking components along the way, so single disk, array of disks in parallel, HBA cards vs pcie bottlenecks, CPU and network....I haven't got very far with numbers that marry what I was expecting though hence why Id be interested in some guidance.

I am curious what results you have had so far. Since I've seen nothing else so far to counter your proposal of a methodical approach, I agree that it might also be the best way to start.

What tests have you been running against your disk subsystem?
DD tests against a single disk, and then the pool?
I was going to research running Seq/random against a single device (disk), then the pool.

Do you have any examples of command lines you where using you could post?
I can't decide what BS and Count sizes to use.

From there, I don't know where to look, ARC? ZIL? Etc.

sfcredfox · Sep 8, 2014

vikingboy said:
I'd be interested in some guides for benchmarking FreeNAS to provide a degree of comfort my system is achieving the performance it should. Im trying to apply a tiered methodical approach benchmarking components along the way, so single disk, array of disks in parallel, HBA cards vs pcie bottlenecks, CPU and network....I haven't got very far with numbers that marry what I was expecting though hence why Id be interested in some guidance.

So, did you find any good method for achieving this?
Which type of DD tests did you run?
Have you found any articles/posts that talk about using zero vs. rand vs. etc?

I found a blog a guy wrote that said he was doing something similar, I'll find the URL and edit it into this post.

Since no one responded and said a guide exists, I was thinking about writing a rough draft of the process we both might use and then let the real professions like jgreco and others turn it into something legit.

Maybe I'll use the word baseline instead of benchmark to try and appease some of the benchmark haters...

[Personal rambling]
I personally feel like there's a slight difference - to me a baseline is about establishing that you are not making terrible choices or installed something wrong, and a benchmark is to determine performance under very specific circumstances - like a particular work load.

IG - Are my single disks performing in line with 6G SAS? And has does my pool perform in varying configurations of vdevs - as compared to - How well will my system do running exchange as a VM on a ZFS datastore with this size write?
[/Personal rambling]

vikingboy · Sep 8, 2014

I've been wrestling with access privileges since I got my array up and running correctly which prevented me from summarising for you, sorry. Since testing Ive also got a bit of an understanding why FreeNAS is tough to benchmark - you have to know what sort of data to expect and look at your numbers critically, don't just accept them. Ive been meaning to document the steps I took to get the basic hardware verified as performing correctly because I hit a few stumbling blocks which the tests help me identify and resolve. I have little interest in ZIL/SLOGs for the meantime so thats a bit beyond my scope right now.

Important Announcement for the TrueNAS Community.

Is there a Comprehensive Performance testing guide?

sfcredfox

Patron

vikingboy

Explorer

diehard

Contributor

vikingboy

Explorer

diehard

Contributor

sfcredfox

Patron

cyberjock

Inactive Account

diehard

Contributor

jgreco

Resident Grinch

sfcredfox

Patron

sfcredfox

Patron

vikingboy

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Is there a Comprehensive Performance testing guide?

Patron

Explorer

Contributor

Explorer

Contributor

Patron

Inactive Account

Contributor

Resident Grinch

Patron

Patron

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Is there a Comprehensive Performance testing guide?"

Similar threads