expected high performance - got abysmal reads

IToaCoC · Mar 25, 2015

Dear community, admins, zfs-heads and co-enthusiasts,

I am stuck with an unexpected (mal-)performance issue I haven't been able to solve for some time, and now I feel it's time to seek your help and ideas. So what happened so far?

After setting up a HP microserver to respectable read iSCSI performance (10Gbe, 500+MB/s seq. read performance within client-VM on VMware, 800+MB/s from a windows client) I now am baffled by a full-fledged server setup offering reads between 250 and 400 MB/s. The latter is rare though and prone to drop to 250MB/s without warning and staying there through reboots etc (I noticed this behaviour several times with different setups, further details below).

Small server:
HP microserver gen8
XEON E3-1265L V2 @ 2.5GHz
16GB RAM
Intel 10GbE pcie
RaidZ2 with 4 Intel SSD DC S3700 800GB
still running FreeNAS 9.2.1.9 (without experimental target or multithreaded mode)

Big server:
Supermicro Mainboard X10DRH-IT
2x Xeon E5-2673v3 3.5GHz
128MB RAM
using on-board 10Gbe ports
3x LSI SAS 9300-8i (Avago? 12Gb/s SAS just to make sure IOs won't be a bottleneck)
22x SSD Crucial M550
2x Samsung 845DC PRO for slogd and L2ARC (the latter in order to ease on r/w load)
running FreeNAS 9.3 (I fetched the most recent update today)
purpose: VM storage for operating system and database volumes, file services will reside on separate NAS

I tried several setups
- a 22-disk RaidZ3
- a 3x7 RaidZ3 (3x 7 disks)
- a 2x9 RaidZ3 (2x 9 disks: max 3 per controller)
- a 3x RaidZ2 (3x 6 disks: max 2 per controller)
- a 21-disk double-mirrored-setup (7 mirrors each with one disk per controller)

Originally I paid attention mostly to worst-case write performance (using sync=always) which didn't change as much as I had hoped, but at some point I noticed that reads hardly seem to differ at all, only ranging from 250 to 350 MB/s (and once 450).

Also, sometimes results (read and write) differed
a. after a reboot
b. after a complete destroy and re-create using EXACTLY the same setup/configuration
c. after a completely new installation of freenas using the same version

I was also very careful to choose exactly the disks I intended to use, spreading them out over the different controllers to increase (theoretical) resilience.

The local performance seems to be much higher so I suspect the problem is somewhere in between... iSCSI performance, network setup (I tuned some sysctls for 10Gbe with only marginal to moderate effect), the windows network card is identical and also tuned identically to my former tests with the microserver. I've been at this for weeks now, configuring, testing, pondering, changing, cursing, googleing (not necessarily in that order though). Ok, I guess should have come here sooner, but I wanted to get some solid base, some reliable numbers to build on - but after hundreds of tests the numbers still won't add up.

Performance testing was done with CrystalDiskMark (using between 1 and 4 GB of pseudo-random data) and AS SSD benchmark. (Btw: AS SSD seems to flush the cache after each write, CrystalDiskMark doesn't. I only noticed by coincidence because for testing I always configured the zvol for sync=always or sync=disabled respectively in order to get comparable results.)

Can anyone point me in a more promising direction where I should keep digging?

Thanks in advance!

Kjartan

PS: as an aside, I found that having all disks on one controller results in a somewhat better performance compared to spreading the disks out over several controllers. Still, I like the extra security spreading offers...

mav@ · Mar 25, 2015

Spreading load between several HBAs is generally good from reaching ultimate IOPS, but unfortunately FreeBSD 9.x is unable to use full benefit of that. Situation should significantly change after migration to FreeBSD 10.x.

On the other side I am not sure why performance in read test with 4GB test file depends on disks on system with 128GB of RAM. If I understand the configuration right, all the read work should got from ARC, not touching disks. And your configuration should be more then enough to saturate 10GbE link.

You are asking questions without giving much input data. Have you tried to collect disk (gstat -I 1s -p) and CPU (top -SHIz) usage statistics during the test?

IToaCoC · Mar 25, 2015

Thank you for the hints, I'll do that as soon as I'm back in the office. All I can say (from running top on the command line most of the time) is that CPU seems to be more or less idling until it comes to 4k random reads/writes (esp. with a high queue depht naturally). And I found out that disabling hyper threading doesn't nuch change my numbers. But I can give you more on that tomorrow.

Btw, is there a general how-to thread offering 'useful' ways to do benchmark and get relevant results? E.g. what tests are sensible with different setups, which tools/benchmarks are useful, what (and how) to monitor (e.g. the commands you mentioned), monitoring slogd usage, where to expect bottlenecks and how to recognize them etc. Kind of the hows and whats and what-fors of benchmarking, how to stress different setups and what to look for in the results. Because I have a feeling I've been looking at the wrong figures during the last few days.

mav@ · Mar 25, 2015

You are asking too big questions to cover single forum thread. There is infinite variety of system loads and so infinite variety of benchmarks. I am doing iSCSI profiling for a year now and I am sure that I have covered only minor part of that variety. My typical scenario for profiling is "search and destroy" -- search internet for some interesting load patterns or patterns that are reported to be bad for FreeNAS, try to reproduce that in maximally controllable lab environment and then use different profiling mechanisms step-by-step to identify bottlenecks.

What I would recommend you if you want to go into that magic/shady world of benchmarking is to use only tests, behavior of which you can understand and explain up to the last digit and request. That usually means using the most simple setups and most simple tests. In such case you will at least know what are you actually measuring and what all those numbers mean. If you don't like numbers you get, then knowing what test should do you can proceed to the next question -- why it does not work as expected.

cyberjock · Mar 27, 2015

I know the 9300 series controllers use a driver that was last listed as "barely alpha". So that could definitely be a problem. I would definitely have avoided going with the 9300 series for any server where the data is important.

mav@ · Mar 27, 2015

cyberjock said:
I know the 9300 series controllers use a driver that was last listed as "barely alpha". So that could definitely be a problem. I would definitely have avoided going with the 9300 series for any server where the data is important.

I am using SAS9311-8i with mpr driver in my lab for almost a year now without major problems. Indeed with earlier version 03 of the firmware I had some performance problems on SATA SSDs. After updating firmware to version 06 several months ago problem has gone. I have report that LSI/Avago is now working on driver update, so we may see version 08 in a base system soon.

Important Announcement for the TrueNAS Community.

expected high performance - got abysmal reads

IToaCoC

Cadet

mav@

iXsystems

IToaCoC

Cadet

mav@

iXsystems

cyberjock

Inactive Account

mav@

iXsystems

Similar threads

Important Announcement for the TrueNAS Community.

expected high performance - got abysmal reads

IToaCoC

Cadet

mav@

iXsystems

IToaCoC

Cadet

mav@

iXsystems

cyberjock

Inactive Account

mav@

iXsystems

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "expected high performance - got abysmal reads"

Similar threads