Some Interesting Performance Numbers on a new Freenas Server

Status
Not open for further replies.

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
OK, so after a bunch of reading and some suggestions from a couple of folks I decided to put my new freenas box through a series of tests using dd. I wanted to answer some questions about performance on the hard drives with various RAID configurations (both optimal and non-optimal) as well as how well the onboard Intel based sata ports were working on my new motherboard.

Here is my server configuration:

Supermicro Superserver 6028R-TRT
1 x X10DRi-T Supermicro Motherboard
1 x Intel Xeon E5-2650 V3 LGA2011-3 Haswell 10 Core 2.3GHz 25MB 5.0GT/s
4 x 16GB PC4-17000 DDR4 1233Mhz Registered ECC Dual-Ranked 1.2V Memory
9 x 4TB Western Digital WD40EFRX Red NAS SATA Hard Drives
Dual 740 Watt Platinum Power Supplies
Dual APC 1500 UPSs (one for each power supply)
8GB USB Thumb Drive for booting

The X10DRi-T motherboard has two Intel X540 based 10Gb Ethernet adaptors along with an Intel C612 Express chipset and 10 SATA3 ports.



I wanted to test read and write performance of various configuration of my server. I wasted a bunch of time doing this with IOZone but forgot to turn compression off, so I basically threw those numbers out and started fresh with dd and no compression.

There were some suggestions about using a SATA card as opposed to using onboard SATA ports as they are faster (my motherboard has 10 of them). So I not only wanted to test my various configurations (RAIDZ3 with 8 drives (considered non-optimal), RAIDZ2 with 8 drives (considered non-optimal), RAIDZ2 with 6 drives (considered optimal), RAID Z3 with 7 drives (considered optimal)) but I also wanted to test the onboard sata ports for performance.

One very interesting thing I found was how the hard drives responded individually during the tests and Ericloewe suggest here that it was possible that some games were being played with the onboard sata ports and total capabilities.

Ericloewe:
However
, there's talk the last four ports on the PCH may be obtained via port multipliers internally or that they're otherwise shady. There is no real data on this yet.

I submit he may be right at least on my motherboard. Here is the reason I think that might be the case. When I was running my tests, I noticed that some hard drives appeared to have different performance values. This only happened on READs not WRITES and only on two of the ports. Swapping hard drives resulted in the exact same results. So it appears (based on my limited hardware knowledge of these Intel sata ports) that something off is going on with the onboard sata ports.

I was running the following command on a RAIDZ3 with 8 Drives vol no compression:
Code:
dd if=/dev/zero of=/mnt/vol1/testfile1 bs=10M count=30000 (not shown on graph)
dd of=/dev/zero if=/mnt/vol1/testfile1 bs=10M count=30000
dd of=/dev/zero if=/mnt/vol1/testfile1 bs=10M count=30000
dd if=/dev/zero of=/mnt/vol1/testfile1 bs=10M count=30000


15565054566_40caea2d4f_o.png


Same exact time here is another disk on another port:

15588732555_f3860cfe44_o.png


I noticed there is a significant difference. This is on two of the ten ports I see this behavior. Switching drives did not change the results.

So that is the first thing I found interesting. Maybe the motherboard is playing games with eight of the ten ports, or maybe just two of the ten ports. Someone with a lot better understanding of the hardware side of the house might be able to explain why the motherboard is acting like it is.

OK, so I scratch my head but move forward with my testing. Use the following command (suggested here):

Code:
dd if=/dev/zero of=/mnt/vol1/testfile bs=4M count=10000


and

Code:
dd of=/dev/zero if=/mnt/vol1/testfile bs=4M count=10000


I ran these tests three times each. The read test was off the charts. I later changed my testing to increase bs from bs=4M to bs=10M. Someone had suggested that with 64G of ram somehow things were getting cached. I am not sure, but later tests tended to report more predictable read results.

I used the onboard stat ports for one set of test, then I switched it out with a sata Areca 1222 controller card that we had laying around for another set of tests. The 1222 card did not see my 4TB drives right away, so I upgraded the firmware on the card and it was great. I toyed with the idea of using this card, but it has horrible reviews, so this was just a test. What I need to do is buy the IBM M1015 but I am still not convinced the onboard ports won't work for me.

All compression was turned off for these tests and I ran them against the following configurations all with eight drives: RAID10, RAIDz1, RAIDz2, RAIDz3 and RAID60. Here were the results:

15565205196_9940be75e6_c.jpg


Well, that card is pretty much worthless I would say, but in all fairness it was an old card!

Since my reads seems WAY off the charts, I decided to change how I was testing. The next series of tests were run as follows:

Code:
dd if=/dev/zero of=/mnt/vol1/testfile1 bs=10M count=30000
dd of=/dev/zero if=/mnt/vol1/testfile1 bs=10M count=30000


These test resulted in much more reasonable read speeds. I didn't bother with the sata controller card, opting instead to stick with my onboard ports.

This test was no compression, 8 drives, bs=10M and a count of 30000. I only ran RAIDZ2 and Z3 this round as those are really what I am looking at running:
15613124165_3fd60b4eb2_c.jpg





And finally, just so I could see for myself if the "Optimal" (power of two + parity) was really more optimal (at least using dd) I ran one more test, RAIDZ2, 6 Drives, no compression. The top results above show that the optimal configuration (in my case, 4 drives + 2 parity drives for a total of six drives) seems to perform worse than a RAIDZ2 with 8 drives!




So after all of this testing, it looks like the motherboard provides pretty good performance, but I still think I may go with the IBM M1015 and rerun these tests to see what difference I might find.

I am very interested to know if you freenas gurus think these tests are reasonable and if dd is the best tool for these types of performance tests.

Thanks!
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'll give some info/advice for some of the things you noticed:

1. The fact that 2 SATA ports behaved differently is unexplainable to me. They all appear to share the same interconnect with the hardware per page 18 of the manual for your motherboard. It's possible the manual is in error, but I can't despute what I don't understand. Of course, later their block diagram is somewhat conflicted because the manual says:
Ten SATA 3.0 ports are located on the motherboard. Six SATA 3.0 ports (I-SATA 0-5) are supported by the Intel PCH C612. The other four SATA ports (S-SATA 0-3) are supported by the Intel PCH.

I'd tend to expect separate interconnects, but that's not what the diagram shows. So I tend to think the diagram is wrong (shame on Supermicro). But, this doesn't change the fact that 2 are different than the other 8. You'd have to test all of the SATA ports simultaneously to determine if this was a port thing that is separate for 4 ports or not.

I will say that internal to iXsystems, if you buy the highest end server from us you cannot use any of the hard drive bays in thehead unit for anything except for ZIL and L2ARC. Both of those are very sensitive to latency and we know from experience that if you have multiple shelves of drives that the latency of those drives will be higher. Keep in mind that when I say higher we're talking enough to seriously impede ZIL and L2ARC but not enough to affect your typical pool. The pool is going to have multiple-ms of latency due to the hard drives alone while the ZIL and L2ARC are going to be flash devices, so even 0.5ms of latency is going to be "extremely high". tl;dr it takes time for the data to traverse the cables and that adds up when you have multiple shelves and interconnects.

2. I had an Areca 1260ML-24 with 2GB of cache on-card. It's the last card that was made by that generation, which is almost 10 or so years old. The card performs very poorly in relation to an M1015. Despite having plenty of throughput, the bottleneck is the small latency (that was acceptable in that time frame). Today's technology is faster and lower latency, which is why the 1222 sucked balls. I was astonished when I had a pool that I put on either a current-gen controller (M1015 and some other card I can't remember) and the old 1260 and the performance was vastly different. In my case I want to say it was something like 50% different. What was funny to me was that enabling the write and read cache on the Areca which should have guaranteed superior write performance for small writes didn't happen. I was shocked because I would have expected the cache on the RAM card to make small writes screaming fast since they'd go to the cache and later to the disk. But that's not what happened at all. It performed just about the same against the M1015.

3. Your read and write tests are basically worthless unless you test them to be much larger than your ARC size. For your RAM (64GB), I would say your numbers can't be trusted unless your write or read test was at least 128GB, and 256GB would be better. Your numbers that went off the charts were because everything was in ARC (RAM). So you weren't testing your pool at all, you were testing "how fast can I retrieve data from the ARC". The answer... "pretty freakin' fast". (Edit: I can't do basic math in my head apparently) Also it's always better to have your block size for dd to be a multiple of 2. 2MB, 4MB, 8MB, 16MB. Doing 10MB blocks was kind of weird and non-standard but probably didn't affect the numbers since 10MB still ends up falling on block size boundaries.

If you were to redo all the tests (which I think you should because of the flaws I discussed in #3 above) I'd do the test with 256GB files and 8MB blocks. You could even do 32MB blocks if you wanted. I wouldn't expect that the difference in block sizes between 8MB and 32MB would matter. But, if you did 1GB block sizes it could seriously hurt performance. When reading from the pool the output device should be /dev/null. /dev/zero does allow you to input to it, but I can't vouch for how much it might affect the benchmark results. Building a pool and testing it all on the motherboard ports as well as all on the M1015 might prove interesting.

Lastly, if you happen to have 10 identical drives I would be very interested to see how all 10 drives perform on the motherboard. How many "ports" perform together and such.

Edit: Don't feel bad about your mistakes. You aren't the first one that has made those mistakes and you won't be the last either. ZFS is very special and you have to know how to benchmark ZFS properly to get information that is useful. ;)
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'll give some info/advice for some of the things you noticed:

1. The fact that 2 SATA ports behaved differently is unexplainable to me. They all appear to share the same interconnect with the hardware per page 18 of the manual for your motherboard. It's possible the manual is in error, but I can't despute what I don't understand. Of course, later their block diagram is somewhat conflicted because the manual says:


I'd tend to expect separate interconnects, but that's not what the diagram shows. So I tend to think the diagram is wrong (shame on Supermicro). But, this doesn't change the fact that 2 are different than the other 8. You'd have to test all of the SATA ports simultaneously to determine if this was a port thing that is separate for 4 ports or not.

Scroll down to the Chipset section.

Intel itself considers the last four SATA ports separate from the rest, particularly they don't support fake-RAID. They have been rather tight-lipped about the implications of this, however, so just how they differ is open to speculation. Suggestions range from "they're driven by a SATA port multiplier" to "they're a separate controller, cut down to reduce costs".

Of course, the elephant in the room is "why two and not all four ports".
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I *seriously* doubt Intel would go with a SATA multiplier. That lone would make the ports so unreliable (and perform so poorly) that this would be *very* well documented *very* quickly. I'd have expected this to have become common knowledge within just a few reviews of the product.

Remember, a SATA multiplier takes the throughput of a single disk and splits it to however many ports you have. So in this case 1 to 4, which would be a death keel for those that want to use those 4 ports for high speed SSDs. Totally not going to expect that from Intel.
 

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
Cyberjock -

Great information - thank you!

I suspected that I might have been reading from the ARC. My initial "reads" were in the 8gb range and I didn't even bother to graph those, but I thought once I went to bs=10MB count=30000 (10mb x 30000 = 300,000mb = 300gb)? When I go look at my testfile, it is showing 293G. I had assumed that this would exceed my ARC and force the system to actually use the pool, hence the reason my read numbers were more reasonable.

Am I misunderstanding something (other than I should use a power of two for the block size)?

[root@freenas01] /mnt/vol1# ls -alh
total 307204489
drwxr-xr-x 3 root wheel 4B Oct 20 19:56 ./
drwxr-xr-x 4 root wheel 512B Oct 20 19:55 ../
drwxr-xr-x 6 root wheel 6B Oct 20 19:55 .system/
-rw-r--r-- 1 root wheel 293G Oct 20 20:08 testfile1


OK, so here is what I am going to do next:

1) I have nine of the - 4TB WD RED drives now. I was testing with eight of those drives. I have another of the exact same drive on its way to me along with the IBM M1015. I will add the ninth and tenth drive to the system and rerun the tests as suggested using the ten ports on the motherboard. Then I will install the M1015 and rerun the same test and post the results. In the meantime I am going to rerun my tests on the eight drives I have now with the 8MB blocksize as opposed to the 10MB to see if there is any difference!

Code:
dd if=/dev/zeroof=/mnt/vol1/testfile1 bs=8MB count=32000

Code:
 dd of=/dev/null if=/mnt/vol1/testfile1 bs=8MB count=32000



On the motherboard, two of the ports are marked as DOM compatible, I do not know if this has anything to do with it and frankly I didn't map the hard drives to figure out which hard drive mapped to which specific ports. I will do that as well so I can see if those ports behave differently. With the eight drives I have on there now, I AM NOT using the two SATA ports marked as DOM compatible.

Thank again for the great information you guys, I enjoy the learning!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
Thanks jgreco - Once I get done with my dd retesting I will try your script and see how it compares!
 

mstinaff

Dabbler
Joined
Jan 21, 2014
Messages
34
jgreco, you're missing a : after the ftp in your link ftp//ftp.sol.net/incoming/solnet-array-test-v2.sh

Also thanks for sharing this. I look forward to trying it on some new hardware we'll have in hand soon.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hey would someone do me a favor?

Try embedding an FTP URL in a message and see if it removes the colon between the ftp and the // for you when doing a forum post.
 

Max the Dog

Cadet
Joined
Jun 26, 2013
Messages
5
wow that's even more awesomely wrong, it is incorrectly rewriting plaintext URL's that aren't explicitly linked.
 

Max the Dog

Cadet
Joined
Jun 26, 2013
Messages
5
Now I'm confused. It does something ELSE wrong from a different account using Internet Exploder.

Oh actually it does the same thing, IE just helpfully prepends "http://" to the screwed-up URL for even more confusingness.
 
Last edited by a moderator:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
In the spirit of further exploration, ftp://ftp.sol.net/incoming/solnet-array-test-v2.sh is not auto-linking for me under Firefox 32.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Idiotic XenForo. Well anyways guys "figure it out" and have fun with the tool.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
Thanks @jgreco.

Here is the script zipped up so xenforo doesn't puke. Hitting the .sh file directly in my browser via ftp sucks. The sticky is locked. When there is a better option I'll nuke this or just leave it.
 

Attachments

  • solnet-array-test-v2.gz
    2.8 KB · Views: 291
  • solnet-array-test-v2.zip
    9.5 KB · Views: 288
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
In the spirit of further exploration, ftp://ftp.sol.net/incoming/solnet-array-test-v2.sh is not auto-linking for me under Firefox 32.

Confirmed in Metro Windows 8.1 IE 11.
 
Status
Not open for further replies.
Top