Another thread on performance...

Status
Not open for further replies.

Dotty

Contributor
Joined
Dec 10, 2016
Messages
125
Allow me to make a few statements, and please correct me every time Im off.
Lets assume my main use of FreeNAS is as file server, via SMB, clients are Windows workstations and files are big, for the most part.
(by big I mean, 2-10 GB each, sort of Raw photos or videos)

Now, the statements:
The theoretical transfer rate on Gigabit LAN is about 125 MB/sec, so no matter how fast are the HDDs on the workstation or how efficient the FreeNAS box, the transfer rate will never exceed that number.
(lets not involve cache, for the sake of simplicity)
Therefore, if I have a workstation with SSD (capable of lets say 550 MB/sec) and whatever combination of HDDs on the FreeNAS, the fastest I will ever go is 125 MB/sec (of course, little more, little less but pretty much around that number, sustained)
All this assuming no one else is using the LAN or the FreeNAS box.
The moment two workstations try to pull a file from the FreeNAS, they start competing for the bandwidth, and if we assume the FreeNAS is not the bottleneck, then each user will get a transfer rate of 125 MB/sec divided by 2, (about 60 MB/sec) . Please dont get picky on me on 58 MB/sec or 73 MB/sec, hopefully I'm coming across with the theory.

Now, lets say I want to make things faster, so I enable Link Aggregation on the FreeNAS box, and LACP on the LAN switch, contrary to some people that think LACP on the FreeNAS will make your workstation pull files faster, we actually do nothing to improve the speed of one single workstation, first because the workstation doesnt have LACP, and second because LACP don't increase the total bandwidth, but just "pool" the number of physical wires you have instead, and balance connections among them.
Basically, one workstation will go to the LAN switch at Gbit speed, connect to the FreeNAS on ones of the LACP wires and pull a big file at 125MB/sec, and if a few seconds later the second workstation comes, then LACP will put that traffic on the second wire on the Link Aggregation of the FreeNAS, at also 125MB/sec, so with LACP on two wires you can have theoretically two workstation pulling big files at 125MB/sec (again, as long as the FreeNAS is not the bottleneck).

By the way, in reality, for the tests Ive done with Cisco LACP , the relationship of bandwidth-wires is not 1:1 but sort of 3:2 , basically to get 125MB/sec on each workstation you have to set LACP with 3 wires, and so on.
I guess there is some sort of overhead on the balancing algorithm (I tried src mac , dst mac, src ip, etc , all the same)

Now, lets say we want to really make the file transfer faster, so lets go to 10GBE, so we go ahead and install a nice Mellanox NIC on the FreeNAS and another on one workstation, FO wires, direct connected to one another (no switch), voila, we expect to hit the transfer rate of 1000MB/sec but it doesnt happen.
Part of the reason if because you have to have a HDD on the workstation that can move at that rate (a brand new m.2 lets say) and also the FreeNAS HDD needs to be capable of that transfer rate.

Well, I was playing with that today, with seven WD RED 3TB, but i cant pass 600MB/sec (the WD RED are rated 130MB/sec so a stripe should give me about 900 MB/sec but im not near that).
at some point I though it was my workstation, so i started running read/write tests on the freenas box itself with dd command,, same thing, cant pass 600MB/sec.

Then I though it was FreeNAS, so I installed Solaris 11_3 , configured a ZFS pool and the same thing (I even enabled SMB and tried from the Windows machine, but no difference).
I also installed OpenIndiana, same thing.

On all those, If I run dd command over a pool with only one HDD on the zpool I get 130 MB/sec, just as the WD RED specs are, but when I get 7 drives together my math doesnt work.
Im installing FreeBSD 10 right now.
After I do that, I will install Ubuntu with an Areca RAID card that I know works excellent, and I will test performance again (Im sure Ive passed that mark with Hardware RAID before in similar tests, but i dont really remember)

Am I making any sense with my assumptions and expectations?
Anyone out there with real word experience on this?
How can one extrapolate the expected performance of a zpool , based on the HDDs specs and the number of disks on the pool?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Now, lets say I want to make things faster, so I enable Link Aggregation on the FreeNAS box, and LACP on the LAN switch, contrary to some people that think LACP on the FreeNAS will make your workstation pull files faster, we actually do nothing to improve the speed of one single workstation, first because the workstation doesnt have LACP, and second because LACP don't increase the total bandwidth, but just "pool" the number of physical wires you have instead, and balance connections among them.
Basically, one workstation will go to the LAN switch at Gbit speed, connect to the FreeNAS on ones of the LACP wires and pull a big file at 125MB/sec, and if a few seconds later the second workstation comes, then LACP will put that traffic on the second wire on the Link Aggregation of the FreeNAS, at also 125MB/sec, so with LACP on two wires you can have theoretically two workstation pulling big files at 125MB/sec (again, as long as the FreeNAS is not the bottleneck).

By the way, in reality, for the tests Ive done with Cisco LACP , the relationship of bandwidth-wires is not 1:1 but sort of 3:2 , basically to get 125MB/sec on each workstation you have to set LACP with 3 wires, and so on.
I guess there is some sort of overhead on the balancing algorithm (I tried src mac , dst mac, src ip, etc , all the same)
The rule of thumb is generally that link aggregation is useful starting around 10 users - I'm sure this will vary according to the specific solution implemented.
Now, lets say we want to really make the file transfer faster, so lets go to 10GBE, so we go ahead and install a nice Mellanox NIC on the FreeNAS and another on one workstation, FO wires, direct connected to one another (no switch), voila, we expect to hit the transfer rate of 1000MB/sec but it doesnt happen.
Part of the reason if because you have to have a HDD on the workstation that can move at that rate (a brand new m.2 lets say) and also the FreeNAS HDD needs to be capable of that transfer rate.
The NIC drivers may also be causing trouble.
Well, I was playing with that today, with seven WD RED 3TB, but i cant pass 600MB/sec (the WD RED are rated 130MB/sec so a stripe should give me about 900 MB/sec but im not near that).
at some point I though it was my workstation, so i started running read/write tests on the FreeNAS box itself with dd command,, same thing, cant pass 600MB/sec.
Doesn't seem unreasonable. Note that reads from ARC are essentially instantaneous and L2ARC is very fast, so more RAM and more L2ARC can massively help your performance (don't forget that every ARC/L2ARC hit is one less read access, leaving the array free for writes).

What pool layout are you using with those seven drives?
 

Dotty

Contributor
Joined
Dec 10, 2016
Messages
125
The rule of thumb is generally that link aggregation is useful starting around 10 users - I'm sure this will vary according to the specific solution implemented.
What would be the metrics that support that only after 10 users? Ive done LACP on Solaris and VMware boxes, you see the benefit as soon as two users are trying to pull the resource from the network at the same time. It is very easy to test by the way.

The NIC drivers may also be causing trouble.
I cant pass 600MB/sec running dd on the FreeNAS itself, no network.

Doesn't seem unreasonable. Note that reads from ARC are essentially instantaneous and L2ARC is very fast, so more RAM and more L2ARC can massively help your performance (don't forget that every ARC/L2ARC hit is one less read access, leaving the array free for writes).
I have 32 GB RAM ECC on this particular Box (it is a Dell Precision T7400 dual Xeon, and I can take it to 128GB RAM using the risers, but I wonder if that is the problem, really? more than 32 GB RAM to increase transfer speed on a 2-10 GB file?)
By the way, I think I was too tired yesterday when I wrote the original post, the HBA is not 9211, it is a SAS9200-8E 8PORT Ext 6GB, and the HDDs are on an external enclosure ARC-4036 from Areca via Mini SAS SFF-8088 cable.
Im going to do some reading today, I wonder if that one-wire Mini SAS is the bottleneck.

What pool layout are you using with those seven drives?
I started with RAIDZ but ended up with stripe, I though writing/reading from the parity would slow things down.

I finished installing OpenIndiana, Ill test the scenario there today, and then I will move to regular hardware RAID and check again.
 
Last edited:

Dotty

Contributor
Joined
Dec 10, 2016
Messages
125
I meant "finished installed FreeBSD",, (openindiana was yesterday).
Same thing with FreeBSD ZFS

I read about my JBOD expander box and HBA,, they are all multilane x4 in their ports, and Im using Mini SAS multilane wire, so technically one external wire should pass 600MB/sec times 4 (2400MB/sec)
Just in case, I set SAS groups on the JBOD and connected dual Mini SAS cables (the HBA and the JBOD has multiple ports and are group-enabled so you can split them and put 4 HDDs on each connector)
Same thing.

I will install Ubuntu with Hardware RAID later today, geez! I might even install Windows 10, configure Storage Spaces on stripe via PowerShell and test :smile:
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778

Dotty

Contributor
Joined
Dec 10, 2016
Messages
125
Not 100% sure, but I think parity only comes into play when the checksum doesn't match the data. If I'm correct, RAIDZ should pretty much match stripe for sequential read performance.

Have you read this?
http://open-zfs.org/wiki/Performance_tuning

Looks like good reading, thanks!, I will definitely go through it tonight.
About parity, I might be wrong, but 7 HDDs on stripe should be faster than 7 HDD configured with parity, no matter the situation, even if the checksum match data at all times.
That's what I did, I though parity was slowing me down a bit, and I reconfigured for stripe, but didnt make a difference.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
7 HDDs on stripe should be faster than 7 HDD configured with parity
I think you're correct in theory, with the difference in streaming bandwidth being 1/7. This assumes the HBA or underlying data bus is not a bottleneck.

Of course, this is for streaming reads. Clearly parity will slow down writes to some degree.
 

Dotty

Contributor
Joined
Dec 10, 2016
Messages
125
I was curious how Windows would fare, using the same JOBD box and the same HDDs of my previous tests.
I get about 920MB/sec on seq read, and almost the same on writes.
Thats consistent with WD RED specs (+/-130MB/sec multiplied by 7 HDDs)

(Im using a EVGA SR-X dual Xeon E5-2687W with 96GB RAM and an Areca 1882-ix-24 booting from 4 Corsair Gforce SSD on RAID "0" and the test is running over the 7 HDDS on my JBOD , connected on the External port of the Areca Card, configured also on RAID 0)

I'll go with Ubuntu later, then I might test Solaris 11 on this machine, with and without hardware raid, and see what happens.
 

Dotty

Contributor
Joined
Dec 10, 2016
Messages
125
[root@freenas] /mnt/volume1# dd if=/dev/zero of=/mnt/volume1/performance/testfile7 bs=128000 count=800000
800000+0 records in
800000+0 records out
102400000000 bytes transferred in 102.251009 secs (1001457110 bytes/sec)

Im testing over a dataset called "performance" with compression OFF.
128k block size, since I saw thats the default for ZFS, I have 96GB RAM so Im testing with a file bigger than that, just to make sure I wont hit cache.

Now we are talking.
This is the SR-X, with the Areca RAID card working as JBOD.

I guess my LSI card is giving me trouble on the other machine,, but im glad the cables, HDD and expander box are not the bottleneck.
Ill keep testing tomorrow.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I'm getting confused but just to double check you should not use the areca raid card with zfs. It will have bad performance and cause data consistency problems.

Sent from my Nexus 5X using Tapatalk
 

wtfR6a

Explorer
Joined
Jan 9, 2016
Messages
88
Sounds about right to me. My ten drive z2 topped out at circa 800-850MB/s. there's overheads to getting all the ZFS goodness.
 

Dotty

Contributor
Joined
Dec 10, 2016
Messages
125
I'm getting confused but just to double check you should not use the areca raid card with zfs. It will have bad performance and cause data consistency problems.

Sent from my Nexus 5X using Tapatalk
Long story short:, I was using an LSI, but I was not passing 600MB/sec, and I eventually started testing with the Areca card set as passthrough , that was my last post.
Not using Raid from the Areca card,
I will start testing the LSI today again.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Long story short:, I was using an LSI, but I was not passing 600MB/sec, and I eventually started testing with the Areca card set as passthrough , that was my last post.
Not using Raid from the Areca card,
I will start testing the LSI today again.
So you initialized each disk in the raid card BIOS as a single disk right? If that's what you did that doesn't work well with zfs. If you have to use the raid cards bios for anything your doing something wrong. You need just a plain hba card. I have a lsi 9211-8i and get 900MB/s with my array that uses we reds. I can't think of a bottleneck that is at 600MB/s.

Sent from my Nexus 5X using Tapatalk
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Dell Precision T7400 dual Xeon

That's an old Core2-based Xeons system, using an FSB and FBDIMMs. That could entirely be the original limitation.

But since you essentially switched the entire system, you haven't really isolated it other than a black-box A:B test.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I can't think of a bottleneck that is at 600MB/s.

A single SATA3 lane is roughly that (6Gbps) ... if the external drive bay was somehow presenting itself as some kind of SATA port multiplier, that could be it.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
A single SATA3 lane is roughly that (6Gbps) ... if the external drive bay was somehow presenting itself as some kind of SATA port multiplier, that could be it.
Yeah that is kind of close. 750MB/s = 6Gbps

Sent from my Nexus 5X using Tapatalk
 

Dotty

Contributor
Joined
Dec 10, 2016
Messages
125
So you initialized each disk in the raid card BIOS as a single disk right? If that's what you did that doesn't work well with zfs. If you have to use the raid cards bios for anything your doing something wrong. You need just a plain hba card. I have a lsi 9211-8i and get 900MB/s with my array that uses we reds. I can't think of a bottleneck that is at 600MB/s.

Sent from my Nexus 5X using Tapatalk
Areca has a setting for "RAID or JBOD", if you do "JBOD" it makes all HDD passthrough and present them to the OS, thats how I configured them.
(It is basically the same as setting passthrough for each HDD individually).

There is the option to do what you said, and pass the volume of each HDD to the OS, but thats not what I did.

I used the Areca card for test, just to rule out the LSI card, I know the Areca doesnt do well with FreeNAS, the SMART implementation (either on FreeNAS or on the Areca firmware) seems to not play well well with one another, but Im not testing SMART, just speed.
And actually, on speed, the Areca performed way better than the LSI (again, Areca on JBOD mode, no Raid at all)
 

Dotty

Contributor
Joined
Dec 10, 2016
Messages
125
Yeah that is kind of close. 750MB/s = 6Gbps

Sent from my Nexus 5X using Tapatalk
Like I said before, the JBOD Im using is external, but it is SATA 3 - with a Mini SAS 4X multilane , SATA 3 is 600MB/sec , therefore I should be able to see 2400 MB/sec. (thats the "4X" part)
I was able to get 1000 MB/sec with that external enclosure and 7 WD RED HDD configured as stripe , those HDD are rated around at a little more than 140MB/sec each, so 7 HDD should give you around that 1000 MB /sec I saw.
So far my only problem is that I was not able to get those numbers with an LSI card that is supposed to be SATA III and has a Mini SAS port "supposedly" rated as 4x Multilane (recommended for FreeNAS in several places).

Ive been stuck with work the last few days, I think I have to wait until the weekend to continue testing.
 
Status
Not open for further replies.
Top