52tb build need help

Status
Not open for further replies.

JoeV

Cadet
Joined
Sep 13, 2013
Messages
7
Ok I built two monster freenas machines for work. 24 x 3Tb Seagate Constellation SATA disk. Super micro chaise with 24 slot hot swap SAS/SATA. 2 x 256Gb enterprise SSD. 2 x intel xeon E5-2620 cpus. 98Gb ram. Emulux 10Gb ethernet adaptors.

So kind of monster machines. My performance is just awesome on the system but I can't seem to make the network part work very well. I can run some standard dd tests on the zfs pool and the performance is amazing. Im using the ssds mirrored and for cache. Using the 10Gb interfaces to server up nfs is getting me only 10MB/s speeds. I have tried auto tune on and off. Off seems to get me a little better performance. I know NFS and ZFS is kind of a big bag of tricks and trying to research how to make this performance better is frustrating because all the replys to people seem to be "ADD more RAM, ZFS loves RAM" or "Add an SSD" Well seeing as I have 256Gb SSD and almost 100Gb ram, lots of cpu cores and the cards I am using are natively supported. What am I missing? I played around with using freenas 8.3 and was able to disable sync on NFS which I know is a burst in performance but at a cost of data integrity if power fails. I upgraded to 9.1 and imported my ZFS pool then upgraded my ZFS version. Getting about the same really slow performance. Yes I am using vmware but come on 10MB/s on 10Gb connection???? The zfs seems to be performing fine on its own almost a 1GB/s write speeds and over that in read, if its local. Compression is turned off and no dedup is going on. I am trying to keep my setup very simple. I know I am leaving a bunch of stuff out and I will need to answer some other questions about my setup but here is at least a zpool status of my volume. Please let me know what I can do. Do I have poorly optimized NIC drivers? Do I need to add tuning? I did build two of these systems and did try some tests going from system1 to system2 using a cross over connection on the 10Gb cards, jumbo frames enabled and the NFS speed was a little better but still way way way less then I should be getting. I know based on the local file system benchmarks that I should be almost able to saturate the full 10Gb links. Ok well here is hopes you folks can help me. I see folks using machines with way less umph then me getting atleast 70-100 MB/s so I am hoping someone can find my silver bullet. If you need more details ask I will try to respond quickly.

pool: volume1
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Sun Aug 4 02:00:02 2013
config:

NAME STATE READ WRITE CKSUM
volume1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/85001b27-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/855f386b-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/85bccda0-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8618ed3c-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/867582a6-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/86d3d83c-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8731e68f-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/878da0ef-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/87e9258d-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8845c405-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/88a54226-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8901d5e1-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/895fc7ba-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/89bda854-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8a1bc16f-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8a7b89cb-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8adac5db-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8b3d0104-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8b9e3d11-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8bfd6826-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8c5f07c3-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
gptid/8cc04ecb-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
logs
gptid/8eb6eccf-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
cache
gptid/8e728c27-de81-11e2-9a47-0090fa1ee116 ONLINE 0 0 0
spares
gptid/8db7f36c-de81-11e2-9a47-0090fa1ee116 AVAIL
gptid/8e1afc87-de81-11e2-9a47-0090fa1ee116 AVAIL

errors: No known data errors
 

JoeV

Cadet
Joined
Sep 13, 2013
Messages
7
Sorry I should mention what I am actually using these things for. One system runs a big cifs share that is a backup target, it gets data over a WAN connection so even slower then average speeds is fine because its limited by the WAN anyways. So one machine is doing exactly what I wanted and seems to be working well. I did a lot of test failures and was very happy with how easily ZFS recovers from disk failures ext... The other machine is basically for my lab. So I am trying to use the second system for vmware and because all my vmware environment is NFS already I wanted to stick to the standard and not mix in iscsi. I really love what I can do with ZFS, snapshots, replication ext.. but the network performance is turning me a bit sour. I am use to working with netapp and other higher end storage systems and I really want this to work so I can say good bye to buying $100,000 over priced storage systems.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First, get rid of jumbo frames. if your hardware doesn't support it well(and I'm talking network infrastructure too) then performance can tank due to issues.

Second, spares are useless for FreeNAS. Don't get rid of them. Just realize they won't go online on their own.

Never used emulex, but I'd have tried for Intel cards instead. Intel's "just work" and "perform great". They are the cream of the crop for networking for FreeBSD(and linux and Windows IMO).

I'm assuming the ESXi server is different hardware? what's its specs?

what happens if you do iperf test between your FreeNAS server and ESXi or something?
 

JoeV

Cadet
Joined
Sep 13, 2013
Messages
7
The network infrastructure is cisco Nexus 5k with the vmware environment being hosted on cisco ucs blades. I had a budget and the intel cards were way out of it for this task, almost 3 times the cost of emulux. I know the spares don't go online on their own but it allows me to start a rebuild from 500 miles away so I have chosen to leave them this way. Jumbo frames are used on all of our storage networks, I have a netapp 3210 on this same network and having everything configured to use jumbo frames is recommended and is fully supported end to end. For my first test I did put a small windows 7 vm on the nfs share. Ran some of the very basic 50% read 50% write tests and was getting about 400 iops and around 10MB/s. I vmotioned the vm back to my net app and ran the same tasks and got over 20,000 iops and almost maxed out my 10Gb link. I know comparing these two systems is basically irrelevant but I mention it to show that the network equipment on my storage network is top notch. I should mention esxi is 5.0 update 1. End to end on the storage network is 10Gb.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So what happens if you setup the iperf daemon on your FreeNAS server and do an iperf test from another machine? I want to see some proof that the FreeNAS machine can actually dish out data at high data rates. Not that I don't believe that your network infrastructure isn't top notch, but I don't particularly trust anything from no-name brands(like emulex).

And the reason why the Intel is more than the emulex(my nickname for that brand is emu-sux) is that they really are that much better.

To be honest, I'm pointing at the NIC for the moment because I really really don't trust them. Been around this game before with emulex and it was the nic pretty much every time.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Holy crap. 22 drives in a single z2 vdev? That's crazy. If I'm not mistaken, vdevs that large can cause performance issues. I certainly wouldn't want to have to resilver a drive on that large of a vdev. I heard it can literally take weeks, or never finish at all.

As I'm aware, jumbo frames are required on 10 gig. Even with enterprise switches and the like, I've heard you don't get more than about 3-4 gig on standard 1500 size frames. That was with Intel 520 / 540 nics. I have no idea about emulex.
 

hotalot

Dabbler
Joined
Dec 23, 2011
Messages
19
Titan_rw is right, it's crazy to have 24 drives in only one vdev. The speed of a vdev, independently of number of disks, is the one of the slowest disk in the vdev. Seagate Constellations can maximize a gigabit card, and that's is the speed you can expect of such a vdev. To get better speeds you could setup a pool with 4 raidz2 vdev's; in this case you can expect more than 400MB/s speed. To maximize you 10GBe card you need mirrors, to test you can make a pool with 12 mirrors.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The "official" Sun documentation recommends against vdevs larger than 11 disk(RAIDZ3). That being said, I'm running an 18 disk RAIDZ3 without any problems at all. But, because I knew my loading( music and videos) I knew I wouldn't have significant issues. If I planned to do anything else with the pool, I'd definitely have issues.

Additionally, the simple fact that its 1 vdev and not 2+ is going to hinder performance(at least, I expect it won't be able to do 10Gb). I figured we'd deal with his networking issue before the others. :P
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
I was also little bit mindtroubled, when saw your 22 vdev raidz2 for data. Almost post this answer before noticed that theres someone too that thinks its crazy. You should calculate your parity percentage and faultpercentage. Right now its even 1,5 times lower than raidz1 setups (and we all know that raidz1 will fail on huge disks when the time comes..). But in bright side pool is superfast. =)
edit:
If i where you i first rip off those ssd out of machine, then i would make smaller vdev (like 5hdd raidz1) and then test lan speeds, if lan speed is as expected then its your ram (and zfs needs tweakin, or dont use ssd's), if lan speed is still slow i study emulex then and try another cables and that other machine is it even able to get those speeds from your nas.
 

hotalot

Dabbler
Joined
Dec 23, 2011
Messages
19
I was also little bit mindtroubled, when saw your 22 vdev raidz2 for data. Almost post this answer before noticed that theres someone too that thinks its crazy. You should calculate your parity percentage and faultpercentage. Right now its even 1,5 times lower than raidz1 setups (and we all know that raidz1 will fail on huge disks when the time comes..). But in bright side pool is superfast. =)
edit:
If i where you i first rip off those ssd out of machine, then i would make smaller vdev (like 5hdd raidz1) and then test lan speeds, if lan speed is as expected then its your ram (and zfs needs tweakin, or dont use ssd's), if lan speed is still slow i study emulex then and try another cables and that other machine is it even able to get those speeds from your nas.

This!
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
But in bright side pool is superfast. =)

It potentially won't be fast though.

The one huge vdev means that all 22 disks have to be 'touched' for any operation. (not technically true with variable stripe size, but bear with me). All 22 disks are 'tied together' performance wise. So even large sequential writes will be dependent on all 22 drives being 'in sync'. You always get slight variances in hard drives, even when they're the same model. With that many drives in a single vdev, you can have latency problems. Sometimes one hard drive will take slightly longer than another, and it won't always be the same hard drive that's 'lagging'.

And random read io will always be limited to the the speed of a single disk with one big vdev. You're 22 disks will have the random read performance of no more than 1 disk. And probably less, if you consider the above. Case in point, I have better luck with random read from my 2 disk mirror compared to my 11 disk Z3. The z3 will smoke the mirror in sustained sequential streaming of course, but in pure random, the mirror does better.

11 wide vdevs are the max recommended for a reason. As I said above, resilvering really large vdevs can be problematic. I remember a post on another forum where someone had to backup and redo the pool because of a failed disk in a really large vdev. The resilver was never completing, even after quite a long time.
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
Yes, it wont be potentialy fast, i should have put some other mark than smile in the end, it was ment to be what's the word. Funny or sarkasm. But nevermind you right even directory listing in 60% full pool would be pain in ass. But you can tweak little bit zfs wait times for greater devices pools, but its again its foolish and crazy also broblematic if you need write/read big or small files at random.
 

JoeV

Cadet
Joined
Sep 13, 2013
Messages
7
ok sorry for the delay, got back in the lab today. Ok so here is a basic network test, I ran a twinax cable between my backup01 and backup02 boxes. Set a basic class b network up as to not conflict with any other networks. Also as a note, backup01 is running freenas 8.3.1 and my backup02 box is on the latest stable release 9.1.0. Here are the iperf read outs. As for a large set of disks I will say before I decided to do this I did put over 15tb of stuff on the array and then pulled a drive to simulate a failure, I replaced said drive with a blank never used disk. The re-slivering process took a little over 5 hours, so Yes if the set was almost full it would take a long time but to be honest that didn't seem very long to me. I was expecting my test to take days. The data that was on there was a mix of sql db and large vmware backup sets. I have had hardware raids take 40+ hours to rebuild. (not good but depending on the data ect.. it might be ok).

[root@backup01] ~# iperf -c 172.16.0.1 -i 1 -t 60
------------------------------------------------------------
Client connecting to 172.16.0.1, TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.16.0.2 port 14881 connected with 172.16.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 1000 MBytes 8.39 Gbits/sec
[ 3] 1.0- 2.0 sec 923 MBytes 7.74 Gbits/sec
[ 3] 2.0- 3.0 sec 886 MBytes 7.44 Gbits/sec
[ 3] 3.0- 4.0 sec 905 MBytes 7.59 Gbits/sec
[ 3] 4.0- 5.0 sec 930 MBytes 7.80 Gbits/sec
[ 3] 5.0- 6.0 sec 947 MBytes 7.95 Gbits/sec
[ 3] 6.0- 7.0 sec 955 MBytes 8.01 Gbits/sec
[ 3] 7.0- 8.0 sec 943 MBytes 7.91 Gbits/sec
[ 3] 8.0- 9.0 sec 927 MBytes 7.78 Gbits/sec
[ 3] 9.0-10.0 sec 908 MBytes 7.62 Gbits/sec
[ 3] 10.0-11.0 sec 884 MBytes 7.41 Gbits/sec
[ 3] 11.0-12.0 sec 835 MBytes 7.00 Gbits/sec
[ 3] 12.0-13.0 sec 827 MBytes 6.94 Gbits/sec
[ 3] 13.0-14.0 sec 872 MBytes 7.31 Gbits/sec
[ 3] 14.0-15.0 sec 896 MBytes 7.51 Gbits/sec
[ 3] 15.0-16.0 sec 918 MBytes 7.70 Gbits/sec
[ 3] 16.0-17.0 sec 934 MBytes 7.84 Gbits/sec
[ 3] 17.0-18.0 sec 885 MBytes 7.42 Gbits/sec
[ 3] 18.0-19.0 sec 883 MBytes 7.41 Gbits/sec
[ 3] 19.0-20.0 sec 886 MBytes 7.44 Gbits/sec
[ 3] 20.0-21.0 sec 882 MBytes 7.40 Gbits/sec
[ 3] 21.0-22.0 sec 811 MBytes 6.81 Gbits/sec
[ 3] 22.0-23.0 sec 860 MBytes 7.22 Gbits/sec
[ 3] 23.0-24.0 sec 842 MBytes 7.07 Gbits/sec
[ 3] 24.0-25.0 sec 824 MBytes 6.91 Gbits/sec
[ 3] 25.0-26.0 sec 811 MBytes 6.80 Gbits/sec
[ 3] 26.0-27.0 sec 807 MBytes 6.77 Gbits/sec
[ 3] 27.0-28.0 sec 814 MBytes 6.83 Gbits/sec
[ 3] 28.0-29.0 sec 823 MBytes 6.90 Gbits/sec
[ 3] 29.0-30.0 sec 880 MBytes 7.38 Gbits/sec
[ 3] 30.0-31.0 sec 923 MBytes 7.74 Gbits/sec
[ 3] 31.0-32.0 sec 910 MBytes 7.63 Gbits/sec
[ 3] 32.0-33.0 sec 702 MBytes 5.89 Gbits/sec
[ 3] 33.0-34.0 sec 859 MBytes 7.21 Gbits/sec
[ 3] 34.0-35.0 sec 870 MBytes 7.30 Gbits/sec
[ 3] 35.0-36.0 sec 864 MBytes 7.25 Gbits/sec
[ 3] 36.0-37.0 sec 848 MBytes 7.11 Gbits/sec
[ 3] 37.0-38.0 sec 826 MBytes 6.93 Gbits/sec
[ 3] 38.0-39.0 sec 817 MBytes 6.85 Gbits/sec
[ 3] 39.0-40.0 sec 819 MBytes 6.87 Gbits/sec
[ 3] 40.0-41.0 sec 812 MBytes 6.81 Gbits/sec
[ 3] 41.0-42.0 sec 820 MBytes 6.88 Gbits/sec
[ 3] 42.0-43.0 sec 870 MBytes 7.30 Gbits/sec
[ 3] 43.0-44.0 sec 880 MBytes 7.39 Gbits/sec
[ 3] 44.0-45.0 sec 896 MBytes 7.52 Gbits/sec
[ 3] 45.0-46.0 sec 881 MBytes 7.39 Gbits/sec
[ 3] 46.0-47.0 sec 914 MBytes 7.67 Gbits/sec
[ 3] 47.0-48.0 sec 873 MBytes 7.32 Gbits/sec
[ 3] 48.0-49.0 sec 866 MBytes 7.26 Gbits/sec
[ 3] 49.0-50.0 sec 848 MBytes 7.11 Gbits/sec
[ 3] 50.0-51.0 sec 835 MBytes 7.00 Gbits/sec
[ 3] 51.0-52.0 sec 825 MBytes 6.92 Gbits/sec
[ 3] 52.0-53.0 sec 834 MBytes 7.00 Gbits/sec
[ 3] 53.0-54.0 sec 919 MBytes 7.71 Gbits/sec
[ 3] 54.0-55.0 sec 1.01 GBytes 8.72 Gbits/sec
[ 3] 55.0-56.0 sec 849 MBytes 7.12 Gbits/sec
[ 3] 56.0-57.0 sec 851 MBytes 7.14 Gbits/sec
[ 3] 57.0-58.0 sec 870 MBytes 7.30 Gbits/sec
[ 3] 58.0-59.0 sec 882 MBytes 7.40 Gbits/sec
[ 3] 59.0-60.0 sec 886 MBytes 7.43 Gbits/sec
[ 3] 0.0-60.0 sec 51.2 GBytes 7.32 Gbits/sec
 
Status
Not open for further replies.
Top