Creating a Set of Benchmarks

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Alot of people don't understand this logic, but I think its a very effective way to look at things...

When I am trying to decide how something "might work" I always ask myself "what's the absolute smartest way to do it?" To an extent, its more of a reflection of "How would I do it" but it doesn't matter. If you design it to be the smartest design you'll figure out one of 3 things:

1. They designed it that way(woohoo). Easiest to deal with.
2. They didn't design it that way on purpose(in which case I know I have a knowledge gap.. so I start seeking out what I don't know).
3. They didn't design it that way on accident. Usually this means either you out-smarted them because of new technologyor they outsmarted you with something superior to your design.

Actually, I just realized that this is slightly modified from Albert Einstein's insight into how God created the universe. He said "If I were God how would I design the universe?" and went from there. Quite interesting indeed!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
In his RAID-Greed article, he has almost convinced me to use eight-2 disk mirrors.

My problem with mirrors is that you have only 2 "copies" of your data. With a RAIDZ2 you technically have 3 "copies". I'm not a big fan of RAIDZ1 for the same reason. Let me explain before I get hung with a noose...

With a mirror if a file's data gets corrupted due to a bad sector(or a disk is lost) then ZFS simply goes to the other disk. No problem. But what if there's a bad sector(or worse, your mirror starts failing)? You may be in for trouble. This also applies to those pesky hard drive statistics for non-recoverable read errors per bit. WD Greens are <1 in 10^14 bits. Which puts it at about 11,641GB(11.6TB). That's getting alarmingly high for how big hard drives get. If you bought a 5TB drive you can EXPECT that statistically you have a 50% chance that a sector on your drive is good but you had an error!

With a RAIDZ2 if a file's data gets corrupted due to a bad sector(or a disk is lost) then ZFS uses parity data. No problem. But what if there's a bad sector(or worse, a second disk starts failing? Still no problem! You have a spare parity still! Same for the non-recoverable read errors per bit.

To me, if I only need space for 1 or 2 disks, I go with a mirror. That makes the most sense. But if I'm going to deal with more than 4 disks it seems logical to go to RAIDZ2 for the extra protection(not to mention the extra space saved). Of course, if performance is actually an issue then you may be stuck paying for mirrors.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411

bollar

Patron
Joined
Oct 28, 2012
Messages
411
The dd without compression yielded write of 320MB/sec & read of 366MB/sec
Code:
[root@freenas] /mnt/bollar/temp# dd if=/dev/zero of=tmp.dat bs=2048k count=50k
51200+0 records in
51200+0 records out
107374182400 bytes transferred in 320.022525 secs (335520703 bytes/sec)
[root@freenas] /mnt/bollar/temp# dd of=/dev/zero if=tmp.dat bs=2048k count=50k
51200+0 records in
51200+0 records out
107374182400 bytes transferred in 279.758637 secs (383810071 bytes/sec)

After I reconfigured the machine by removing a processor and 16G of RAM and replaced the two RAIDz2 Vdevs with with eight striped mirrors, I got a substantial improvement, as you would expect:

443MB/sec write & 747MB/sec read
Code:
[root@freenas] /mnt/bollar/temp# dd if=/dev/zero of=tmp.dat bs=2048k count=50k
51200+0 records in
51200+0 records out
107374182400 bytes transferred in 231.077620 secs (464667164 bytes/sec)
[root@freenas] /mnt/bollar/temp# dd of=/dev/zero if=tmp.dat bs=2048k count=50k
51200+0 records in
51200+0 records out
107374182400 bytes transferred in 137.032585 secs (783566788 bytes/sec)


Otherwise, though, this system is now real-world limited by processor, RAM and maybe ZIL. The best I have been able to coax out of AFP loads is 150MB/sec. I have been able to do better with NFS, but that's not a load we typically use here. The speed is actually adequate, but since this box isn't in production yet, I'm still able to to do some tuning.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So I'll say this. You've probably learned alot by all of your experimenting and playing with things. I love that "stage" of a new project I start working on. But, from my first post:

I think a dd is about the best you're going to get for raw ZFS performance. For instance, my main server has something like 450MB/sec with a dd test with 2 vdevs of 8 drives in a RAIDZ2(Note: I didn't follow the n^2 parity thumbrule because we already had these drives). Since I have 2x1Gb Intel NICs, even maxing out both would yield only 266MB/sec. So technically, my CPU is far more powerful than needed for any expected load. But if you throw in trying to do a scrub WHILE using the system, that can affect performance too. I typically don't try to do heavy loading when I know a scrub is going on.

I would say that having a dd test that is 200%+ above your maximum possible network bandwidth is optimal. This allows for delays due
to seek times and hopefully still yield the highest network performance.

So now you see why in the big picture, as long as you get get fast enough speeds to hit your networking limit that's about the best you can do without a heavy analysis of your data type. I'd say that unless you are running all iscsi machines, using the machine strictly for backups, or a database you are probably better of with some "standard" configuration of mirrors, RAIDZ or RAIDZ2. Trying to get more performance with more disks, less redundancy, or ZIL/ARCs is completely pointless unless you have a deep understanding of ZFS. Id bet you might have a deep understanding if you started reading the code for ZFS in 2-3 years. Yes, that deep.

So congrats on your experiment, I really hope you had fun(I always do). And now you are filled with a bunch of knowledge that possibly simplifies down to "faster than the NIC is all that matters". LOL
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
So I'll say this. You've probably learned alot by all of your experimenting and playing with things. I love that "stage" of a new project I start working on.

I do like this part -- and I also like to measure my results with the real-world and theoretical tests.

So congrats on your experiment, I really hope you had fun(I always do). And now you are filled with a bunch of knowledge that possibly simplifies down to "faster than the NIC is all that matters". LOL

Alas, I'm not yet faster than the NICs, I'm just faster than my load requirements!

Look at this example:

I have four Intel NICs bonded in 802.3ad, eight Mac clients, various minimal traffic Linux & Unix clients and the two Freenas boxes.

- The FreeNAS boxes are doing a replication task in igb3. Completely unloaded with other traffic, they do this task at an extremely stable 800Mbps.
- If you turn on AFP and start some Time Machine backups, it drops to a reasonably steady 600Mbps. This Time Machine traffic appears to be on igb0.
- If I introduce some large file reads from multiple clients, I can saturate a NIC (igb2), but the replication task throughput drops to 300Mbps. I do need to understand why some of those clients weren't on igb0 or igb1.

In the meantime in this scenario, system loads increase to ~5, so we know that's part of the limiting factor.

I have a little time left, so my next questions to answer will be around the NICs -- why loads from three different clients are all on the same NIC; why I never have traffic on igb1 (FreeNAS and router both show the port as up)

And maybe if I get bored over the holiday break, we'll see what happens if we reintroduce processor, RAM & ZIL to the equation.

 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
LACP doesn't work quite like that. There's quite a few forum posts of people that posted the same question, and the answers are almost always the same.

I will say that I learned this lesson the hard way. Bought the equipment and then found out I was all messed up. Thankfully I was using it at home so I was out the money but gained experience.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Also, do you REALLY expect to max out all of your NICs at the same time? If the answer is yes and you don't want to be bottlenecked you should have at least 3 vdevs, possibly 4+.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
LACP doesn't work quite like that. There's quite a few forum posts of people that posted the same question, and the answers are almost always the same.
I will say that I learned this lesson the hard way. Bought the equipment and then found out I was all messed up. Thankfully I was using it at home so I was out the money but gained experience.
How do you think I think it works?

I'm well aware that any client is assigned one NIC and can't have throughput in excess of the NIC's capacity. But that doesn't explain the situation where I have a NIC that isn't used at all, for several days now -- even though it's marked as UP on the switch and in FreeNAS. That needs further investigation.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Also, do you REALLY expect to max out all of your NICs at the same time? If the answer is yes and you don't want to be bottlenecked you should have at least 3 vdevs, possibly 4+.

Well, gee. In the message you're replying to, I already said I'm using eight mirrored, striped vdevs.

Aside from the NIC issue I already know I need to understand more, I think I'm CPU bound under these conditions as much as than anything:


freenasread.png


freenascpu.png


freenasload.png
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
But that doesn't explain the situation where I have a NIC that isn't used at all, for several days now -- even though it's marked as UP on the switch and in FreeNAS. That needs further investigation.

The answer to this question was fortunately the obvious one: Faulty cabling. I've now run a replacement cable and I'm rerunning some tests. Preliminarily sitting at 2.0Gbps/sec and I'm also reaching the limit that the Mac clients I have for test can push, so I'm not sure I can do any better -- at least not in the test environment.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, gee. In the message you're replying to, I already said I'm using eight mirrored, striped vdevs.

Yes, but there is also a big difference between 8 vdevs mirrored and 8 drives on a RAID0, then mirrored. You didn't really specify which, although I'd assume the first one, but the second one is probably better for performance reasons. Also alot of people will completely fail to differentiate between the 2 situations.

I just saw you also said you have a bad network cable. That can definitely cause..problems. I saw that igb1 was only getting up to 70Mbit, so I was wondering if it was being limited to 100Mbit connection or what.

So what's your new top networking speeds with the working cable?

Now that I'm in front of a keyboard, I'll explain a little from my last post. AFAIK FreeNAS can receive data from as many ports as you setup for lagg. But FreeNAS can only send data from 1 single ethernet port. In essence, you can't send more than 1Gb/sec of traffic out. At least, that was my understanding of it. When I tried lacp at home it flopped for me. After doing alot of research I came to the conclusion that I wanted my dual NIC to be 2 different IPs for home use. I've got a list of 2-3 topics I'd love to put into a presentation. I just don't have the knowledge to write them and expect 100% accuracy.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Yes, but there is also a big difference between 8 vdevs mirrored and 8 drives on a RAID0, then mirrored. You didn't really specify which, although I'd assume the first one, but the second one is probably better for performance reasons. Also alot of people will completely fail to differentiate between the 2 situations.

I just saw you also said you have a bad network cable. That can definitely cause..problems. I saw that igb1 was only getting up to 70Mbit, so I was wondering if it was being limited to 100Mbit connection or what.

So what's your new top networking speeds with the working cable?

Now that I'm in front of a keyboard, I'll explain a little from my last post. AFAIK FreeNAS can receive data from as many ports as you setup for lagg. But FreeNAS can only send data from 1 single ethernet port. In essence, you can't send more than 1Gb/sec of traffic out. At least, that was my understanding of it. When I tried lacp at home it flopped for me. After doing alot of research I came to the conclusion that I wanted my dual NIC to be 2 different IPs for home use. I've got a list of 2-3 topics I'd love to put into a presentation. I just don't have the knowledge to write them and expect 100% accuracy.

Actually, igb1 was 70 bps, not Mbps, so you're absolutely right -- problems.

So, for performance: I was able to get 2.0Gbps from FreeNAS to three Macs, each reading a separate 50-60G file. Sending the same files back from Macs to FreeNAS, I got 1.2Gbps. Two of the Macs were getting reported read speeds in excess of 1Gbps and the laptop about half that. Writes were reported slower on all machines. Writing caused huge CPU and load spikes on the NAS, so there's probably an opportunity to improve that. This test was using AFP, but I got 90-95% of that on iSCSI.

Now, the important bit was that I only need to exceed Firewire 800 performance for an enduser win -- and we were only getting 50-60Mbps on Firewire. We've tried some video editing using Final Cut Pro via iSCSI on the NAS and it's very crisp compared to what we're used to.

As for the lagg, FreeNAS was transmitting at 2.0Gbps across the four NICs -- my understanding of 802.3ad is that the server can give up to NIC speed to each client, with some other limitations. But as you already know, there are some significant caveats to this and for many users it just doesn't make sense.
 
Status
Not open for further replies.
Top