Upgrading 9.1.0 -> 9.1.1 SSH transfer speed poor over 10GbE

Status
Not open for further replies.

Matt Reynolds

Dabbler
Joined
Sep 27, 2013
Messages
10
Hi all

I have two FreeNAS boxes, one is for production and the second a replica only. The production server has uses a 10Gb NIC, while the replica only a 1Gb NIC. When I was running FreeNAS 9.1.0 the transfer speed for ZFS replication was pretty much maxing out the 1Gb connection.

After upgrading both boxes to 9.1.1 my replication speed has plummeted to an average of 200Mb/sec. This appears to be something to do with transferring over SSH, as copying files over SMB I'm seeing 1Gb speeds, however when I use scp to copy files between the servers as a test it sits at 200Mb.

If I change the scp cipher to arcfour, I can push ~560Mb until it halts every time at 17% transferred with a 5GB iso I'm using as a test.

As a test, I unplugged the 10Gb port on the production server and configured the 1GB port it has on the same network card. Using the arcfour cipher I can still only get ~560Mb but it doesn't stall and transfers happily. It's not an ideal speed, but it shows that the replica server can handle the transfer (the CPU on the replica server does sit at 90% for the transfer process).

So I'm hoping you guys can point me in the right direction of where to look further. The transfer speeds were fine in 9.1.0, so I don't know what has changed with 9.1.1. Any advice on what to try would be greatly appreciated.

Server specs are...

Production Server:
CPU: Intel(R) Xeon CPU E5-2609 0 @ 2.40GHz
RAM: 32GB
NIC: Ethernet Controller 10 Gigabit X540-AT2

Replica Server:
CPU: Intel(R) Core 2 CPU 6400 @ 2.13GHz
RAM: 6GB
NIC: 82546EB Gigabit Ethernet Controller

Thanks!
Matt
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm not sure how much data you have, but let me give you some advice on RAM.

1. The minimum for ZFS is 8GB. While you can get by with less, you can expect performance and reliability problems. Many users have had a server panic and lost their pool due to insufficient RAM. So be wary. As a backup server you could probably go for less than a production server, but I still wouldn't go below 8GB of RAM for any reason. And honestly, I'd probably go with 16GB of RAM just because its not a big cost difference between 8GB of RAM and 16GB of RAM.
2. Keep in mind that there's a thumbrule of 1GB of RAM per TB of storage space. For me, I was able to get by with a little less. And my pool performed fine for quite a while below that limit. But one day out of the blue performance was horrible. Upgraded my RAM and instant performance gain back to what I used to get.

So I'll agree that 9.1.1 might be the culprit since it probably needs more RAM than 9.1.0, but I'd look at upgrading your RAM before you look too far. And depending on the size of your pool, you might need a lot more than 6GB of RAM. Unfortunately, this means you are looking at new hardware for the replica server since that CPU is limited to 8GB of RAM.

Also, your CPU is quite old. It was a good processor in 2006, but in 2013 you're really expecting much for hardware that old. The good thing is that as soon as you go to any i3/5/7 you are looking at much higher RAM limits. Even 1st gen i3s are smoking fast for ZFS if you give them sufficient RAM.

In realize you may want to dismiss RAM as a possible cause because you've found a correlation between ssh encryption method and performance, but don't be fooled by the complexity of ZFS for a moment. On the plus side of things, upgrading to any i3/5/7 will kill 2 birds with 1 stone... you're guaranteed to have a faster CPU and guaranteed to have a higher RAM limit.

Good luck.

Edit: I will say that ZFS replication is probably the fastest way to move data from 1 ZFS based server to another. It is not unreasonable to expect to hit the network limit speeds with replication. But my guess is you are expecting too much from hardware that is too old.
 

Matt Reynolds

Dabbler
Joined
Sep 27, 2013
Messages
10
Thanks cyberjock!

I was planning on increasing the RAM in the replica server next week to see if that made any difference.

Still, I still find it odd that I can sustain a better speed over 1Gb vs 10Gb. Any thoughts on that?

Edit: Also thanks for your reply. Nice to see an informative response!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Still, I still find it odd that I can sustain a better speed over 1Gb vs 10Gb. Any thoughts on that

Not specifically. When you don't have enough RAM services compete, some crash occasionally, some frequently. Basically the whole house of cards comes crashing down because things that the software doesn't anticipate happening such as out of RAM errors start happening. It's one of the reason why we hound the hell out of people in the forums when they either go very short on RAM for their pool size or hound them for a RAM test when they don't use ECC RAM. Both of those cause problems that aren't always apparent but make troubleshooting and logically trying to diagnose an issue(like what you are doing) pretty much worthless.

You haven't mentioned how big your pool is. But if its more than 10TB or so you really do need more than 8GB of RAM. The 8GB is the minimum recommended to use ZFS. I had to have 20GB of RAM to have performance that was more than about 5MB/sec on my pool. :P

How big is your pool?
 

Matt Reynolds

Dabbler
Joined
Sep 27, 2013
Messages
10
Sorry about the delayed reply!

I have a pool that's sitting at ~2TB right now. However I'm fairly convinced it's the replica server that is the problem. As you said, the more RAM the better and I think it needs some love on that front.

After some testing today, I found of one of the drives in the RAIDZ on the replica server is under performing considerably (note: I'm only using RAIDZ on the replica server, not the production server). Using diskinfo on that particular drive was returning only 20-30MB/sec transfer rate on the inside, while the other drives are double that or more on average. So I'm thinking I've jumped to a false conclusion with blaming the upgrade of FreeNAS and it's in fact a faulty drive.

I'm not sure how ZFS works, but would transfer speeds drop back to the lowest common denominator and stay at that? Or should I be seeing a more schizophrenic transfer speed as it writes data across the drives?

Also I came across one of your earlier posts about caching on hardware RAID cards harming performance for ZFS, and that you found disabling the cards cache greatly improved ZFS performance. Is that still something you recommend? Post is here: http://forums.freenas.org/threads/disable-cache-flush.12253/

Thanks muchly cyberjock! You're a wealth of info dude!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So I'm thinking I've jumped to a false conclusion with blaming the upgrade of FreeNAS and it's in fact a faulty drive.

I'd say that 99% of the time, when people upgrade they notice things they haven't noticed because they haven't been to in tune with their server and falsely blame the upgrade. The correlation leads to causation frequently, but its still fake.

I'm not sure how ZFS works, but would transfer speeds drop back to the lowest common denominator and stay at that? Or should I be seeing a more schizophrenic transfer speed as it writes data across the drives?

Well, in theory any given vdev will only perform as fast as its slowest drive. The reality is that you get to cheat if you have a system with extra RAM and the pre-fetch cache gives you good hits. But a single slow drive really can take the whole vdev with it, as you are seeing.

If you think one drive is performing abnormally poorly compared to the other disks I'd check out the SMART attributes and run SMART tests on the disk to see if its failing. Typically a drive that suddenly starts performing slowly is a failing drive.

Also I came across one of your earlier posts about caching on hardware RAID cards harming performance for ZFS, and that you found disabling the cards cache greatly improved ZFS performance. Is that still something you recommend? Post is here: http://forums.freenas.org/threads/disable-cache-flush.12253/

I'm not sure why you think that a cache can matter for your situation. You're using the on-board based on the information you've provided, and those have no cache. So there's nothing to enable or disable. External caches add alot of complexity for ZFS, so unless you have a RAID card with a cache that you haven't mentioned I won't go into much detail here. There's just too much to explain and its not likely to be your problem(look at that odd ball disk that is performing slower for now).
 

Matt Reynolds

Dabbler
Joined
Sep 27, 2013
Messages
10
Thanks cyberjock!

I'm not sure why you think that a cache can matter for your situation. You're using the on-board based on the information you've provided, and those have no cache. So there's nothing to enable or disable. External caches add alot of complexity for ZFS, so unless you have a RAID card with a cache that you haven't mentioned I won't go into much detail here. There's just too much to explain and its not likely to be your problem(look at that odd ball disk that is performing slower for now).


Sorry, I wasn't implying it had to do with my problem, just asking whether you recommend this as something to do with the current version of ZFS in general. My production server is using a PERC H800 w/1GB cache, and coming across that post of yours I thought I'd just throw the question out there into the thread. Apologies about the confusion!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ah. Well, if you are using hardware RAID, you are a bad boy. It's discussed in the FreeNAS manual that ZFS and hardware RAID shouldn't be mixed. There's performance and reliability reasons. :P
 
Status
Not open for further replies.
Top