SOLVED Problem copying large file onto FreeNAS

Status
Not open for further replies.

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Hi, everyone!

I am running my FreeNAS 9.3-server since a few days before the 9.3-release and never had problems copying files (1 Windows 8.1 Pro client (64-bit)). However, I never seemed to exceed 600Mbit/sec. I did some measurements with iperf and always hit the 600Mbit/sec-barrier. The problem was the realtek-networkchip on the client, which I switched with a Intel Gigabit CT network card. Now, I hit 950Mbit/sec, which is really fine.

-----------------
[ 17] 0.0-102.2 sec 1.13 GBytes 95.0 Mbits/sec
[ 7] 0.0-102.2 sec 1.13 GBytes 94.9 Mbits/sec
[ 8] 0.0-102.2 sec 1.13 GBytes 94.9 Mbits/sec
[ 9] 0.0-102.2 sec 1.13 GBytes 94.9 Mbits/sec
[ 10] 0.0-102.2 sec 1.13 GBytes 94.9 Mbits/sec
[ 11] 0.0-102.2 sec 1.13 GBytes 94.9 Mbits/sec
[ 12] 0.0-102.2 sec 1.13 GBytes 94.9 Mbits/sec
[ 15] 0.0-102.2 sec 1.13 GBytes 94.9 Mbits/sec
[ 13] 0.0-102.2 sec 1.13 GBytes 94.9 Mbits/sec
[ 14] 0.0-102.2 sec 1.13 GBytes 94.9 Mbits/sec
[SUM] 0.0-102.2 sec 11.3 GBytes 949 Mbits/sec
-----------------

So far, so good. If I try to copy a large file (230GB .vhdx-container file), the write preformance usually is around 100MB/sec, but drops occasionally to 20-30MB just to ride again a bit later. After some time, it drops to zero with the hard-drives in my server working (harddrive-LED lit and you can hear it). After some time, Windows cancels the transfer with an unknown error (I think, this is because the server doesn't accept new data until a timeout is reached).

I am running a RAIDZ1 with 4 * 2TB drives (Samsung Spinpoint).

The pool is fine:
-----------------
pool: brain
state: ONLINE
scan: scrub repaired 0 in 7h10m with 0 errors on Mon Feb 16 08:11:00 2015
config:

NAME STATE READ WRITE CKSUM
brain ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/ccfbb34a-7dff-11e4-8be5-2c4138a81e5e ONLINE 0 0 0
gptid/cd89f054-7dff-11e4-8be5-2c4138a81e5e ONLINE 0 0 0
gptid/ce0f6c16-7dff-11e4-8be5-2c4138a81e5e ONLINE 0 0 0
gptid/ceb6462b-7dff-11e4-8be5-2c4138a81e5e ONLINE 0 0 0

errors: No known data errors
-----------------
The debug.log shown nothing at all (except the usual autosnap-messages).

Smartctl -a shows no errors (here is one of my drives, the others look the same):
-----------------
SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 21305 -
# 2 Short offline Completed without error 00% 21137 -
# 3 Short offline Completed without error 00% 20969 -
# 4 Short offline Completed without error 00% 20801 -
# 5 Short offline Completed without error 00% 20633 -
# 6 Short offline Completed without error 00% 20465 -
# 7 Short offline Completed without error 00% 20297 -
# 8 Short offline Completed without error 00% 20130 -
# 9 Short offline Completed without error 00% 19962 -
#10 Short offline Completed without error 00% 19794 -
#11 Short offline Completed without error 00% 19626 -
#12 Short offline Completed without error 00% 19334 -
#13 Short offline Completed without error 00% 18906 -
-----------------

Any ideas, what my problem might be? My hardware-specs are in the footer.
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
I tried to do some screenshots. In the screenshot 1, you can see the stable file transfer (in the left half of the screenshot) getting a bit slower (right hald of the picture). In the screenshot 5, you can see it collapsing completely.
 

Attachments

  • Durchsatz1.PNG
    Durchsatz1.PNG
    14.7 KB · Views: 611
  • Durchsatz5.PNG
    Durchsatz5.PNG
    23.3 KB · Views: 633

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Okay, I think, I solved the problem. It wasn't the server, it was the Intel Gigabit CT. With the onboard Realtek-chip, it works.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Go into your network settings and limit or turn off the following:
- Interupt Moderation Rate.
- Flow control.
- Adaptive Inter-Framing Spacing.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What kind of Xeon system ends up with a Realtek onboard? :confused:
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
I just re-read the OP's original message.

"The problem was the realtek-networkchip on the client" - I didn't see it the first time either.

What kind of Xeon system ends up with a Realtek onboard? :confused:
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Go into your network settings and limit or turn off the following:
- Interupt Moderation Rate.
- Flow control.
- Adaptive Inter-Framing Spacing.
Thank you, I will try this when I get home. Sound promising! :smile:
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
I just re-read the OP's original message.

"The problem was the realtek-networkchip on the client" - I didn't see it the first time either.
I should have been more clear on that, sorry.

The server has an Intel PRO/1000 network-chip onboard and my client has a Sabertooth X58-mainboard with a Realtek.chip.
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Go into your network settings and limit or turn off the following:
- Interupt Moderation Rate.
- Flow control.
- Adaptive Inter-Framing Spacing.
Thank you very much for your input. I turned off these three functions, which led to about 40MB/Sek. throughput, but not more stability.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Can you tell us more about both your setup?
For instance is the copy from your PC to Freenas? How is your pool/dataset configured? Any type of compression, deduplication, replication on Freenas?
What is the capacity of the pool?
Are you using Tunables?
Anything we should know about network setup? Jumbo frames, QoS...?
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Of course, here is some information about my setup:

The client is a Windows 8.1Pro 64bit machine with a 500GB Samsung SSD, 12GB DDR3-Ram(1600MHz) and an ASUS Sabertooth X58 mainboard (Intel chipset, Realtek NIC). The CPU is an Intel i7, 3,2GHz Quadcore. Patch-level is up to date.

Cables are Cat6

The switch is an ASUS RT-AC66U (Dualcore Intel Atom, 4 Gigabit LAN-ports, 1 Gigabit WAN-port), firmware is up to date.

There are no futher switches in between. iperf shows consistent 950Mbit/sec.

Server is a HP Z210-workstation with a 3,4GHz Quadcore Xeon (E3-1270), 16GB ECC-Ram and 4 SpinPoint-drives (2TB, 5400rpm, 3,5-inch) in a RAIDZ1. Scrub is clear, so is the SMART data. Server is release 9.3, latest update.
The pool is about 61% filled, no hot spare, 2TB free space. The Performance test shows about 240-300MB/Sek, scrubbing is about at the same speed.
I'm not using any tunables, not Jumbo frames, no QoS. IP-adresses are all DHCP with the DHCP-server handing out the same adress to Freenas all of the time. IP-adresses are all in the 192.168.2.x-range.
Ping-times are all <1ms.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Of course, here is some information about my setup:

The client is a Windows 8.1Pro 64bit machine with a 500GB Samsung SSD, 12GB DDR3-Ram(1600MHz) and an ASUS Sabertooth X58 mainboard (Intel chipset, Realtek NIC). The CPU is an Intel i7, 3,2GHz Quadcore. Patch-level is up to date.

Cables are Cat6

The switch is an ASUS RT-AC66U (Dualcore Intel Atom, 4 Gigabit LAN-ports, 1 Gigabit WAN-port), firmware is up to date.

There are no futher switches in between. iperf shows consistent 950Mbit/sec.

Server is a HP Z210-workstation with a 3,4GHz Quadcore Xeon (E3-1270), 16GB ECC-Ram and 4 SpinPoint-drives (2TB, 5400rpm, 3,5-inch) in a RAIDZ1. Scrub is clear, so is the SMART data. Server is release 9.3, latest update.
The pool is about 61% filled, no hot spare, 2TB free space. The Performance test shows about 240-300MB/Sek, scrubbing is about at the same speed.
I'm not using any tunables, not Jumbo frames, no QoS. IP-adresses are all DHCP with the DHCP-server handing out the same adress to Freenas all of the time. IP-adresses are all in the 192.168.2.x-range.
Ping-times are all <1ms.

The AC66U has a MIPS processor, not an Atom. It's also irrelevant, since the switching is done in hardware.
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
i would direct connect the FN box and your client for a small test. just to be sure.
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
I did, same outcome.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Can you run "gstat" via a ssh session to see if the hard drives are maxing out during copy?

Code:
gstat -f /


as well as checking CPU usage:

Code:
top

Once displaying top processes, press the "Shift + P" keys to display CPU threads.

Are there any of the "idle" counts dropping close to 0%

You didn't say whether the copy was from client to Freenas. Copying a 230GB files to or from the SSD out of a 500GB SSD doesn't leave much room.
Do you have the latest SSD firmware? I think last year Samsung did release some update for their PRO series, where performance could be affected during reads due to wear leveling bugs.
Are you using deduplication on any of your dataset?
What does the ZFS ARC graph looks like under Reporting, during transfer?
Can you also monitor activity on your client machine, such as HDD throughput/latency and CPU/thread usage?
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Thank you for all the great ideass, I just did exactly what you proposed.

I forgot to mention one thing. The client has a 500GB SSD for the operating system and an external HDD (2TB seagate) connected via USB 3.0. I'm copying the huge file from the external drive to the FreeNAS-server. The SSD was bought after the firmware problem was corrected, I downloaded the Samsung tool, which confirmed that the firmware is not affected and I'm not having any problems with the SSD. But a good idea! :smile:

I made you two screenshots, both showing the gstat and top-output as well as the windows taskmanager (with HDD activity of the external drive, CPU activity of the client) and the windows copying-window with the current data rate.

In the first screenshot, you see the beginning of the transfer, where the data rate is close to 100MB/Sec, all the server-CPUs are between 20% and 40% busy and the gstat output shows 30-40% disk activity. The values are staying like this during the "ok-phase" of the file transfer. on the client-side, the disk activity is around 40%, and CPU-activity stays at around 5-10%.

In the second screenshot (transfer dropping to 0), the client-CPU-activity drops to around 2%, HDD disc-activity drops to around zero and on the server, all CPUs are nearly idle. Server disc activity drops to around 20% and a bit later to around 0. But the server disc latency numbers never get higher than 50ms under load and 10ms when the transfer is stalling.

One funny thing is the arc-cache. The cache size drops from 11GB to around 1.5GB. I attached a screenshot of that, too.

I still think, it's a client driver issue. One thing, I noticed is that when the transfer rates are high, the mouse cursor doesn't move as smoothly, as it's doing normally. Also, when saving the screenshots, the "save as"-window took around 10 seconds to open, which isn't normal.

I also tried to turn off my antivirus-program (Norton Antivirus), which didn't change anything.
 

Attachments

  • ok.JPG
    ok.JPG
    402.2 KB · Views: 703
  • Not ok.JPG
    Not ok.JPG
    398.7 KB · Views: 730
  • arc.JPG
    arc.JPG
    33.3 KB · Views: 577

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
I agree with your assessment, I don't think Freenas is the issue. Could be an issue with Samba though, but nothing seems to point there either.
I think the issue is with your client.
The mouse not moving is probably caused by USB2.0 sharing the same PICe lane as the one used by the USB3.0 hard drive. That or your motherboard has a USB hub connecting both mouse and HDD. Not an issue I think. However the mouse is supposed to interface with Windows via the USB 2.0 Interrupt transfer mode, which must occur every 125us and is therefore supposed to be guaranteed, while HDD transfer is done via Bulk transfer and can account for up to 80-90% of the USB2.0 bandwidth, but here HDD transfer is done through USB3.0 which is a full duplex over 5Gbit/s link. The processor, hub or any other chipset/IC manipulating those link may be stretch too thin, or Windows is having some resource issues.

The ARC dropping graph is not the issue. What I think is happening is that Freenas is trying to cache the data and during the copy process flush the old data and replaces it with the new one. However, when copy is interrupted, the content of the cache doesn't point to any existing and complete set of data so the cache is flushed.
I have same ARC behavior on my system, but no drop in transfer.
I suspect your USB3.0 is the cause of the problem, possibly fighting for some resources with the SSD.
Can you check whether the USB3.0 device is not resetting for some reason?
Can you check error logs in Windows 8? What do you find under "Event Viewer", anything errors or warning Windows would detect?
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
I think, you might be on to something there. The external drive may indeed be the problem. I initially ruled it out, because the drive activity was way below 100%, but the USB-controller might indeed be fighting with the NIC or confusing windows. I tried to copy large files from my SSD (large being 1 GB) and it went smoothly. Then I was way late for work, so I had to take off. I have a 30GB VMWare-Image, I will take home from work and copy that to my SSD. I'll keep you updated when I get home.

Thanks for all your effort and advice! :smile:
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Okay, I tried to copy a 30GB-file from a USB 3.0 stick (260MB/Sec read, 170MB/sec write to/from the SSD) as well as from the internal SSD, but the problem remains.

The windows Event Viewer shows one warning related to network:
Event-ID 27 from e1iexpress. Windows doesn't have a textual message for that ID.

A quick google search showed: http://www.eventid.net/display-eventid-27-source-e1iexpress-eventno-11300-phase-1.htm
So that means "Network link disconnected". But, since this message appears once every wake-up from Stand-by, I don't think, this has anything to do with my errors.

By the way: Copying a large file from one dataset to another with the Windows client (so, copying from one share to the other), works flawlessly at about 82MB/sec.

I just realized, I didn't answer an earlier question: I don't use deduplication on the dataset, I'm copying the bis image files to/from and only use the standard lz4 compression on all datasets.
 
Status
Not open for further replies.
Top