Poor ZFS performance with directories that have huge numbers of files in them

Status
Not open for further replies.

mikesm

Dabbler
Joined
Mar 20, 2013
Messages
36
Hi. I am converting over from a hardware raid solution running on windows server 2008 to FreeNas with a couple 8x2TB arrays. One array is built of WD20EADS drives in RAIDZ2, and the other is built from HGST 5K3000's, all hooked up via multiple M1015 HBA's flashed in IT mode, and the local MB LSI HBA. The system is built around a Supermicro X3ST3-F motherboard with an i7 950 CPU and 24GB of RAM. Dual Intel onboard Gig-E controllers connect it to the rest of the home network. It's not virtualized, but running on the bare metal.

The primary purpose of the system is media storage and streaming. It will hold (like the system before it) ripped music and videos, and is also the backend store for a SageTV DVR, with about 7000 recordings.

The system is used CIFS performance is pretty decent, 50-60 MB/s, though I really would like to see it go at line rate. But the biggest gap is on directory performance. Anytime I open one of the DVR directories, which might have 4000 recordings, each with an associated properties file and .edl file (which tells the system where the commercials have been detected for autoskip), it takes >30 secs to do a directory list, and interactions with the filesystem seem sluggish because of slow directory lookups.

Is there something I can do about this? The windows server system did not have this sort of problem, and directory operations were really fast. Is this a SAMBA issue or a ZFS issue, or something else? I have the CIFS server configured with AIO set to 2096, large file RW on, as well as DOS file attributes, etc... It's really a problem, and I may go back to hardware raid and windows server if I can't get it resolved. Any ideas?

My apologies if I missed something obvious - this is my first FreeNAS build.

Thanks in advance!
Mike
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Did you look at the STICKY in the Sharing section of the forum?

I read recently in the Samba documentation that using AIO with Windows Vista and newer is not recommended.

I know that using some firewall/antivirus software on your workstations can cause slow directory listings.

Also enabling powerd on my server caused some latency issues with getting the directory listing. My server seemed to be too aggressive at going to sleep and doesn't always wake up to provide me with my directory listing. But if I was already doing something that kept the CPU from being in its lowest power states then the directory listing would list in a reasonable time frame.
 

mikesm

Dabbler
Joined
Mar 20, 2013
Messages
36
Thanks for the link! I apologize for missing it!

I updated the configuration to protosd's as described in post 7 of the thread. This made the performance much better, ESP for copies across the gigabit network, which now run at near wirepseed. Directory browsing is fast, at least for the first access to the directory when the file system is quiet.

But I am still seeing some issues, esp where 2-3 DVR (max 19 mbps) streams are being written or read. In that case, I still see some long delays when trying to browse a large directory in freenas via cifs. I have 32 GB in the system and the reporting indicates I have plenty of ram. Is the read ahead prefetching causing a problem for directory browsing operation? Any other reason I am seeing this behavior? I assume its not normal?

How long does ZFS hold onto directory information in ram before flushing it?


Thx
Mike
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Is the read ahead prefetching causing a problem for directory browsing operation?
The pre-fetch generally causes significant performance increase. I wouldn't expect it to cause the slowdown you are speaking of. How "long" is the slowdown? Are we talking 2 seconds or 20 seconds?

Any other reason I am seeing this behavior? I assume its not normal?
It may be normal for your configuration. That would be something you'd have to figure out for yourself unfortunately. Its definitely not "normal" for the servers I've built. I do know that if you have this problem on Windows and you use a Linux machine on the network, my problem magically went away(not sure why.. just an observation). I have no idea what Linux does different from Windows, but I sometimes wondered if Windows is doing something not expected and that was some of the problem.

How long does ZFS hold onto directory information in ram before flushing it?
It holds onto whatever information it can fit into its cache. Since you have 32GB it would depend on how much reading and writing you are doing to the server.
 

mikesm

Dabbler
Joined
Mar 20, 2013
Messages
36
Well, it turns out that I don't think this problem is a ZFS problem at all. I logged into the server with the shell, and issued the following command:

time ls -l -R /mnt/* > /dev/null

It took about 5 secs to traverse all the directories with tons of files in them, but I don't see this kind of speed via CIFS. So it appears the issue is somehow related to how samba serves up this directory information to the windows clients, and not the core file system itself.

Given I have applied protosd's tweaks to CIFS already, what more can I do to track down the slowness in browsing large directories?

Thanks!
Mike
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
Well, it turns out that I don't think this problem is a ZFS problem at all. I logged into the server with the shell, and issued the following command:

time ls -l -R /mnt/* > /dev/null

It took about 5 secs to traverse all the directories with tons of files in them, but I don't see this kind of speed via CIFS. So it appears the issue is somehow related to how samba serves up this directory information to the windows clients, and not the core file system itself.

Given I have applied protosd's tweaks to CIFS already, what more can I do to track down the slowness in browsing large directories?

Thanks!
Mike
Hi

I see also see a lot faster shell command. Maybe CIFS is sorting the output and is not doing it the most efficient way.

Alain
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Hi

I see also see a lot faster shell command. Maybe CIFS is sorting the output and is not doing it the most efficient way.

Alain

Unless you loaded the modules, Samba doesn't sort the directory information. The sorting is down on your workstation.
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
Unless you loaded the modules, Samba doesn't sort the directory information. The sorting is down on your workstation.

Yes, but there's something strange going on. A "ls -alR /mnt/* >/dev/null" is an order of magnitude faster that a simple dir (in a 20k+ entry dir) from a windows client. Even if a "ls ..." has just ran on the server and I hear the disks working...

I did the suggestions on the "standard" advice thread except running a cron job, but I will add this now.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
This is a pointless comparison. There are at least a couple of threads where the very same logic was argued and it's not valid. It's like comparing apples and golf balls. "ls" is a local unix command, and dir is on Windows, and you have a bunch of Samba crap in between sending stuff over the network. You cannot use "ls" to compare, it's just not the same.
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
This is a pointless comparison. There are at least a couple of threads where the very same logic was argued and it's not valid. It's like comparing apples and golf balls. "ls" is a local unix command, and dir is on Windows, and you have a bunch of Samba crap in between sending stuff over the network. You cannot use "ls" to compare, it's just not the same.

I know it's not the same, but the difference is huge. It also makes it plausible that it's not an zfs "problem", but more linked to CIFS or samba (the specific implementation).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I know it's not the same, but the difference is huge. It also makes it plausible that it's not an zfs "problem", but more linked to CIFS or samba (the specific implementation).

You are correct, but there is room for ALOT of blame. Sharing files is not exactly a simple thing to implement. You need some rather complex software and hardware throughout the entire path that the data will take. Then throw in even more complexity because you probably have anti-virus installed and some kind of software firewall and who gets the blame for the latency? I know I don't have any slowdowns at all in Linux where I did for some very large directories in Windows.

The point ProtoSD was making is that you can't make the comparison because the ability to list the directories locally and the ability to list a directory on a different PC requires a whole different set of code to be used, which may or may not cause slowdowns or speedups. All the ls command does is prove that you can list the directory locally. Big deal when you want to view the directory remotely.
 

mikesm

Dabbler
Joined
Mar 20, 2013
Messages
36
Well, the main reason I did the comparison was just to see if ZFS had issues. Being able to do a local directory walk that fast means that the file system itself was performing fine, and that I needed to look elsewhere.

If it had been really slow, I wold have looked at the disks etc... for faults.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Well, the main reason I did the comparison was just to see if ZFS had issues. Being able to do a local directory walk that fast means that the file system itself was performing fine, and that I needed to look elsewhere.

If it had been really slow, I wold have looked at the disks etc... for faults.

It's a fine test for testing the system locally / ZFS (which never turns out to be the problem), it's just useless for testing CIFS performance. It never hurts to check the disks for faults since a significant amount of the time that's what the problem turns out to be. In your case I think it's CIFS related.
 

mikesm

Dabbler
Joined
Mar 20, 2013
Messages
36
It's a fine test for testing the system locally / ZFS (which never turns out to be the problem), it's just useless for testing CIFS performance. It never hurts to check the disks for faults since a significant amount of the time that's what the problem turns out to be. In your case I think it's CIFS related.

I think the data clearly points to that. ;). Is this likely to get better in Samba 4? My experience with samba in the past under Linux was a little underwhelming. The kernel based CIFS support in Solaris seems to be better designed, but then you have to deal with all the Solaris issues.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So if you're doing 4000 video files with metadata files, that's what, maybe 12,000 files? Doesn't CIFS do a fairly heavyweight lookup when you pull a directory listing? And if you need to do a round trip for each request, that seems like it'd take a Real Long Time. Run a tcpdump and "netstat 1" and see what is going on. Too lazy to do any experiments from my phone, sorry heh

- - - Updated - - -

So if you're doing 4000 video files with metadata files, that's what, maybe 12,000 files? Doesn't CIFS do a fairly heavyweight lookup when you pull a directory listing? And if you need to do a round trip for each request, that seems like it'd take a Real Long Time. Run a tcpdump and "netstat 1" and see what is going on. Too lazy to do any experiments from my phone, sorry heh
 

mikesm

Dabbler
Joined
Mar 20, 2013
Messages
36
So if you're doing 4000 video files with metadata files, that's what, maybe 12,000 files? Doesn't CIFS do a fairly heavyweight lookup when you pull a directory listing? And if you need to do a round trip for each request, that seems like it'd take a Real Long Time. Run a tcpdump and "netstat 1" and see what is going on. Too lazy to do any experiments from my phone, sorry heh

- - - Updated - - -

So if you're doing 4000 video files with metadata files, that's what, maybe 12,000 files? Doesn't CIFS do a fairly heavyweight lookup when you pull a directory listing? And if you need to do a round trip for each request, that seems like it'd take a Real Long Time. Run a tcpdump and "netstat 1" and see what is going on. Too lazy to do any experiments from my phone, sorry heh

Well, if you are telling me CIFS is not necessarily designed well, that's not that hard for me to believe. However, when I had these same directories on a hardware raid controller attached to a windows 2008 file server, I never had these problems. So it's clear to me that things can work well if implemented properly, even using CIFS.

Thx
Mike
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, if you are telling me CIFS is not necessarily designed well, that's not that hard for me to believe. However, when I had these same directories on a hardware raid controller attached to a windows 2008 file server, I never had these problems. So it's clear to me that things can work well if implemented properly, even using CIFS.

Thx
Mike

The way Windows works, the way Linux works, and the way FreeBSD works, are not even close to the same. They use totally different file system, have totally different "good vs bad" and what not.

You know what's really weird? I have a directory with almost 120k files. It takes about 30 seconds to load on Windows. Do you know how long it takes to load on my Linux Mint machine(which has lower system specs by a long shot)? Less than 2 seconds.
 
Status
Not open for further replies.
Top