System hangs when copying and deleting at the same time (Windows clients)

sideswipe001

Cadet
Joined
Jan 28, 2019
Messages
5
Hello all! I just set up FreeNAS for the first time for use in my home. It's basically working, but I've noticed one odd annoyance, and I'm trying to figure out how to resolve it. So first, information about my system:

FreeNAS-11.2-RELEASE-U1
Intel(R) Celeron(R) CPU G3930
Supermicro Micro ATX DDR4 LGA 1151 Motherboards X11SSM-F-O
16 GB ECC RAM
7x 3TB WD Red HDD, single pool (RAIDZ1)
1x 1Gbit NIC

Generally, this is made to be a low-power NAS, primarily used for Plex, and other home use storage (documents, pictures, etc). The Plex server is NOT running on the NAS, it simply connects to the shares.

Here's the odd thing I've noticed:
While messing around re-encoding some of my movies, I found that I could make the SMB shares totally unresponsive by deleting a movie while simultaneously copying a different one over. Often this was from two different clients, but it happens from the same one as well. Generally it would happen like this:

1) Begin to copy over a large-ish movie file (15GB or so) from a Windows client computer. Usually maxes out the 1Gbit connection pretty stably.
2) You can still browse the shares fine, so go find a different movie (about the same size) and delete it off the NAS, from the client.
3) All the shares hang now. Within a few seconds, the currently active transfer stops copying. If any reads from the server are active, they stop. The file does not appear to be deleted yet. You can no longer browse any shares on the NAS. I believe the web interface still responds at this time.
4) Between 10 and 60 seconds later (this fluctuates) the file disappears (the delete happens) and the transfer starts up again. The NAS becomes responsive again.

This is very reproduce-able on my system.

So my question is:

Is this due to insufficient hardware (Not enough RAM, slow processor, slow HDDs) or is this a configuration issue? I haven't really done much except install the OS and set up the pool/shares, so things should basically be at default configuration. Anyone seen this and have an idea what I can do to fix it? It doesn't affect my normal daily usage, but it can get annoying when doing any serious work to my library.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
There's an OpenZFS bug/commit pair, #6569, "large file delete can starve out write ops"

https://www.illumos.org/issues/6569

That would seem to match up with this behavior, but I'd expect this fix to have been implemented in the FreeBSD implementation given that the commit is from 2016; that said, I didn't see a tunable similar to zfs_per_txg_dirty_frees_percent on my machine. I'm wondering if maybe the delete is starving out the txg contents as described.

SSH into your system, we're gonna run some DTrace up in here.

Create a file dirty.d using vi or nano and dump this code in there:

Code:
txg-syncing
{
        this->dp = (dsl_pool_t *)arg0;
}

txg-syncing
/this->dp->dp_spa->spa_name == $$1/
{
        printf("%4dMB of %4dMB used", this->dp->dp_dirty_total / 1024 / 1024,
            `zfs_dirty_data_max / 1024 / 1024);
}


Then from a shell do dtrace -s dirty.d YourPool and wait. You'll see a bunch of lines that look like:

Code:
dtrace: script 'dirty.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  4  56342                 none:txg-syncing   62MB of 4096MB used
  4  56342                 none:txg-syncing   64MB of 4096MB used
  5  56342                 none:txg-syncing   64MB of 4096MB used


Start a copy to the system, note the first number, and then do the delete. If it suddenly rockets up to "4096MB of 4096MB" or a similarly high ratio, then the deletes are starving out the incoming writes.

Edit: I see this is your first post. Talk about jumping in with both feet! :P
 

sideswipe001

Cadet
Joined
Jan 28, 2019
Messages
5
Yes it is! Thankfully I've got a good amount of experience in Linux/Command line stuff, if not FreeNAS. I'll give this a shot when I get home from work and see. Thanks for the help.
 

sideswipe001

Cadet
Joined
Jan 28, 2019
Messages
5
Okay, so I tested it out. (Sorry for the delay; the last few days were crazy)

No go on the starvation. Here's what I got:

1 63786 none:txg-syncing 242MB of 1616MB used
1 63786 none:txg-syncing 790MB of 1616MB used
0 63786 none:txg-syncing 497MB of 1616MB used
1 63786 none:txg-syncing 11MB of 1616MB used <--This is after the delete started.
1 63786 none:txg-syncing 23MB of 1616MB used
1 63786 none:txg-syncing 20MB of 1616MB used
1 63786 none:txg-syncing 1MB of 1616MB used
1 63786 none:txg-syncing 18MB of 1616MB used
1 63786 none:txg-syncing 18MB of 1616MB used


So basically the number dropped down to almost nothing. The first few were from the copy that I started up before the delete. I will also note that I managed to max that out (1616/1616) when I was creating a copy of the file to delete at the same time I was pulling a different movie to my desktop, and it didn't stop me from browsing the share still.

Also, the "nothing going on" hang lasted for about 30 seconds. I counted it out, though it was a rather large file I tested with - 48GB.

Any more suggestions on where to look next?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hmm. It might be that "delete this data from disk" isn't considered as the same type of "dirty data" and perhaps it's still starving it.

To rule out the network, I would suggest the next step would be to do the copy remotely, but SSH in and do a local delete of a large file.

If you don't stall out, it's a network/SMB issue. If you still stall out, it's something internal to the system or ZFS.
 

sideswipe001

Cadet
Joined
Jan 28, 2019
Messages
5
Test has been run. If I delete it from the terminal, the copy continues without ANY issue. It slowed down briefly (meaning it went from 112 MB/s solid to 95 MB/s or so for a second or two) but that's not unexpected to me. So it appears to have something to do with SMB.

Is there any way to tune that, options to pass in or something? If I need to, I can certainly SSH over whenever I need to delete something, but ideally I'd like to get the SMB share to work smoothly instead.
 
Top