First off I'd like to thank everyone that responds to this with helpful pointer. I realize no one has to contribute to my issue and I appreciate those that do. I've spent some time searching these forums, reddit and google searching and I haven't been able to find anything similar to my experience, it's possibly I've missed a post with the solution and appreciate whoever points it out!
With that out of the way, I'll do my best to give some background, describe what I'm seeing, as well as my hypothesis on what might be at issue.
Background -
1) This isn't my first rodeo, I've put FreeNAS systems in place at work and at home running on enterprise systems as well as commodity platforms. I've never really had any issues with FreeNAS, once the configuration is complete.
2) This build it pretty low end, but it's got a purpose it's filling and prior to several changes outlined below it was running flawlessly for over 9 months.
3) Due to unique worldwide circumstances (COVID) I moved this box from an AD environment (work) to my home to update hardware and load some data
The system is a lower end build designed to house transient backups as well as a large media collection served out on emby. It's got 16GB of RAM, a modest quad core processor 4th gen Intel, 2x4TB drives and was connected via an iSCSI target to a 32TB array. It's got a 1 Gig Ethernet adapter and a 4x 1Gig Intel Ethernet card in one of the PCIe slots. I'd brought it home to change the boot drive to dual 200GB Intel S3700s I had lying around, add a 10GB NIC as well as to add 2x10TB drives to it. In doing so I unplugged the system and brought it home.
Initially I had issues with AD based logins, I'm not sure what the cache duration was/is or how FreeNAS deals with things, but it was easy enough to add a local account. Once I did so, everything worked as expected. The system using the two internal drives will write at theoretical Ethernet speeds (~115MB/sec - no jumbo frames on the home NW) and was performing well. I backed up around 600GB of data to it and loaded another 100GB of files over SMB with no issues. I did however make a few changes to my home network and noticed that AT&T is pushing a "search domain" down through their boundary device.
On day two, I backed up the config, placed one S3700 drive in the system, loaded the same version of FreeNAS that was on the previous boot media FreeNAS-11.3-U1, rebooted and restored the config. Everything came up fine on the system to include the web interface and the emby jail. Emby is running fine and at full speed. CPU utilization hovers between 0-6% while doing some modest and light testing. Memory consumption under testing never exceeds 8GB with services at about 1.7GB and ZFS cache using the rest.
Now on to the issue -
A) From multiple (3) Windows 10PCs, the login to the array can take anywhere from 30secs to 90secs to present the Windows login credentials. Often this will cause a timeout or error on the Windows side, but with a few attempts (usually 2) I can log in and browsing the share is very quick.
B) Copy performance is spotty at best. What usually happens is that large single file copy will start out very fast and then stall out for 5-30 seconds dropping to no transmission at all. It will then resume at full speed. I've attached a screenshot showing a 1.8GB file copy and the dip in transmission speed. While it doesn't look like it's stalling to zero bps, watching it live, it does. This was not the behavior that I saw while it was at work, or at home prior to the boot media update along with a few NW changes. With multiple files it seems like it will stall out on each file transmission.
What I've done so far -
1) I've double checked all settings
2) I've run a scrub on the main drive
3) I've tried to boot from the original media (this failed oddly on 2 attempts... and I've not tried to resolve this as I'd rather use the HW that I have in place)
4) I've tried to set the SMB bind interface, but all I see in the settings is a "--".
What I haven't done so far -
1) I haven't tested AFP login or performance
2) I haven't tried CIFS from a Linux box
Hypothesis -
I believe this is a network or FreeNAS configuration problem. I think this for two reasons, first I had everything working really well at home using a local account. Logins were quick and transfers maintained high theoretical speeds. In updating my NW equipment I'm now getting a pushed "domain" from AT&T Gig fiber of attlocal.net. This seems to impact Zeroconf and possibly other dynamic browsing on my network. Prior to this I'd run "domain less" with a .local being the preferred and working method of connection. This would certainly be a candidate for the very slow presentation of the login information under Windows. As for the copy speed, as you can see it's FULL speed once it resumes, but I believe the initial high speed is a misleading windows trick on outbound queue or something and that the stalled connection is really at the beginning of the file creation on the FreeNAS system. If browsing wasn't lightening fast with directories containing hundreds/thousands of files I'd be convinced this was the case.
If you've made it this far - congratulations... and thanks. Unfortunately, I don't know what else I can do to try and resolve this. I'm going to keep working and test AFP as a next step.
With that out of the way, I'll do my best to give some background, describe what I'm seeing, as well as my hypothesis on what might be at issue.
Background -
1) This isn't my first rodeo, I've put FreeNAS systems in place at work and at home running on enterprise systems as well as commodity platforms. I've never really had any issues with FreeNAS, once the configuration is complete.
2) This build it pretty low end, but it's got a purpose it's filling and prior to several changes outlined below it was running flawlessly for over 9 months.
3) Due to unique worldwide circumstances (COVID) I moved this box from an AD environment (work) to my home to update hardware and load some data
The system is a lower end build designed to house transient backups as well as a large media collection served out on emby. It's got 16GB of RAM, a modest quad core processor 4th gen Intel, 2x4TB drives and was connected via an iSCSI target to a 32TB array. It's got a 1 Gig Ethernet adapter and a 4x 1Gig Intel Ethernet card in one of the PCIe slots. I'd brought it home to change the boot drive to dual 200GB Intel S3700s I had lying around, add a 10GB NIC as well as to add 2x10TB drives to it. In doing so I unplugged the system and brought it home.
Initially I had issues with AD based logins, I'm not sure what the cache duration was/is or how FreeNAS deals with things, but it was easy enough to add a local account. Once I did so, everything worked as expected. The system using the two internal drives will write at theoretical Ethernet speeds (~115MB/sec - no jumbo frames on the home NW) and was performing well. I backed up around 600GB of data to it and loaded another 100GB of files over SMB with no issues. I did however make a few changes to my home network and noticed that AT&T is pushing a "search domain" down through their boundary device.
On day two, I backed up the config, placed one S3700 drive in the system, loaded the same version of FreeNAS that was on the previous boot media FreeNAS-11.3-U1, rebooted and restored the config. Everything came up fine on the system to include the web interface and the emby jail. Emby is running fine and at full speed. CPU utilization hovers between 0-6% while doing some modest and light testing. Memory consumption under testing never exceeds 8GB with services at about 1.7GB and ZFS cache using the rest.
Now on to the issue -
A) From multiple (3) Windows 10PCs, the login to the array can take anywhere from 30secs to 90secs to present the Windows login credentials. Often this will cause a timeout or error on the Windows side, but with a few attempts (usually 2) I can log in and browsing the share is very quick.
B) Copy performance is spotty at best. What usually happens is that large single file copy will start out very fast and then stall out for 5-30 seconds dropping to no transmission at all. It will then resume at full speed. I've attached a screenshot showing a 1.8GB file copy and the dip in transmission speed. While it doesn't look like it's stalling to zero bps, watching it live, it does. This was not the behavior that I saw while it was at work, or at home prior to the boot media update along with a few NW changes. With multiple files it seems like it will stall out on each file transmission.
What I've done so far -
1) I've double checked all settings
2) I've run a scrub on the main drive
3) I've tried to boot from the original media (this failed oddly on 2 attempts... and I've not tried to resolve this as I'd rather use the HW that I have in place)
4) I've tried to set the SMB bind interface, but all I see in the settings is a "--".
What I haven't done so far -
1) I haven't tested AFP login or performance
2) I haven't tried CIFS from a Linux box
Hypothesis -
I believe this is a network or FreeNAS configuration problem. I think this for two reasons, first I had everything working really well at home using a local account. Logins were quick and transfers maintained high theoretical speeds. In updating my NW equipment I'm now getting a pushed "domain" from AT&T Gig fiber of attlocal.net. This seems to impact Zeroconf and possibly other dynamic browsing on my network. Prior to this I'd run "domain less" with a .local being the preferred and working method of connection. This would certainly be a candidate for the very slow presentation of the login information under Windows. As for the copy speed, as you can see it's FULL speed once it resumes, but I believe the initial high speed is a misleading windows trick on outbound queue or something and that the stalled connection is really at the beginning of the file creation on the FreeNAS system. If browsing wasn't lightening fast with directories containing hundreds/thousands of files I'd be convinced this was the case.
If you've made it this far - congratulations... and thanks. Unfortunately, I don't know what else I can do to try and resolve this. I'm going to keep working and test AFP as a next step.