Network drops under load

wisepass

Dabbler
Joined
May 26, 2018
Messages
13
Apologies all, but I wonder if you might be able to help me with the following problem.

Whenever my server is put under load (see below)...its network connection will suddenly stop – this is exceptionally consistent and will happen within minutes of a reboot if the system is put under enough load.

Under load can mean ... a single fast file transfer to my main computers SSD - via SMB, multiple file transfers between different clients or transcoding to seperate emby client(s) - usually after about 20 minutes - can even occur with NO transcoding through emby though less frequently)

The problem has never occured to my knowledge when writing to the sever even from multiple clients. Nor has it occured on writes to my HDD on my main PC from the sever (which tend to transfer at a slower rate).

The server is still happily running locally (i.e. connected to its own monitor and keyboard) and responding to commands, it will reboot and the problem is reset.

I’ve been trying to t-shoot the problem myself though I’m not particularly familiar with freenas and its logs.

When I use “tail –f /var/log/messages” I get the following recurring message

home msk0: watchdog timeout
home kernel: msk0: link state changed to DOWN
home kernel: msk0: link state changed to DOWN
home kernel: msk0: link state changed to UP
home kernel: msk0: link state changed to UP

this repeats pretty much exactly every 30 seconds.

Also I’ve bought an Intel EXPI9301CTBLK NIC and popped this in and the problem is exactly the same (though I haven’t disabled the inbuilt NIC since I wasn’t sure how to do this or if it would make a difference)

The problem seems to be reduced and happen less frequently if I tick for autotune to be enabled, but I suspect this is just artificially limiting system load.

I’ve run memtest86 for several cycles and it’s reported no errors

I'm running FreeNAS 11, with the following hardware, It means the minimum hardware recommendations, but I know its not perfect and apologise but it was an old PC I was keen to change into a FreeNAS server I’m planning on adding backup drives and regular backups once I get it stable – it’s not meant to be a enterprise sever but a simple home server – I’m obviously keen for it to be reliable -

  • FreeNAS-11.1-U4
  • Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
  • Shuttle SP35P2 - Intel® P35 Mainboard
    Onboard - Marvell 88E8056 NIC (and Intel EXPI9301CTBLK – see above)
    Shuttle 400W Power Supply
  • Kinston 8GB (4x kvr800d2n6/2g) Non-ECC ram
  • 3x WD-RED 3 TB NAS Hard Drive – WD30EFRX
Any help would be greatly appreciated...
 
Last edited:

wisepass

Dabbler
Joined
May 26, 2018
Messages
13
On inital testing the above seems to have fixed my network drops as well thanks. So it seems to be a Marvell NIC in FreeBSD issue - I read something about network storm on the FreeBSD forums.

**NB and on more extensive testing over some hours despite 7 emby streams and concurrent file transfers my server hasn't missed a beat!
Anyone else with a Marvell Network Card NIC I would suggest looking into the above links and thanking the OP.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Also I’ve bought an Intel EXPI9301CTBLK NIC and popped this in and the problem is exactly the same (though I haven’t disabled the inbuilt NIC since I wasn’t sure how to do this or if it would make a difference)
You need to have you network traffic on the Intel NIC and disable the Marvell interface. There have been many problems reported with the Marvell network hardware. If the problem persists after the Intel NIC is the only connection in use, then we can explore other options.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
This hardware is a bit older than what is recommended. It shouldn't be a show stopper, but you might want to consider going to something a little newer, when budget permits.

Here is some useful reading if you want to learn more:

Slideshow explaining VDev, zpool, ZIL and L2ARC
https://forums.freenas.org/index.ph...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/

Terminology and Abbreviations Primer
https://forums.freenas.org/index.php?threads/terminology-and-abbreviations-primer.28174/

FreeNAS® Quick Hardware Guide
https://forums.freenas.org/index.php?resources/freenas®-quick-hardware-guide.7/

Hardware Recommendations Guide Rev 1e) 2017-05-06
https://forums.freenas.org/index.php?resources/hardware-recommendations-guide.12/
 

wisepass

Dabbler
Joined
May 26, 2018
Messages
13
You need to have you network traffic on the Intel NIC and disable the Marvell interface. There have been many problems reported with the Marvell network hardware. If the problem persists after the Intel NIC is the only connection in use, then we can explore other options.

AS in the above link the tunable - "hw.msk.msi_disable" Value = 1 - seemed to fix my old Marvell Network card dropping out - thanks depasseg

See this link for more info - https://forum.netgate.com/topic/45928/kernel-msk1-watchdog-timeout/2

Thanks - Chris for the info above though I'll keep looking into as I get to start to use FreeNAS more.

index.php
 

wisepass

Dabbler
Joined
May 26, 2018
Messages
13
Reporting back after the above...Unfortuantly it didn't work...so wondering if anyone can still help.

I did all the above settings to try and fix the msk timeout issue, but it just seemed to reduce the problem rather than fix it - at its best it was occuring about once per day and forcing me to reset my system.

So I gave up on the Marvell inbuilt NIC and tried to focus on the Intel EXPI9301CTBLK I already bought. I disabled the inbuilt NIC with "ifconfig msk0 DOWN" on startup as a script.

Unfortuantly I seem to be getting exactly the same symptoms as before still and the problem continues....the msk0 interface still appears in the FreeNAS reporting section and I believe I can see the driver being loaded at startup - though it doesn't appear in the interfaces section.

Is there anyway the same driver being loaded could be causing the issue?

Is there another way to disable the interface? - I've looked for a bios setting but haven't been able to find it.

I can't recreate the issue quite as easily as previously so I'm not 100% sure its an msk0 timeout message as before (supposedly an interupt issue) but its acting exactly the same even through the network is running through the em0 interface and on load the network stops and the system seems to be responding fine - though I run it headless and when it happens I can't easily run and connect a monitor up and can't recreate the problem as easily as previously.

Any ideas would help thanks again!

I'm currently trying adding if_msk_load="NO" to the loader.conf but the interface still appears in the reporting section of freenas as above but not sure if the problem still exists...

I get the below from ifconfig...

em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=2098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
ether 68:05:ca:01:f7:f5
hwaddr 68:05:ca:01:f7:f5
inet 192.168.0.20 netmask 0xffffff00 broadcast 192.168.0.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
msk0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=c011a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING,TSO4,VLAN_HWTSO,LINKSTATE>
ether 00:30:1b:bc:bc:7d
hwaddr 00:30:1b:bc:bc:7d
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 02:ff:a7:90:5d:00
nd6 options=1<PERFORMNUD>
groups: bridge
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 5 priority 128 path cost 2000
member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 1 priority 128 path cost 20000
epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
ether 02:a7:90:00:05:0a
hwaddr 02:a7:90:00:05:0a
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
groups: epair
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Just popping in here, please let us know if you have already done this...

1. In the BIOS, disable your onboard NIC, you can't just disable it in FreeNAS. You will only be using your Intel NIC, I have this same NIC and it works great.
2. Remove any tunable you added for this problem. The Intel NIC should work perfectly without any modifications.
3. If the problem remains then you probably have a network issue, poor cable or somthing else causing your issue. Using a new Ethernet cable directly connect your NAS to you main computer. Test it out and verify the connection doesn't drop. If the connection remains good then troubleshoot your network, maybe it's your switch or cable between your switch and FreeNAS machine. If you still have the issue then try another Ethernet cable to rule it all out. I have to assume that your main computer is reliable, you need to start somewhere.
4. If you still have the drops then you must have a hardware issue with your FreeNAS computer.

Post your comments.
 

wisepass

Dabbler
Joined
May 26, 2018
Messages
13
Just popping in here, please let us know if you have already done this...

1. In the BIOS, disable your onboard NIC, you can't just disable it in FreeNAS. You will only be using your Intel NIC, I have this same NIC and it works great.
2. Remove any tunable you added for this problem. The Intel NIC should work perfectly without any modifications.
3. If the problem remains then you probably have a network issue, poor cable or somthing else causing your issue. Using a new Ethernet cable directly connect your NAS to you main computer. Test it out and verify the connection doesn't drop. If the connection remains good then troubleshoot your network, maybe it's your switch or cable between your switch and FreeNAS machine. If you still have the issue then try another Ethernet cable to rule it all out. I have to assume that your main computer is reliable, you need to start somewhere.
4. If you still have the drops then you must have a hardware issue with your FreeNAS computer.

Post your comments.

thanks for the reply I do appreciated it and I've responded below - My aim is to properly disable the Marvell interface and/or driver in FreeNAS?

1. No option in BIOS (I can see) to disable this NIC is present

NB I've had a good look through this online manual for the bios and it doesn't seem to have that option "https://docplayer.net/32327576-Xpc-user-guide-for-the-sp35p2.html"

2. Already removed turnable and any other added options.

3. I've already looked at this. I've tried directly connecting to my computer issue still occurs - also occurs when using with other systems on the network including multiple devices streaming media from the NAS, but then sometimes it occurs when transferring files themselves. It also occurs in multiple places within my network and behide different routers - I still suspect its the Marvell NIC and it was due to "msk0 timeout" (lots of sources when googled) since the symptoms are the same - which is a well described problem with this driver - some describe it as network storm or interupt storm. I've read other posts where people have installed a new NIC and the problem continues not seen a solution posted anywhere

4. Indeed I think it is a hardware problem with the Marvell NIC and likely its driver, a well described one and one I can't see anyone who reported it being completely resolved - one would think it should be as simple as disbling the interface or driver, but as stated I've tried "ifconfig msk0 DOWN" and certainly this isn't working and I believe I see the driver loading on boot and certainly the reporting section still sees msk0 present (with no network traffic).

I'm currently trying to see if - if_msk_load="NO" - added to the loader.conf solves the issue - but still the interface appears in the reporting section in FreeNAS - not crashed as of yet, I'll give it a try over the coming few weeks and report back.

Any other ideas to disable the driver / interface?
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I looked at the user manuals, they suck. I didn't see any BIOS disable nor Jumper to disable the onboard LAN. The user guide doesn't always list everything so it you have actually gone through the BIOS then I believe you.

Not sure what else would help here, maybe you have the option to change the IRQ for the onboard NIC? Hopefully your last tweak will resolve the issue for you.

Good Luck!
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Shuttle SP35P2 - Intel® P35 Mainboard
Onboard - Marvell 88E8056 NIC (and Intel EXPI9301CTBLK – see above)
Shuttle 400W Power Supply
That is quite old (LGA775 / Core 2 Duo) and it is probably time to look at something newer that is properly compatible. You don't need to come up to the most modern hardware, but something a bit newer than that would be a big step forward.
 

wisepass

Dabbler
Joined
May 26, 2018
Messages
13
I looked at the user manuals, they suck. I didn't see any BIOS disable nor Jumper to disable the onboard LAN. The user guide doesn't always list everything so it you have actually gone through the BIOS then I believe you.

Not sure what else would help here, maybe you have the option to change the IRQ for the onboard NIC? Hopefully your last tweak will resolve the issue for you.

Good Luck!

I'd thought of possibly looking into the IRQ section, but if I just change it's IRQ will it disable it? I suspect it wont fix the problem, not 100% though.

I hadn't thought of a jumper! I'll see how things go for now - I've been blasting the thing with everything I can and its holding up fine at present - but its a strange problem and seemed to occur more randomly recently - if its not stable still though I'll pull things out and see if I can find a jumper. Thanks for the help so far, just thought I was being stupid and lacking in linux knowledge thought they might be a really simple method for disabling hardware I wasn't thinking of.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
just thought I was being stupid and lacking in linux knowledge
FreeNAS is FreeBSD Unix, not Linux, but with heavy customization to make it into an appliance.
 

wisepass

Dabbler
Joined
May 26, 2018
Messages
13
FreeNAS is FreeBSD Unix, not Linux, but with heavy customization to make it into an appliance.

My Bad...unix based sorry!

NB - As an update the above didn't work and I can't locate a jumper for the board so I'm giving up!
 
Last edited:

taaangy234

Dabbler
Joined
Dec 7, 2016
Messages
18
I'm running FreeNAS on a ESXI 6.7 box, whenever I transfer a big file that takes more than 10 seconds will drop the ESXI host NIC. Either I force reboot the host or reseat the network cable reinitiates the connection. I have dual onboard intel nics.
 

taaangy234

Dabbler
Joined
Dec 7, 2016
Messages
18
Update! the onboard dual Intel Corporation 82575EB NIC some reason doesn't work. I went ahead and install the Intel Corporation 82571EB Dual NIC card I have 4 port connected. Within my ESXI vSwitch0 I add the two new NIC uplinks then remove the other two NIC connect onboard and that works!!! Can't explain why an onboard Intel also giving issue.
 

Jumanjii1

Cadet
Joined
Feb 23, 2020
Messages
1
This advice is still working good. I had the same problem while moving large quantities of data to my newly installed FreeNAS 11.3 server. I finally tired of the Windows Server 2003 Data Center and decided to try FreeNAS on it. Everything was going great and was starting to move data over to a couple Terabyte Hard Drives in a mirrored configuration when I started getting that msk0: FreeNAS Watchdog timeout error. After re-booting the server and again moving data over to it, the error came back. I looked around the net and wound up on this site for a fix. It only took unplugging the network cable from the Marvell interface port and plugging into the Intel port, re-configuring the network to run on this new port and away I went. The server's been more reliable that I can continue to move data over to it.

Thanks everybody
 
Top