Hi EzEkil and
@whodat , I too have this
e1000e Detected Hardware Unit Hang with an
Intel I219-V Ethernet interface that's on my AsRock H670M Pro RS motherboard.
I know it's been some time for you but have you gotten over this issue?
I am running TrueNAS Scale 23.10.0.1 at this moment.
Code:
Jan 17 06:00:03 serveur4 systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Jan 17 06:00:03 serveur4 systemd[1]: sysstat-collect.service: Deactivated successfully.
Jan 17 06:00:03 serveur4 systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Jan 17 06:00:15 serveur4 kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH <c6>
TDT <f3>
next_to_use <f3>
next_to_clean <c5>
buffer_info[next_to_clean]:
time_stamp <101fb00cb>
next_to_watch <c6>
jiffies <101fb02b0>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Jan 17 06:00:17 serveur4 kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH <c6>
TDT <f3>
next_to_use <f3>
next_to_clean <c5>
buffer_info[next_to_clean]:
time_stamp <101fb00cb>
next_to_watch <c6>
jiffies <101fb04a8>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Jan 17 06:00:19 serveur4 kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH <c6>
TDT <f3>
next_to_use <f3>
next_to_clean <c5>
buffer_info[next_to_clean]:
time_stamp <101fb00cb>
next_to_watch <c6>
jiffies <101fb0698>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Jan 17 06:00:21 serveur4 kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH <c6>
TDT <f3>
next_to_use <f3>
next_to_clean <c5>
buffer_info[next_to_clean]:
time_stamp <101fb00cb>
next_to_watch <c6>
jiffies <101fb0890>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Jan 17 06:00:22 serveur4 kernel: ------------[ cut here ]------------
Jan 17 06:00:22 serveur4 kernel: NETDEV WATCHDOG: enp0s31f6 (e1000e): transmit queue 0 timed out
Jan 17 06:00:22 serveur4 kernel: WARNING: CPU: 6 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x207/0x210
Jan 17 06:00:22 serveur4 kernel: Modules linked in: rpcsec_gss_krb5(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) xt_tcpudp(E) nft_log(E) nft_limit(E) xt_limit(E) xt_NFLOG(E) nfnetlink_log(E) xt_physdev(E) veth(E) tls(E) xt_multiport(E) xt_addrtype(E) ip_vs_rr(E) dummy(E) ipt_REJECT(E) nf_reject_ipv4(E) ip_set_hash_ipport(E) xt_nat(E) xt_ipvs(E) xt_set(E) ip_vs(E) ip_set_hash_ip(E) ip_set_hash_net(E) ip_set(E) xt_MASQUERADE(E) nft_chain_nat(E) xt_mark(E) xt_conntrack(E) xt_comment(E) nft_compat(E) nf_tables(E) nfnetlink(E) iptable_filter(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) overlay(E) br_netfilter(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) tun(E) scst_vdisk(OE) isert_scst(OE) iscsi_scst(OE) scst(OE) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) dlm(E) nvme_fabrics(E) binfmt_misc(E) bridge(E) stp(E) llc(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) dca(E) essiv(E) authenc(E) dm_crypt(E) snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E)
Jan 17 06:00:22 serveur4 kernel: snd_hda_codec_generic(E) ledtrig_audio(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) snd_sof_pci_intel_tgl(E) kvm_intel(E) snd_sof_intel_hda_common(E) snd_sof_intel_hda(E) kvm(E) snd_sof_pci(E) irqbypass(E) snd_sof_xtensa_dsp(E) snd_sof(E) snd_sof_utils(E) ghash_clmulni_intel(E) snd_soc_hdac_hda(E) snd_hda_ext_core(E) sha512_ssse3(E) snd_soc_acpi_intel_match(E) snd_soc_acpi(E) sha512_generic(E) i915(E) snd_soc_core(E) snd_compress(E) aesni_intel(E) snd_hda_intel(E) snd_intel_dspcfg(E) crypto_simd(E) drm_buddy(E) cryptd(E) snd_hda_codec(E) rapl(E) drm_display_helper(E) intel_cstate(E) snd_hda_core(E) cec(E) snd_hwdep(E) mei_hdcp(E) intel_uncore(E) rc_core(E) snd_pcm(E) ttm(E) wmi_bmof(E) snd_timer(E) pcspkr(E) iTCO_wdt(E) mei_me(E) snd(E) drm_kms_helper(E) intel_pmc_bxt(E) iTCO_vendor_support(E) i2c_algo_bit(E) watchdog(E) mei(E) soundcore(E) ee1004(E) intel_pmc_core(E) acpi_pad(E) acpi_tad(E) evdev(E) joydev(E) button(E)
Jan 17 06:00:22 serveur4 kernel: sg(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) drm(E) sunrpc(E) fuse(E) loop(E) efi_pstore(E) dm_mod(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) sd_mod(E) ses(E) enclosure(E) hid_generic(E) usbhid(E) hid(E) nvme(E) nvme_core(E) t10_pi(E) ahci(E) ahciem(E) mpt3sas(E) xhci_pci(E) libahci(E) crc64_rocksoft(E) raid_class(E) crc64(E) e1000e(E) crc_t10dif(E) scsi_transport_sas(E) xhci_hcd(E) crct10dif_generic(E) crc32_pclmul(E) libata(E) i2c_i801(E) intel_lpss_pci(E) ptp(E) crc32c_intel(E) crct10dif_pclmul(E) i2c_smbus(E) pps_core(E) usbcore(E) intel_lpss(E) scsi_mod(E) crct10dif_common(E) usb_common(E) idma64(E) scsi_common(E) video(E) wmi(E)
Jan 17 06:00:22 serveur4 kernel: CPU: 6 PID: 0 Comm: swapper/6 Tainted: P OE 6.1.55-production+truenas #2
Jan 17 06:00:22 serveur4 kernel: Hardware name: To Be Filled By O.E.M. H670M Pro RS/H670M Pro RS, BIOS 9.02 09/07/2022
Jan 17 06:00:22 serveur4 kernel: RIP: 0010:dev_watchdog+0x207/0x210
Jan 17 06:00:22 serveur4 kernel: Code: 00 e9 40 ff ff ff 48 89 df c6 05 91 35 3f 01 01 e8 6e de fa ff 44 89 e9 48 89 de 48 c7 c7 38 a6 3f 89 48 89 c2 e8 69 af 88 ff <0f> 0b e9 22 ff ff ff 66 90 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f
Jan 17 06:00:22 serveur4 kernel: RSP: 0018:ffffae8c002b0e80 EFLAGS: 00010286
Jan 17 06:00:22 serveur4 kernel: RAX: 0000000000000000 RBX: ffff9af715130000 RCX: 0000000000000000
Jan 17 06:00:22 serveur4 kernel: RDX: 0000000000000104 RSI: ffffffff8938a806 RDI: 00000000ffffffff
Jan 17 06:00:22 serveur4 kernel: RBP: ffff9af715130488 R08: 0000000000000000 R09: ffffae8c002b0cf0
Jan 17 06:00:22 serveur4 kernel: R10: 0000000000000003 R11: ffffffff89ad40a8 R12: ffff9af7151303dc
Jan 17 06:00:22 serveur4 kernel: R13: 0000000000000000 R14: ffffffff888117b0 R15: ffff9af715130488
Jan 17 06:00:22 serveur4 kernel: FS: 0000000000000000(0000) GS:ffff9b064f780000(0000) knlGS:0000000000000000
Jan 17 06:00:22 serveur4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 06:00:22 serveur4 kernel: CR2: 00007ff2324f3000 CR3: 0000000afb410000 CR4: 0000000000752ee0
Jan 17 06:00:22 serveur4 kernel: PKRU: 55555554
Jan 17 06:00:22 serveur4 kernel: Call Trace:
Jan 17 06:00:22 serveur4 kernel: <IRQ>
Jan 17 06:00:22 serveur4 kernel: ? __warn+0x7d/0xc0
Jan 17 06:00:22 serveur4 kernel: ? dev_watchdog+0x207/0x210
Jan 17 06:00:22 serveur4 kernel: ? report_bug+0xe6/0x170
Jan 17 06:00:22 serveur4 kernel: ? irq_work_queue+0xa/0x50
Jan 17 06:00:22 serveur4 kernel: ? handle_bug+0x41/0x70
Jan 17 06:00:22 serveur4 kernel: ? exc_invalid_op+0x13/0x60
Jan 17 06:00:22 serveur4 kernel: ? asm_exc_invalid_op+0x16/0x20
Jan 17 06:00:22 serveur4 kernel: ? pfifo_fast_reset+0x140/0x140
Jan 17 06:00:22 serveur4 kernel: ? dev_watchdog+0x207/0x210
Jan 17 06:00:22 serveur4 kernel: ? pfifo_fast_reset+0x140/0x140
Jan 17 06:00:22 serveur4 kernel: call_timer_fn+0x24/0x130
Jan 17 06:00:22 serveur4 kernel: __run_timers+0x21c/0x2a0
Jan 17 06:00:22 serveur4 kernel: run_timer_softirq+0x2b/0x50
Jan 17 06:00:22 serveur4 kernel: __do_softirq+0xed/0x2fe
Jan 17 06:00:22 serveur4 kernel: __irq_exit_rcu+0xc7/0x130
Jan 17 06:00:22 serveur4 kernel: sysvec_apic_timer_interrupt+0x9e/0xc0
Jan 17 06:00:22 serveur4 kernel: </IRQ>
Jan 17 06:00:22 serveur4 kernel: <TASK>
Jan 17 06:00:22 serveur4 kernel: asm_sysvec_apic_timer_interrupt+0x16/0x20
Jan 17 06:00:22 serveur4 kernel: RIP: 0010:cpuidle_enter_state+0xde/0x420
Jan 17 06:00:22 serveur4 kernel: Code: 00 00 31 ff e8 33 f9 97 ff 45 84 ff 74 16 9c 58 0f 1f 40 00 f6 c4 02 0f 85 25 03 00 00 31 ff e8 68 6c 9e ff fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
Jan 17 06:00:22 serveur4 kernel: RSP: 0018:ffffae8c00197e90 EFLAGS: 00000246
Jan 17 06:00:22 serveur4 kernel: RAX: ffff9b064f780000 RBX: ffff9b064f7bbe00 RCX: 0000000000000000
Jan 17 06:00:22 serveur4 kernel: RDX: 0000000000000006 RSI: ffffffff8938a806 RDI: ffffffff89364511
Jan 17 06:00:22 serveur4 kernel: RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000026c27b0a
Jan 17 06:00:22 serveur4 kernel: R10: 0000000000000018 R11: 0000000000002448 R12: ffffffff89b9efa0
Jan 17 06:00:22 serveur4 kernel: R13: 000079280728e19d R14: 0000000000000003 R15: 0000000000000000
Jan 17 06:00:22 serveur4 kernel: cpuidle_enter+0x29/0x40
Jan 17 06:00:22 serveur4 kernel: do_idle+0x20c/0x2b0
Jan 17 06:00:22 serveur4 kernel: cpu_startup_entry+0x19/0x20
Jan 17 06:00:22 serveur4 kernel: start_secondary+0x130/0x150
Jan 17 06:00:22 serveur4 kernel: secondary_startup_64_no_verify+0xe5/0xeb
Jan 17 06:00:22 serveur4 kernel: </TASK>
Jan 17 06:00:22 serveur4 kernel: ---[ end trace 0000000000000000 ]---
Jan 17 06:00:22 serveur4 kernel: e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
Jan 17 06:00:22 serveur4 kernel: br0: port 1(enp0s31f6) entered disabled state
Jan 17 06:00:26 serveur4 kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 17 06:00:26 serveur4 kernel: br0: port 1(enp0s31f6) entered blocking state
Jan 17 06:00:26 serveur4 kernel: br0: port 1(enp0s31f6) entered listening state
Jan 17 06:00:27 serveur4 k3s[6163]: {"level":"warn","ts":"2024-01-17T06:00:27.856-0500","logger":"etcd-client","caller":"v3@v3.5.7-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000590000/kine.sock","attempt":0,"error":"rpc error: code = Unknown desc = no such table: dbstat"}
Jan 17 06:00:28 serveur4 kernel: br0: port 1(enp0s31f6) received tcn bpdu
Jan 17 06:00:28 serveur4 kernel: br0: topology change detected, propagating
Jan 17 06:00:28 serveur4 kernel: br0: port 1(enp0s31f6) received tcn bpdu
Jan 17 06:00:28 serveur4 kernel: br0: topology change detected, propagating
Jan 17 06:00:41 serveur4 kernel: br0: port 1(enp0s31f6) entered learning state
Jan 17 06:00:56 serveur4 kernel: br0: port 1(enp0s31f6) entered forwarding state
Jan 17 06:00:56 serveur4 kernel: br0: topology change detected, propagating
Jan 17 06:01:01 serveur4 k3s[6163]: time="2024-01-17T06:01:01-05:00" level=info msg="COMPACT compactRev=9430489 targetCompactRev=9431002 currentRev=9432002"
Jan 17 06:01:01 serveur4 k3s[6163]: time="2024-01-17T06:01:01-05:00" level=info msg="COMPACT deleted 513 rows from 513 revisions in 5.763759ms - compacted to 9431002/9432002"
On my system, the hang is auto detected and auto healed by the driver, it seems. So it has far less impact. I get disconnections across the network for a few seconds and then everything returns to normal for a few hours or a full day. The strangest thing is that when it occurs, the error occurs at exactly the same time of day : 6:01 AM and/or 8:01 AM (in other words, once or twice per day). It's a moment of the day where the VM (a Proxmox Backup Server) syncs its datastore over the network, so this is a peak of network activity. However this is not the only moment in the day where this system has peaks network activity.