[SOLVED] Pool import fails with panic: VERIFY message

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
I'm glad we've got the pool mounted - what's the NIC bug that's choking out transfers? The ASRock X399 series says "Intel NIC" on their page but doesn't specify in any more detail. The only issues I've ever had with Intel cards have been with the i225-V
I thought it was the NIC but it still crashes after I inserted a dedicated PCI network card.

I've the following error message after few minutes of transfer, the only workaround is to reboot the "old" server from where I copy the data.
Rebooting the receiving server, or the internet box has no effect.

Code:
ssh_dispatch_run_fatal: Connection to 192.168.2.26 port 22: message authentication code incorrect
rsync: connection unexpectedly closed (37993623202 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(228) [receiver=3.2.3]
rsync: connection unexpectedly closed (110500 bytes received so far) [generator]
rsync error: unexplained error (code 255) at io.c(228) [generator=3.2.3]
rsync: [generator] write error: Broken pipe (32)


dmesg is flooded with
Code:
[  844.127170] pcieport 0000:00:01.1: AER: Corrected error received: 0000:00:00.0
[  844.128413] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
[  844.129393] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000080/00006000
[  844.130313] pcieport 0000:00:01.1:    [ 7] BadDLLP


and lspci returns this "device" for 01.1


Code:
root@truenas[/Storage/Emby-DS/Films]# lspci -n -vvv -s 0000:00:01.1
00:01.1 0604: 1022:1453 (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin ? routed to IRQ 28
        IOMMU group: 1
        Bus: primary=00, secondary=01, subordinate=07, sec-latency=0
        I/O behind bridge: 00001000-00002fff [size=8K]
        Memory behind bridge: fa000000-fa3fffff [size=4M]
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #1, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet- LinkState+
                RootCap: CRSVisible+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd+
                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS-
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled, ARIFwd-
                         AtomicOpsCtl: ReqEn- EgressBlck-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00000  Data: 0000
        Capabilities: [c0] Subsystem: 1022:1453
        Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
                RootCmd: CERptEn+ NFERptEn+ FERptEn+
                RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
                         FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
                ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
        Capabilities: [270 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: LaneErr at lane: 0 1 2 3
        Capabilities: [2a0 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
                ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
        Capabilities: [370 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                L1SubCtl2:
        Capabilities: [380 v1] Downstream Port Containment
                DpcCap: INT Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
                DpcCtl: Trigger:1 Cmpl- INT+ ErrCor- PoisonedTLP- SwTrigger- DL_ActiveErr-
                DpcSta: Trigger- Reason:00 INT- RPBusy- TriggerExt:00 RP PIO ErrPtr:1f
                Source: 0000
        Capabilities: [3c4 v1] Designated Vendor-Specific: Vendor=1022 ID=0001 Rev=1 Len=44 <?>
        Kernel driver in use: pcieport
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Edit : It was the internet box not handling the trafic... A mini switch to connect both servers works, and now everything is copying.

I'll mark this thread as resolved, thank you @jgreco and @HoneyBadger for the help !

Root cause : Probably defect PCI card SAS3008 + deduplication enabled
Fix : Mount pool as readonly through the CLI, copy everything to another server, get rid of the initial pool

Advise :
If you want to test deduplication, do it on a dedicated pool, and not only a dedicated dataset.

Some logs for SEO :
Code:
metaslab.c:2422:metaslab_load_impl(): metaslab_load: txq 11537026, spa Storage, vdev_id 0, ms_id 264, smp_lenght 57712, unflushed_allocs 688128, unflushed_frees 205886832640, freed 0, defer 0 + 0, unloaded time 91390 ms, loading_time 16ms, ms_max_size 274

panic: VERIFY(ddt_object_remove(ddt, otype, oclass, dde, tx) == 0) failed

cpuid = 0
time = 1671110509
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b
vpanic() at vpanic+0x17b
spl_panic() at spl_panic+0x3a
ddt_sync() at ddt_sync+0xc93
dsl_scan_sync() at dsl_scan_sync+0x6c9
spa_sync() at spa_sync+0xac7
txg_sync_thread() at txg_sync_thread+0x413
fork_exit() at fork_exit+0x7e
fork_trampoline() at fork_trampoline+0xe
KDB: enter: panic
[ thread pid 22 tid 101036 ]
Stopped at kdb_enter+0x37: movq
db:0:kdb.enter.default> write cn_mute 1
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Root cause : Probably defect PCI card SAS3008 + deduplication enabled
Fix : Mount pool as readonly through the CLI, copy everything to another server, get rid of the initial pool

Advise :
If you want to test deduplication, do it on a dedicated pool, and not only a dedicated dataset.

Glad you've worked this all out. Hopefully now you can get to the fun part of enjoying your NAS.

Happy Holidays!
 
Top