Critical error disappears on restart

Status
Not open for further replies.

clement

Dabbler
Joined
Feb 2, 2015
Messages
10
Hi, I've built a freenas 1 year ago on a supermicro MB ,8gb ecc ram and 3x1Tb WD red drives in a RAIDZ1 (I know it's not the most secure thing but it fits my needs as it's not very important data.
However since the last 3 scrubs, after every scrubs, I have a critical error saying a disk is faulty (not always the same disk), when I check the smart test output everythig is fine, than I reboot freenas and the error disappears.

What I did last time to understand if a disk was really in trouble is after the error disappeared, I rebooted 3 times, disconnecting a different disk each time, and the pool as accessible each time, so in my opinion, all disks are fine right ?
Is it a known bug or should a worry that something deeper is going on ?
Thanks
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
  1. What version are you running (have you applied updates since you built it a year ago)?
  2. In regards to SMART Tests, are you running the "Long Self Tests" or just looking at results for the "Short Self Tests"?
  3. If possible, post (In Code Tags) output of: dmesg
 

clement

Dabbler
Joined
Feb 2, 2015
Messages
10
Hi, I'm on 9.3 Stable and update regularly every two weeks, currently I'm up to date, this morning after a long smart test, it gave me a degraded pool again, however the output of SMART test is ok, just restarted the NAS, and everything is fine again,
zpool status gives me this
Code:
pool: zfs
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub canceled on Sun Feb  7 02:22:50 2016
config:

    NAME                                            STATE     READ WRITE CKSUM
    zfs                                             ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        gptid/aae4046f-5df1-11e5-b5d1-00304867cb9e  ONLINE       0     0     2
        gptid/ab5d8500-5df1-11e5-b5d1-00304867cb9e  ONLINE       0     0     0
        gptid/ac0774bd-5df1-11e5-b5d1-00304867cb9e  ONLINE       0     0     0
    cache
      gptid/dda3435f-5df1-11e5-b5d1-00304867cb9e    ONLINE       0     0     0

errors: No known data errors



and the output of dmesg is :
Code:
Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.3-RELEASE-p31 #0 r288272+33bb475: Wed Feb  3 02:19:35 PST 2016
    root@build3.ixsystems.com:/tank/home/stable-builds/FN/objs/os-base/amd64/tank/home/stable-builds/FN/FreeBSD/src/sys/FREENAS.amd64 amd64
gcc version 4.2.1 20070831 patched [FreeBSD]
CPU: Intel(R) Xeon(R) CPU            5140  @ 2.33GHz (2333.38-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x6f6  Family = 0x6  Model = 0xf  Stepping = 6
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x4e3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant, performance statistics
real memory  = 9126805504 (8704 MB)
avail memory = 8238002176 (7856 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <PTLTD       APIC  >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  1
WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.
ioapic0 <Version 2.0> irqs 0-23 on motherboard
kbd1 at kbdmux0
cryptosoft0: <software crypto> on motherboard
aesni0: No AESNI support.
padlock0: No ACE support.
acpi0: <PTLTD      XSDT> on motherboard
acpi0: Power Button (fixed)
unknown: I/O range not supported
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 4.0 on pci0
pci2: <ACPI PCI bus> on pcib2
uhci0: <Intel 82801I (ICH9) USB controller> port 0x1820-0x183f irq 16 at device 26.0 on pci0
usbus0 on uhci0
ehci0: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xfc500000-0xfc5003ff irq 18 at device 26.7 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> irq 16 at device 28.4 on pci0
pci4: <ACPI PCI bus> on pcib4
em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0x2000-0x201f mem 0xfc100000-0xfc11ffff irq 16 at device 0.0 on pci4
em0: Using an MSI interrupt
em0: Ethernet address: 00:30:48:67:cb:9e
pcib5: <ACPI PCI-PCI bridge> irq 17 at device 28.5 on pci0
pci5: <ACPI PCI bus> on pcib5
em1: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0x3000-0x301f mem 0xfc200000-0xfc21ffff irq 17 at device 0.0 on pci5
em1: Using an MSI interrupt
em1: Ethernet address: 00:30:48:67:cb:9f
uhci1: <Intel 82801I (ICH9) USB controller> port 0x1840-0x185f irq 23 at device 29.0 on pci0
usbus2 on uhci1
uhci2: <Intel 82801I (ICH9) USB controller> port 0x1860-0x187f irq 22 at device 29.1 on pci0
usbus3 on uhci2
uhci3: <Intel 82801I (ICH9) USB controller> port 0x1880-0x189f irq 21 at device 29.2 on pci0
usbus4 on uhci3
ehci1: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xfc500400-0xfc5007ff irq 23 at device 29.7 on pci0
usbus5: EHCI version 1.0
usbus5 on ehci1
pcib6: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci6: <ACPI PCI bus> on pcib6
vgapci0: <VGA-compatible display> port 0x4000-0x407f mem 0xfa000000-0xfbffffff,0xfc000000-0xfc03ffff at device 1.0 on pci6
vgapci0: Boot video device
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH9 SATA300 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x18d0-0x18df,0x18c0-0x18cf at device 31.2 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
atapci1: <Intel ICH9 SATA300 controller> port 0x1c28-0x1c2f,0x1c1c-0x1c1f,0x1c20-0x1c27,0x1c18-0x1c1b,0x18f0-0x18ff,0x18e0-0x18ef irq 18 at device 31.5 on pci0
ata2: <ATA channel> at channel 0 on atapci1
ata3: <ATA channel> at channel 1 on atapci1
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
ichwd0 on isa0
ichwd0: resuming after hardware watchdog timeout
wbwd0: <Winbond 83627DHG IC ver. 5> at port 0x2e-0x2f on isa0
orm0: <ISA Option ROM> at iomem 0xc0000-0xc7fff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
coretemp0: <CPU On-Die Thermal Sensors> on cpu0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 728072806000728
device_attach: est0 attach returned 6
coretemp1: <CPU On-Die Thermal Sensors> on cpu1
est1: <Enhanced SpeedStep Frequency Control> on cpu1
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 728072806000728
device_attach: est1 attach returned 6
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 1.000 msec
ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to accept, logging disabled
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 480Mbps High Speed USB v2.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 12Mbps Full Speed USB v1.0
usbus4: 12Mbps Full Speed USB v1.0
usbus5: 480Mbps High Speed USB v2.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <Intel> at usbus1
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ugen2.1: <Intel> at usbus2
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <Intel> at usbus3
uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3
ugen4.1: <Intel> at usbus4
uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <Intel> at usbus5
uhub5: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5
uhub0: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub3: 2 ports with 2 removable, self powered
uhub4: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub5: 6 ports with 6 removable, self powered
ugen1.2: <SanDisk> at usbus1
umass0: <SanDisk Cruzer Fit, class 0/0, rev 2.10/1.00, addr 2> on usbus1
umass0:  SCSI over Bulk-Only; quirks = 0x8100
umass0:5:0:-1: Attached to scbus5
ugen1.3: <SanDisk> at usbus1
umass1: <SanDisk Cruzer Fit, class 0/0, rev 2.00/1.27, addr 3> on usbus1
umass1:  SCSI over Bulk-Only; quirks = 0x8100
umass1:6:1:-1: Attached to scbus6
da0 at umass-sim0 bus 0 scbus5 target 0 lun 0
da0: <SanDisk Cruzer Fit 1.00> Removable Direct Access SCSI-6 device
da0: Serial Number 4C530013300703111082
da0: 40.000MB/s transfers
da0: 7632MB (15630336 512 byte sectors: 255H 63S/T 972C)
da0: quirks=0x2<NO_6_BYTE>
da1 at umass-sim1 bus 1 scbus6 target 0 lun 0
da1: <SanDisk Cruzer Fit 1.27> Removable Direct Access SCSI-6 device
da1: Serial Number 4C530007521220122075
da1: 40.000MB/s transfers
da1: 7633MB (15633408 512 byte sectors: 255H 63S/T 973C)
da1: quirks=0x2<NO_6_BYTE>
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <SAMSUNG SSD 830 Series CXM03B1Q> ATA-9 SATA 3.x device
ada0: Serial Number S0XXNEAC614239
ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada0: 61057MB (125045424 512 byte sectors: 16H 63S/T 16383C)
ada0: quirks=0x1<4K>
ada0: Previously was known as ad0
ada1 at ata1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD10EFRX-68PJCN0 82.00A82> ATA-9 SATA 3.x device
ada1: Serial Number WD-WCC4J2KY5863
ada1: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada1: quirks=0x1<4K>
ada1: Previously was known as ad2
ada2 at ata1 bus 0 scbus1 target 1 lun 0
ada2: <WDC WD10EFRX-68PJCN0 82.00A82> ATA-9 SATA 3.x device
ada2: Serial Number WD-WCC4J7RHC3EH
ada2: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada2: quirks=0x1<4K>
ada2: Previously was known as ad3
ada3 at ata3 bus 0 scbus3 target 0 lun 0
ada3: <WDC WD10EFRX-68PJCN0 82.00A82> ATA-9 SATA 3.x device
ada3: Serial Number WD-WCC4J5YS0Z7N
ada3: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada3: quirks=0x1<4K>
ada3: Previously was known as ad6
SMP: AP CPU #1 Launched!
Timecounter "TSC-low" frequency 1166687917 Hz quality 1000
GEOM: ada3: the secondary GPT table is corrupt or invalid.
GEOM: ada3: using the primary only -- recovery suggested.
Trying to mount root from zfs:freenas-boot/ROOT/FreeNAS-9.3-STABLE-201602031011 []...
GEOM_RAID5: Module loaded, version 1.3.20140711.62 (rev f91e28e40bf7)
GEOM_ELI: Device ada1p1.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI:     Crypto: software
GEOM_ELI: Device ada2p1.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI:     Crypto: software
GEOM_ELI: Device ada3p1.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI:     Crypto: software
vboxdrv: fAsync=0 offMin=0x3cd offMax=0x81f
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
bridge0: Ethernet address: 02:ab:9a:c1:bf:00
bridge0: link state changed to UP
em0: promiscuous mode enabled
epair0a: Ethernet address: 02:e0:17:00:0c:0a
epair0b: Ethernet address: 02:e0:17:00:0d:0b
epair0a: link state changed to UP
epair0b: link state changed to UP
em0: link state changed to DOWN
epair0a: promiscuous mode enabled
ng_ether_ifnet_arrival_event: can't re-name node epair0b
em0: link state changed to UP
epair1a: Ethernet address: 02:41:e8:00:0d:0a
epair1b: Ethernet address: 02:41:e8:00:0e:0b
epair1a: link state changed to UP
epair1b: link state changed to UP
epair1a: promiscuous mode enabled
ng_ether_ifnet_arrival_event: can't re-name node epair1b
epair2a: Ethernet address: 02:d0:d3:00:0e:0a
epair2b: Ethernet address: 02:d0:d3:00:0f:0b
epair2a: link state changed to UP
epair2b: link state changed to UP
epair2a: promiscuous mode enabled
ng_ether_ifnet_arrival_event: can't re-name node epair2b
epair3a: Ethernet address: 02:d4:da:00:0f:0a
epair3b: Ethernet address: 02:d4:da:00:10:0b
epair3a: link state changed to UP
epair3b: link state changed to UP
epair3a: promiscuous mode enabled
ng_ether_ifnet_arrival_event: can't re-name node epair3b
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
arpresolve: can't allocate llinfo for 192.168.0.254
epair4a: Ethernet address: 02:67:d7:00:10:0a
epair4b: Ethernet address: 02:67:d7:00:11:0b
epair4a: link state changed to UP
epair4b: link state changed to UP
epair4a: promiscuous mode enabled
ng_ether_ifnet_arrival_event: can't re-name node epair4b
epair5a: Ethernet address: 02:71:aa:00:11:0a
epair5b: Ethernet address: 02:71:aa:00:12:0b
epair5a: link state changed to UP
epair5b: link state changed to UP
epair5a: promiscuous mode enabled
ng_ether_ifnet_arrival_event: can't re-name node epair5b
epair6a: Ethernet address: 02:94:73:00:12:0a
epair6b: Ethernet address: 02:94:73:00:13:0b
epair6a: link state changed to UP
epair6b: link state changed to UP
epair6a: promiscuous mode enabled
ng_ether_ifnet_arrival_event: can't re-name node epair6b
arp: 192.168.0.16 moved from 02:94:73:00:12:0a to 00:30:48:67:cb:9e on epair6b
arp: 192.168.0.16 moved from 02:71:aa:00:11:0a to 00:30:48:67:cb:9e on epair5b
arp: 192.168.0.16 moved from 02:67:d7:00:10:0a to 00:30:48:67:cb:9e on epair4b
arp: 192.168.0.16 moved from 02:d4:da:00:0f:0a to 00:30:48:67:cb:9e on epair3b
arp: 192.168.0.16 moved from 02:d0:d3:00:0e:0a to 00:30:48:67:cb:9e on epair2b
arp: 192.168.0.16 moved from 02:41:e8:00:0d:0a to 00:30:48:67:cb:9e on epair1b
arp: 192.168.0.16 moved from 02:e0:17:00:0c:0a to 00:30:48:67:cb:9e on epair0b
bridge1: Ethernet address: 02:ab:9a:c1:bf:01
epair7a: Ethernet address: 02:71:ba:00:14:0a
epair7b: Ethernet address: 02:71:ba:00:15:0b
epair7a: link state changed to UP
epair7b: link state changed to UP
epair7a: promiscuous mode enabled
bridge1: link state changed to UP
ng_ether_ifnet_arrival_event: can't re-name node epair7b
pid 1875 (syslog-ng), uid 0: exited on signal 6 (core dumped)


I'm gonna disconnect a working disk to check and keep you posted.
Thanks
 

clement

Dabbler
Joined
Feb 2, 2015
Messages
10
Hi, little update,
First of all,the problem turned out to be very different than what I initially tought, it's definitely not a bug, so feel free to move the post to another section.


I rebooted and rescrubed this morning and still had : UDMA_CRC_Error_Count 0x0032 200 156 000 Old_age Always - 61
when I call : smartctl -a /dev/ada1, as well as a degraded state after scrub, so I used a different SATA cable and moved changed the motherboard sata connector, rebooted and now the "faulty" disk changed as ada3, I ran another Scrub and long smart test and here are the outputs :
Code:
[root@freenas] ~# smartctl -a /dev/ada3
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD10EFRX-68PJCN0
Serial Number:    WD-WCC4J2KY5863
LU WWN Device Id: 5 0014ee 2b652de23
Firmware Version: 82.00A82
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Feb 16 01:10:40 2016 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (  25)    The self-test routine was aborted by
                    the host.
Total time to complete Offline
data collection:         (13320) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 152) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x303d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   141   139   021    Pre-fail  Always       -       3925
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       59
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3519
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       56
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       142
194 Temperature_Celsius     0x0022   114   106   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   156   000    Old_age   Always       -       61
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%      3517         -
# 2  Extended offline    Aborted by host               90%      3517         -
# 3  Extended offline    Completed without error       00%      3517         -
# 4  Short offline       Completed without error       00%      3514         -
# 5  Extended offline    Completed without error       00%      3502         -
# 6  Short offline       Completed without error       00%      3498         -
# 7  Short offline       Completed without error       00%       133         -
# 8  Short offline       Completed without error       00%       111         -
# 9  Short offline       Completed without error       00%        88         -
#10  Extended offline    Completed without error       00%        68         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I understand that UDMA_CRC_Error_Count will never revert to 0 and it makes sense to me, correct ?

Here is the zpool status output after a Scrub with the new Sata cable and it looks good to me :

Code:
zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Mon Jan 18 18:47:16 2016
config:

    NAME                                            STATE     READ WRITE CKSUM
    freenas-boot                                    ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        gptid/d3f9780e-5de8-11e5-9c66-00304867cb9e  ONLINE       0     0     0
        da0p2                                       ONLINE       0     0     0

errors: No known data errors

  pool: zfs
state: ONLINE
  scan: scrub repaired 0 in 2h31m with 0 errors on Tue Feb 16 01:28:57 2016
config:

    NAME                                            STATE     READ WRITE CKSUM
    zfs                                             ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        gptid/aae4046f-5df1-11e5-b5d1-00304867cb9e  ONLINE       0     0     0
        gptid/ab5d8500-5df1-11e5-b5d1-00304867cb9e  ONLINE       0     0     0
        gptid/ac0774bd-5df1-11e5-b5d1-00304867cb9e  ONLINE       0     0     0
    cache
      gptid/dda3435f-5df1-11e5-b5d1-00304867cb9e    ONLINE       0     0     0

errors: No known data errors




HOWEVER : I checked gpart status, which to be honnest, I don't really understand what it means and don't even know why I checked that, but here is what I get:
Code:
[root@freenas] ~# gpart status
  Name   Status  Components
da0p1       OK  da0
da0p2       OK  da0
da1p1       OK  da1
da1p2       OK  da1
ada0p1       OK  ada0
ada1p1  CORRUPT  ada1
ada1p2  CORRUPT  ada1
ada2p1       OK  ada2
ada2p2       OK  ada2
ada3p1       OK  ada3
ada3p2       OK  ada3


I can see the corrupt info, however the weird thing is that the supposedly defective disk is now ada3 since I changed the sata cable's position, so why is the CORRUPT on ada 1 ?

Sorry for the long post and thanks for any help, I'll do a memtest tomorrow just to rule this out.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
I would do a short and long smart test on ada3.


Sent from my phone
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You have a hardware failure of some kind. Your disk is bad or your sata cable is bad. Configure smart long and short tests and see if they pass.
 

clement

Dabbler
Joined
Feb 2, 2015
Messages
10
Yes, it's what I did and I copied the output of smart tests,and UDMA_CRC_Error_Count is at 61, now that I have changed the cable, I don't have any CKSUM errors on scrubs, I guess I should just change that sata cable and monitor that UDMA_CRC_Error_Count doesn't go higher right ?
 

clement

Dabbler
Joined
Feb 2, 2015
Messages
10
by the way, SMART tests always passed, even with the bad cable, and when scrubs would give me a "degraded pool"
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
by the way, SMART tests always passed, even with the bad cable, and when scrubs would give me a "degraded pool"

It's normal, SMART tests check the disk surface (amongst other things), they don't check the SATA cable.
 

clement

Dabbler
Joined
Feb 2, 2015
Messages
10
Yes, that's what led me to replace the Sata cable, glad I learned more stuff about ZFS.
Thanks
 
Status
Not open for further replies.
Top