AHCI timeouts

Status
Not open for further replies.

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
I've tried to find information about this and there appear to be similar issues with other FreeBSD based configs, but haven't found anything specific. Basically I'm getting time out errors on the component disks of my ZFS mirror.

The timeouts are sometimes forever, and sometimes the drive comes back eventually. If it doesn't come back, a reboot is the only recourse. I have an identical motherboard in use under FreeNAS .7 without these problems, but there are different drives in that one (Western Digital as opposed to Seagate)

The problem system is as follows:

Motherboard: Via EPIA SN10000G (C7 processor 1GHz)
Memory: 4GB (2x Kingston KVR677D2/2GR)
Disks: 2x 1.5TB Seagate. ZFS mirror.
FreeNAS8.0.1B4

Below I've pasted the output of "camcontrol identify" for one of the disks, my dmesg output and my loader.conf.

camcontrol identify
Code:
pass0: <ST1500DL003-9VT16L CC32> ATA-8 SATA 3.x device
pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)

protocol              ATA/ATAPI-8 SATA 3.x
device model          ST1500DL003-9VT16L
firmware revision     CC32
serial number         5YD1VX7B
WWN                   5000c5002f93cc0f
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         268435455 sectors
LBA48 supported       2930277168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6 
media RPM             5900

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes	yes
write cache                    yes	yes
flush cache                    yes	yes
overlap                        no
Tagged Command Queuing (TCQ)   no	no
Native Command Queuing (NCQ)   yes		32 tags
SMART                          yes	yes
microcode download             yes	yes
security                       yes	no
power management               yes	yes
advanced power management      no	no
automatic acoustic management  yes	yes	0/0x00	254/0xFE
media status notification      no	no
power-up in Standby            no	no
write-read-verify              yes	no	0/0x0
unload                         no	no
free-fall                      no	no
data set management (TRIM)     no


dmesg output, with some example timeout at the end:
Code:

Copyright (c) 1992-2011 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.2-RELEASE-p2 #0: Wed Jul 13 06:28:22 PDT 2011
    jpaetzel@servant.iXsystems.com:/b/home/jpaetzel/sf_freenas_build/obj.i386/i386/b/home/jpaetzel/sf_freenas_build/FreeBSD/src/sys/FREENAS.i386 i386
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: VIA C7 Processor 1000MHz (1009.88-MHz 686-class CPU)
  Origin = "CentaurHauls"  Id = 0x6d0  Family = 6  Model = d  Stepping = 0
  Features=0xa7c9bbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,CMOV,PAT,CLFLUSH,ACPI,MMX,FXSR,SSE,SSE2,TM,PBE>
  Features2=0x4181<SSE3,EST,TM2,xTPR>
  VIA Padlock Features=0xffcc<RNG,AES,AES-CTR,SHA1,SHA256,RSA>
real memory  = 4294967296 (4096 MB)
avail memory = 3403685888 (3246 MB)
ACPI APIC Table: <022708 APIC1721>
ioapic0 <Version 0.3> irqs 0-23 on motherboard
ioapic1 <Version 0.3> irqs 24-47 on motherboard
kbd1 at kbdmux0
netsmb_dev: loaded
cryptosoft0: <software crypto> on motherboard
acpi0: <022708 RSDT1721> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of fec00000, 1000 (3) failed
acpi0: reservation of fee00000, 1000 (3) failed
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, cfe00000 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <VIA 3364 (P4M900) host to PCI bridge> on hostb0
agp0: aperture size is 128M
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> mem 0xd8000000-0xdfffffff,0xfd000000-0xfdffffff irq 16 at device 0.0 on pci1
pcib2: <ACPI PCI-PCI bridge> irq 27 at device 2.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> irq 31 at device 3.0 on pci0
pci3: <ACPI PCI bus> on pcib3
vge0: <VIA Networking Velocity Gigabit Ethernet> port 0xe800-0xe8ff mem 0xfeaffc00-0xfeaffcff irq 28 at device 0.0 on pci3
miibus0: <MII bus> on vge0
ip1000phy0: <IC Plus IP1001 10/100/1000 media interface> PHY 1 on miibus0
ip1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto
vge0: Ethernet address: 00:40:63:f4:95:ee
vge0: [ITHREAD]
ahci0: <VIA VT8251 AHCI SATA controller> port 0xdc00-0xdc07,0xd880-0xd883,0xd800-0xd807,0xd480-0xd483,0xd400-0xd40f mem 0xfcfffc00-0xfcffffff irq 21 at device 15.0 on pci0
ahci0: [ITHREAD]
ahci0: AHCI v1.00 with 4 3Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich0: [ITHREAD]
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich1: [ITHREAD]
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich2: [ITHREAD]
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich3: [ITHREAD]
atapci0: <VIA 8251 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
uhci0: <VIA 83C572 USB controller> port 0xcc00-0xcc1f irq 20 at device 16.0 on pci0
uhci0: [ITHREAD]
usbus0: <VIA 83C572 USB controller> on uhci0
uhci1: <VIA 83C572 USB controller> port 0xd000-0xd01f irq 22 at device 16.1 on pci0
uhci1: [ITHREAD]
usbus1: <VIA 83C572 USB controller> on uhci1
uhci2: <VIA 83C572 USB controller> port 0xd080-0xd09f irq 21 at device 16.2 on pci0
uhci2: [ITHREAD]
usbus2: <VIA 83C572 USB controller> on uhci2
ehci0: <VIA VT6202 USB 2.0 controller> mem 0xfcfff800-0xfcfff8ff irq 22 at device 16.4 on pci0
ehci0: [ITHREAD]
usbus3: EHCI version 1.0
usbus3: <VIA VT6202 USB 2.0 controller> on ehci0
isab0: <PCI-ISA bridge> at device 17.0 on pci0
isa0: <ISA bus> on isab0
vr0: <VIA VT6102 Rhine II 10/100BaseTX> port 0xc800-0xc8ff mem 0xfcfff400-0xfcfff4ff irq 23 at device 18.0 on pci0
vr0: Quirks: 0x0
vr0: Revision: 0x7c
miibus1: <MII bus> on vr0
ukphy0: <Generic IEEE 802.3u media interface> PHY 1 on miibus1
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vr0: Ethernet address: 00:40:63:f4:95:ed
vr0: [ITHREAD]
acpi_button0: <Sleep Button> on acpi0
acpi_button1: <Power Button> on acpi0
pcib4: <ACPI Host-PCI bridge> on acpi0
pci128: <ACPI PCI bus> on pcib4
pcib5: <PCI-PCI bridge> at device 0.0 on pci128
pci130: <PCI bus> on pcib5
pcib6: <PCI-PCI bridge> at device 0.1 on pci128
pci129: <PCI bus> on pcib6
pci128: <multimedia, HDA> at device 1.0 (no driver attached)
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 3 flags 0x10 on acpi0
uart0: [FILTER]
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
uart1: [FILTER]
pmtimer0 on isa0
orm0: <ISA Option ROM> at iomem 0xce000-0xcefff pnpid ORM0000 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: parallel port not found.
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
Timecounter "TSC" frequency 1009877267 Hz quality 800
Timecounters tick every 1.000 msec
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 480Mbps High Speed USB v2.0
ugen0.1: <VIA> at usbus0
uhub0: <VIA UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <VIA> at usbus1
uhub1: <VIA UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <VIA> at usbus2
uhub2: <VIA UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <VIA> at usbus3
uhub3: <VIA EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub3: 6 ports with 6 removable, self powered
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <ST1500DL003-9VT16L CC32> ATA-8 SATA 3.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
ada1: <ST1500DL003-9VT16L CC32> ATA-8 SATA 3.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
ada2 at ata0 bus 0 scbus4 target 0 lun 0
ada2: <CF Card Ver6.04> ATA-5 device
ada2: 133.000MB/s transfers (UDMA6, PIO 512bytes)
ada2: 3811MB (7806960 512 byte sectors: 16H 63S/T 7745C)
Trying to mount root from ufs:/dev/ufs/FreeNASs1a
ZFS filesystem version 4
ZFS storage pool version 15
vge0: link state changed to UP
fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.8
ahcich2: Timeout on slot 24 port 0
ahcich2: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 50 serr 00000000
ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich2: Timeout on slot 24 port 0
ahcich2: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 80 serr 001a0000
ahcich0: Timeout on slot 22 port 0
ahcich0: is 00000000 cs 00400000 ss 00000000 rs 00400000 tfd 50 serr 00000000
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
Went over 10,000 char limit. Here is the loader.conf:

Code:
#
# Boot loader file for FreeNAS.  This relies on a hacked beastie.4th.
#
autoboot_delay="2"
loader_logo="freenas"
#Fix booting from USB device bug
kern.cam.boot_delay=10000

# GEOM support
geom_mirror_load="YES"
geom_stripe_load="YES"
geom_raid3_load="YES"
#geom_raid5_load="YES"
geom_gate_load="YES"
ntfs_load="YES"
smbfs_load="YES"

hw.hptrr.attach_generic=0

# Customization for buckyball
vm.kmem_size="512M"
vm.kmem_size_max="768M"
vfs.zfs.prefetch_disable=1
vfs.zfs.arc_max="40M"
vfs.zfs.vdev.cache.size="5M"
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
2 updates.

First I tried to change the SATA Controllers mode from AHCI to IDE. This changed the errors from AHCI timeout to "Timeout waiting for write DRQ". This happens on both drives the first time they are accessed. It seemed like after that everything would be OK, but that's not quite true. If I reboot the NAS and then immediately start an rsync from a remote machine, there can be Input/Output errors.

Second I replaced the Seagate Barracudas with Western Digital Caviar Green drives. Interestingly enough I got one EADS and one EARS drive because a Fry's open-box apparently contained the wrong kind of drive. I set the ZFS mirror up with the "Force 4096 byte sectors" checkbox turned on, so the EADS would also use the larger sector size. After replacing the drives I also set the SATA controller back to AHCI mode.

I'm now copying everything back from a backup to the new drives, and so far so good. Haven't had any AHCI timeouts yet. We'll see what happens after I get all the data back on there and reboot it a million times and generally put it through its paces.
 

Tekkie

Patron
Joined
May 31, 2011
Messages
353
AHCI timeouts indicate that one of your drives is either dieing or dead, because its no longer responding to commands sent to it.

You should also see messages about some pass driver not being installed etc.
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
AHCI timeouts indicate that one of your drives is either dieing or dead, because its no longer responding to commands sent to it.

You should also see messages about some pass driver not being installed etc.

This is something different. I am getting these errors with 4 separate, identical, brand-new drives. Something about this particular drive firmware and the way it interacts with FreeBSD8. Could even be the combination of the drives and the SATA controller. I have another motherboard coming in a few days and I'll experiment further with these to see if it's the drives or the controller.
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
One more update on this. I built a second FreeNAS8 system in which I put the same Seagate drives that were suffering time-outs on the first system with the Via EPIA board. The new system runs an Asus E35M1-I Deluxe (FreeNas8.0.1-B4-amd64). I was able to import the zpool from the old system and I have been running this system for about a week now without any time-outs.

At this point I suspect the SATA controller on the Via EPIA board is the culprit. Switching to Western Digital drives did not ultimately help on that board. The time-outs started happening there as well after I got the drives fully loaded with data. I am actually in the process of getting another E35M1-I to replace the mobo in the problem NAS.

Interesting in all this is that I have 2 more systems that each have Via EPIA boards in them (an SN10000 and an SN18000 respectively) that are both running FreeNAS 0.7.2 (Shere). These systems have been humming along for well over a year with absolutely no problems.

There clearly is a problem between the SATA controller on the EPIA boards and FreeNAS 8 (and consequently FreeBSD 8.2), but the same problem does not seem to happen with the older baselines of FreeBSD.

If this problem could be addressed it would be great, but I suspect if it can be addressed at all it will fall to the FreeBSD driver developers, not the FreeNAS team.
 

Tekkie

Patron
Joined
May 31, 2011
Messages
353
Could it be that the SATA controller on that mobo is just bad?
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
It's possible, but I've had this board for a while and use it for building temporary systems running various flavors of Linux. I've never seen these issues there.

Once I get the new E350 motherboard swapped in, I'll put FreeNAS 7 on this EPIA board and see if the problems persist or if they go away.
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
I have the same problem :
Code:
Oct 10 00:17:51 nas kernel: ahcich3: Timeout on slot 19 port 0
Oct 10 00:17:51 nas kernel: ahcich3: is 00000000 cs 10000000 ss 1ff80000 rs 1ff80000 tfd 40 serr 00000000

HDDs Samsung EcoGreen F4 (HD204UI 2TB, with patched firmware i case you ask)
Mobo Asus P8P67 R3.1 with Intel Intel Cougar Point AHCI SATA controller

I hope i won't be having a lot of these timeouts/freezes!
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
No good fixes for this problem unfortunately, djoole. I ended up abandoning that motherboard in favor of a different one, but continued to use the same drives etc. The problematic motherboard I have since used for other experimental setups with no problems. It appears to be an issue specifically with FreeBSD8.2-derived distributions. I use the same mobo with FreeNAS7 without any issues.
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
Thanks for replying.
I read somewhere else that problem doesn't occur with "old ahci driver".
Do you know where is this old driver and how to install it and what will i loose compared to the actual one?

I got another AHCI timeout today, this time on a seagate drive, leading to a complete freeze of the pool. And soft reboot didn't work, i had to hard reboot the computer.

This won't be possible, a NAS simply need to be reliable, and now it's the opposite.
I'm starting to regret my Sinology...
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
Hi again.

Now it's worse, even after a fresh hard reset, the minute i try to access my pool (even only for read), the ada0 seagate timeouts, and NAS is frozen.

Now at the boot, smartd reports errors on the ada0 drive.

I tried to put the disk in my PC and do a DOS Seatools diagnosis, but no error found.

I think i have a problem of compatibility with the trio mobo/drive/freenas.

Maybe using the old ahcpi driver would be a solution (as i read on other forums).

But i don't have any clue on how to use the old ahcpi driver.

Please help :(
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
I think the old driver that people refer to is the one that came with FreeBSD 7.x (and by extension FreeNAS 0.7.x). One thing you could try is to switch your SATA controller to IDE mode in the BIOS. That should cause FreeNAS to not use AHCI. It's possibly that your timeout message with change to "timeout waiting for write DRQ" but it's worth a shot.

By the way, your symptoms sound just like mine. In my case after the initial time-out and freeze the system would be fine, provided I didn't set a spindown delay on the drives. In other words, the drives would have to be set up to spin constantly. Once a drive spins down, the next time it spins up you'll likely get the same time-out message.
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
I'll try to set the controller to IDE in the BIOS, thanks.
Though, i have 2 AHCI controllers in the mobo :
- Intel Cougar with 6 SATA ports
- Marvell 9120 with 2 SATA ports

In FreeNAS 8, i have a zpool with 2 vdevs :
Code:
  pool: zepool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zepool      ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada4p2  ONLINE       0     0     0
            ada5p2  ONLINE       0     0     0
            ada6p2  ONLINE       0     0     0
            ada7p2  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0
            ada2p2  ONLINE       0     0     0
            ada3p2  ONLINE       0     0     0

errors: No known data errors


ada0 and ada1 are on the Marvell controller.
ada2 and ada3 (and ada 4 to 7) are on the Intel controller.

Is it okay if i switch only the Marvel controller to IDE mode?

And what is exactly the difference between running my SATA drives in IDE mode and AHCI mode?




Also, now at each boot, i have these messages :
Code:
Oct 11 19:44:09 nas smartd[1697]: Device: /dev/ada1, 315 Currently unreadable (pending) sectors
Oct 11 19:44:11 nas smartd[1697]: Device: /dev/ada1, 315 Offline uncorrectable sectors

What does it mean exactly? Seatools long test (consisting in reading every sector of the drive) passed, so how is it possible that there is 315 bad sectors? Is it permanent?
I didn't have these bad sectors before having the AHCI timeouts issue.
At the first freeze + reboot, there was 192 bad sectors, and now 315...
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
Those smart errors means your ada1 disk is having problems with unreadable sectors (aka bad blocks), he is likely to be dying soon...

AHCI is a protocol that allows a lot of things, hotplug is one among of them, so IDE means no hotplug... nad yes you cn set it just to marvell, if thats is the only controller with problem with freenas
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
Here is the detailed hardware config of the NAS, and the history of my problems, if it can help in anyway..
Code:
Asus P8P67 R3.1  - FreeNAS 8.0.1 RC2
  |
  |-Marvell 9120 controler
  |  |
  |  |-6Gb/s--ada0--WDC WD20EADS-00S2B0 01.00A01 (WD Caviar Green)---\
  |  |-6Gb/s--ada1--ST32000542AS CC34 (Seagate LP)--------------------|
  |                                                                   |
  |-Intel AHCI Cougar Point controler                                 |--RAIDZ1--|
     |                                                                |          |
     |-6Gb/s--ada2--ST32000542AS CC34 (Seagate LP)--------------------|          |__zepool
     |-6Gb/s--ada3--ST32000542AS CC34 (Seagate LP)-------------------/           |
     |-3Gb/s--ada4--SAMSUNG HD204UI 1AQ10001 (EcoGreen F4)-----------\           |
     |-3Gb/s--ada5--SAMSUNG HD204UI 1AQ10001 (EcoGreen F4)------------|__RAIDZ1__|
     |-3Gb/s--ada6--SAMSUNG HD204UI 1AQ10001 (EcoGreen F4)------------|
     |-3Gb/s--ada7--SAMSUNG HD204UI 1AQ10001 (EcoGreen F4)-----------/

History :

- Only the 4 Samsung in the NAS
Huge transfer from the old NAS (Syno with the other 4 disks) to FreeNAS.
==> AHCI timeout in loop on ada4 like this :
Code:
ahcich4: Timeout on slot 24 port 0
ahcich4: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 50 serr 00000000
ahcich4: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich4: Timeout on slot 24 port 0
ahcich4: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 80 serr 001a0000

NAS frozen, nothing else to do other than hard reboot.

- After reboot, i launch a scrub to check integrity of the pool.
Scrub lasts only seconds. Thanks to the zpool status command i see there was a resilver (??) on ada4.
Relaunching scrub, now much longer, ends up without errors, although i can notice the number 5 under CKSUM for ada4.
This SAMSUNG F4EG having been manufactured in feb 2011 (aug 2011 for the 3 others), i decide to patch the firmware, just in case.

- Relaunching transfer Syno->FreeNAS, no error

- Placing the 4 drivers from the Syno into FreeNAS, second raidz1, added to the pool zepool ( :) )

- Huge transfer to the pool
==> AHCI timeouts on ada1. NAS frozen, hard reboot, and after boot i get this message :
Code:
nas smartd[1707]: Device: /dev/ada1, 192 Currently unreadable (pending) sectors
nas smartd[1707]: Device: /dev/ada1, 192 Offline uncorrectable sectors

What does it mean? The drive has been damaged?

- Launching smartctl -t long /dev/ada0 to test the drive state
==> AHCI timeouts on ada1. NAS frozen, hard reboot, and after boot i get this message :
Code:
nas smartd[1707]: Device: /dev/ada1, 315 Currently unreadable (pending) sectors (+123)
nas smartd[1707]: Device: /dev/ada1, 315 Offline uncorrectable sectors (+123)

Is FreeNAS damaging my Seagate drive?

- I decide to stop the NAS in order to save my data, waiting for this problem to be solved. before that, i try to recover some files from the pool via SMB, but each time i try to read it, i get AHCI ada1 timeouts freezing the NAS.

- I put ada1 drive in my PC, and launch a long test with DOS Seatools. test PASS!
Now i don't understand anymore... is there bad sectors on the drive or not??

- I put the drive back into the NAS (taking advantage of that to change the SATA cable, which seems cheaper thant the others), and launch smartctl -t long /dev/ada1

And i'm here, waiting for the long SMART test to finish. So far, no error....

According to Seatools, the drive is sane, so the problem is else where.... (to many variables!)

If AHCI timeouts occur again il' try to set mobo AHCI Marvel controller to IDE mode.

Si mon expérience vous parle et que vous avez des pistes, n'hésitez pas!
Pour le moment je suis bien dégouté, qu'est-ce qu'on est dépendant d'un NAS à la maison!
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
Those smart errors means your ada1 disk is having problems with unreadable sectors (aka bad blocks), he is likely to be dying soon...

AHCI is a protocol that allows a lot of things, hotplug is one among of them, so IDE means no hotplug... nad yes you cn set it just to marvell, if thats is the only controller with problem with freenas

So if there are bad blocks on the drive, how Seatools couldn't find them?
Seagate is asking me to give the Seatools error code with the RAM for changing the drive..

And how come the drive didn't have any bad sector before playing with AHCI timeouts in FreeNAS?
I didn't know a software could damage a hardware :/

Well... i have to wait for another 2 hours and the long SMART test will be over, we'll see.


I don't need hotplug, so IDE should be good for me, as long as it doesn't decrease transfer speed.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
So if there are bad blocks on the drive, how Seatools couldn't find them?

In my experience, manufacturer's diagnostics *rarely* if *ever* find any real problems. It is my feeling that this is intentional so they give the impression that their drives are better/more reliable. A few months ago I had a Hitachi drive that was making unusual clicking noises. I ran their diagnostics and it found nothing, but clearly there was a problem and I didn't trust the drive. Fortunately they didn't require an error code before requesting and RMA.

It's possible the system you had the drive in before didn't report the errors you're seeing now, but they were still there. While I agree FreeNAS needs some improvements, I really doubt it is damaging your disks. While scrubbing causes a lot of disk activiity, I believe it will only cause a drive that is already about to fail to fail more quickly. Never waste time replacing a disk you don't have confidence in because it's not worth the risk of losing your data.
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
Okay.
Will create a RMA tomorrow :) (actually there is a special error code to provide corresponding at "Failed other diagnostic tools, I'm confident it is a bad drive")

So i hope the AHCI timeouts i had on the Samsung were because of the buggy firmware now patched, and the timeouts i had on the Seagate were because it's damaged.
I'll let you know.
For now, the long SMART test is still running, and no timeouts, this is the first time i can put the test this far (60% remaining).

But is there anyway FreeNAS continue to work with the damaged drive (i think i'll have to wait a few weeks before getting the new drive)? Can't it "mark" the damaged sectors in order not to use them anymore?
Do i have to make another scrub?
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
Just my $0.02, but you'll probably find a new drive will have the same issues. If anything the AHCI problems are probably related to the SMART errors. I am using the drives that gave me timeouts with a different mobo with absolutely no problems. This whole timeout business is something odd about FreeBSD8.2.
 
Status
Not open for further replies.
Top