freenas kernel lost device

Status
Not open for further replies.

huberte

Explorer
Joined
Sep 20, 2011
Messages
87
After a resilvering (http://forums.freenas.org/showthread.php?4031-Daily-run-output-error-checking-status) due to a possibly bad disk, I have constantly I/O error,suddenly freenas is lossing my disks.

Code:
Dec  5 20:04:44 freenas kernel: (ada1:ata2:0:1:0): lost device
Dec  5 20:04:44 freenas kernel: (ada0:ata2:0:0:0): lost device
Dec  5 20:04:44 freenas root: ZFS: vdev I/O failure, zpool=gra1 path=/dev/gptid/199b99aa-e63a-11e0-9acd-14dae98ee84a offset=270336 size=8192 error=6
Dec  5 20:04:44 freenas root: ZFS: vdev I/O failure, zpool=gra1 path=/dev/gptid/199b99aa-e63a-11e0-9acd-14dae98ee84a offset=1498153754624 size=8192 error=6
Dec  5 20:04:44 freenas root: ZFS: vdev I/O failure, zpool=gra1 path=/dev/gptid/199b99aa-e63a-11e0-9acd-14dae98ee84a offset=1498154016768 size=8192 error=6
Dec  5 20:04:44 freenas root: ZFS: vdev I/O failure, zpool=gra1 path=/dev/gptid/61e4ac21-e63a-11e0-9acd-14dae98ee84a offset=270336 size=8192 error=6
Dec  5 20:04:44 freenas root: ZFS: vdev I/O failure, zpool=gra1 path=/dev/gptid/61e4ac21-e63a-11e0-9acd-14dae98ee84a offset=1498153754624 size=8192 error=6
Dec  5 20:04:44 freenas root: ZFS: vdev I/O failure, zpool=gra1 path=/dev/gptid/61e4ac21-e63a-11e0-9acd-14dae98ee84a offset=1498154016768 size=8192 error=6
Dec  5 20:04:44 freenas root: ZFS: zpool I/O failure, zpool=gra1 error=6
Dec  5 20:04:44 freenas root: ZFS: zpool I/O failure, zpool=gra1 error=6
Dec  5 20:04:44 freenas root: ZFS: zpool I/O failure, zpool=gra1 error=28
Dec  5 20:04:44 freenas last message repeated 2 times
Dec  5 20:04:44 freenas root: ZFS: vdev I/O failure, zpool=gra1 path= offset= size= error=
Dec  5 20:04:44 freenas root: ZFS: vdev I/O failure, zpool=gra1 path= offset=1686275686400 size=1024 error=6
Dec  5 20:04:44 freenas root: ZFS: vdev I/O failure, zpool=gra1 path= offset=1686279274496 size=4608 error=6


After that, I can't reboot (I have to hard reset it), and then it works.

I have no RAID card, freenas amd64

I did change the sata cable, same issue tonight

thanks
 

huberte

Explorer
Joined
Sep 20, 2011
Messages
87
Very strange, is my usb stick faulty ?

Code:

  pool: gra1
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        gra1                                            ONLINE       1     5     0
          raidz1                                        ONLINE       4     9     0
            gptid/61e4ac21-e63a-11e0-9acd-14dae98ee84a  ONLINE       3    11     0
            gptid/199b99aa-e63a-11e0-9acd-14dae98ee84a  ONLINE       3    10     0
            gptid/19f779b3-e63a-11e0-9acd-14dae98ee84a  ONLINE       0     0     0
            ada3p2                                      ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>
        /mnt/gra1/.freenas


Please help...
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
These errors are not from your flash drive. How do you have your controller configured in your BIOS? If you have AHCI enabled (someitmes called RAID mode), try changing it to IDE mode and then run a scrub again. I'm assuming you're using the controller on your motherboard, if you're not, please post more info about your controller.

Thats the first thing that comes to mind.
 

huberte

Explorer
Joined
Sep 20, 2011
Messages
87
Thanks a lot protosd

1. AHCI
2. Motherboard controller.

Moreover, these losts of devices happen after a short period of time. I checked the freenas boot messages : I had these strange messages, who I think, I did not see before :

Code:
Synchronize cache failed
(...)
ataidle : the device does not support advanced power management


Taking very very long after those two

Code:
Mouting local filesystems
(...)
Starting smartd


So for now I try with APM disable, strange it worked before my reslivering...
 

huberte

Explorer
Joined
Sep 20, 2011
Messages
87
Same error, when I try to get access to my nas after sleep time, I get a

Code:
Dec 10 00:32:10 freenas kernel: (ada1:ata0:0:1:0): lost device
Dec 10 00:32:10 freenas kernel: (ada0:ata0:0:0:0): lost device


Strange that occurs after a bad drive replacement.

Moreover (and again), I can't reboot freenas after such an error, I have to make a hard reset...

Help needed again :(

EDIT : These lost devices lines always appear when I get access to the pool after sleep time. Did never happend before the drive replacement.
 

brejoc

Cadet
Joined
Jan 5, 2012
Messages
5
huberte, did you ever find a solution to this error? Seems like I'm having the same problem.
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
This seems like a bug with ataidle, if you are experiencing that kind of issue try to do put your disks to sleep... (or try 8.0.3-p1, i think it has ataidle 2.7 that might help solve those issues)
 

brejoc

Cadet
Joined
Jan 5, 2012
Messages
5
Thanks for your help William! I'm not sure if I understood you correctly. Your first suggestion is not to put my disks to sleep? I'm not sure if this will help, because I was copying a lot of data while it happend and I'm quite sure the disk was not asleep for some time.

But if this happens again I will update to 8.0.3 and see if this fixes the problem.
 

huberte

Explorer
Joined
Sep 20, 2011
Messages
87
Hi, Id did try everything. Final solution with reinstalling freenas did work. New reinstall and then import zpool.

No error since the last 45 days
 

Ken Almond

Dabbler
Joined
May 11, 2014
Messages
19
I'm using 5 of 6 SATA ports on a Intel DG33TL motherboard. The 6th is red-external-sata. A year ago, when I setup 5x3TB Seagate using FreeNAS 8.3.1 in single raidz1 I had some problems, replaced a disk, more problems (e.g. READ, WRITE, CHKSUM errors). I got new cables and eventually moved the cable from a troubled disk/SATA port to the 6th (red-external-sata) unused port on motherboard and everything smoothed out. At that time, I didn't really understand ZFS, scrub, or anything - just that I needed 0(s) in READ/WRITE/CHKSUM columns of zpool status.

9 months later, after significant use, zpool scrub started reporting problems. smartctl showed a disk with bad sectors like this:
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8
so i replace that disk, reslivered OK, and zpool scrub(s) started causing 'lost device' errors on a different drive.

I changed motherboard from ACHI to IDE, from xx to LEGAGY IDE, no avail. I changed cables, I changed back to unused SATA port... no avail.
I replace the disk - but lost device interrupted.

Based on this blog, I upgraded to FreeNAS 9.2.1, resliver finished OK. I did scrub but got CHKSUM errors - could have been from lost device above, not sure. I set motherboard to IDE and did scrub... got (repairing)... - there were < 1,000 CHKSUM errors.

Now all seems clean. So FreeNAS 9.2.1 might have done the trick OR 9.2.1 / IDE mode might have disturbed things on 'flaky motherboard'?. Don't know. Don't know why things when south after 9 months of OK except that I did have a disk failure. Interesting that both disks (sector failure and replacement) where W1xxx serial numbers AND were running 38 C where as the other 3 were running 32 C and have Z serial numbers. The replacement disks hae Zxxx serial numbers and I put a fan on them - so all is running 32'ish C.

I bought a new power-supply - but haven't tried that yet.
 
Status
Not open for further replies.
Top