Need help troubleshooting "swap_pager: I/O error"

Status
Not open for further replies.

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Hello.

Woke up this morning to find FreeNAS telling me that a drive had an unrecoverable error and should be checked out, possibly replaced. I could see in the log that it was drive "ada0" which I'm assuming is the drive plugged into the first sata port.

I have the data on the system backed up. So, I started moving some files around despite the warning. That generated a lot of these errors:

kernel: swap_pager: I/O error - pageout failed; blkno 608,size 65536, error 5ahcich0:
kernel: Error while READ LOG EXT
kernel: ahcich0: Error while READ LOG EXT
kernel: ahcich0: Error while READ LOG EXT
kernel: swap_pager: I/O error - pageout failed; blkno 624,size 65536, error 5

Questions:

1) What is 5ahcich0? Is that just an error code? Or, is it a reference to a drive? I'm running FreeNAS from a USB drive. So, I want to rule out a problem with that drive. Although I had the earlier error on ada0, these new error messages do not cite ada0. So, I don't know if it's just that drive or some larger problem.

2) The mention of swap makes me wonder if something isn't able to read/write to space on the USB drive. Or, is swap handled entirely in RAM? Or, am I just completely misunderstanding the error message?

3) How do I test the drive ada0 to determine whether or not it really needs to be replaced? Or, are the errors that I'm seeing strong enough evidence on their own to warrant replacing the drive?

Thanks in advance for any insight!

*** UPDATE:

In addition to the above, I'm now getting these errors:

kernel: re0: watchdog timeout
kernel: re0: link state changed to DOWN
kernel: re0: link state changed to UP

Related? Or, do I also have a network config issue? (This wasn't happening at all yesterday while performing the same operations.) This happened a bunch of times. But, it eventually looks to have crashed FreeNAS. The web GUI is gone and the direct screen just shows repeated "watchdog timeout" errors...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That makes it sound likely to be hardware failure. Check your power supply (voltages, fans, etc) and check for stalled fans in the system causing hot spots. Then try running something like memtest86 for a while.
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Thank you.

It is using a relatively low powered PSU: http://www.newegg.com/Product/Product.aspx?Item=N82E16817371033

It looks like it's fine when checked through the BIOS. But, maybe not when all the drives are going, etc. How do I monitor the wattage while FreeNAS is running?

I hooked the HDD fan directly to the PSU rather than to the MB. I didn't want the MB to slow the fan down because it was ok temp when the HDD might want to be cooled. Could that cause power issues?
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Hmm. A few strange turns:

I did a reboot and the drive was kicked out of the RAID entirely.

A few hours later, I did another reboot and the drive was back, showing the RAID as healthy again.

What tests would you suggest to determine whether it is ok to leave the drive or replace? (A replacement drive was already ordered to arrive in a few days.)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Hmm. A few strange turns:

I did a reboot and the drive was kicked out of the RAID entirely.

A few hours later, I did another reboot and the drive was back, showing the RAID as healthy again.

What tests would you suggest to determine whether it is ok to leave the drive or replace? (A replacement drive was already ordered to arrive in a few days.)

That sounds like a potentially flaky power supply or disk(or both!). I'd make sure you have a beefy power supply in there that is trustworthy. If the issue continues I'd do a SMART test on that disk and see what happens.

Keep in mind if a disk is removed from a zpool and later readded it won't be "in sync" with the zpool until you do a scrub. A scrub also stresses the disk, so you may find that doing a scrub will kick the disk out again if its failing. If its a power supply issue doing a scrub could be very bad because it could cause other disks to be kicked out from the higher power usage.
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Thank you.

Coincidentally, I had already ordered a replacement power supply because the current one is noisier than I had hoped. The replacement is 600W (vs. 380W).

I'll run the scrub once I get the new supply installed and see what happens. I don't have anything on the system yet that isn't also on multiple other disks. So, I can afford a surprise at this time.

FreeNAS supplied me with new messages in the log this morning:

Jan 6 18:05:51 mdat smartd[2549]: Device: /dev/ada0, 987 Currently unreadable (pending) sectors (changed -2)
Jan 6 18:05:51 mdat smartd[2549]: Device: /dev/ada0, 493 Offline uncorrectable sectors
Jan 6 18:05:51 mdat smartd[2549]: Device: /dev/ada0, 987 Currently unreadable (pending) sectors (changed -2)
Jan 6 18:05:51 mdat smartd[2549]: Device: /dev/ada0, 493 Offline uncorrectable sectors
Jan 6 18:35:50 mdat smartd[2549]: Device: /dev/ada0, 986 Currently unreadable (pending) sectors (changed -1)
Jan 6 18:35:50 mdat smartd[2549]: Device: /dev/ada0, 493 Offline uncorrectable sectors
Jan 6 18:35:50 mdat smartd[2549]: Device: /dev/ada0, previous self-test completed with error (read test element)
Jan 6 18:35:50 mdat smartd[2549]: Device: /dev/ada0, Self-Test Log error count increased from 3 to 4
Jan 6 18:35:50 mdat smartd[2549]: Device: /dev/ada0, 986 Currently unreadable (pending) sectors (changed -1)
Jan 6 18:35:50 mdat smartd[2549]: Device: /dev/ada0, 493 Offline uncorrectable sectors
Jan 6 18:35:50 mdat smartd[2549]: Device: /dev/ada0, previous self-test completed with error (read test element)
Jan 6 18:35:50 mdat smartd[2549]: Device: /dev/ada0, Self-Test Log error count increased from 3 to 4
Jan 6 19:05:50 mdat smartd[2549]: Device: /dev/ada0, 987 Currently unreadable (pending) sectors (changed +1)
Jan 6 19:05:50 mdat smartd[2549]: Device: /dev/ada0, 493 Offline uncorrectable sectors
Jan 6 19:05:50 mdat smartd[2549]: Device: /dev/ada0, 987 Currently unreadable (pending) sectors (changed +1)

....

Jan 7 06:05:54 mdat smartd[2549]: Device: /dev/ada0, 493 Offline uncorrectable sectors
Jan 7 06:35:50 mdat smartd[2549]: Device: /dev/ada0, 987 Currently unreadable (pending) sectors
Jan 7 06:35:50 mdat smartd[2549]: Device: /dev/ada0, 493 Offline uncorrectable sectors
Jan 7 06:35:52 mdat smartd[2549]: Device: /dev/ada0, Self-Test Log error count increased from 5 to 6
Jan 7 06:35:56 mdat smartd[2549]: Device: /dev/ada0, 987 Currently unreadable (pending) sectors
Jan 7 06:35:56 mdat smartd[2549]: Device: /dev/ada0, 493 Offline uncorrectable sectors
Jan 7 06:35:56 mdat smartd[2549]: Device: /dev/ada0, previous self-test completed with error (read test element)
Jan 7 06:35:58 mdat smartd[2549]: Device: /dev/ada0, Self-Test Log error count increased from 5 to 6

So, SMART continues to show "activity" on that same drive. As you note, it could be power, etc. But, I would think that it would then apply to the other drives on occasion too, no?
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Installed new power supply. Ran a scrub and it kicked the same drive out of the RAID again.

I'm going to install the replacement drive as soon as it arrives. Which leads me to a follow up question:

I've read a lot about pools being messed up because a failing drive wasn't taken offline before being replaced. In my case, it doesn't show an option to offline the failed drive. It shows as "UNAVAIL" with REPLACE as the only option. Does that mean that my pool will be ok by just clicking replace and designating the newly installed drive? Or, will I need to do some other cleanup first to protect the pool structure?
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Does that mean that my pool will be ok by just clicking replace and designating the newly installed drive?
You should be OK assuming your other drives are OK. I would be doing the replace under FreeNAS 8.3 in case you aren't there yet.
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Ok. Thank you.

I'm on 8.3. Just started using FreeNAS a few days ago. Liking it so far. I consider the disk errors good in a way. More notification than what I was getting out of the box with mdadm on ubuntu.
 
Status
Not open for further replies.
Top