swap_pager I/O error ... every 40 days???

Status
Not open for further replies.

Cube101

Cadet
Joined
Mar 3, 2014
Messages
3
I am fairly new to freenas. I have had machine running for 9 months. First 5 or 6 months were fine then I am getting the attached error message on the console every 40 days or so. The message always referes to swap_pager I/O error.

There is no menu so I have to use the power button to switch the machine off then on. It then boots up fine and works for about 40 days before crashing again.

Is this hardware or software?

Can anyone help me please.
 

Attachments

  • freenas error.JPG
    freenas error.JPG
    314.9 KB · Views: 233

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Your ada0 device is going away. Possibly a scrub teasing it into failing. Do you have SMART tests set up on it? Suggest you assume the disk may be dying and back up your data, then poke at the disk to determine the issue.
 

Cube101

Cadet
Joined
Mar 3, 2014
Messages
3
Thanks for the advice.

The data is backed up daily, but we cannot afford any downtime on this mission critical system.

I thought it might be something to do with the scrub which it set to 35days.

SMART tests are not setup - I will set them up tomorrow - which type of test is best and how often should I get it to run? Am I looking for errors and bad sectors?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, you best figure out how to afford some downtime, 'cause generally speaking, drive failures don't just magically get better. Unless you are super lucky and maybe having a cable fail or dodgy controller or something like that, it is likely your first disk there is dying.

We do long tests 3x a week and shorts every four hours.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
So what likely is happening is you've never actually completed a scrub.. in months. Yikes! Really nasty place to be!

Do a SMART long test on that disk, it's almost certainly going to fail. Then you can RMA it. If you already know its out of warranty, then I'd just replace it.
 

Cube101

Cadet
Joined
Mar 3, 2014
Messages
3
I have now performed a long test and the report from smartctl says Extended offline completed without error.

Do you guys think it is still the hard disk?

Could it be software - I am running 8.3.1 p2 x64?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, that's not horrible news. You might have some sort of issue with your cabling or HBA/SATA ports, given that it appears the host dropped a drive offline. Fortunately that is all stuff that can be tested.

You can try a read stress test of the drives by running "dd if=/dev/${foo} of=/dev/null bs=1048576" where ${foo} is the disk's device name. Allowing for any load on your filer, the times reported for each drive should be similar within a few percent and no errors should be reported on the console. You can refer to this sticky for further ideas for disk stress testing, but note that you do not want to write anything to the drives.
 
Status
Not open for further replies.
Top