My Server Crashed but i can still SSH in and do stuff

Status
Not open for further replies.

genBTC

Dabbler
Joined
Aug 11, 2017
Messages
33
The GUI is unresponsive, and wont load.

dmesg is filled with unlimited lines of this:

swap_pager: I/O error - pagein failed; blkno 2376950,size 4096, error 6
uiomove_object: vm_obj 0xfffff80038400738 idx 0 valid 0 pager error 4
swap_pager: I/O error - pagein failed; blkno 2376950,size 4096, error 6
uiomove_object: vm_obj 0xfffff80038400738 idx 0 valid 0 pager error 4


This was the last thing i saw on the GUI bottom thingy
4fe87db308.png


I think my ada4 died, but why did it freeze up the whole system and how do i recover ?

The server is still running and presumably trying to recover ? So i can still access it to type commands and do stuff. Also i'm pretty competent so fire away
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
What kind of media is your server booting from?
Is it a mirrored device?
Do you have a backup of your config?
What is your 'ada4' used in? Why do you suspect that device?
If you are able to SSH in, and the server is responsive, check zpool status and give us the results inside code tags.
 

genBTC

Dabbler
Joined
Aug 11, 2017
Messages
33
Its booting off a Kingston Datatraveler 16GB flash drive. I was literally about to build a mirror backup drive today after asking about it (only been running 6 days). So no i dont have a mirror.
I exported a database .db save @ 5:37pm but i was doing a lot of stuff between then and midnight.
The SSH connection has since shut itself down, and died.
I suspect ada4 because thats the device i was reading from when it froze up, and also that green message says ada4. And its a known sketchy drive. I put it all by itself on its own pool and vdev on purpose just to suss out if it indeed was faulty. I've written and read about 1TB back and forth 3 times now and there was a few messages about LBA error once, as per why i suspect the drive. But it never did anything remotely like this.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
The GUI is unresponsive, and wont load.

dmesg is filled with unlimited lines of this:

swap_pager: I/O error - pagein failed; blkno 2376950,size 4096, error 6
uiomove_object: vm_obj 0xfffff80038400738 idx 0 valid 0 pager error 4
swap_pager: I/O error - pagein failed; blkno 2376950,size 4096, error 6
uiomove_object: vm_obj 0xfffff80038400738 idx 0 valid 0 pager error 4


This was the last thing i saw on the GUI bottom thingy
4fe87db308.png


I think my ada4 died, but why did it freeze up the whole system and how do i recover ?

The server is still running and presumably trying to recover ? So i can still access it to type commands and do stuff. Also i'm pretty competent so fire away

FreeNAS currently stripes swap across all drives in your pool. Its stupid. And it means that if a drive dies, its likely your system will die... just like you saw.

The solution is to either not have swap on the drives, but there are good reasons to have swap on the drives... or a better solution is to raid10 the swap across the drives.

Good news is this latter approach seems like it might be in FreeNAS 11.1
https://bugs.freenas.org/issues/23523

In the meantime, you can run this script to try to lessen the chances of this problem occuring *when* you have another disk failure
https://forums.freenas.org/index.ph...ny-used-swap-to-prevent-kernel-crashes.46206/

And as to you disk failure, you might want to establish if the disk failed, the cable, the port, or something else, and then rectify (ie by replacing the disk).

Good luck :)
 

genBTC

Dabbler
Joined
Aug 11, 2017
Messages
33
I am 100% sure now, ada4 has failed. I arrived at the server and it was sitting at a BIOS error prompt of "SATA port E not detected".
So it musta taken the whole server down, when the drive died.
Thats not good.

Anyway the drive pretty bad. SMART Reallocated #5 is down to 1% and 4306 value :p Just suddenly skyrocketed, it had like 14 or something, for the first few terabytes written. Thats why im testing it lol. Unfortunately i had started to put a fair bit of work into it, thinking it was about to be OK.

Freenas booted without the bad disk, so the USB boot drive is fine. So I can mirror it before it dies for real.
And it also booted WITH the bad disk, and recognized the ZFS at least.

EDIT: about the swap - thats weird. I'll read up.
I never expected a failed disk to take down the system. I guess its not redundant in that way.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I am 100% sure now, ada4 has failed. I arrived at the server and it was sitting at a BIOS error prompt of "SATA port E not detected".
So it musta taken the whole server down, when the drive died.
Thats not good.

Anyway the drive pretty bad. SMART Reallocated #5 is down to 1% and 4306 value :p Just suddenly skyrocketed, it had like 14 or something, for the first few terabytes written. Thats why im testing it lol. Unfortunately i had started to put a fair bit of work into it, thinking it was about to be OK.

Freenas booted without the bad disk, so the USB boot drive is fine. So I can mirror it before it dies for real.
And it also booted WITH the bad disk, and recognized the ZFS at least.

EDIT: about the swap - thats weird. I'll read up.
I never expected a failed disk to take down the system. I guess its not redundant in that way.
It is important to test the disks before you put them into a zpool and only use disks that you can rely on. At the first sign of fault, it should be replaced.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
FreeNAS currently stripes swap across all drives in your pool. Its stupid.
I don't think it's exactly striped--it's just FreeBSD doing what it does when there are multiple swap devices active (and I'd be surprised if every *nix didn't do pretty much the same thing). It does mean that the system is less robust than it could be (as we're seeing here), but I don't think that qualifies it as "stupid." It also isn't really a FreeNAS thing, though it's probably going to be more visible in FreeNAS than in most other *nix installations (as I'd expect that multiple swap devices is a fairly uncommon configuration).
 

genBTC

Dabbler
Joined
Aug 11, 2017
Messages
33
It is important to test the disks before you put them into a zpool and only use disks that you can rely on. At the first sign of fault, it should be replaced.

Thats exactly what i was doing. Testing it. I had a 3x2tb primary RaidZ1 pool with a log drive (ada0-3). And ada4 was seperate. Little old ada4 was different. I knew ada4 was working but "sketchy" as i put it. It worked in the last system. I successfully dd dumped and read 1TB of data off it (backed up to the Zpool). It passed a full random wipe. Badblocks reported nothing. So at this point i regarded it as useable but not trustable, (trust ME on this), and I tried to add that single disk on its lonesome. Away from everything else. Like as a non-ZFS regular-old disk. Couldnt. "Not allowed", from the GUI at least. Was I expected to force it? and create a new fs on /dev/ada4 as mount it as whatever kind of partition (ext4,NTFS) from the command line?

I also dont agree with its decision to touch that drive for swap. It was faulty, and this software likely killed it by using it for swap. Nah jk, no biggie :) But that is probably what happened -iiuc.
It lasted at least 3 full drive writes and 3 full drive reads, over the course of 6 days.
This is my testing period.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Thats exactly what i was doing. Testing it.
Adding a drive to your NAS is not a good way to test a drive.
I am guessing you already saw this but, here is the community advice from our 'How To' section: Hard Drive Burn-In Testing
The thing I do to test a drive that I have questions about it DBAN boot and nuke, in a computer all by itself, run a DOD short wipe with verify between passes.
A suggestion for next time. Still, if you are not confident of the disk, you shouldn't put it in. Also, in the GUI, there is a place to set the swap size. You can set it to zero, add a disk or set of disks and then set it back to some other value. The default is 2. I set mine to 1 because I have 12 disks and that gives me 12 x the swap space. Still, I have removed a disk to replace it and caused my system to crash because it just happened to be using that disk when I pulled it. Don't do that either, if you have a system with hot-swap drives.
 
Last edited:

genBTC

Dabbler
Joined
Aug 11, 2017
Messages
33
Adding a drive to your NAS is not a good way to test a drive.
I am guessing you already saw this but, here is the community advice from our 'How To' section: Hard Drive Burn-In Testing
The thing I do to test a drive that I have questions about it DBAN boot and nuke, in a computer all by itself, run a DOD short wipe with verify between passes.
A suggestion for next time. Still, if you are not confident of the disk, you shouldn't put it in. Also, in the GUI, there is a place to set the swap size. You can set it to zero, add a disk or set of disks and then set it back to some other value. The default is 2. I set mine to 1 because I have 12 disks and that gives me 12 x the swap space. Still, I have removed a disk to replace it and caused my system to crash because it just happened to be using that disk when I pulled it. Don't do that either, if you have a system with hot-swap drives.

Thank you, but I cant find that place in the GUI to set swap size. Please help me how do I find it?

I didnt bother to take the drive out, (to run DBAN) since this "NAS" is a multitasking computer afterall. And i figured one pass random was enough, like, to not kill it outright, cause was fragile. (And also the main thing im doing with the server at the time - testing phase). Also dont worry its not for production or anything crucial. I would not use such a drive then. I get your point That a faulty drive should not even be in a Freenas system regardless. But i disgaree. Theres levels to failures. In a perfect world, I wouldnt want to use that drive when it had 14 sectors reallocated according to smart #5 and nothing else wrong.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Thank you, but I cant find that place in the GUI to set swap size. Please help me how do I find it?

I didnt bother to take the drive out, (to run DBAN) since this "NAS" is a multitasking computer afterall. And i figured one pass random was enough, like, to not kill it outright, cause was fragile. (And also the main thing im doing with the server at the time - testing phase). Also dont worry its not for production or anything crucial. I would not use such a drive then. I get your point That a faulty drive should not even be in a Freenas system regardless. But i disgaree. Theres levels to failures. In a perfect world, I wouldnt want to use that drive when it had 14 sectors reallocated according to smart #5 and nothing else wrong.
The whole reason to do it is to kill it and the reason to do it out of the NAS is so that flaming death does not hurt the NAS.
If it survives the DOD wipe with the verification on every pass, then it could be allowed in the NAS.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

genBTC

Dabbler
Joined
Aug 11, 2017
Messages
33
Now i know How important that setting is.
 
Status
Not open for further replies.
Top