9.10 System crash with unexplained log

Status
Not open for further replies.

DeanNotDin

Dabbler
Joined
Mar 19, 2017
Messages
32
Hey everyone,

So after downgrading back to 9.10 from Corral I had about 2 weeks of blessed stability, until yesterday.

My server became unresponsive - file transfers stopped, jails crash, and the server was inaccessible from the web GUI.

I checked the on screen console and got the following disturbing log (picture attached)
Also, I got a volume degraded email and 3 hours after that the daily security emails. Until now, I did not reset the server and it is still offline.

Any ideas?
052c65e2d92453e9699893cafc31305f.jpg


Sent from my ONEPLUS A3000 using Tapatalk
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
We you got an email. What did it say? Looks like a usb failure to me.

Sent from my Nexus 5X using Tapatalk
 

DeanNotDin

Dabbler
Joined
Mar 19, 2017
Messages
32
We you got an email. What did it say? Looks like a usb failure to me.

I got 3 emails, and by them it seems that it wasn't a USB failure. The humb drive is connected and active.
Also, the volume degraded alarm is something that I have been getting for a while, resolves after a reboot, and SMART status for all drives is fine. Could be a defected SATA port on the server's motherboard.

The emails that I got are:
  1. Volume Degraded email
    Code:
    The volume Storage (ZFS) state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
  2. Daily security run output
    Code:
    freenas.local kernel log messages:
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 7c:dd:90:a5:a7:6a to 00:5a:20:31:a1:f0 on epair0b
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 7c:dd:90:a5:a7:6a to 00:5a:20:31:a1:f0 on epair0b
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 7c:dd:90:a5:a7:6a to 00:5a:20:31:a1:f0 on epair0b
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 7c:dd:90:a5:a7:6a to 00:5a:20:31:a1:f0 on epair0b
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 7c:dd:90:a5:a7:6a to 00:5a:20:31:a1:f0 on epair0b
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    swap_pager: indefinite wait buffer: bufobj: 0, blkno: 524298, size: 4096
    ahcich0: Timeout on slot 5 port 0
    ahcich0: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd c0 serr 00000000 cmd 0004c517
    (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
    (ada0:ahcich0:0:0:0): CAM status: Command timeout
    (ada0:ahcich0:0:0:0): Retrying command
    swap_pager: indefinite wait buffer: bufobj: 0, blkno: 524298, size: 4096
    swap_pager: indefinite wait buffer: bufobj: 0, blkno: 575437, size: 8192
    ahcich0: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    swap_pager: indefinite wait buffer: bufobj: 0, blkno: 524298, size: 4096
    swap_pager: indefinite wait buffer: bufobj: 0, blkno: 575437, size: 8192
    swap_pager: indefinite wait buffer: bufobj: 0, blkno: 524298, size: 4096
    ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
    ada0: <WDC WD40EFRX-68WT0N0 80.00A80> s/n WD-WCC4E0701422 detached
    GEOM_ELI: g_eli_read_done() failed (error=6) ada0p1.eli[READ(offset=209502208, length=8192)]
    swap_pager: I/O error - pagein failed; blkno 575437,size 8192, error 6
    vm_fault: pager read error, pid 1132 (devd)
    GEOM_ELI: g_eli_read_done() failed (error=6) ada0p1.eli[READ(offset=36864, length=4096)]
    swap_pager: I/O error - pagein failed; blkno 524298,size 4096, error 6
    vm_fault: pager read error, pid 2524 (cnid_metad)
    GEOM_ELI: g_eli_read_done() failed (error=6) ada0p1.eli[READ(offset=73830400, length=8192)]
    swap_pager: I/O error - pagein failed; blkno 542314,size 8192, error 6
    vm_fault: pager read error, pid 1132 (devd)
    GEOM_ELI: g_eli_read_done() failed (error=6) ada0p1.eli[READ(offset=73830400, length=8192)]
    swap_pager: I/O error - pagein failed; blkno 542314,size 8192, error 6
    vm_fault: pager read error, pid 1132 (devd)
    GEOM_ELI: g_eli_read_done() failed (error=6) ada0p1.eli[READ(offset=32493568, length=4096)]
    swap_pager: I/O error - pagein failed; blkno 532222,size 4096, error 6
    vm_fault: pager read error, pid 1132 (devd)
    Failed to write core file for process devd (error 14)
    pid 1132 (devd), uid 0: exited on signal 11
    GEOM_ELI: g_eli_read_done() failed (error=6) ada0p1.eli[READ(offset=96215040, length=4096)]
    swap_pager: I/O error - pagein failed; blkno 547779,size 4096, error 6
    vm_fault: pager read error, pid 2524 (cnid_metad)
    Failed to write core file for process cnid_metad (error 14)
    pid 2524 (cnid_metad), uid 0: exited on signal 11
    GEOM_ELI: g_eli_read_done() failed (error=6) ada0p1.eli[READ(offset=211886080, length=4096)]
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 79357 (afpd)
    pid 79357 (afpd), uid 1001: exited on signal 6
    GEOM_ELI: g_eli_read_done() failed (error=6) ada0p1.eli[READ(offset=115728384, length=8192)]
    swap_pager: I/O error - pagein failed; blkno 552543,size 8192, error 6
    vm_fault: pager read error, pid 8035 (zfsd)
    GEOM_ELI: g_eli_read_done() failed (error=6) ada0p1.eli[READ(offset=33107968, length=8192)]
    swap_pager: I/O error - pagein failed; blkno 532372,size 8192, error 6
    vm_fault: pager read error, pid 2502 (netatalk)
    GEOM_ELI: Device ada0p1.eli destroyed.
    GEOM_ELI: Detached ada0p1.eli on last close.
    swap_pager: I/O error - pagein failed; blkno 542723,size 4096, error 6
    vm_fault: pager read error, pid 2502 (netatalk)
    Failed to write core file for process netatalk (error 14)
    pid 2502 (netatalk), uid 0: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 552569,size 4096, error 6
    vm_fault: pager read error, pid 8035 (zfsd)
    Failed to write core file for process zfsd (error 14)
    pid 8035 (zfsd), uid 0: exited on signal 11
    (ada0:ahcich0:0:0:0): Periph destroyed
    swap_pager: I/O error - pagein failed; blkno 566473,size 20480, error 6
    vm_fault: pager read error, pid 15601 (transmission-daemon)
    pid 15601 (transmission-daemon), uid 921: exited on signal 11
    (aprobe0:ahcich0:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
    (aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
    (aprobe0:ahcich0:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
    (aprobe0:ahcich0:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
    (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
    (aprobe0:ahcich0:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
    (aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
    (aprobe0:ahcich0:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
    (aprobe0:ahcich0:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
    (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96534 (afpd)
    pid 96534 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96560 (afpd)
    pid 96560 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96561 (afpd)
    pid 96561 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 551069,size 4096, error 6
    vm_fault: pager read error, pid 2847 (collectd)
    swap_pager: I/O error - pagein failed; blkno 539389,size 8192, error 6
    vm_fault: pager read error, pid 2847 (collectd)
    Failed to write core file for process collectd (error 14)
    pid 2847 (collectd), uid 0: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96562 (afpd)
    pid 96562 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 548186,size 8192, error 6
    vm_fault: pager read error, pid 2545 (python2.7)
    pid 2545 (python2.7), uid 0: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96579 (afpd)
    pid 96579 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96580 (afpd)
    pid 96580 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96581 (afpd)
    pid 96581 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96582 (afpd)
    pid 96582 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96583 (afpd)
    pid 96583 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96584 (afpd)
    pid 96584 (afpd), uid 1001: exited on signal 6
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    swap_pager: I/O error - pagein failed; blkno 542673,size 16384, error 6
    vm_fault: pager read error, pid 2267 (smartd)
    swap_pager: I/O error - pagein failed; blkno 542668,size 20480, error 6
    vm_fault: pager read error, pid 2267 (smartd)
    Failed to write core file for process smartd (error 14)
    pid 2267 (smartd), uid 0: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96890 (afpd)
    pid 96890 (afpd), uid 1001: exited on signal 6
    swap_pager: I/O error - pagein failed; blkno 525281,size 40960, error 6
    vm_fault: pager read error, pid 196 (python2.7)
    swap_pager: I/O error - pagein failed; blkno 544436,size 32768, error 6
    vm_fault: pager read error, pid 196 (python2.7)
    Failed to write core file for process python2.7 (error 14)
    pid 196 (python2.7), uid 0: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 542206,size 12288, error 6
    vm_fault: pager read error, pid 194 (daemon)
    swap_pager: I/O error - pagein failed; blkno 542206,size 12288, error 6
    vm_fault: pager read error, pid 194 (daemon)
    swap_pager: I/O error - pagein failed; blkno 542206,size 12288, error 6
    vm_fault: pager read error, pid 194 (daemon)
    swap_pager: I/O error - pagein failed; blkno 542206,size 12288, error 6
    vm_fault: pager read error, pid 194 (daemon)
    swap_pager: I/O error - pagein failed; blkno 542175,size 8192, error 6
    vm_fault: pager read error, pid 194 (daemon)
    Failed to write core file for process daemon (error 14)
    pid 194 (daemon), uid 0: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 534273,size 40960, error 6
    vm_fault: pager read error, pid 2568 (python2.7)
    swap_pager: I/O error - pagein failed; blkno 542744,size 8192, error 6
    vm_fault: pager read error, pid 2568 (python2.7)
    Failed to write core file for process python2.7 (error 14)
    pid 2568 (python2.7), uid 0: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 576153,size 4096, error 6
    vm_fault: pager read error, pid 24518 (nginx)
    pid 24518 (nginx), uid 80: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 567547,size 8192, error 6
    vm_fault: pager read error, pid 2489 (nginx)
    swap_pager: I/O error - pagein failed; blkno 567547,size 8192, error 6
    vm_fault: pager read error, pid 2489 (nginx)
    swap_pager: I/O error - pagein failed; blkno 567547,size 8192, error 6
    vm_fault: pager read error, pid 2489 (nginx)
    swap_pager: I/O error - pagein failed; blkno 574999,size 4096, error 6
    vm_fault: pager read error, pid 2489 (nginx)
    Failed to write core file for process nginx (error 14)
    pid 2489 (nginx), uid 0: exited on signal 4
    swap_pager: I/O error - pagein failed; blkno 524609,size 4096, error 6
    vm_fault: pager read error, pid 15291 (Plex Media Server)
    pid 15291 (Plex Media Server), uid 972: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 557708,size 32768, error 6
    vm_fault: pager read error, pid 16255 (Plex Script Host)
    pid 16255 (Plex Script Host), uid 972: exited on signal 11
    swap_pager: I/O error - pagein failed; blkno 576019,size 4096, error 6
    vm_fault: pager read error, pid 96914 (afpd)
    pid 96914 (afpd), uid 1001: exited on signal 6
    arp: 192.168.1.47 moved from 00:5a:20:31:a1:f0 to 7c:dd:90:a5:a7:6a on epair0b
    arp: 192.168.1.89 moved from 7c:dd:90:a5:a7:6a to 00:5a:20:31:a1:f0 on epair0b
    
    -- End of security output --
  3. Daily run output
    Code:
    Checking status of zfs pools:
    NAME  SIZE  ALLOC  FREE  EXPANDSZ  FRAG  CAP  DEDUP  HEALTH  ALTROOT
    Storage  10.9T  3.87T  7.01T  -  19%  35%  1.00x  DEGRADED  /mnt
    freenas-boot  29.5G  654M  28.9G  -  -  2%  1.00x  ONLINE  -
    
    pool: Storage
    state: DEGRADED
    status: One or more devices has been removed by the administrator.
    	Sufficient replicas exist for the pool to continue functioning in a
    	degraded state.
    action: Online the device using 'zpool online' or replace the device with
    	'zpool replace'.
    scan: resilvered 256G in 1h13m with 0 errors on Fri Apr 14 19:32:48 2017
    config:
    
    	NAME  STATE  READ WRITE CKSUM
    	Storage  DEGRADED  0  0  0
    	 raidz1-0  DEGRADED  0  0  0
    	 gptid/d456cbba-20ff-11e7-b130-e06995776367  ONLINE  0  0  0
    	 4195179654383725630  REMOVED  0  0  0  was /dev/gptid/47194161-2100-11e7-b130-e06995776367
    	 gptid/48242829-2100-11e7-b130-e06995776367  ONLINE  0  0  0
    
    errors: No known data errors
    
    -- End of daily output --
 

DeanNotDin

Dabbler
Joined
Mar 19, 2017
Messages
32
Sounds like it was using swap in one of the disks that died.
I didn't know that FreeNAS stores swap on the raid drives, I mean - I have 6gb of RAM and a 32gb flash drive with only ~600MB used, it would be more efficient to store the swap on the usb drive and not on the RAID drives.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What's the plan for reliable swapping in FreeNAS 11?
 

DeanNotDin

Dabbler
Joined
Mar 19, 2017
Messages
32
Oh wow, I had no idea it was that stupid.
So solving the disconnecting drive is my issue then.
Replaced ports and cables, and the disconnecting drive is almost random, could it be a PS issue?

Sent from my ONEPLUS A3000 using Tapatalk
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Your system does not meet the minimum specs. Fix that before asking any more questions.

You had a drive fail and because you have an inadequate system it was using swap which should never happen on a healthy system. When the swap disappeared because of the disk failure system process are lost, causing the error message.

Sent from my Nexus 5X using Tapatalk
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
6GB is still below the 8GB minimum for FreeNAS.

Swapping to USB would really kill performance. FreeNAS puts a swap partition on each hard disk to facilitate swapping hard disks when the disk sizes aren't exactly the same size.

I didn't know that FreeNAS stores swap on the raid drives, I mean - I have 6gb of RAM and a 32gb flash drive with only ~600MB used, it would be more efficient to store the swap on the usb drive and not on the RAID drives.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I didn't know that FreeNAS stores swap on the raid drives
Sounds like someone needs to RTFM.
I have 6gb of RAM
As already noted, this doesn't meet the minimum system requirements for FreeNAS.
it would be more efficient to store the swap on the usb drive and not on the RAID drives.
Try thinking about this some more. USB drives are notoriously unreliable, they're usually slower than spinning rust, and they're much smaller in capacity.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

DeanNotDin

Dabbler
Joined
Mar 19, 2017
Messages
32
6GB is still below the 8GB minimum for FreeNAS.
6GB is still below the 8GB minimum for FreeNAS.
Your system does not meet the minimum specs. Fix that before asking any more questions.
I remember that 9.10 suggested a 4gb of RAM as a minimum requirement.
Anyway, I installed another 2GB stick and now have a total of 8GB RAM, And the problem still persists.

Now I'm getting every couple of minutes an email with an error:
Code:
  • CRITICAL: April 25, 2017, 6:41 p.m. - The volume Storage (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
I replaced the SATA and MOLEX cables, and changed the disks SATA ports. The issue still persists.

Any suggestions?
 

Attachments

  • Screen Shot 2017-04-25 at 18.42.17.png
    Screen Shot 2017-04-25 at 18.42.17.png
    46.7 KB · Views: 223

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I remember that 9.10 suggested a 4gb of RAM as a minimum requirement.
That is not true and and never was. The absolute minimum has always been 8GB since 9.3.
 

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
Sorry then, its my bad. Now I have 8gb of RAM and I erased 2 out of 4 jails.

Your failure was likely CAUSED by the lack of RAM in past-tense. You can't just add 2GB and expect it to work, just like you can't take a bullet out of your foot and expect it to be as if you were never shot. Your best bet is to try and repair the boot volume with the 8GB of RAM, and remount the volume. Also, see if that drive connects with any other drive on the same port, if another drive works on the same port then that disk has failed. Worst case scenario: Reinstall, import volume, and if disk still doesn't work replace it with a known good one and see if it picks up.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Tell us the rest of your hardware. I have a feeling since you didn't have minimum ram requirements that you are probably lacking in all the other areas also.

I want to know:
* motherboard
* CPU
* memory amount and brand
* psu
* chassie
* how are disks attached
* what layout are you using, zpool status output
* are you running smart tests and output from all disks for smartctl -a /dev/adaX

This should have all been posted in the first post so we can actually help you.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
When you hear hoofbeats in the distance, at least in North America, you should assume it's a horse, not a zebra. Your system is reporting a failed drive. The obvious conclusion is that the drive has failed--and if you've replaced the power and data cables, and tried plugging the data cable into a different SATA port, that makes the obvious conclusion that much more likely. Replace the drive, following the instructions in the manual.

The reason that RAM was relevant is that you were using swap space that was stored on the fail[ed|ing] drive, which probably wouldn't have happened if you'd had enough RAM. The corruption/loss of swap crashed the system.

Your best bet is to try and repair the boot volume with the 8GB of RAM
What does the boot volume have to do with anything?
 
Status
Not open for further replies.
Top