Running VirtualBox machine from SSD crashes system

Status
Not open for further replies.

John Gorst

Dabbler
Joined
Dec 18, 2014
Messages
15
System: HP microserver 54L, 8gb RAM, 2x3TB HDD as a mirror, 8GB USB boot, 1x64gb SSD (ada2), FreeNas 9.3 stable (last updated 19th May.... GUI unavailable at present so can't get exact version)

I successfully installed VirtualBox as a jail. To improve performance of the virtual machine (Windows 7, allocated only 1gb of RAM) I decided to store the virtual image on an SSD drive. (this is manually replicated to my mirrored 3tb HDD in case of failure of the single SSD).

This ran well for a week or two.

Now when I run the virtual machine it destabilises the WHOLE system!

Symptoms:
1) I got emails warning me there had been errors as follows in quick succession:
  • The volume ssd (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
  • The volume ssd (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
  • The volume ssd (ZFS) state is UNAVAIL: One or more devices are faulted in response to IO failures.
  • The volume ssd (ZFS) state is UNAVAIL: One or more devices are faulted in response to IO failures. Device: /dev/ada2, unable to open device

2) The VirtualBox virtual machine obviously stopped working

3) The FreeNas web interface was not contactable at all.... eventually it came up with the error message

An error occurred.
Sorry, the page you are looking for is currently unavailable.
Please try again later.

If you are the system administrator of this resource then you should check theerror log for details.

Faithfully yours, nginx.
4) If I try to navigate to /mnt/ssd/vm then the SSH session freezes

5) If I reboot on the command line (shutdown -r now) it fails to restart.... which is problematic as I was away on holiday when this first happened!
On the console the error messages seem to indicate something is preventing reboot (see photo of console below)

5) The following is in /var/log/messages around time of 'melt down'
Code:
May 31 13:52:10 garage kernel: epair3b: promiscuous mode disabled
May 31 16:04:32 garage kernel: epair3a: link state changed to DOWN
May 31 16:04:32 garage kernel: epair3a: link state changed to DOWN
May 31 16:04:32 garage kernel: epair3b: link state changed to DOWN
May 31 16:04:32 garage kernel: epair3b: link state changed to DOWN
May 31 16:05:25 garage notifier: Performing sanity check on nginx configuration:
May 31 16:05:25 garage notifier: nginx: the configuration file /usr/local/etc/nginx/nginx.conf syntax is ok
May 31 16:05:25 garage notifier: nginx: configuration file /usr/local/etc/nginx/nginx.conf test is successful
May 31 16:05:26 garage epair3a: Ethernet address: 02:eb:36:00:0e:0a
May 31 16:05:26 garage epair3b: Ethernet address: 02:eb:36:00:0f:0b
May 31 16:05:26 garage kernel: epair3a: link state changed to UP
May 31 16:05:26 garage kernel: epair3a: link state changed to UP
May 31 16:05:26 garage kernel: epair3b: link state changed to UP
May 31 16:05:26 garage kernel: epair3b: link state changed to UP
May 31 16:05:26 garage kernel: epair3a: promiscuous mode enabled
May 31 16:05:26 garage kernel: ng_ether_ifnet_arrival_event: can't re-name node epair3b
May 31 16:05:26 garage kernel: ng_ether_ifnet_arrival_event: can't re-name node epair3b
May 31 16:05:52 garage kernel: ifa_del_loopback_route: deletion failed
May 31 16:05:52 garage Freed UMA keg (udp_inpcb) was not empty (20 items).  Lost 2 pages of memory.
May 31 16:05:52 garage Freed UMA keg (udpcb) was not empty (336 items).  Lost 2 pages of memory.
May 31 16:05:52 garage Freed UMA keg (tcptw) was not empty (1050 items).  Lost 21 pages of memory.
May 31 16:05:52 garage Freed UMA keg (tcp_inpcb) was not empty (1030 items).  Lost 103 pages of memory.
May 31 16:05:52 garage Freed UMA keg (sackhole) was not empty (101 items).  Lost 1 pages of memory.
May 31 16:05:52 garage Freed UMA keg (tcpcb) was not empty (36 items).  Lost 9 pages of memory.
May 31 16:05:52 garage Freed UMA keg (ripcb) was not empty (10 items).  Lost 1 pages of memory.
May 31 16:05:52 garage hhook_vnet_uninit: hhook_head type=1, id=1 cleanup required
May 31 16:05:52 garage hhook_vnet_uninit: hhook_head type=1, id=0 cleanup required
May 31 16:05:53 garage kernel: epair3b: promiscuous mode enabled
May 31 16:07:20 garage ahcich5: Timeout on slot 5 port 0
May 31 16:07:20 garage ahcich5: is 00000000 cs 00000000 ss 00007fe0 rs 00007fe0 tfd 50 serr 00000000 cmd 0020ee17
May 31 16:07:20 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 48 5d 70 40 02 00 00 01 00 00
May 31 16:07:20 garage (ada2:ahcich5:0:0:0): CAM status: Command timeout
May 31 16:07:20 garage (ada2:ahcich5:0:0:0): Retrying command
May 31 16:07:26 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:07:46 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:07:51 garage ahcich5: AHCI reset: device not ready after 31000ms (tfd = 00000080)
May 31 16:08:06 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:08:21 garage ahcich5: Timeout on slot 14 port 0
May 31 16:08:21 garage ahcich5: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 80 serr 00000000 cmd 0020ee17
May 31 16:08:21 garage (aprobe0:ahcich5:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
May 31 16:08:21 garage (aprobe0:ahcich5:0:0:0): CAM status: Command timeout
May 31 16:08:21 garage (aprobe0:ahcich5:0:0:0): Error 5, Retry was blocked
May 31 16:08:26 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:08:46 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:08:53 garage ahcich5: AHCI reset: device not ready after 31000ms (tfd = 00000080)
May 31 16:09:06 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:09:23 garage ahcich5: Timeout on slot 14 port 0
May 31 16:09:23 garage ahcich5: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 80 serr 00000000 cmd 0020ee17
May 31 16:09:23 garage (aprobe0:ahcich5:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
May 31 16:09:23 garage (aprobe0:ahcich5:0:0:0): CAM status: Command timeout
May 31 16:09:23 garage (aprobe0:ahcich5:0:0:0): Error 5, Retry was blocked
May 31 16:09:23 garage ada2 at ahcich5 bus 0 scbus5 target 0 lun 0
May 31 16:09:23 garage ada2: <SanDisk SDSSDP064G 2.0.0> s/n 133280400064 detached
May 31 16:09:26 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:09:46 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:09:54 garage ahcich5: AHCI reset: device not ready after 31000ms (tfd = 00000080)
May 31 16:10:06 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:10:24 garage ahcich5: Timeout on slot 14 port 0
May 31 16:10:24 garage ahcich5: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 80 serr 00000000 cmd 0020ee17
May 31 16:10:24 garage (ada2:ahcich5:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 00 00 00 00 00
May 31 16:10:24 garage (ada2:ahcich5:0:0:0): CAM status: Command timeout
May 31 16:10:24 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:10:26 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:10:46 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:11:14 garage ahcich5: AHCI reset: device not ready after 31000ms (tfd = 00000080)
May 31 16:11:14 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:11:14 garage ahcich5: Poll timeout on slot 14 port 0
May 31 16:11:14 garage ahcich5: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 00 serr 00000000 cmd 0020ee17
May 31 16:11:14 garage (aprobe0:ahcich5:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
May 31 16:11:14 garage (aprobe0:ahcich5:0:0:0): CAM status: Command timeout
May 31 16:11:14 garage (aprobe0:ahcich5:0:0:0): Error 5, Retries exhausted
May 31 16:11:26 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:11:44 garage zfsd: Event exceeds event size limit of 8192 bytes.
May 31 16:11:44 garage zfsd: Truncated 12 characters from event.
May 31 16:11:44 garage zfsd: Event exceeds event size limit of 8192 bytes.
May 31 16:11:44 garage zfsd: Truncated 21 characters from event.
May 31 16:11:44 garage ahcich5: Timeout on slot 14 port 0
May 31 16:11:44 garage ahcich5: is 00000000 cs 00ffc000 ss 00ffc000 rs 00ffc000 tfd 00 serr 00000000 cmd 0020ee17
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 48 5d 70 40 02 00 00 01 00 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Command timeout
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 48 5e 70 40 02 00 00 01 00 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 48 5f 70 40 02 00 00 01 00 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 48 60 70 40 02 00 00 01 00 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 48 61 70 40 02 00 00 01 00 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 50 38 70 40 02 00 00 00 00 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 c8 50 39 70 40 02 00 00 00 00 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 48 62 70 40 02 00 00 01 00 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 48 63 70 40 02 00 00 01 00 00
May 31 16:11:44 garage zfsd: Event exceeds event size limit of 8192 bytes.
May 31 16:11:44 garage zfsd: Truncated 1 characters from event.
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 48 64 70 40 02 00 00 01 00 00
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
May 31 16:11:44 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:11:44 garage zfsd: Event exceeds event size limit of 8192 bytes.
May 31 16:11:44 garage zfsd: Truncated 9 characters from event.
May 31 16:11:44 garage GEOM_ELI: g_eli_read_done() failed ada2p1.eli[READ(offset=11010048, length=61440)]
May 31 16:11:44 garage swap_pager: I/O error - pagein failed; blkno 1051266,size 61440, error 6
May 31 16:11:44 garage vm_fault: pager read error, pid 2565 (python2.7)
May 31 16:11:44 garage kernel: pid 2565 (python2.7), uid 0: exited on signal 11
May 31 16:11:46 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:12:06 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:12:33 garage ahcich5: AHCI reset: device not ready after 31000ms (tfd = 00000080)
May 31 16:12:33 garage ahcich5: Poll timeout on slot 23 port 0
May 31 16:12:33 garage ahcich5: is 00000000 cs 00800000 ss 00000000 rs 00800000 tfd 00 serr 00000000 cmd 0020f717
May 31 16:12:33 garage (aprobe0:ahcich5:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
May 31 16:12:33 garage (aprobe0:ahcich5:0:0:0): CAM status: Command timeout
May 31 16:12:33 garage (aprobe0:ahcich5:0:0:0): Error 5, Retries exhausted
May 31 16:12:33 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:12:53 garage swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1049934, size: 12288
May 31 16:13:03 garage ahcich5: Timeout on slot 23 port 0
May 31 16:13:03 garage ahcich5: is 00000000 cs 00800000 ss 00000000 rs 00800000 tfd 00 serr 00000000 cmd 0020f717
May 31 16:13:03 garage (ada2:ahcich5:0:0:0): READ_DMA. ACB: c8 00 e0 2a 00 40 00 00 00 00 18 00
May 31 16:13:03 garage (ada2:ahcich5:0:0:0): CAM status: Command timeout
May 31 16:13:03 garage (ada2:ahcich5:0:0:0): Error 5, Periph was invalidated
May 31 16:13:03 garage GEOM_ELI: g_eli_read_done() failed ada2p1.eli[READ(offset=5554176, length=12288)]
May 31 16:13:03 garage swap_pager: I/O error - pagein failed; blkno 1049934,size 12288, error 5
May 31 16:13:03 garage vm_fault: pager read error, pid 2560 (python2.7)
May 31 16:13:03 garage GEOM_ELI: Device ada2p1.eli destroyed.
May 31 16:13:03 garage GEOM_ELI: Detached ada2p1.eli on last close.
May 31 16:13:03 garage (ada2:ahcich5:0:0:0): Periph destroyed
May 31 16:13:03 garage swap_pager: I/O error - pagein failed; blkno 1051458,size 32768, error 6
May 31 16:13:03 garage vm_fault: pager read error, pid 2560 (python2.7)
May 31 16:13:03 garage kernel: Failed to write core file for process python2.7 (error 14)
May 31 16:13:03 garage kernel: Failed to write core file for process python2.7 (error 14)
May 31 16:13:03 garage kernel: pid 2560 (python2.7), uid 0: exited on signal 11
May 31 16:13:52 garage ahcich5: AHCI reset: device not ready after 31000ms (tfd = 00000080)
May 31 16:13:52 garage ahcich5: Poll timeout on slot 23 port 0
May 31 16:13:52 garage ahcich5: is 00000000 cs 00800000 ss 00000000 rs 00800000 tfd 00 serr 00000000 cmd 0020f717
May 31 16:13:52 garage (aprobe0:ahcich5:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
May 31 16:13:52 garage (aprobe0:ahcich5:0:0:0): CAM status: Command timeout
May 31 16:13:52 garage (aprobe0:ahcich5:0:0:0): Error 5, Retries exhausted


I have replicated the volume off the SSD to the spinning disks mirror without any errors. A scrub of the SSD reveals no errors. I can successfully run the VirtualBox image off the spinning disks without problems.

If this was off a single spinning disk I would say that the problem was a failing disc and the system lockup was because some of the virtual memory is on the failing disk?? But this is an SSD and is highly unlikely to fail? It also seems to work absolutely fine copying 20gb of information off it no problems (but I guess not writing to it).

Does anyone have any suggestions?

Should the failure of a non-system disk cause such erratic and unstable behaviour? Can I stop a particular disk being used for virtual memory?
 

Attachments

  • thumb_IMG_1087_1024.jpg
    thumb_IMG_1087_1024.jpg
    346.4 KB · Views: 264
Last edited:

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Does anyone have any suggestions?
  1. The error from FreeNAS was pretty unequivocal, i.e. that it has a problem with ada2. The precise root of that problem is unclear to me (drive, cable, port?)
  2. Before attempting to run multiple plugins, including resource hogs like Plex, CrashPlan and VirtualBox, install more than the minimum 8GB of RAM.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I feel like I remember this guy from IRC. If it's the same guy, he says he's replaced both the drives, and the cables, for ada2, and no matter what drives he puts in there, it craps out with this error about ada2.

Forgive me if it's not the same guy.
 

John Gorst

Dabbler
Joined
Dec 18, 2014
Messages
15
Seems to be sorted after changing cables........ very odd! You wouldn't think that increasing the load would cause the cable to fail!
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
You wouldn't think that increasing the load would cause the cable to fail!
Why not? I would think elevated temperature resulting from increased load could easily result in worse behavior from marginal components.
 
Status
Not open for further replies.
Top