RancherOS VM didn't boot after change interface IP

hatem5000

Dabbler
Joined
May 16, 2019
Messages
26
Hi folks,

I have a truenas machine with bhyve vm for rancheros and installed docker and portainer inside it...
Yesteday, i wanted to change Interface IP of eth0 for the rancherOS vm so i used this cmd inside the rancheros ssh window :

sudo ros config set rancher.network.interfaces.eth0.address 172.68.1.100/24

And i forgot to change the gateway and after reboot the vm, the rancheros didn't boot at all..

So is there a way to revert the IP manually so it can boot properly ?!!

Thanksss
Tom
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Yes, but not easily. You'll need to boot the VM into recovery mode. First, you'll need to find the grub.cfg file for the VM. Run midclt call vm.query | jq, and look for the grubconfig property to locate the VM's grub.cfg.

Next, edit the grub.cfg to enable recovery boot: add rancher.autologin=ttyS0 rancher.recovery=true to the end of the linux directive in the grub.cfg. Start a serial console session with cu -l /dev/nmdm1B. Note, you'll need to change the nmdm device number to match your VM's ID from the vm.query call above. Now, start the VM in the GUI, and you should start the VM in single-user mode. Fix your config, and shutdown from within the VM. Type "~." to exit out of the serial console.

Remove those recovery boot directives from the grub.cfg, and start the VM again from the GUI. Your RancherOS should be back in business.
 

hatem5000

Dabbler
Joined
May 16, 2019
Messages
26
Yes, but not easily. You'll need to boot the VM into recovery mode. First, you'll need to find the grub.cfg file for the VM. Run midclt call vm.query | jq, and look for the grubconfig property to locate the VM's grub.cfg.

Next, edit the grub.cfg to enable recovery boot: add rancher.autologin=ttyS0 rancher.recovery=true to the end of the linux directive in the grub.cfg. Start a serial console session with cu -l /dev/nmdm1B. Note, you'll need to change the nmdm device number to match your VM's ID from the vm.query call above. Now, start the VM in the GUI, and you should start the VM in single-user mode. Fix your config, and shutdown from within the VM. Type "~." to exit out of the serial console.

Remove those recovery boot directives from the grub.cfg, and start the VM again from the GUI. Your RancherOS should be back in business.

Thanks for the steps ..
I followed them and changed the interface ip to the old one but i'm getting those errors while booting the VM .. is there something i'm missing ?!!
 

Attachments

  • DAE49A4D-A0BD-4F87-86E0-9279D4EC7EAC.jpeg
    DAE49A4D-A0BD-4F87-86E0-9279D4EC7EAC.jpeg
    288.4 KB · Views: 212
  • 84B37B96-CD34-4465-950A-463244F7561B.jpeg
    84B37B96-CD34-4465-950A-463244F7561B.jpeg
    344.8 KB · Views: 211
  • DC0B3CF1-9C68-4B38-9526-D54D639C8651.jpeg
    DC0B3CF1-9C68-4B38-9526-D54D639C8651.jpeg
    319.4 KB · Views: 204

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Ouch. The RancherOS boot disk has a corrupt ext4 file system. Try booting back into recovery mode, and then try to fsck the ext4 partition. If fsck can't fix it, then you can only delete and rebuild the VM from scratch. Of course, this means reinstalling all your containers from scratch again.

What I do to prevent this is to add a zvol as a 2nd virtual disk device to the VM, with a GPT partition table and a ext4 partition reserved for Docker. Then, I created /var/lib/rancher/conf/cloud-config.d/mounts.yml with contents:
Code:
mounts:
- - /dev/sdb1
  - /var/lib/docker/mnt
  - ext4
  - ""
rancher:
  docker:
    graph: /var/lib/docker/mnt


Remember to create the /var/lib/docker/mnt directory for the mount point. The mounts will take effect on the next start of the VM. Now, all Docker containers and configuration live in the zvol, and any corruption in the boot RAW virtual disk doesn't affect my containers.
 
Top