System compromised?

danb35 · Apr 28, 2016

I don't think this is quite the right place for this question, but it's the closest I can think of. If it belongs somewhere else, please let me know.

I woke up this morning to an apparently-compromised FreeNAS box. The reporting graphs were showing 100% CPU usage, almost all of which (95%+) was "system" processes; the load average was over 200; and top showed a whole mess of 'sudo' processes, all owned by root. The console was scrolling the same message over and over, something dealing with sudo. My IPMI web interface showed a sharp increase in power consumption starting about 2:00 am, so I suspect that's when whatever happened, started to happen.

Since the only outside access to anything on my machine is port 32400 forwarded from the Internet to my Plex jail, I stopped the Plex jail, with no immediate effect. The web GUI then became unresponsive, or at least slow enough that I wasn't willing to wait for it to respond, so I shut the system down from the console (option 14 on the menu, not a hard power off). On reboot, the console didn't immediately start scrolling messages, but the web UI wouldn't load any of the images, or in fact any of the alt text for the buttons in the top frame. I shut down the system again and left it shut down. I have not yet investigated whether there's any data loss or damage.

The system is running 9.10-20160426..., which was freshly installed onto an SSD, and then my configuration from my previous box (running a recent 9.3 build) was uploaded. The jails were manually installed using FreeBSD packages where available (CouchPotato and Urbackup were installed from source), and the VirtualBox jail was using the FreeNAS VirtualBox jail template.

I have a recursive snapshot task running on my main pool, though I hadn't gotten around to setting one up on the jails pool (but I just installed those yesterday anyway). I'm sure I could simply reinstall 9.10, upload my config, and possibly revert to last night's snapshot, and maybe rebuild my jails, but that wouldn't do anything toward figuring out what happened, which I'd like to do (if I don't know what happened, I don't have much of a chance of preventing it happening again).

Current status is that the server is shut down. When it was last up, the web GUI wouldn't load. I'm keeping it shut down until I have some plan of action to further investigate. I'm sure a debug file would be helpful, but I'm not sure how to generate one without web GUI access. How should I best proceed to figure out what happened?

SweetAndLow · Apr 28, 2016

Hardware info and what is the error message.

Ericloewe · Apr 28, 2016

The logs should still be available on the .system datatset, and they'd probably go a long way towards building a picture of what's going on.

danb35 · Apr 28, 2016

Hardware info from my sig:

FreeNAS 9.10-201604261518
SuperMicro SuperStorage Server 6047R-E1R36L (Motherboard: X9DRD-7LN4F-JBOD, Chassis: SuperChassis 847E16-R1K28LPB)
2 x Xeon E5-2670, 128 GB RAM
6 x 4 TB RAIDZ2, (3 x 2 TB + 3 x 3 TB) RAIDZ2

This looks like the error message I was seeing:

Code:

Apr 28 01:28:27 freenas2 sudo:  root : unable to stat /usr/local/etc/sudoers : Permission denied ; TTY=unknown ; PWD=/root ;

danb35 · Apr 28, 2016

Slight update: I did bring the system back up, but it's disconnected from the Internet. The error message I just gave (which repeats several times per second) is from /var/log/auth.log, which is 452 MB in size; the logrotate-d previous days are each under 10 KB (though of course they're bzip'd).

Poking through some of my data doesn't show any .gpg file extensions, which would be a sign of a cryptolocker attack as I understand it. The system as running now doesn't show any unusual processes or unreasonable system load.

anodos · Apr 28, 2016

http://doc.freenas.org/9.10/freenas_cli.html#freenas-debug

It's possible that if someone compromised a jail, they could use it as a springboard to attack IPMI, the freenas webgui / API, or other services on the NAS. Especially if you reused SSH keys. I'd look back through /var/log/auth.log for the beginning point at which you start to see weird activity, then try to correlate with other log entries.

Might be a good time to switch to using a VPN. :D

danb35 · Apr 28, 2016

It's amazing what TFM will tell you if you just read it... Thanks for the pointer, @anodos. I ran freenas-debug -v, and it reports that "All Files, Directories and Symlinks in the system were verified successfully". But now it's getting curiouser and curiouser.

I can get into the shell from the console without issue. I can ssh into the server as root. But when I try to su from root to an unprivileged user, I get

Code:

[root@freenas2] ~# su dan
su: /bin/csh: Permission denied

When I try to ssh in as that user, which was possible previously, I get

Code:

root@pve:~# ssh dan@192.168.2.2
dan@192.168.2.2's password:
Last login: Wed Apr 27 19:43:01 2016 from 192.168.1.198
Could not chdir to home directory /mnt/tank/dan: Permission denied
/bin/csh: Permission denied
Connection to 192.168.2.2 closed.

Permissions look correct:

Code:

[root@freenas2] ~# ll /bin/csh
-r-xr-xr-x  2 root  wheel  382080 Apr 27 09:45 /bin/csh*

drwx------  3 dan  staff  4 Dec 22  2014 dan/

danb35 · Apr 28, 2016

Interestingly, the error in auth.log corresponds exactly in time to this bit in /var/log/messages:

Code:

Apr 28 01:28:23 freenas2        (pass17:mps0:0:25:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 154 command timeout cm 0xfffffe000115ea20 ccb 0xfffff81a71d22000
Apr 28 01:28:23 freenas2        (noperiph:mps0:0:4294967295:0): SMID 1 Aborting command 0xfffffe000115ea20
Apr 28 01:28:23 freenas2 mps0: Sending reset from mpssas_send_abort for target ID 25
Apr 28 01:28:25 freenas2 mountd[3056]: export request succeeded from 192.168.1.3
Apr 28 01:28:25 freenas2 mps0: mpssas_prepare_remove: Sending reset for target ID 25
Apr 28 01:28:25 freenas2 da17 at mps0 bus 0 scbus0 target 25 lun 0
Apr 28 01:28:25 freenas2 da17: <ATA WL6000GSA6457 0F.3> s/n WOL240336070         detached
Apr 28 05:28:27 freenas2 devd: Executing '[ -e /tmp/.sync_disk_done ] && LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python /usr/local/www/freenasUI/tools/sync_disks.py'
Apr 28 01:28:27 freenas2        (da17:mps0:0:25:0): WRITE(16). CDB: 8a 00 00 00 00 02 9e eb 54 00 00 00 01 00 00 00 length 131072 SMID 957 terminated ioc 804b scsi 0 state c xfer 0
Apr 28 01:28:27 freenas2 mps0: (da17:mps0:0:25:0): WRITE(16). CDB: 8a 00 00 00 00 02 9e eb 54 00 00 00 01 00 00 00 
Apr 28 01:28:27 freenas2 IOCStatus = 0x4b while resetting device 0x1b
Apr 28 01:28:27 freenas2 (da17:mps0:0:25:0): CAM status: Unconditionally Re-queue Request
Apr 28 01:28:27 freenas2 mps0: (da17:Unfreezing devq for target ID 25
Apr 28 01:28:27 freenas2 mps0:0:mps0: 25:Unfreezing devq for target ID 25
Apr 28 01:28:27 freenas2 0): Error 5, Periph was invalidated
Apr 28 01:28:27 freenas2 (da17:mps0:0:25:0): Periph destroyed

That was a newly-purchased drive that I was testing, and which I think is going back to the vendor. The drive wasn't part of a pool or otherwise mounted in any way; it was just undergoing badblocks testing. There's nothing else anywhere close in time in messages, other than NFS mounts succeeding. I can't imagine how this could relate to the problem I was seeing, but the "Periph destroyed" is at exactly the same time as the auth.log messages start, which is a little too coincidental to just ignore.

I'm seeing a number of "permission denied" like the following in nginx-error.log, but they're from when I tried to access the web GUI, nothing around the time the auth.log messages started:

Code:

2016/04/28 05:47:52 [error] 41424#0: *26043 open() "/usr/local/www/freenasUI/static/images/ui/menu/system.png" failed (13: Permission denied), client: 192.168.1.216, server: localhost, request: "GET /static/images/ui/menu/system.png HTTP/1.1", host: "freenas2.familybrown.org", referrer: "https://freenas2.familybrown.org/"

More to follow; I need to leave in a few minutes. It's looking like something's happened to hose permissions in some way, but as far as I can see so far, data isn't affected.

anodos · Apr 28, 2016

danb35 said:
Interestingly, the error in auth.log corresponds exactly in time to this bit in /var/log/messages:

Code:
Apr 28 01:28:23 freenas2 (pass17:mps0:0:25:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 154 command timeout cm 0xfffffe000115ea20 ccb 0xfffff81a71d22000 Apr 28 01:28:23 freenas2 (noperiph:mps0:0:4294967295:0): SMID 1 Aborting command 0xfffffe000115ea20 Apr 28 01:28:23 freenas2 mps0: Sending reset from mpssas_send_abort for target ID 25 Apr 28 01:28:25 freenas2 mountd[3056]: export request succeeded from 192.168.1.3 Apr 28 01:28:25 freenas2 mps0: mpssas_prepare_remove: Sending reset for target ID 25 Apr 28 01:28:25 freenas2 da17 at mps0 bus 0 scbus0 target 25 lun 0 Apr 28 01:28:25 freenas2 da17: <ATA WL6000GSA6457 0F.3> s/n WOL240336070 detached Apr 28 05:28:27 freenas2 devd: Executing '[ -e /tmp/.sync_disk_done ] && LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python /usr/local/www/freenasUI/tools/sync_disks.py' Apr 28 01:28:27 freenas2 (da17:mps0:0:25:0): WRITE(16). CDB: 8a 00 00 00 00 02 9e eb 54 00 00 00 01 00 00 00 length 131072 SMID 957 terminated ioc 804b scsi 0 state c xfer 0 Apr 28 01:28:27 freenas2 mps0: (da17:mps0:0:25:0): WRITE(16). CDB: 8a 00 00 00 00 02 9e eb 54 00 00 00 01 00 00 00 Apr 28 01:28:27 freenas2 IOCStatus = 0x4b while resetting device 0x1b Apr 28 01:28:27 freenas2 (da17:mps0:0:25:0): CAM status: Unconditionally Re-queue Request Apr 28 01:28:27 freenas2 mps0: (da17:Unfreezing devq for target ID 25 Apr 28 01:28:27 freenas2 mps0:0:mps0: 25:Unfreezing devq for target ID 25 Apr 28 01:28:27 freenas2 0): Error 5, Periph was invalidated Apr 28 01:28:27 freenas2 (da17:mps0:0:25:0): Periph destroyed

That was a newly-purchased drive that I was testing, and which I think is going back to the vendor. The drive wasn't part of a pool or otherwise mounted in any way; it was just undergoing badblocks testing. There's nothing else anywhere close in time in messages, other than NFS mounts succeeding. I can't imagine how this could relate to the problem I was seeing, but the "Periph destroyed" is at exactly the same time as the auth.log messages start, which is a little too coincidental to just ignore.

I'm seeing a number of "permission denied" like the following in nginx-error.log, but they're from when I tried to access the web GUI, nothing around the time the auth.log messages started:

Code:
2016/04/28 05:47:52 [error] 41424#0: *26043 open() "/usr/local/www/freenasUI/static/images/ui/menu/system.png" failed (13: Permission denied), client: 192.168.1.216, server: localhost, request: "GET /static/images/ui/menu/system.png HTTP/1.1", host: "freenas2.familybrown.org", referrer: "https://freenas2.familybrown.org/"

More to follow; I need to leave in a few minutes. It's looking like something's happened to hose permissions in some way, but as far as I can see so far, data isn't affected.

You might also want to check logs on your plex jail. Since it's a jail you don't actually need to have it running to do this.

danb35 · Apr 28, 2016

I've checked the logs in all the jails. Absolutely nothing of interest anywhere close to 0130 today.

There's clearly something strange going on with permissions, and I don't know what it is. Both the inability of the nonprivileged user to log in, and the issues with the web GUI, seem to stem from this, but the permissions on all the relevant files appear to be correct. What's disconcerting about this is that, if this is related to one of the jails, the apparent attacker managed to break out of the jail, which I didn't think they were supposed to be able to do.

anodos · Apr 28, 2016

danb35 said:
I've checked the logs in all the jails. Absolutely nothing of interest anywhere close to 0130 today.

There's clearly something strange going on with permissions, and I don't know what it is. Both the inability of the nonprivileged user to log in, and the issues with the web GUI, seem to stem from this, but the permissions on all the relevant files appear to be correct. What's disconcerting about this is that, if this is related to one of the jails, the apparent attacker managed to break out of the jail, which I didn't think they were supposed to be able to do.

Perhaps it's just a combination of bad juju with a failing boot medium and a misbehaving application. :D

If your system was indeed compromised, it does not indicate that the attacker broke out of the jail (I'd rather use such a vulnerability to attack something of higher value than a home NAS). Theoretically, an attacker can easily leverage a compromised jail to attack the host system through more mundane methods. For instance, you can start running post and get requests from the jail to the FreeNAS webgui's API. If you reuse SSH keys (i.e. use ssh the client inside jail to access other servers), an attacker can use them to access other systems on your home network. There was recently a user who ended up with his NAS encrypted by ransomware because a jail had an SSH key to rsync data to the main NAS.

That said, it's probably a good idea to download your freenas config, install a fresh copy of FreeNAS, create a new plex jail, and possibly look into securing remote access through a VPN.

danb35 · Apr 28, 2016

Bad juju and misbehaving application are possible, but (1) a scrub of the boot pool came up clean, and (2) the OS verification also came up clean, which together I'd think would rule out a failing boot device (there does appear to be a failing device, but there's no obvious way that it would be relevant to what's going on). The clean OS verification puzzles me a bit, as I'd think it'd catch whatever the permissions issue is, but nothing there. Fortunately I don't ssh from any of my jails, so there are no keys there to grab.

anodos said:
That said, it's probably a good idea to download your freenas config, install a fresh copy of FreeNAS, create a new plex jail, and possibly look into securing remote access through a VPN.

That's about where I'm at, though I'm not sure about the VPN in this case. As I understand it, port 32400 (or some other specified port) needs to be open to the Plex installation for the remote features to work (which I could live without, at least for a while), and I'm not sure how a local Plex client would deal with not being able to discover the server through plex.tv. Easy enough to test in the short term, though; I can just remove the port forward from my router.

anodos · Apr 28, 2016

danb35 said:
Bad juju and misbehaving application are possible, but (1) a scrub of the boot pool came up clean, and (2) the OS verification also came up clean, which together I'd think would rule out a failing boot device (there does appear to be a failing device, but there's no obvious way that it would be relevant to what's going on). The clean OS verification puzzles me a bit, as I'd think it'd catch whatever the permissions issue is, but nothing there. Fortunately I don't ssh from any of my jails, so there are no keys there to grab.

That's about where I'm at, though I'm not sure about the VPN in this case. As I understand it, port 32400 (or some other specified port) needs to be open to the Plex installation for the remote features to work (which I could live without, at least for a while), and I'm not sure how a local Plex client would deal with not being able to discover the server through plex.tv. Easy enough to test in the short term, though; I can just remove the port forward from my router.

Honestly, I have no experience with plex and so I can't comment on what will help in this situation. If your router allows more flexible firewall rules, you can filter the allowed ingress / egress IP addresses on your WAN interface for your Plex jail. For instance, In PFsense you can create an alias for allowed source IP addresses in your port-forward rules. This will only allow certain IP addresses to access your plex jail from the internet.

danb35 · Apr 28, 2016

Even stranger: I just attempted to boot into the installer (ISO, mounted virtually via IPMI). This is the same ISO image I used to install yesterday, and installation ran without issue. The SHA256sum was just confirmed a few minutes ago. Here's the result now:

Time to burn it to a USB and boot from that, I guess.

Ericloewe · Apr 28, 2016

danb35 said:
Even stranger: I just attempted to boot into the installer (ISO, mounted virtually via IPMI). This is the same ISO image I used to install yesterday, and installation ran without issue. The SHA256sum was just confirmed a few minutes ago. Here's the result now:
View attachment 11590
Time to burn it to a USB and boot from that, I guess.

At this rate, by Tuesday, @anodos (next most liked guy after me and @danb35 ) will suffer a mysterious failure on a server of his that leads into more WTF events.

I propose we all pitch in and hire an exorcist. I know a guy.

danb35 · Apr 28, 2016

Burned to a USB stick, that didn't even show up as a bootable device. Downloaded the ISO and sha256 again, verified the sha256 (it was fine), and this time the installer booted and ran without apparent error. The new install is booting now, whereupon I'll upload the config (copy taken right at midnight, so before whatever happened, happened) and see what happens.

Edit: Who knows, we might need the exorcist yet.

anodos · Apr 28, 2016

danb35 said:
Burned to a USB stick, that didn't even show up as a bootable device. Downloaded the ISO and sha256 again, verified the sha256 (it was fine), and this time the installer booted and ran without apparent error. The new install is booting now, whereupon I'll upload the config (copy taken right at midnight, so before whatever happened, happened) and see what happens.

Edit: Who knows, we might need the exorcist yet.

It might be worthwhile to check the event logs in IPMI for any sort of hardware fault detected. Maybe run memtest as well.

danb35 · Apr 28, 2016

Good idea on the IPMI logs, I hadn't thought of that. Nothing in them, though, except for chassis intrusion when I had the case open a few times. I'd run memtest for a few days last week when I got the server with no errors found (it is ECC RAM, which should make this a non-issue, though I'm awaiting @cyberjock's analysis of the faulty ECC stick he received).

The system is up and running now with the restored config file. I'm only allowing one SSH key access for root (and that only until I can figure out how to let Proxmox run pve-zsync as someone other than root on the remote box). I'll keep an eye on what happens.

anodos · Apr 28, 2016

danb35 said:
Good idea on the IPMI logs, I hadn't thought of that. Nothing in them, though, except for chassis intrusion when I had the case open a few times. I'd run memtest for a few days last week when I got the server with no errors found (it is ECC RAM, which should make this a non-issue, though I'm awaiting @cyberjock's analysis of the faulty ECC stick he received).

The system is up and running now with the restored config file. I'm only allowing one SSH key access for root (and that only until I can figure out how to let Proxmox run pve-zsync as someone other than root on the remote box). I'll keep an eye on what happens.

Hardware can do weird stuff.

This is mostly conjecture, but what you described doesn't fit in with what I would expect from a network intrusion. I'd expect [1] your system to be really busy leading up to compromise (as attacker tries to force a buffer overflow or exploit some other vulnerability) followed by it [2] returning to a more-or-less normal state as he tries to move laterally on your network, compromise more systems, then (since you're a home user) a [3] lot of activity as he starts encrypting everything (because that's where the money is). Steps [1] and [2] are done in ways that minimally affect the filesystem to reduce chances of detection, and obviously no one got to [3] since you still have your files. I think hardware or software problem is more likely.

danb35 · Apr 28, 2016

That's at least somewhat encouraging. The system seems to be running fine for now. I think I'll turn off all the jails before bed tonight and make sure everything's fine in the morning without them (I'm not at all sure the problem had anything to do with them, but it at least eliminates a variable). I have badblocks running on the questionable drive again, hoping it will complete a full pass to get a true picture of the status of the drive.

One thing I did notice as I was going through the Plex jail was that there's parameter in the Preferences.xml file called "disableRemoteSecurity", and by default with the FreeBSD packages, that's enabled. That gives anyone who wanders across your Plex server full access to your data. That's frankly an idiotic default setting, and I've now changed it.

Still unexplained:

Borked permissions on something in the OS (still don't know what), resulting in the inability of a nonprivileged user to log in, that don't show up despite running both a boot pool scrub and a Verify Install
FreeNAS ISO didn't work, despite matching sha256, when mounted as virtual device over IPMI, or when burned to a USB stick. A fresh download worked fine.

Important Announcement for the TrueNAS Community.

System compromised?

Hall of Famer

Sweet'NASty

Server Wrangler

Hall of Famer

Hall of Famer

Sambassador

Hall of Famer

Hall of Famer

Sambassador

Hall of Famer

Sambassador

Hall of Famer

Sambassador

Hall of Famer

Server Wrangler

Hall of Famer

Sambassador

Hall of Famer

Sambassador

Hall of Famer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "System compromised?"

Similar threads