Meltdown / Spectre Discussion

jgreco · Jan 2, 2018

https://www.theregister.co.uk/2018/01/02/intel_cpu_design_flaw/

Awesome. That's just what we all need, a 5-60% speed decrease.

Redcoat · Jan 2, 2018

The stock market doesn't seem to care about it ...

Nick2253 · Jan 2, 2018

Holy guacamole, batman!

Even at a 5% performance decrease, this is huge. I hope that once the details are revealed, there are only a few corner-case workloads that see the big-number decreases, but I'm afraid that common workloads, like virtualization and file serving will be hit hardest.

Jailer · Jan 2, 2018

Saw that earlier today. Avaton bug, IME exploit and now this. I guess life aint always great at the top......

rs225 · Jan 2, 2018

It could be pretty bad. I think the workaround has the effect of taking every operating system function call, and turning it into the equivalent of a context-switch.

Ten years ago, the AMD Phenom had a microcode work-around that crippled the TLB(translation lookaside/lookup buffer) and it made WinRAR run 70% slower.

In this case, the impact is going to depend on how often your code is making system calls and/or triggering hypervisor code.

I would expect their stock to notice by tomorrow, whatever that turns out to be.

nightshade00013 · Jan 2, 2018

Sucks but on the bright side AMD is immune to this bug. Most of what I use is AMD based other than my wife's laptop and my FreeNAS. Speaking of I assume FreeBSD will be affected by this as well.

Should be interesting when all the current and slightly older stuff that is intel based runs the same speed as if not slower than the AMD based hardware when the software fixes are released.

rs225 · Jan 2, 2018

Mid term, couldn't the operating system have a process white-list that exempts some code from the fix? The question is whether you trust that code enough to do so.

jgreco · Jan 3, 2018

rs225 said:
Mid term, couldn't the operating system have a process white-list that exempts some code from the fix? The question is whether you trust that code enough to do so.

No. I'm going to try to describe this in a general computer-science way, so some of the specific terms and issues are a little different, this is to see if I can get the concept across. So anyone who wants to "correct" me will get a Grinchly response.

So modern computing systems have a virtual memory system, which includes the ability to create a virtual address space for each process. This is how you can run several different processes each with a 3GByte size on a 4GByte 32-bit system... the virtual memory system can swap in pages from disk, or in some cases even share pages.

We also usually have a kernel, which is a portion of code that acts as the operating system, which has privileges to do ANYTHING to the system. Because this is dangerous, we create a second tier, for user-provided code, which doesn't have the sweeping privileges. Since users are bad and evil and out to crash your system, we require user programs to gateway risky things, such as talking to I/O devices, through the kernel.

The interface between the user and kernel is called a "syscall." A syscall causes certain things to happen and control to be passed off from the user process, which made the syscall, to the kernel function that implements the functionality, such as "write data to file" or "read stuff from network."

Now, remember where I talked about virtual memory systems? It takes a certain amount of time to switch contexts between processes, which means that the virtual memory system needs to be set up to run a different process. You can watch the rate at which a normal UNIX box does this by running "vmstat 1" and looking at the "cs" column. Usually in the hundreds-to-thousands-per-second.

Because context switches take a bunch of time, we have traditionally used tricks to map the kernel into the same address space as each running process, which means that a CPU can switch from user mode to kernel mode and back without suffering a context switch and without needing to remap the virtual memory spaces. This is generally a good thing, it makes your system run faster. This is fine, because a CPU and the virtual memory system is supposed to provide protections against user accesses to memory that is marked as privileged, such as the kernel.

Intel fscked this up.

The specifics are unclear, but modern CPU's speculatively begin processing commands several cycles ahead ("pipelining"), and apparently there isn't sufficient logic in Intel's pipeline design to protect privileged memory properly. Apparently a clever attacker can cause a byte, maybe a few bytes, maybe even a page, of privileged memory to be read through clever sequencing of instructions with only user privileges. So this means that a user process can see the kernel's memory space. This is largely boring, but it can also have important stuff there, encryption keys, information about other processes, etc.

So the problem is that you have a binary choice. You can trust user code not to be malicious, in which case things are fine as they are. This might be an okay decision in some cases. But for most of computing, it is a dangerous one to make, and the decision many or most will make is that the kernel needs to be protected. So we have to look at mitigation.

The problem is easily mitigated, FSVO "easily", by putting the kernel into its own virtual address space. This means that a user process doesn't know and cannot abuse the pipeline to peek into the kernel memory.

The PROBLEM is that putting the kernel into its own virtual address space means that each time you make a syscall, you have to do a context switch and map out the user process, and map in the kernel, and then resume execution in the kernel, then reverse those steps to return back to the user process when done. The CPU cost to do this has been measured at between 5% and 70%, depending on the workload. I'm pretty sure that even 70% is not an upper bound.

This SUUUUUUUUCKKKKSSSS.

Right now, this is causing a massive panic in the world of cloud, where it is looking fairly likely that cloud resources are suddenly going to get noticeably slower, which means that the cloud is going to have to expand. Because there hasn't been a clear disclosure, those of us who do virtualization infrastructure are expecting that this is going to significantly impact hypervisors, and there's a good case to be made that guest operating systems can examine or even escape into the hypervisor management plane. The NSA had been rumored to have tools that were capable of VM escape for quite some time now, and perhaps this is the vulnerability that they were using. If so, shame on them. This is an IT train wreck.

Tell me if you do or don't understand what I've written. This is the kind of train wreck many of us have feared as CPU's have gone from a few thousand transistors (really!) to billions.

spotcatbug · Jan 3, 2018

jgreco said:
Tell me if you do or don't understand what I've written.

Awesome explanation. Thanks. What was missing for my understanding was the bit about how currently the kernel does not have to be context-switched-in for syscalls. I just assumed that was happening all the time (the context switches). So now we're losing that apparently super-important optimization. Yikes.

rs225 · Jan 3, 2018

Makes sense to me. I still wonder though if it is possible for the syscall gateway to act based on who called it, or if that is so low-level it can't be known until after the kernel is mapped and starts running. Perhaps it can't, and that is what the PCID feature support is kind of about. (Obviously, Intel wouldn't have created a feature just for this bug.)

jgreco · Jan 3, 2018

spotcatbug said:
Awesome explanation. Thanks. What was missing for my understanding was the bit about how currently the kernel does not have to be context-switched-in for syscalls. I just assumed that was happening all the time (the context switches). So now we're losing that apparently super-important optimization. Yikes.

Correct. I'm not totally clear on how this fix has been implemented, so it's possible that this isn't quite as bad as a full context switch, but anytime you mess with the virtual memory setup, that's a fairly heavy operation. I'm used to seeing this in the hundreds or thousands per second on the typical system. However, if you now have to do this for every I/O operation, ... wow that's going to be crazy shit.

Now, the thing is, this is an Intel bug. It isn't inherent to the platform, so AMD does not have this particular flaw, and also Intel can totally fix this in new silicon, which they absolutely will.

But this is really worse than Apple's Batterygate. For older CPU's, which quite likely means every Intel CPU out there, there isn't a microcode fix or other repair available. You don't have the option to go on down to Intel's Idiot Bar and have your CPU replaced for $29. Batteries are a part we've always *known* is subject to degradation over time and charge cycles, so in reality there's a lot of fuss and somewhat untrue angst over that Apple issue, but it's totally correctable, and while I understand the theory behind the "fraud" claim, well, eh, I'm ambivalent about that. But Intel and CPU's? Those Avotons? Those Xeon-D's? You can't even replace those fsck'ers even if you could convince Intel to send you a replacement, which isn't likely to happen. This is a silicon-wasting shitstorm.

And thanks for the comment. I often write long messages and positive feedback encourages me to do more of it. ;-)

jgreco · Jan 3, 2018

rs225 said:
Makes sense to me. I still wonder though if it is possible for the syscall gateway to act based on who called it, or if that is so low-level it can't be known until after the kernel is mapped and starts running. Perhaps it can't, and that is what the PCID feature support is kind of about. (Obviously, Intel wouldn't have created a feature just for this bug.)

That would be irrelevant. This doesn't appear to involve corrupting the syscall gateway mechanism. It's happening because the kernel is always mapped, as a performance optimization. This implies that all you really need to do is to set up the correct set of instructions inside a userspace process, and you can cause an access into kernel memory that SHOULD generate a security violation, but doesn't. This is the CPU itself doing the wrong thing. The "fix" is to set up a Berlin Wall at the syscall gateway that creates a strong separation between the virtual address spaces for user processes and for the kernel. It isn't actually a fix because it's just moving the kernel to someplace where it can't be inspected so easily.

rs225 · Jan 3, 2018

If you knew a process was white-listed, you could keep the kernel mapped for that process. But, on further thought, I think that wouldn't be wise because the bad process would just find a way to get the good process to load an altered DLL, etc. And on Windows, there are a lot of ways to attack another user-mode process.

Jailer · Jan 3, 2018

jgreco said:
I often write long messages and positive feedback encourages me to do more of it. ;-)

Well I for one encourage you to do more. You have a unique talent for explaining all things technical in a way that is easily understood.

rs225 · Jan 3, 2018

What's the hit going to be on virtualization? Amazon started patching for this about two weeks ago, only a couple reports of speed degrading. Are they hiding it by upping their instances, or is it not as big a problem at the VM/hypervisor layer? Why are they patching at all?

jgreco · Jan 3, 2018

rs225 said:
What's the hit going to be on virtualization? Amazon started patching for this about two weeks ago, only a couple reports of speed degrading. Are they hiding it by upping their instances, or is it not as big a problem at the VM/hypervisor layer? Why are they patching at all?

It is unclear at this time, but I can tell you that virtualization people are preparing for it to suuuuuuuck. There is a lot of uncertainty at this point as to how bad the problem actually is, or even what precisely it is.

An existing virtual machine would probably need to be updated or rebuilt in order to see the performance impact of guest kernel updates to mitigate this problem.

All we really seem to know about the hypervisor side of this is that cloud services are in a bit of a panic about it. I've seen some credible reports that the issue is more of a slow information leak which is exploitable within the local OS, and that makes a certain kind of sense to me, because a hypervisor is trussed up around that, but doesn't map significant portions of itself into the VM's virtual address space, so there shouldn't be a big risk factor there. This interpretation of affairs seems to be at odds with some of the things that are being done to mitigate, so it is very possible that there's some as-yet not-fully-revealed impact on virtualization and hypervisors, or that people are overreacting, which is possible too.

I wish I could tell you which it was.

Ericloewe · Jan 3, 2018

jgreco said:
For older CPU's, which quite likely means every Intel CPU out there, there isn't a microcode fix or other repair available.

This part surprises me. I would have expected this to be fixable in microcode - which leaves me wondering if the prefetch stuff is implemented in fixed-function logic or if the fix doesn't fit in available microcode space.

There are some rumors that the embargo on this whole story lifts on 2018-01-04, so hopefully we'll know details tomorrow.

rs225 · Jan 3, 2018

We'll probably find out tomorrow, when Xen announces.

If hardware is direct mapped into the VM address space, and something extra is too but was protected by the page table, maybe the same problem exists depending on how the hypervisor works internally.

I think the problem is too low-level to be addressed by microcode. Is it even a bug, or is it caused exclusively by the fact that the outcome varies by time? The only fix there would be to make everything slow. Right back where you started.

MatthewSteinhoff · Jan 3, 2018

jgreco said:
NSA had been rumored to have tools that were capable of VM escape for quite some time now, and perhaps this is the vulnerability that they were using.

NSA... vulnerability or feature? ;)

Thanks for the explainer. In broad terms, I knew what was going on and that it's really bad. It'll be nice to dial-in the specifics as more news is released.

Way long ago, Intel swapped the FDIV-impacted 60 MHz Pentium processor in my Gateway desktop for an unaffected 66 MHz at no cost. I thought I had died and gone to heaven. I doubt they'll be able to replace all the chips affected by this error. They simply don't have the fabrication capacity. Even if they only replaced currently-marketed chips. They are in a world of hurt though this wouldn't be the first time they bounced-back.

Cheers,
Matt

jgreco · Jan 3, 2018

Intel posts an arse-covering press release a few moments ago...

https://newsroom.intel.com/news/intel-responds-to-security-research-findings/

(whiny voice:) "But it's not REEEEALLLLLLYYYY a bug"

And they have the balls to put "Intel believes its products are the most secure in the world" after I've had to make rounds fixing SA-00075 last fall and SA-00086 last month so I'm going to go the extra mile and invite their PR flaks to pull their heads out of where it don't shine.

Important Announcement for the TrueNAS Community.

Meltdown / Spectre Discussion

Resident Grinch

MVP

Wizard

Not strong, but bad

Guru

Wizard

Guru

Resident Grinch

Dabbler

Guru

Resident Grinch

Resident Grinch

Guru

Not strong, but bad

Guru

Resident Grinch

Server Wrangler

Guru

Guru

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Meltdown / Spectre Discussion"

Similar threads