[Cerowrt-devel] Spectre and EBPF JIT

Fri Jan 5 10:35:45 EST 2018

One of the most troubling "overreactions" is due to the fact that the POC by Google Project Zero describes an attack on the hypervisor host memory under KVM.
In fine print, and not very explicitly in the Project Zero description, is that the version of KVM that was hacked was dependent on the hypervisor being mapped into the linear address space of the guest kernel.
In a hypervisor that uses VMX extensions, the EPT during guest execution doesn't even provide addressability to the hypervisor code and data. (I haven't inspected KVM's accelerated mode, but I can't see why it would have the EPT map non-guest memory. I know VMWare does not.)

This is validated by a posting from QEMU re KVM, [ https://www.qemu.org/2018/01/04/spectre/ ]( https://www.qemu.org/2018/01/04/spectre/ ) , again a little hard to understand if you don't know how VMX and EPT's work.

What this means is that older cloud VMs based on techniques used in paravirtualization (Xen, ancient QEMU, older VMware) may be susceptible to accessing hypervisor state via Spectre v1.

But newer so-called hardware-accelerated VMs based on VMX extensions and using the EPT are isolated to a much larger extent, making Spectre v1 pretty useless.

Thus, the "overreaction" is that ALL VM's are problematic.  This is very far from true. Hardware-accelerated VM's hypervisors are not vulnerable to Meltdown, Spectre v2, and probably not Spectre v1.

Of course, *within* a particular VM, the guest kernel and other processes are vulnerable. But there is no inter-VM path that has been demonstrated, nor do any of the discussions explain any means for using speculative execution and branch misprediction between VMs running under different EPT's.

So for the cloud, and also for NVF's that are run on accelerated HVM's, the problem is either non-existent or yet to be discovered.

Of course the "press" wants everyone to be superafraid, so if they can say "KVM is affected" that causes the mob to start running for the exits!

Summary: hardware virtualization appears to be a pragmatic form of isolation that works. And thus many cloud providers are fine.

-----Original Message-----
From: "Jonathan Morton" <chromatix99 at gmail.com>
Sent: Friday, January 5, 2018 9:07am
To: "Dave Taht" <dave.taht at gmail.com>
Cc: "dpreed at deepplum.com" <dpreed at deepplum.com>, "Joel Wirāmu Pauling" <joel at aenertia.net>, cerowrt-devel at lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] Spectre and EBPF JIT

> On 5 Jan, 2018, at 6:53 am, Dave Taht <dave.taht at gmail.com> wrote:
> 
> It took me a long while to digest that one. The branch predictor
> analysis of haswell was easiest to understand (and AMD claims to have
> an AI based one), and perhaps scrambling that at random intervals
> would help? (this stuff is now way above my pay grade)

Software mitigations for all three attacks have been developed during the "responsible disclosure" period.

Spectre v1: adding an LFENCE instruction (memory load fence) to JIT code performing a bounds-checked array read. This is basically a userspace fix for a userspace attack. Firefox just got this, Chrome undoubtedly will too, if it hasn't already.

Spectre v2: three different mitigations are appropriate for different families of CPU:

 https://lkml.org/lkml/2018/1/4/742

On AMD CPUs, the small risk actually existing (because AMD's BTB is much less prone to poisoning than Intel's) is erased by adding LFENCE to privileged indirect branches. This has only a very small cost.

On Intel CPUs until Broadwell inclusive (and Silvermont onwards), a "retpoline" structure is necessary and sufficient. This has a bigger cost than LFENCE and is pretty ugly to look at, but it's still relatively minor.

On Skylake, Kaby Lake and Coffee Lake, something more exotic is required - I think it involves temporarily disabling the BTB during privileged indirect branches. That's *really* ugly, and involves tweaking poorly-documented MSRs.

Something similar in nature to the above should also work for affected ARM cores.

Meltdown: nothing is required for AMD CPUs. Unmapping the privileged addresses when returning to userspace is sufficient for Intel, but incurs a big performance overhead for syscalls. The same is likely true for any other affected CPUs.

 - Jonathan Morton

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20180105/1c6b5100/attachment.html>