From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dpreed@deepplum.com>
Received: from smtp73.iad3a.emailsrvr.com (smtp73.iad3a.emailsrvr.com
 [173.203.187.73])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 4FA423B29E
 for <cerowrt-devel@lists.bufferbloat.net>;
 Fri,  5 Jan 2018 10:35:46 -0500 (EST)
Received: from smtp18.relay.iad3a.emailsrvr.com (localhost [127.0.0.1])
 by smtp18.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 0B807252DD;
 Fri,  5 Jan 2018 10:35:46 -0500 (EST)
X-SMTPDoctor-Processed: csmtpprox beta
Received: from smtp18.relay.iad3a.emailsrvr.com (localhost [127.0.0.1])
 by smtp18.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 0523225303;
 Fri,  5 Jan 2018 10:35:46 -0500 (EST)
Received: from app62.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140])
 by smtp18.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id D6846252DD;
 Fri,  5 Jan 2018 10:35:45 -0500 (EST)
X-Sender-Id: dpreed@deepplum.com
Received: from app62.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140]) by 0.0.0.0:25 (trex/5.7.12);
 Fri, 05 Jan 2018 10:35:45 -0500
Received: from deepplum.com (localhost.localdomain [127.0.0.1])
 by app62.wa-webapps.iad3a (Postfix) with ESMTP id C4CA241201;
 Fri,  5 Jan 2018 10:35:45 -0500 (EST)
Received: by apps.rackspace.com
 (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) 
 with HTTP; Fri, 5 Jan 2018 10:35:45 -0500 (EST)
X-Auth-ID: dpreed@deepplum.com
Date: Fri, 5 Jan 2018 10:35:45 -0500 (EST)
From: "dpreed@deepplum.com" <dpreed@deepplum.com>
To: "Jonathan Morton" <chromatix99@gmail.com>
Cc: "Dave Taht" <dave.taht@gmail.com>,
 "=?utf-8?Q?Joel_Wir=C4=81mu_Pauling?=" <joel@aenertia.net>,
 cerowrt-devel@lists.bufferbloat.net
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="----=_20180105103545000000_73081"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To: <EEA16134-72C4-4A35-ACB1-0F6FB29B2718@gmail.com>
References: <CAA93jw5e9GG_dWDitpts+-e=v8r0m=mjT6SOKKBKwGv6b5=D=g@mail.gmail.com> 
 <D02628E8-9515-4914-BE44-5A6F66DD1264@gmail.com> 
 <CE3AD1EC-8BE0-4FE6-ACAE-C2A205FE1581@gmail.com> 
 <CAKiAkGRAwNtBGs8rs-8NWgFGZfr2zAzyf8A9Sv6sQGWoQ+j20w@mail.gmail.com> 
 <1515103048.715224709@apps.rackspace.com> 
 <CAKiAkGSEL=mQC9=bFjff_FKVdqsux0W+WqZsU3oU1j7MCTsreQ@mail.gmail.com> 
 <1515103759.340132151@apps.rackspace.com> 
 <E2BD83EC-3957-4C5E-8C7D-825DB4F26EAA@gmail.com> 
 <CAKiAkGTZdkg8XG4h_4d+1v5PLKPBvXeUJ_CwGYSmiPKXFSvp5w@mail.gmail.com> 
 <1515106728.430510671@apps.rackspace.com> 
 <CAA93jw5zzkFGusTvjG++WO71q8NMN0mmVsx2Q+SWh2LKEbuLUQ@mail.gmail.com> 
 <EEA16134-72C4-4A35-ACB1-0F6FB29B2718@gmail.com>
Message-ID: <1515166545.80420063@apps.rackspace.com>
X-Mailer: webmail/12.9.10-RC
Subject: Re: [Cerowrt-devel] Spectre and EBPF JIT
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
 <cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
 <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
 <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Fri, 05 Jan 2018 15:35:46 -0000

------=_20180105103545000000_73081
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

=0AOne of the most troubling "overreactions" is due to the fact that the PO=
C by Google Project Zero describes an attack on the hypervisor host memory =
under KVM.=0AIn fine print, and not very explicitly in the Project Zero des=
cription, is that the version of KVM that was hacked was dependent on the h=
ypervisor being mapped into the linear address space of the guest kernel.=
=0AIn a hypervisor that uses VMX extensions, the EPT during guest execution=
 doesn't even provide addressability to the hypervisor code and data. (I ha=
ven't inspected KVM's accelerated mode, but I can't see why it would have t=
he EPT map non-guest memory. I know VMWare does not.)=0A =0AThis is validat=
ed by a posting from QEMU re KVM, [ https://www.qemu.org/2018/01/04/spectre=
/ ]( https://www.qemu.org/2018/01/04/spectre/ ) , again a little hard to un=
derstand if you don't know how VMX and EPT's work.=0A =0AWhat this means is=
 that older cloud VMs based on techniques used in paravirtualization (Xen, =
ancient QEMU, older VMware) may be susceptible to accessing hypervisor stat=
e via Spectre v1.=0A =0ABut newer so-called hardware-accelerated VMs based =
on VMX extensions and using the EPT are isolated to a much larger extent, m=
aking Spectre v1 pretty useless.=0A =0AThus, the "overreaction" is that ALL=
 VM's are problematic.  This is very far from true. Hardware-accelerated VM=
's hypervisors are not vulnerable to Meltdown, Spectre v2, and probably not=
 Spectre v1.=0A =0AOf course, *within* a particular VM, the guest kernel an=
d other processes are vulnerable. But there is no inter-VM path that has be=
en demonstrated, nor do any of the discussions explain any means for using =
speculative execution and branch misprediction between VMs running under di=
fferent EPT's.=0A =0ASo for the cloud, and also for NVF's that are run on a=
ccelerated HVM's, the problem is either non-existent or yet to be discovere=
d.=0A =0AOf course the "press" wants everyone to be superafraid, so if they=
 can say "KVM is affected" that causes the mob to start running for the exi=
ts!=0A =0ASummary: hardware virtualization appears to be a pragmatic form o=
f isolation that works. And thus many cloud providers are fine.=0A =0A =0A =
=0A-----Original Message-----=0AFrom: "Jonathan Morton" <chromatix99@gmail.=
com>=0ASent: Friday, January 5, 2018 9:07am=0ATo: "Dave Taht" <dave.taht@gm=
ail.com>=0ACc: "dpreed@deepplum.com" <dpreed@deepplum.com>, "Joel Wir=C4=81=
mu Pauling" <joel@aenertia.net>, cerowrt-devel@lists.bufferbloat.net=0ASubj=
ect: Re: [Cerowrt-devel] Spectre and EBPF JIT=0A=0A=0A=0A> On 5 Jan, 2018, =
at 6:53 am, Dave Taht <dave.taht@gmail.com> wrote:=0A> =0A> It took me a lo=
ng while to digest that one. The branch predictor=0A> analysis of haswell w=
as easiest to understand (and AMD claims to have=0A> an AI based one), and =
perhaps scrambling that at random intervals=0A> would help? (this stuff is =
now way above my pay grade)=0A=0ASoftware mitigations for all three attacks=
 have been developed during the "responsible disclosure" period.=0A=0ASpect=
re v1: adding an LFENCE instruction (memory load fence) to JIT code perform=
ing a bounds-checked array read. This is basically a userspace fix for a us=
erspace attack. Firefox just got this, Chrome undoubtedly will too, if it h=
asn't already.=0A=0ASpectre v2: three different mitigations are appropriate=
 for different families of CPU:=0A=0A https://lkml.org/lkml/2018/1/4/742=0A=
=0AOn AMD CPUs, the small risk actually existing (because AMD's BTB is much=
 less prone to poisoning than Intel's) is erased by adding LFENCE to privil=
eged indirect branches. This has only a very small cost.=0A=0AOn Intel CPUs=
 until Broadwell inclusive (and Silvermont onwards), a "retpoline" structur=
e is necessary and sufficient. This has a bigger cost than LFENCE and is pr=
etty ugly to look at, but it's still relatively minor.=0A=0AOn Skylake, Kab=
y Lake and Coffee Lake, something more exotic is required - I think it invo=
lves temporarily disabling the BTB during privileged indirect branches. Tha=
t's *really* ugly, and involves tweaking poorly-documented MSRs.=0A=0ASomet=
hing similar in nature to the above should also work for affected ARM cores=
.=0A=0AMeltdown: nothing is required for AMD CPUs. Unmapping the privileged=
 addresses when returning to userspace is sufficient for Intel, but incurs =
a big performance overhead for syscalls. The same is likely true for any ot=
her affected CPUs.=0A=0A - Jonathan Morton=0A=0A
------=_20180105103545000000_73081
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<font face=3D"arial" size=3D"2"><p style=3D"margin:0;padding:0;font-family:=
 arial; font-size: 10pt; overflow-wrap: break-word;">One of the most troubl=
ing "overreactions" is due to the fact that the POC by Google Project Zero =
describes an attack on the hypervisor host memory under KVM.</p>=0A<p style=
=3D"margin:0;padding:0;font-family: arial; font-size: 10pt; overflow-wrap: =
break-word;">In fine print, and not very explicitly in the Project Zero des=
cription, is that the version of KVM that was hacked was dependent on the h=
ypervisor being mapped into the linear address space of the guest kernel.</=
p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 10pt; ov=
erflow-wrap: break-word;">In a hypervisor that uses VMX extensions, the EPT=
 during guest execution doesn't even provide addressability to the hypervis=
or code and data. (I haven't inspected KVM's accelerated mode, but I can't =
see why it would have the EPT map non-guest memory. I know VMWare does not.=
)</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 10pt;=
 overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;fo=
nt-family: arial; font-size: 10pt; overflow-wrap: break-word;">This is vali=
dated by a posting from QEMU re KVM, <a href=3D"https://www.qemu.org/2018/0=
1/04/spectre/">https://www.qemu.org/2018/01/04/spectre/</a> , again a littl=
e hard to understand if you don't know how VMX and EPT's work.</p>=0A<p sty=
le=3D"margin:0;padding:0;font-family: arial; font-size: 10pt; overflow-wrap=
: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: ari=
al; font-size: 10pt; overflow-wrap: break-word;">What this means is that ol=
der cloud VMs based on techniques used in paravirtualization (Xen, ancient =
QEMU, older VMware) may be susceptible to accessing hypervisor state via Sp=
ectre v1.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-siz=
e: 10pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padd=
ing:0;font-family: arial; font-size: 10pt; overflow-wrap: break-word;">But =
newer so-called hardware-accelerated VMs based on VMX extensions and using =
the EPT are isolated to a much larger extent, making Spectre v1 pretty usel=
ess.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 10=
pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0=
;font-family: arial; font-size: 10pt; overflow-wrap: break-word;">Thus, the=
 "overreaction" is that ALL VM's are problematic.&nbsp; This is very far fr=
om true. Hardware-accelerated VM's hypervisors are not vulnerable to Meltdo=
wn, Spectre v2, and probably not Spectre v1.</p>=0A<p style=3D"margin:0;pad=
ding:0;font-family: arial; font-size: 10pt; overflow-wrap: break-word;">&nb=
sp;</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 10p=
t; overflow-wrap: break-word;">Of course, *within* a particular VM, the gue=
st kernel and other processes are vulnerable. But there is no inter-VM path=
 that has been demonstrated, nor do any of the discussions explain any mean=
s for using speculative execution and branch misprediction between VMs runn=
ing under different EPT's.</p>=0A<p style=3D"margin:0;padding:0;font-family=
: arial; font-size: 10pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=
=3D"margin:0;padding:0;font-family: arial; font-size: 10pt; overflow-wrap: =
break-word;">So for the cloud, and also for NVF's that are run on accelerat=
ed HVM's, the problem is either non-existent or yet to be discovered.</p>=
=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 10pt; over=
flow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-fa=
mily: arial; font-size: 10pt; overflow-wrap: break-word;">Of course the "pr=
ess" wants everyone to be superafraid, so if they can say "KVM is affected"=
 that causes the mob to start running for the exits!</p>=0A<p style=3D"marg=
in:0;padding:0;font-family: arial; font-size: 10pt; overflow-wrap: break-wo=
rd;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-s=
ize: 10pt; overflow-wrap: break-word;">Summary: hardware virtualization app=
ears to be a pragmatic form of isolation that works. And thus many cloud pr=
oviders are fine.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; =
font-size: 10pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margi=
n:0;padding:0;font-family: arial; font-size: 10pt; overflow-wrap: break-wor=
d;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-si=
ze: 10pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;pad=
ding:0;font-family: arial; font-size: 10pt; overflow-wrap: break-word;">---=
--Original Message-----<br />From: "Jonathan Morton" &lt;chromatix99@gmail.=
com&gt;<br />Sent: Friday, January 5, 2018 9:07am<br />To: "Dave Taht" &lt;=
dave.taht@gmail.com&gt;<br />Cc: "dpreed@deepplum.com" &lt;dpreed@deepplum.=
com&gt;, "Joel Wir=C4=81mu Pauling" &lt;joel@aenertia.net&gt;, cerowrt-deve=
l@lists.bufferbloat.net<br />Subject: Re: [Cerowrt-devel] Spectre and EBPF =
JIT<br /><br /></p>=0A<div id=3D"SafeStyles1515165367">=0A<p style=3D"margi=
n:0;padding:0;font-family: arial; font-size: 10pt; overflow-wrap: break-wor=
d;">&gt; On 5 Jan, 2018, at 6:53 am, Dave Taht &lt;dave.taht@gmail.com&gt; =
wrote:<br />&gt; <br />&gt; It took me a long while to digest that one. The=
 branch predictor<br />&gt; analysis of haswell was easiest to understand (=
and AMD claims to have<br />&gt; an AI based one), and perhaps scrambling t=
hat at random intervals<br />&gt; would help? (this stuff is now way above =
my pay grade)<br /><br />Software mitigations for all three attacks have be=
en developed during the "responsible disclosure" period.<br /><br />Spectre=
 v1: adding an LFENCE instruction (memory load fence) to JIT code performin=
g a bounds-checked array read. This is basically a userspace fix for a user=
space attack. Firefox just got this, Chrome undoubtedly will too, if it has=
n't already.<br /><br />Spectre v2: three different mitigations are appropr=
iate for different families of CPU:<br /><br /> https://lkml.org/lkml/2018/=
1/4/742<br /><br />On AMD CPUs, the small risk actually existing (because A=
MD's BTB is much less prone to poisoning than Intel's) is erased by adding =
LFENCE to privileged indirect branches. This has only a very small cost.<br=
 /><br />On Intel CPUs until Broadwell inclusive (and Silvermont onwards), =
a "retpoline" structure is necessary and sufficient. This has a bigger cost=
 than LFENCE and is pretty ugly to look at, but it's still relatively minor=
.<br /><br />On Skylake, Kaby Lake and Coffee Lake, something more exotic i=
s required - I think it involves temporarily disabling the BTB during privi=
leged indirect branches. That's *really* ugly, and involves tweaking poorly=
-documented MSRs.<br /><br />Something similar in nature to the above shoul=
d also work for affected ARM cores.<br /><br />Meltdown: nothing is require=
d for AMD CPUs. Unmapping the privileged addresses when returning to usersp=
ace is sufficient for Intel, but incurs a big performance overhead for sysc=
alls. The same is likely true for any other affected CPUs.<br /><br /> - Jo=
nathan Morton<br /><br /></p>=0A</div></font>
------=_20180105103545000000_73081--