From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bcronce@gmail.com>
Received: from mail-yk0-x232.google.com (mail-yk0-x232.google.com
 [IPv6:2607:f8b0:4002:c07::232])
 by lists.bufferbloat.net (Postfix) with ESMTPS id 946723B2DD
 for <bloat@lists.bufferbloat.net>; Tue,  5 Jan 2016 16:36:11 -0500 (EST)
Received: by mail-yk0-x232.google.com with SMTP id a85so218721443ykb.1
 for <bloat@lists.bufferbloat.net>; Tue, 05 Jan 2016 13:36:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=HWSnNkKntAJggLFhagpdrWsbOqDUsH/WsSdBwhiGbyk=;
 b=WvLx1mPp90kpjAH0qDQAleVn2rghtjkjWkXdRi2aq9oD552MJ9ik8jV2RFMN4LD6u0
 0ohWkpSOHWjzGLqZrOhJQI5AGD0O9VFkaj8KutOsID1XzMVUOLHEK9f0Ule8HRJ47zbt
 +XE0ZCitbC/DJmlnceup31SgFj+FbcAIhQx6NA+p4vOfhV3CVwPbrST2pfI183GC3amO
 mMmOkeIoB9q5aJxxlfqAH4P+Ioe9oOgrd0WnLPX4fi2dSlcJBLfwW8f1kiVgXifIff4M
 9aEouoI1sHv6mqFMREMCgdbDxgqBxi8MrduX2rpxmXPP3TqUjrpS0ngrKOaD8tW/i0fL
 g9kw==
MIME-Version: 1.0
X-Received: by 10.129.40.77 with SMTP id o74mr79592026ywo.9.1452029770266;
 Tue, 05 Jan 2016 13:36:10 -0800 (PST)
Received: by 10.13.222.196 with HTTP; Tue, 5 Jan 2016 13:36:10 -0800 (PST)
In-Reply-To: <568C1209.5050403@taht.net>
References: <165DB6E9-A6DD-4626-B6AA-E1B2DBA0B5FA@gmail.com>
 <CALQXh-PKLhLC=r3-nGBNq_rUYgDKknsHfxHBu+TsGzLUWCVUSw@mail.gmail.com>
 <CAJq5cE3F37u1AGOO8dvGWkd4y=oqqtiTjZJ_7se+=rygdsW3hw@mail.gmail.com>
 <568C1209.5050403@taht.net>
Date: Tue, 5 Jan 2016 15:36:10 -0600
Message-ID: <CAJ_ENFHA++fSosEY5RDR=82H5XdkQ8P31Nhksrq7OA_Ds_EBZg@mail.gmail.com>
From: Benjamin Cronce <bcronce@gmail.com>
To: =?UTF-8?Q?Dave_T=C3=A4ht?= <dave@taht.net>
Cc: bloat <bloat@lists.bufferbloat.net>
Content-Type: multipart/alternative; boundary=001a114221cea9017005289d06d9
Subject: Re: [Bloat] Hardware upticks
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 05 Jan 2016 21:36:11 -0000

--001a114221cea9017005289d06d9
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 5, 2016 at 12:57 PM, Dave T=C3=A4ht <dave@taht.net> wrote:

>
>
> On 1/5/16 10:27 AM, Jonathan Morton wrote:
> > Undoubtedly.  But that beefy quad-core CPU should be able to handle it
> > without them.
>
> Sigh. It's not just the CPU that matters. Context switch time, memory
> bus and I/O bus architecture, the intelligence or lack thereof of the
> network interface, and so on.
>
> To give a real world case of stupidity in a hardware design - the armada
> 385 in the linksys platform connects tx and rx packet related interrupts
> to a single interrupt line, requiring that tx and rx ring buffer cleanup
> (in particular) be executed on a single cpu, *at the same time, in a
> dedicated thread*.
>
> Saving a single pin (which doesn't even exist off chip) serializes
> tx and rx processing. DUMB. (I otherwise quite like much of the marvel
> ethernet design and am looking forward to the turris omnia very much)
>
> ...
>
> Context switch time is probably one of the biggest hidden nightmares in
> modern OOO cpu architectures - they only go fast in a straight line. I'd
> love to see a 1ghz processor that could context switch in 5 cycles.
>

Seeing that most modern CPUs take thousands to tens of thousands of cycles
to switch, 5 is similar to saying "instantly". Some of that overhead is
shooting down the TLB and many layers of cache misses. You can't have
different virtual memory space and not take some large switching overhead
without devoting a lot of transistors to massive caches. And the larger the
caches, the higher the latency.

Modern PC hardware can use soft interrupts to reduce hardware interrupts
and context switching. My Intel i350 issues a steady about 150 interrupts
per second per core regardless the network load, while maintaining tens of
microsecond ping times.

I'm not sure what they could do with custom architectures, but there will
always be an issue with context switching overhead, but they may be able to
cache a few specific contexts knowing that the embedded system will rarely
have more than a few contexts doing the bulk of the work.


>
> Having 4 cores responding to interrupts masks this latency somewhat
> when having multiple sources of interrupt contending... (but see above -
> you need dedicated interrupt lines per major source of interrupts for
> it to work)
>
> and the inherent context switch latency is still always there. (sound
> cheshire's rant)
>
> The general purpose "mainstream" processors not handling interrupts well
> anymore is one of the market drivers towards specialized co-processors.
>
> ...
>
> Knowing broadcom, there's probably so many invasive offloads, bugs
> and errata in this new chip that 90% of the features will never be
> used. But "forwarding in-inspected, un-firewalled, packets in massive
> bulk so as to win a benchmark race", ok, happy they are trying.
>
> Maybe they'll even publish a data sheet worth reading.
>
> >
> > - Jonathan Morton
> >
> >
> >
> > _______________________________________________
> > Bloat mailing list
> > Bloat@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
> >
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>

--001a114221cea9017005289d06d9
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><div class=3D"gmail_extra"><br><div class=3D"gmail_quo=
te">On Tue, Jan 5, 2016 at 12:57 PM, Dave T=C3=A4ht <span dir=3D"ltr">&lt;<=
a href=3D"mailto:dave@taht.net" target=3D"_blank">dave@taht.net</a>&gt;</sp=
an> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex"><span class=3D""><br>
<br>
On 1/5/16 10:27 AM, Jonathan Morton wrote:<br>
&gt; Undoubtedly.=C2=A0 But that beefy quad-core CPU should be able to hand=
le it<br>
&gt; without them.<br>
<br>
</span>Sigh. It&#39;s not just the CPU that matters. Context switch time, m=
emory<br>
bus and I/O bus architecture, the intelligence or lack thereof of the<br>
network interface, and so on.<br>
<br>
To give a real world case of stupidity in a hardware design - the armada<br=
>
385 in the linksys platform connects tx and rx packet related interrupts<br=
>
to a single interrupt line, requiring that tx and rx ring buffer cleanup<br=
>
(in particular) be executed on a single cpu, *at the same time, in a<br>
dedicated thread*.<br>
<br>
Saving a single pin (which doesn&#39;t even exist off chip) serializes<br>
tx and rx processing. DUMB. (I otherwise quite like much of the marvel<br>
ethernet design and am looking forward to the turris omnia very much)<br>
<br>
...<br>
<br>
Context switch time is probably one of the biggest hidden nightmares in<br>
modern OOO cpu architectures - they only go fast in a straight line. I&#39;=
d<br>
love to see a 1ghz processor that could context switch in 5 cycles.<br></bl=
ockquote><div><br></div><div>Seeing that most modern CPUs take thousands to=
 tens of thousands of cycles to switch, 5 is similar to saying &quot;instan=
tly&quot;. Some of that overhead is shooting down the TLB and many layers o=
f cache misses. You can&#39;t have different virtual memory space and not t=
ake some large switching overhead without devoting a lot of transistors to =
massive caches. And the larger the caches, the higher the latency.</div><di=
v><br></div><div>Modern PC hardware can use soft interrupts to reduce hardw=
are interrupts and context switching. My Intel i350 issues a steady about 1=
50 interrupts per second per core regardless the network load, while mainta=
ining tens of microsecond ping times.=C2=A0</div><div><br></div><div>I&#39;=
m not sure what they could do with custom architectures, but there will alw=
ays be an issue with context switching overhead, but they may be able to ca=
che a few specific contexts knowing that the embedded system will rarely ha=
ve more than a few contexts doing the bulk of the work.</div><div>=C2=A0</d=
iv><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left=
:1px #ccc solid;padding-left:1ex">
<br>
Having 4 cores responding to interrupts masks this latency somewhat<br>
when having multiple sources of interrupt contending... (but see above -<br=
>
you need dedicated interrupt lines per major source of interrupts for<br>
it to work)<br>
<br>
and the inherent context switch latency is still always there. (sound<br>
cheshire&#39;s rant)<br>
<br>
The general purpose &quot;mainstream&quot; processors not handling interrup=
ts well<br>
anymore is one of the market drivers towards specialized co-processors.<br>
<br>
...<br>
<br>
Knowing broadcom, there&#39;s probably so many invasive offloads, bugs<br>
and errata in this new chip that 90% of the features will never be<br>
used. But &quot;forwarding in-inspected, un-firewalled, packets in massive<=
br>
bulk so as to win a benchmark race&quot;, ok, happy they are trying.<br>
<br>
Maybe they&#39;ll even publish a data sheet worth reading.<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
&gt;<br>
&gt; - Jonathan Morton<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; Bloat mailing list<br>
&gt; <a href=3D"mailto:Bloat@lists.bufferbloat.net">Bloat@lists.bufferbloat=
.net</a><br>
&gt; <a href=3D"https://lists.bufferbloat.net/listinfo/bloat" rel=3D"norefe=
rrer" target=3D"_blank">https://lists.bufferbloat.net/listinfo/bloat</a><br=
>
&gt;<br>
_______________________________________________<br>
Bloat mailing list<br>
<a href=3D"mailto:Bloat@lists.bufferbloat.net">Bloat@lists.bufferbloat.net<=
/a><br>
<a href=3D"https://lists.bufferbloat.net/listinfo/bloat" rel=3D"noreferrer"=
 target=3D"_blank">https://lists.bufferbloat.net/listinfo/bloat</a><br>
</div></div></blockquote></div><br></div></div>

--001a114221cea9017005289d06d9--