From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-x232.google.com (mail-yk0-x232.google.com [IPv6:2607:f8b0:4002:c07::232]) by lists.bufferbloat.net (Postfix) with ESMTPS id 946723B2DD for ; Tue, 5 Jan 2016 16:36:11 -0500 (EST) Received: by mail-yk0-x232.google.com with SMTP id a85so218721443ykb.1 for ; Tue, 05 Jan 2016 13:36:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=HWSnNkKntAJggLFhagpdrWsbOqDUsH/WsSdBwhiGbyk=; b=WvLx1mPp90kpjAH0qDQAleVn2rghtjkjWkXdRi2aq9oD552MJ9ik8jV2RFMN4LD6u0 0ohWkpSOHWjzGLqZrOhJQI5AGD0O9VFkaj8KutOsID1XzMVUOLHEK9f0Ule8HRJ47zbt +XE0ZCitbC/DJmlnceup31SgFj+FbcAIhQx6NA+p4vOfhV3CVwPbrST2pfI183GC3amO mMmOkeIoB9q5aJxxlfqAH4P+Ioe9oOgrd0WnLPX4fi2dSlcJBLfwW8f1kiVgXifIff4M 9aEouoI1sHv6mqFMREMCgdbDxgqBxi8MrduX2rpxmXPP3TqUjrpS0ngrKOaD8tW/i0fL g9kw== MIME-Version: 1.0 X-Received: by 10.129.40.77 with SMTP id o74mr79592026ywo.9.1452029770266; Tue, 05 Jan 2016 13:36:10 -0800 (PST) Received: by 10.13.222.196 with HTTP; Tue, 5 Jan 2016 13:36:10 -0800 (PST) In-Reply-To: <568C1209.5050403@taht.net> References: <165DB6E9-A6DD-4626-B6AA-E1B2DBA0B5FA@gmail.com> <568C1209.5050403@taht.net> Date: Tue, 5 Jan 2016 15:36:10 -0600 Message-ID: From: Benjamin Cronce To: =?UTF-8?Q?Dave_T=C3=A4ht?= Cc: bloat Content-Type: multipart/alternative; boundary=001a114221cea9017005289d06d9 Subject: Re: [Bloat] Hardware upticks X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jan 2016 21:36:11 -0000 --001a114221cea9017005289d06d9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Tue, Jan 5, 2016 at 12:57 PM, Dave T=C3=A4ht wrote: > > > On 1/5/16 10:27 AM, Jonathan Morton wrote: > > Undoubtedly. But that beefy quad-core CPU should be able to handle it > > without them. > > Sigh. It's not just the CPU that matters. Context switch time, memory > bus and I/O bus architecture, the intelligence or lack thereof of the > network interface, and so on. > > To give a real world case of stupidity in a hardware design - the armada > 385 in the linksys platform connects tx and rx packet related interrupts > to a single interrupt line, requiring that tx and rx ring buffer cleanup > (in particular) be executed on a single cpu, *at the same time, in a > dedicated thread*. > > Saving a single pin (which doesn't even exist off chip) serializes > tx and rx processing. DUMB. (I otherwise quite like much of the marvel > ethernet design and am looking forward to the turris omnia very much) > > ... > > Context switch time is probably one of the biggest hidden nightmares in > modern OOO cpu architectures - they only go fast in a straight line. I'd > love to see a 1ghz processor that could context switch in 5 cycles. > Seeing that most modern CPUs take thousands to tens of thousands of cycles to switch, 5 is similar to saying "instantly". Some of that overhead is shooting down the TLB and many layers of cache misses. You can't have different virtual memory space and not take some large switching overhead without devoting a lot of transistors to massive caches. And the larger the caches, the higher the latency. Modern PC hardware can use soft interrupts to reduce hardware interrupts and context switching. My Intel i350 issues a steady about 150 interrupts per second per core regardless the network load, while maintaining tens of microsecond ping times. I'm not sure what they could do with custom architectures, but there will always be an issue with context switching overhead, but they may be able to cache a few specific contexts knowing that the embedded system will rarely have more than a few contexts doing the bulk of the work. > > Having 4 cores responding to interrupts masks this latency somewhat > when having multiple sources of interrupt contending... (but see above - > you need dedicated interrupt lines per major source of interrupts for > it to work) > > and the inherent context switch latency is still always there. (sound > cheshire's rant) > > The general purpose "mainstream" processors not handling interrupts well > anymore is one of the market drivers towards specialized co-processors. > > ... > > Knowing broadcom, there's probably so many invasive offloads, bugs > and errata in this new chip that 90% of the features will never be > used. But "forwarding in-inspected, un-firewalled, packets in massive > bulk so as to win a benchmark race", ok, happy they are trying. > > Maybe they'll even publish a data sheet worth reading. > > > > > - Jonathan Morton > > > > > > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > --001a114221cea9017005289d06d9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Tue, Jan 5, 2016 at 12:57 PM, Dave T=C3=A4ht <<= a href=3D"mailto:dave@taht.net" target=3D"_blank">dave@taht.net> wrote:


On 1/5/16 10:27 AM, Jonathan Morton wrote:
> Undoubtedly.=C2=A0 But that beefy quad-core CPU should be able to hand= le it
> without them.

Sigh. It's not just the CPU that matters. Context switch time, m= emory
bus and I/O bus architecture, the intelligence or lack thereof of the
network interface, and so on.

To give a real world case of stupidity in a hardware design - the armada 385 in the linksys platform connects tx and rx packet related interrupts to a single interrupt line, requiring that tx and rx ring buffer cleanup (in particular) be executed on a single cpu, *at the same time, in a
dedicated thread*.

Saving a single pin (which doesn't even exist off chip) serializes
tx and rx processing. DUMB. (I otherwise quite like much of the marvel
ethernet design and am looking forward to the turris omnia very much)

...

Context switch time is probably one of the biggest hidden nightmares in
modern OOO cpu architectures - they only go fast in a straight line. I'= d
love to see a 1ghz processor that could context switch in 5 cycles.

Seeing that most modern CPUs take thousands to= tens of thousands of cycles to switch, 5 is similar to saying "instan= tly". Some of that overhead is shooting down the TLB and many layers o= f cache misses. You can't have different virtual memory space and not t= ake some large switching overhead without devoting a lot of transistors to = massive caches. And the larger the caches, the higher the latency.

Modern PC hardware can use soft interrupts to reduce hardw= are interrupts and context switching. My Intel i350 issues a steady about 1= 50 interrupts per second per core regardless the network load, while mainta= ining tens of microsecond ping times.=C2=A0

I'= m not sure what they could do with custom architectures, but there will alw= ays be an issue with context switching overhead, but they may be able to ca= che a few specific contexts knowing that the embedded system will rarely ha= ve more than a few contexts doing the bulk of the work.
=C2=A0

Having 4 cores responding to interrupts masks this latency somewhat
when having multiple sources of interrupt contending... (but see above - you need dedicated interrupt lines per major source of interrupts for
it to work)

and the inherent context switch latency is still always there. (sound
cheshire's rant)

The general purpose "mainstream" processors not handling interrup= ts well
anymore is one of the market drivers towards specialized co-processors.

...

Knowing broadcom, there's probably so many invasive offloads, bugs
and errata in this new chip that 90% of the features will never be
used. But "forwarding in-inspected, un-firewalled, packets in massive<= br> bulk so as to win a benchmark race", ok, happy they are trying.

Maybe they'll even publish a data sheet worth reading.

>
> - Jonathan Morton
>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat= .net
> https://lists.bufferbloat.net/listinfo/bloat >
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net<= /a>
https://lists.bufferbloat.net/listinfo/bloat

--001a114221cea9017005289d06d9--