[Bloat] Hardware upticks

Tue Jan 5 16:36:10 EST 2016

On Tue, Jan 5, 2016 at 12:57 PM, Dave Täht <dave at taht.net> wrote:

>
>
> On 1/5/16 10:27 AM, Jonathan Morton wrote:
> > Undoubtedly.  But that beefy quad-core CPU should be able to handle it
> > without them.
>
> Sigh. It's not just the CPU that matters. Context switch time, memory
> bus and I/O bus architecture, the intelligence or lack thereof of the
> network interface, and so on.
>
> To give a real world case of stupidity in a hardware design - the armada
> 385 in the linksys platform connects tx and rx packet related interrupts
> to a single interrupt line, requiring that tx and rx ring buffer cleanup
> (in particular) be executed on a single cpu, *at the same time, in a
> dedicated thread*.
>
> Saving a single pin (which doesn't even exist off chip) serializes
> tx and rx processing. DUMB. (I otherwise quite like much of the marvel
> ethernet design and am looking forward to the turris omnia very much)
>
> ...
>
> Context switch time is probably one of the biggest hidden nightmares in
> modern OOO cpu architectures - they only go fast in a straight line. I'd
> love to see a 1ghz processor that could context switch in 5 cycles.
>

Seeing that most modern CPUs take thousands to tens of thousands of cycles
to switch, 5 is similar to saying "instantly". Some of that overhead is
shooting down the TLB and many layers of cache misses. You can't have
different virtual memory space and not take some large switching overhead
without devoting a lot of transistors to massive caches. And the larger the
caches, the higher the latency.

Modern PC hardware can use soft interrupts to reduce hardware interrupts
and context switching. My Intel i350 issues a steady about 150 interrupts
per second per core regardless the network load, while maintaining tens of
microsecond ping times.

I'm not sure what they could do with custom architectures, but there will
always be an issue with context switching overhead, but they may be able to
cache a few specific contexts knowing that the embedded system will rarely
have more than a few contexts doing the bulk of the work.

>
> Having 4 cores responding to interrupts masks this latency somewhat
> when having multiple sources of interrupt contending... (but see above -
> you need dedicated interrupt lines per major source of interrupts for
> it to work)
>
> and the inherent context switch latency is still always there. (sound
> cheshire's rant)
>
> The general purpose "mainstream" processors not handling interrupts well
> anymore is one of the market drivers towards specialized co-processors.
>
> ...
>
> Knowing broadcom, there's probably so many invasive offloads, bugs
> and errata in this new chip that 90% of the features will never be
> used. But "forwarding in-inspected, un-firewalled, packets in massive
> bulk so as to win a benchmark race", ok, happy they are trying.
>
> Maybe they'll even publish a data sheet worth reading.
>
> >
> > - Jonathan Morton
> >
> >
> >
> > _______________________________________________
> > Bloat mailing list
> > Bloat at lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
> >
> _______________________________________________
> Bloat mailing list
> Bloat at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20160105/7f909032/attachment.html>