On Tue, Jan 5, 2016 at 12:57 PM, Dave Täht <dave@taht.net> wrote:


On 1/5/16 10:27 AM, Jonathan Morton wrote:
> Undoubtedly.  But that beefy quad-core CPU should be able to handle it
> without them.

Sigh. It's not just the CPU that matters. Context switch time, memory
bus and I/O bus architecture, the intelligence or lack thereof of the
network interface, and so on.

To give a real world case of stupidity in a hardware design - the armada
385 in the linksys platform connects tx and rx packet related interrupts
to a single interrupt line, requiring that tx and rx ring buffer cleanup
(in particular) be executed on a single cpu, *at the same time, in a
dedicated thread*.

Saving a single pin (which doesn't even exist off chip) serializes
tx and rx processing. DUMB. (I otherwise quite like much of the marvel
ethernet design and am looking forward to the turris omnia very much)

...

Context switch time is probably one of the biggest hidden nightmares in
modern OOO cpu architectures - they only go fast in a straight line. I'd
love to see a 1ghz processor that could context switch in 5 cycles.

Seeing that most modern CPUs take thousands to tens of thousands of cycles to switch, 5 is similar to saying "instantly". Some of that overhead is shooting down the TLB and many layers of cache misses. You can't have different virtual memory space and not take some large switching overhead without devoting a lot of transistors to massive caches. And the larger the caches, the higher the latency.

Modern PC hardware can use soft interrupts to reduce hardware interrupts and context switching. My Intel i350 issues a steady about 150 interrupts per second per core regardless the network load, while maintaining tens of microsecond ping times. 

I'm not sure what they could do with custom architectures, but there will always be an issue with context switching overhead, but they may be able to cache a few specific contexts knowing that the embedded system will rarely have more than a few contexts doing the bulk of the work.
 

Having 4 cores responding to interrupts masks this latency somewhat
when having multiple sources of interrupt contending... (but see above -
you need dedicated interrupt lines per major source of interrupts for
it to work)

and the inherent context switch latency is still always there. (sound
cheshire's rant)

The general purpose "mainstream" processors not handling interrupts well
anymore is one of the market drivers towards specialized co-processors.

...

Knowing broadcom, there's probably so many invasive offloads, bugs
and errata in this new chip that 90% of the features will never be
used. But "forwarding in-inspected, un-firewalled, packets in massive
bulk so as to win a benchmark race", ok, happy they are trying.

Maybe they'll even publish a data sheet worth reading.

>
> - Jonathan Morton
>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat