[Bloat] [Cake] are anyone playing with dpdk and vpp?

Wed Apr 27 15:50:27 EDT 2016

Not really relevant to this thread, probably, was this very good
article on scaling linux to many cores:

https://blog.acolyer.org/2016/04/26/the-linux-scheduler-a-decade-of-wasted-cores/

I still like the idea of making single threaded cpus better, but only
the millcomputer even comes close to trying, effectively.

On Wed, Apr 27, 2016 at 12:45 PM, Dave Taht <dave.taht at gmail.com> wrote:
> On Wed, Apr 27, 2016 at 12:32 PM, Stephen Hemminger
> <stephen at networkplumber.org> wrote:
>> DPDK gets impressive performance on large systems (like 14M packets/sec per
>> core), but not convinced on smaller systems.
>
> My take on dpdk has been mostly that it's a great way to heat data
> centers. Still I would really like to see these advanced algorithms
> (cake, pie, fq_codel, htb) tested on it at these higher speeds.
>
> And I still have great hope for cheap, FPGA-assisted designs that
> could one day be turned into asics, but not as much as I did last year
> when I first started fiddling with the meshsr onenetswitch. I really
> wish I could find a few good EE's to tackle making something fq_codel
> like work on the netfpga project, the proof of concept verilog already
> exists for DRR and AQM technologies.
>
>> Performance depends on having good CPU cache. I get poor performance on Atom
>> etc.
>
> I had hoped that the rangeley class atoms would do better on dpdk, as
> they do I/O direct to cache. I am not sure which processors that is
> actually in, anymore.
>
>> Also driver support is limited (mostly 10G and above)
>
> Well, as we push end-user class devices to 1GigE, we are having issues
> with overuse of offloads to get there, and in terms
> of PPS, certainly pushing small packets is becoming a problem, on
> ethernet and wifi. I would like to see a 100 dollar router that could
> do full PPS at that speed, feeding fiber and going over 802.11ac, and
> we are quite far from there. I see, for example, that meraki is using
> click (I think) to push more processing into userspace.
>
> Also the time for a packet to transit linux from read to write is
> "interesting". Last I looked it was something like 42 function calls
> in the path to "get there", and some of my benchmarks on both the c2
> and apu2 are showing that that time is significant enough for fq_codel
> to start kicking in to compensate. (which is kind of cool to see the
> packet processing adapt to the cpu load, actually - and I still long
> for timestamping on rx directly to adapt ever better)
>
> I have also acquired a mild dislike for seeing stuff like this:
>
> where the tx and rx rings are cleaned up in the same thread and there
> is only one interrupt line for both.
>
>   51:         18      59244     253350     314273   PCI-MSI
> 1572865-edge      enp3s0-TxRx-0
>   52:          5     484274     141746     197260   PCI-MSI
> 1572866-edge      enp3s0-TxRx-1
>   53:          9     152225      29943     436749   PCI-MSI
> 1572867-edge      enp3s0-TxRx-2
>   54:         22      54327     299670     360356   PCI-MSI
> 1572868-edge      enp3s0-TxRx-3
>   56:     525343     513165    2355680     525593   PCI-MSI
> 2097152-edge      ath10k_pci
>
> and the ath10k only uses one interrupt. Maybe I'm wrong on my
> assumptions, I'd think in today's multi-core environment that
> processing tx and rx separately might be a win. (?)
>
> I keep hoping for on-board assist for routing table lookups on
> something - your classic cam - for example. I saw today that there has
> been some work on getting source specific routing into dpdk, which
> makes me happy -
>
> https://www.ietf.org/proceedings/95/slides/slides-95-hackathon-18.pdf
>
> which is, incidentally, where I found the reference to the vpp stuff.
>
> https://www.ietf.org/blog/author/jari/
>
>
>>
>> On Wed, Apr 27, 2016 at 12:28 PM, Aaron Wood <woody77 at gmail.com> wrote:
>>>
>>> I'm looking at DPDK for a project, but I think I can make substantial
>>> gains with just AF_PACKET + FANOUT and SO_REUSEPORT.  It's not clear to my
>>> yet how much DPDK is going to gain over those (and those can go a long way
>>> on higher-powered platforms).
>>>
>>> On lower-end systems, I'm more suspicious of the memory bus (and the cache
>>> in particular), than I am the raw CPU power.
>>>
>>> -Aaron
>>>
>>> On Wed, Apr 27, 2016 at 11:57 AM, Dave Taht <dave.taht at gmail.com> wrote:
>>>>
>>>> https://fd.io/technology seems to have come a long way.
>>>>
>>>> --
>>>> Dave Täht
>>>> Let's go make home routers and wifi faster! With better software!
>>>> http://blog.cerowrt.org
>>>> _______________________________________________
>>>> Bloat mailing list
>>>> Bloat at lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/bloat
>>>
>>>
>>>
>>> _______________________________________________
>>> Cake mailing list
>>> Cake at lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cake
>>>
>>
>
>
>
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org