[Cake] profiling using perf

Adrian Popescu adriannnpopescu at gmail.com
Mon Mar 11 10:49:37 EDT 2019


Hello,

On Sat, Mar 9, 2019 at 6:03 PM Toke Høiland-Jørgensen <toke at redhat.com>
wrote:

> Georgios Amanakis <gamanakis at gmail.com> writes:
>
> > Dear List,
> >
> > I made an effort to profile the performance of cake with perf in
> > openwrt. perf was run on a WRT1900ACS router while downloading
> > archlinux.iso via torrent in a LAN client. You can find the annotated
> > sch_cake.c in the attachment as well as a performance histogram of
> > sch_cake (percentages are relative to sch_cake). Hopefully people can
> > take a look at it, and see if there are performance concerns.
>
> Hmm, nothing immediately jumps out as low-hanging fruit to be harvested.
> It's not too surprising the 200+-line cake_dequeue() is where most time
> is spent, since that is where the bulk of the algorithm is implemented.
>

> And, well, there's nothing in there that can obviously be removed unless
> we want to drop features. I guess one could try to make it possible to
> disable features at compile time; but that carries quite a bit of
> complexity with it (for one, it needs testing with the combinatorial
> explosion of possible configurations), so don't think it's realistic.
> The only exception *might* be a compile time option to turn off those
> stats that are not needed for the algorithm to run...
>

The algorithm itself has probably been optimized over the years. It
might be a good idea to think of other ways to perform some
operations and simplify the algorithm. The code may not be that
slow on a high end CPU such as a Core i5 and anything faster.

The problem with the current implementation is that it's not able to
saturate a gigabit connection even on dual core ARM routers with
frequencies above 1.2 GHz. Routers for home users are probably going
to rely on hardware offloads to saturate gigabit connections for a
long time. This doesn't mean cake is poorly optimized or poorly
implemented. It's not a good fit for small embedded systems with small
CPU caches.

Different data structures might help improve performance.

This is why I've run a bunch of tests over the last few weeks. My
conclusion is that the current version of cake can't deal with more
than 100 mbps on ar71xx. mt7621 seems to go up to about 200 mbps.

I was thinking of a few things to try:
- disable some stats and profile
- lower the number of queues from 1024 to 256
- look into profiling to figure out what's causing cache misses
- disable some features and profile again
- set up a lab for all this testing

It's hard to find the time to do all of this. There's a lot to learn
in the process.


>
> -Toke
> _______________________________________________
> Cake mailing list
> Cake at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cake/attachments/20190311/a3aabbf7/attachment.html>


More information about the Cake mailing list