Depends on how short of a timescale you're talking about. Shared global state that is being read and written to very quickly by multiple threads is bad enough for a single package system, but when you start getting to something like an AMD Ryzen or NUMA, shared global state becomes really expensive. Accuracy is expensive. Loosen the accuracy and gain scalability.

I would be interested in the pseduo-code or high level of what state needs to be shared and how that state is used.

I was also thinking more of some hybrid. Instead of a "token" representing a bucked amount of bandwidth that can be immediately used, I was thinking more of like a "future" of bandwidth that could be used. So instead of saying "here's a token of bandwidth", you have each core doing it's own deficit bandwidth shaping, but when a token is received, a core can temporarily increase its assigned shaping bandwidth. If I remember correctly, cake already supports having its bandwidth changed on the fly.

Of course it may be simpler to say cake is meant to be used on no more than 8 cores with a non-numa CPU system with all cores having a shared low-latency cache connecting the cores.

On Mon, Mar 6, 2017 at 8:44 AM, Jonathan Morton <chromatix99@gmail.com> wrote:

> On 6 Mar, 2017, at 15:30, Benjamin Cronce <bcronce@gmail.com> wrote:
>
> You could treat it like task stealing, except each core can generate tokens that represent a quantum of bandwidth that is only valid for some interval.

You’re obviously thinking of a token-bucket based shaper here. CAKE uses a deficit-mode shaper which deliberately works a different way - it’s more accurate on short timescales, and this actually makes a positive difference in several important cases.

The good news is that there probably is a way to explicitly and efficiently share bandwidth in any desired ratio across different CAKE instances, assuming a shared-memory location can be established. I don’t presently have the mental bandwidth to actually try doing that, though.

- Jonathan Morton