[Cake] Using cake to shape 1000’s of users.
Jonathan Morton
chromatix99 at gmail.com
Sat Jul 28 04:06:17 EDT 2018
> There are some older backhaul routers still with 2.6.26.8(!) although those are being phased out so don’t count them. More current ones use 3.16.7 and there’s some discussion but I’m not sure what/when the upgrade plan is. I think the Internet router uses a more modern Debian 9 which is likely to have a 4.9 series, but I don’t have access to it.
>
> FreeNet might be unusual in that administration is distributed. Volunteer members administer backhaul routers which carry some number of customers, average a few dozen or so. These do routing/firewalling/monitoring and QoS, when they’re a bottleneck. They use either an ancient esfq (has per-IP fairness though) on older kernels and sfq when it’s not available, which is now more common. There are arguments (heated ones, I’ve heard) about centralizing administrative functions, including QoS, at or near the Internet gateway, but I think this would have to depend on over-provisioning the backhaul. I’ve volunteered to modernize the QoS when I can. If ISP flavored Cake doesn’t happen, it will end up being HTB+sfq|fq_codel|cake depending on what the kernel supports.
This sounds like a relatively complex network topology, in which there are a lot of different potential bottlenecks, depending on the dynamic state of the network. But it's encouraging to hear that you have *some* sort of solution in place, even if it's using rather old techniques at present.
I think we should treat wired and wireless backhaul separately here. By wireless I specifically mean shared-medium links which are, by design, half-duplex.
>> If all users will have the same link-layer technology (with the same overhead parameters), then these can be set globally - or if not, they can be set per-tier.
>
> Afaik almost all members use WiFi, possibly some Ethernet. There are no tiers and no per-member rate limits.
Okay, so straight away we're in a significantly different regime from CoverFire's situation, where the subscribers each have a defined link bandwidth (which may or may not be related to the capabilities of the underlying physical link). You simply need to share some backhaul link fairly between subscribers using it at any given moment.
I haven't yet considered whether this capability might fall naturally out of the implementation of a shaped-subscriber model set to infinite rate, or some other sane default. If not, a separate algorithm will be required for that case. But it's useful to have it on the radar.
>> Is the Diffserv support from Cake likely to be useful, and if so how flexible should the configuration be?
>
> Currently the root qdisc on backhaul routers is a prio with "bands 3 priomap 2 2 2 2 2 2 0 0 2 2 2 2 2 2 2 2”. I don’t think we’d want DSCP for anything, with the possible exception of voip, and even then there might just be a special IP/port rule for the asterisk server.
Let's call this one a vote for "diffserv not required", since DRR++ copes well by prioritising sparse traffic.
>> And are there only a few discrete settings for bandwidth per user, or do we have to be more flexible to handle a BRAS environment?
>
> Mainly in this case only fairness between members is needed. There’s a db mapping members to their MAC addresses (usually one to one but not always).
Could you convert that DB to eBPF rules? This would let you use the same configuration interface as CoverFire's situation.
>> Is it also necessary to account per-user traffic accurately, or will an external tool be used for that?
>
> It would be better than what is done now (counting per-IP, which when a customer has multiple MACs makes it harder to interpret).
It appears that this can also be done within eBPF. Don't ask me the details of how; I haven't yet looked into what eBPF is actually capable of.
> Lastly, do you think a better shaper for point-to-point WiFi is within the scope of this project? This might be more needed for WISPs, but here’s why:
>
> - If we still have bottlenecks in the backhaul, we’ll need to keep doing QoS there, and for FreeNet that means soft rate limiting, because the WiFi devices have to keep running airOS for its management tools.
> - If you only rate limit on egress, you lose at least half your available throughput.
> - You can run egress and ingress through a common IFB, but in my testing, TCP RTT (rrul_be with —socket-stats) gets higher.
> - I find it works better to use HTB at the root and have HTB+cake as leaf queues, then TCP RTT is roughly cut in half.
> - But it would be even better if the shaper understood an approximation of airtime, because aggregate throughput changes based on the balance of up/down traffic in the case where up/down rates are stable but asymmetric, which FreeNet sometimes has.
> - And, I’d rather not have to use HTB, and be able to use the better deficit mode shaper in Cake.
>
> I know that WiFi has many complexities where soft rate limiting can’t ever be perfect without knowledge from the driver, but I still think it could be useful to have something that does a better approximation than what we have today. The question is, is that something that could be part of this project, or not… :)
In general, I think the make-wifi-fast team has a better handle on the subtleties of half-duplex links. If there's any way to get their work into the airOS devices, so much the better.
Otherwise, your problem basically boils down to the problem of choosing a suitable rate limit, which you can dynamically update Cake's configuration with from userspace (tc qdisc change…) without losing any packets or fairness state. Since the out-of-tree version of "normal" Cake already works with relatively old kernels, this seems like a good way to deploy a worthwhile improvement. If the traffic load on backhaul links is complex enough (more than about several hundred *simultaneous* flows) to overstress Cake's flow-isolation capabilities, you might try using its "src-host" and "dst-host" modes instead of "flows" or "dual-src/dst".
But I think there's also a use for "ISP-type" Cake in your network, especially in the wired backhauls where link bandwidth is relatively predictable, and I suspect most of these will be in the core parts of your network where the traffic is most complex. As long as you can get around to running a suitable kernel version, there should also be no inherent problem with replicating the same dynamic adjustment of global shaper rate as "normal" Cake has, which will allow it to be used on some of your wireless links as well.
- Jonathan Morton
More information about the Cake
mailing list