[Bloat] Thanks to developers / htb+fq_codel ISP shaper

Toke Høiland-Jørgensen toke at toke.dk
Thu Jan 21 06:14:31 EST 2021


Robert Chacon <robert.chacon at jackrabbitwireless.com> writes:

> Toke,
>
> Thank you very much for pointing me in the right direction.
> I am having some fun in the lab tinkering with the 'mq' qdisc and Jesper's
> xdp-cpumap-tc.
> It seems I will need to use iptables or nftables to filter packets to
> corresponding queues, since mq apparently cannot have u32 filters on its
> root.
> I will try to familiarize myself with iptables and nftables, and hopefully
> get it working soon and report back. Thank you!

Cool - adding in Jesper, maybe he has some input on this :)

-Toke


> On Fri, Jan 15, 2021 at 5:30 AM Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>
>> Robert Chacon <robert.chacon at jackrabbitwireless.com> writes:
>>
>> >> Cool! What kind of performance are you seeing? The README mentions being
>> >> limited by the BPF hash table size, but can you actually shape 2000
>> >> customers on one machine? On what kind of hardware and at what rate(s)?
>> >
>> > On our production network our peak throughput is 1.5Gbps from 200
>> clients,
>> > and it works very well.
>> > We use a simple consumer-class AMD 2700X CPU in production because
>> > utilization of the shaper VM is ~15% at 1.5Gbps load.
>> > Customers get reliably capped within ±2Mbps of their allocated
>> htb/fq_codel
>> > bandwidth, which is very helpful to control network congestion.
>> >
>> > Here are some graphs from RRUL performed on our test bench hypervisor:
>> >
>> https://raw.githubusercontent.com/rchac/LibreQoS/main/docs/fq_codel_1000_subs_4G.png
>> > In that example, bandwidth for the "subscriber" client VM was set to
>> 4Gbps.
>> > 1000 IPv4 IPs and 1000 IPv6 IPs were in the filter hash table of
>> LibreQoS.
>> > The test bench server has an AMD 3900X running Ubuntu in Proxmox. 4Gbps
>> > utilizes 10% of the VM's 12 cores. Paravirtualized VirtIO network drivers
>> > are used and most offloading types are enabled.
>> > In our setup, VM networking multiqueue isn't enabled (it kept disrupting
>> > traffic flow), so 6Gbps is probably the most it can achieve like this.
>> Our
>> > qdiscs in this VM may be limited to one core because of that.
>>
>> I suspect the issue you had with multiqueue is that it requires per-CPU
>> partitioning on a per-customer base to work well. This is possible to do
>> with XDP, as Jesper demonstrates here:
>>
>> https://github.com/netoptimizer/xdp-cpumap-tc
>>
>> With this it should be possible to scale the hardware queues across
>> multiple CPUs properly, and you should be able to go to much higher
>> rates by just throwing more CPU cores at it. At least on bare metal; not
>> sure if the VM virt-drivers have the needed support yet...
>>
>> -Toke
>>
>
>
> -- 
> [image: photograph]
>
>
> *Robert Chacón* Owner
> *M* (915) 730-1472
> *E* robert.chacon at jackrabbitwireless.com
> *JackRabbit Wireless LLC*
> P.O. Box 222111
> El Paso, TX 79913
> *jackrabbitwireless.com* <http://jackrabbitwireless.com>


More information about the Bloat mailing list