Toke,

Thank you very much for pointing me in the right direction.
I am having some fun in the lab tinkering with the 'mq' qdisc and Jesper's
xdp-cpumap-tc.
It seems I will need to use iptables or nftables to filter packets to
corresponding queues, since mq apparently cannot have u32 filters on its
root.
I will try to familiarize myself with iptables and nftables, and hopefully
get it working soon and report back. Thank you!

On Fri, Jan 15, 2021 at 5:30 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Robert Chacon <robert.chacon@jackrabbitwireless.com> writes:
>
> >> Cool! What kind of performance are you seeing? The README mentions being
> >> limited by the BPF hash table size, but can you actually shape 2000
> >> customers on one machine? On what kind of hardware and at what rate(s)?
> >
> > On our production network our peak throughput is 1.5Gbps from 200
> clients,
> > and it works very well.
> > We use a simple consumer-class AMD 2700X CPU in production because
> > utilization of the shaper VM is ~15% at 1.5Gbps load.
> > Customers get reliably capped within ±2Mbps of their allocated
> htb/fq_codel
> > bandwidth, which is very helpful to control network congestion.
> >
> > Here are some graphs from RRUL performed on our test bench hypervisor:
> >
> https://raw.githubusercontent.com/rchac/LibreQoS/main/docs/fq_codel_1000_subs_4G.png
> > In that example, bandwidth for the "subscriber" client VM was set to
> 4Gbps.
> > 1000 IPv4 IPs and 1000 IPv6 IPs were in the filter hash table of
> LibreQoS.
> > The test bench server has an AMD 3900X running Ubuntu in Proxmox. 4Gbps
> > utilizes 10% of the VM's 12 cores. Paravirtualized VirtIO network drivers
> > are used and most offloading types are enabled.
> > In our setup, VM networking multiqueue isn't enabled (it kept disrupting
> > traffic flow), so 6Gbps is probably the most it can achieve like this.
> Our
> > qdiscs in this VM may be limited to one core because of that.
>
> I suspect the issue you had with multiqueue is that it requires per-CPU
> partitioning on a per-customer base to work well. This is possible to do
> with XDP, as Jesper demonstrates here:
>
> https://github.com/netoptimizer/xdp-cpumap-tc
>
> With this it should be possible to scale the hardware queues across
> multiple CPUs properly, and you should be able to go to much higher
> rates by just throwing more CPU cores at it. At least on bare metal; not
> sure if the VM virt-drivers have the needed support yet...
>
> -Toke
>


-- 
[image: photograph]


*Robert Chacón* Owner
*M* (915) 730-1472
*E* robert.chacon@jackrabbitwireless.com
*JackRabbit Wireless LLC*
P.O. Box 222111
El Paso, TX 79913
*jackrabbitwireless.com* <http://jackrabbitwireless.com>