Toke, Thank you very much for pointing me in the right direction. I am having some fun in the lab tinkering with the 'mq' qdisc and Jesper's xdp-cpumap-tc. It seems I will need to use iptables or nftables to filter packets to corresponding queues, since mq apparently cannot have u32 filters on its root. I will try to familiarize myself with iptables and nftables, and hopefully get it working soon and report back. Thank you! On Fri, Jan 15, 2021 at 5:30 AM Toke Høiland-Jørgensen wrote: > Robert Chacon writes: > > >> Cool! What kind of performance are you seeing? The README mentions being > >> limited by the BPF hash table size, but can you actually shape 2000 > >> customers on one machine? On what kind of hardware and at what rate(s)? > > > > On our production network our peak throughput is 1.5Gbps from 200 > clients, > > and it works very well. > > We use a simple consumer-class AMD 2700X CPU in production because > > utilization of the shaper VM is ~15% at 1.5Gbps load. > > Customers get reliably capped within ±2Mbps of their allocated > htb/fq_codel > > bandwidth, which is very helpful to control network congestion. > > > > Here are some graphs from RRUL performed on our test bench hypervisor: > > > https://raw.githubusercontent.com/rchac/LibreQoS/main/docs/fq_codel_1000_subs_4G.png > > In that example, bandwidth for the "subscriber" client VM was set to > 4Gbps. > > 1000 IPv4 IPs and 1000 IPv6 IPs were in the filter hash table of > LibreQoS. > > The test bench server has an AMD 3900X running Ubuntu in Proxmox. 4Gbps > > utilizes 10% of the VM's 12 cores. Paravirtualized VirtIO network drivers > > are used and most offloading types are enabled. > > In our setup, VM networking multiqueue isn't enabled (it kept disrupting > > traffic flow), so 6Gbps is probably the most it can achieve like this. > Our > > qdiscs in this VM may be limited to one core because of that. > > I suspect the issue you had with multiqueue is that it requires per-CPU > partitioning on a per-customer base to work well. This is possible to do > with XDP, as Jesper demonstrates here: > > https://github.com/netoptimizer/xdp-cpumap-tc > > With this it should be possible to scale the hardware queues across > multiple CPUs properly, and you should be able to go to much higher > rates by just throwing more CPU cores at it. At least on bare metal; not > sure if the VM virt-drivers have the needed support yet... > > -Toke > -- [image: photograph] *Robert Chacón* Owner *M* (915) 730-1472 *E* robert.chacon@jackrabbitwireless.com *JackRabbit Wireless LLC* P.O. Box 222111 El Paso, TX 79913 *jackrabbitwireless.com*