Hey,
I mean it about the warnings, this isn't at all stable, debugged - and can't promise that it won't unleash the nasal demons
(to use a popular C++ phrase). The name is descriptive! ;-)
With that said, I'm pretty happy so far:
* It runs only on the classifier - which xdp-cpumap-tc has nicely shunted onto a dedicated CPU. It has to run on both
the inbound and outbound classifiers, since otherwise it would only see half the conversation.
* It does assume that your ingress and egress CPUs are mapped to the same interface; I do that anyway in BracketQoS. Not doing
that opens up a potential world of pain, since writes to the shared maps would require a locking scheme. Too much locking, and you lose all of the benefit of using multiple CPUs to begin with.
* It is pretty wasteful of RAM, but most of the shaper systems I've worked with have lots of it.
* I've been gradually removing features that I don't want for BracketQoS. A hypothetical future "useful to everyone" version wouldn't do that.
* Rate limiting is working, but I removed the requirement for a shared configuration provided from userland - so right now it's always set to report at 1 second intervals per stream.
My testbed is currently 3 Hyper-V VMs - a simple "client" and "world", and a "shaper" VM in between running a slightly hacked-up LibreQoS.
iperf from "client" to "world" (with Libre set to allow 10gbit/s max, via a cake/HTB queue setup) is around 5 gbit/s at present, on my
test PC (the host is a core i7, 12th gen, 12 cores - 64gb RAM and fast SSDs)
Output currently consists of debug messages reading:
cpumap/0/map:4-1371 [000] D..2. 515.399222: bpf_trace_printk: (tc) Flow open event
cpumap/0/map:4-1371 [000] D..2. 515.399239: bpf_trace_printk: (tc) Send performance event (5,1), 374696
cpumap/0/map:4-1371 [000] D..2. 515.399466: bpf_trace_printk: (tc) Flow open event
cpumap/0/map:4-1371 [000] D..2. 515.399475: bpf_trace_printk: (tc) Send performance event (5,1), 247069
cpumap/0/map:4-1371 [000] D..2. 516.405151: bpf_trace_printk: (tc) Send performance event (5,1), 5217155
cpumap/0/map:4-1371 [000] D..2. 517.405248: bpf_trace_printk: (tc) Send performance event (5,1), 4515394
cpumap/0/map:4-1371 [000] D..2. 518.406117: bpf_trace_printk: (tc) Send performance event (5,1), 4481289
cpumap/0/map:4-1371 [000] D..2. 519.406255: bpf_trace_printk: (tc) Send performance event (5,1), 4255268
cpumap/0/map:4-1371 [000] D..2. 520.407864: bpf_trace_printk: (tc) Send performance event (5,1), 5249493
cpumap/0/map:4-1371 [000] D..2. 521.406664: bpf_trace_printk: (tc) Send performance event (5,1), 3795993
cpumap/0/map:4-1371 [000] D..2. 522.407469: bpf_trace_printk: (tc) Send performance event (5,1), 3949519
cpumap/0/map:4-1371 [000] D..2. 523.408126: bpf_trace_printk: (tc) Send performance event (5,1), 4365335
cpumap/0/map:4-1371 [000] D..2. 524.408929: bpf_trace_printk: (tc) Send performance event (5,1), 4154910
cpumap/0/map:4-1371 [000] D..2. 525.410048: bpf_trace_printk: (tc) Send performance event (5,1), 4405582
cpumap/0/map:4-1371 [000] D..2. 525.434080: bpf_trace_printk: (tc) Send flow event
cpumap/0/map:4-1371 [000] D..2. 525.482714: bpf_trace_printk: (tc) Send flow event
The times haven't been tweaked yet. The (5,1) is tc handle major/minor, allocated by the xdp-cpumap parent.
I get pretty low latency between VMs; I'll set up a test with some real-world data very soon.
I plan to keep hacking away, but feel free to take a peek.
Thanks,
Herbert