<div dir="ltr"><div>I'd probably go with an HTB queue per target IP group, and not attach a <br></div><div>discipline to it - with only a ceiling set at the top. That'll do truly minimal <br></div><div>shaping, and you can still use cpumap-pping to get the data you want. <br></div><div>(The current branch I'm testing/working on also reports the local IP <br></div><div>address, which I'm finding pretty helpful). Otherwise, you're going to</div><div>make building both tools part of the setup process* and still have</div><div>to parse IP pairs for results. Hopefully, there's a decent Python</div><div>LPM Trie out there (to handle subnets and IPv6) to make that</div><div>easier.<br></div><div><br></div><div>I'm (obviously!) going to respectfully disagree with Toke on this one.</div><div>I didn't dive into cpumap-pping for fun; I tried *really hard* to work</div><div>with the original epping/xdp-pping. It's a great tool, really fantastic work. <br></div><div>It's also not really designed for the same purpose.</div><div><br></div><div>The original Polere pping is wonderful, but isn't going to scale - the</div><div>way it ingests packets isn't going to scale across multiple CPUs,</div><div>and having a core pegging 100% on a busy shaper box was</div><div>degrading overall performance. epping solves the scalability</div><div>issue wonderfully, and (rightly) remains focused on giving you</div><div>a complete report of all of the data is accessed while it was</div><div>running. If you want to run a monitoring session and see what's</div><div>going on, it's a *fantastic* way to do it - serious props there. I</div><div>benchmarked it at about 15 gbit/s on single-stream testing,</div><div>which is *really* impressive (no other BPF programs active,</div><div>no shaping).</div><div><br></div><div>The first issue I ran into is that stacking XDP programs isn't</div><div>all that well defined a process. You can make it work, but</div><div>it gets messy when both programs have setup/teardown</div><div>routines. I kinda, sorta managed to get the two running at</div><div>once, and it mostly worked. There *really* needs to be an</div><div>easier way that doesn't run headlong into Ubuntu's lovely</div><div>"you updated the kernel and tools, we didn't think you'd</div><div>need bpftool so we didn't include it" issues, adjusting scripts</div><div>until neither says "oops, there's already an XDP program</div><div>here! Bye!". I know that this is a pretty new thing, but the</div><div>tooling hasn't really caught up yet to make this a comfortable</div><div>process. I'm pretty sure I spent more time trying to run both</div><div>at once than it took to make a combined version that sort-of</div><div>ran. (I had a working version in an afternoon)<br></div><div><br></div><div>With the two literally concatenated (but compiled together),</div><div>it worked - but there was a noticeable performance cost. That's</div><div>where orthogonal design choices hit - epping/xdp-pping is</div><div>sampling everything (it can even go looking for DNS and ICMP!).</div><div>A QoE box *really* needs to go out of its way to avoid adding</div><div>any latency, otherwise you're self-defeating. A representative</div><div>sample is really all you need - while for epping's target,<br></div><div>a really detailed sample is what you need. When faced with</div><div>differing design goals like that, my response is always to</div><div>make a tool that very efficiently does what I need.</div><div><br></div><div>Combining the packet parsing** was the obvious low-hanging</div><div>fruit. It is faster, but not by very much. But I really hate</div><div>it when code repeats itself. It seriously set off my OCD</div><div>watching both find the IP header offset, determine protocol</div><div>(IPv4 vs IPv6), etc. Small performance win.<br></div><div><br></div><div>Bailing out as soon as we determine that we aren't looking</div><div>at a TCP packet was a big performance win. You can achieve</div><div>the same by carefully setting up the "config" for epping,</div><div>but there's not a lot of point in keeping the DNS/ICMP code</div><div>when it's not needed. Still a performance win, and not</div><div>needing to maintain a configuration (that will be the same</div><div>each time) makes setup easier.<br></div><div><br></div><div>Running by default on the TC (egress) rather than XDP</div><div>is a big win, too - but only after cpumap-tc has shunted</div><div>processing to the appropriate CPU. Now processing is</div><div>divided between CPUs, and cache locality is more likely</div><div>to happen - the packet we are reading is in the local</div><div>core's cache when cpumap-pping reads it, and there's</div><div>a decent chance it'll still be there (at least L2) by the time</div><div>it gets to the actual queue discipline.<br></div><div><br></div><div>Changing the reporting mechanism was a really big win,</div><div>in terms of performance and the tool aligning with what's</div><div>needed:</div><div>* Since xdp-cpumap has already done the work to determine</div><div> that a flow belongs in TC handle X:Y - and mapping RTT</div><div> performance to customer/circuit is *exactly* what we're</div><div> trying to do - it just makes sense to take that value and</div><div> use it as a key for the results.</div><div>* Since we don't care about every packet - rather, we want</div><div> a periodic representative sample - we can use an efficient</div><div> per TC handle circular buffer in which to store results.</div><div>* In turn, I realized that we could just *sample* rather than</div><div> continually churning the circular buffer. So each flow's</div><div> buffer has a capacity, and the monitor bails out once a flow</div><div> buffer is full of RTT results. Really big performance win.</div><div> "return" is a really fast call. :-) (The buffers are reset when</div><div> read)</div><div>* Perfmaps are great, but I didn't want to require a daemon</div><div> run (mapping the perfmap results) and in turn output</div><div> results in a LibreQoS-friendly format when a much simpler</div><div> mechanism gets the same result - without another program</div><div> sitting handling the mmap'd performance flows all the time.<br></div><div><br></div><div>So the result was really fast and does exactly what I need.</div><div>It's not meant to be "better" than the original; for the original's</div><div>purpose, it's not great. For rapidly building QoE metrics on</div><div>a live shaper box, with absolutely minimal overhead and a</div><div>focus on sipping the firehose rather than trying to drink it</div><div>all - it's about right.<br></div><div><br></div><div>Philosophically, I've always favored tools that do exactly</div><div>what I need.<br></div><div><br></div><div>Likewise, if someone would like to come up with a really</div><div>good recipe that runs both rather than a combined</div><div>program - that'd be awesome. If it can match the</div><div>performance of cpumap-pping, I'll happily switch</div><div>BracketQoS to use it.<br></div><div><br></div><div>You're obviously welcome to any of the code; if it can help</div><div>the original projects, that's wonderful. Right now, I don't</div><div>have the time to come up with a better way of layering</div><div>XDP/TC programs!<br></div><div></div><div><br></div><div>* - I keep wondering if I shouldn't roll some .deb packages</div><div>and a configurator to make setup easier!<br></div><div><br></div><div>** - there *really* should be a standard flow dissector. The <br></div><div>Linux traffic shaper's dissector can handle VLAN tags and <br></div><div>an MPLS header. xdp-cpumap-tc handles VLANs with <br></div><div>aplomb and doesn't touch MPLS. epping calls out to the <br></div><div>xdp-project's dissector which appears to handle</div><div>VLANs and also doesn't touch MPLS).</div><div><br></div><div>Thanks,</div><div>Herbert<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 8, 2022 at 8:23 AM Toke Høiland-Jørgensen via LibreQoS <<a href="mailto:libreqos@lists.bufferbloat.net">libreqos@lists.bufferbloat.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Robert Chacón via LibreQoS <<a href="mailto:libreqos@lists.bufferbloat.net" target="_blank">libreqos@lists.bufferbloat.net</a>> writes:<br>
<br>
> I was hoping to add a monitoring mode which could be used before "turning<br>
> on" LibreQoS, ideally before v1.3 release. This way operators can really<br>
> see what impact it's having on end-user and network latency.<br>
><br>
> The simplest solution I can think of is to implement Monitoring Mode using<br>
> cpumap-pping as we already do - with plain HTB and leaf classes with no<br>
> CAKE qdisc applied, and with HTB and leaf class rates set to impossibly<br>
> high amounts (no plan enforcement). This would allow for before/after<br>
> comparisons of Nodes (Access Points). My only concern with this approach is<br>
> that HTB, even with rates set impossibly high, may not be truly<br>
> transparent. It would be pretty easy to implement though.<br>
><br>
> Alternatively we could use ePPing<br>
> <<a href="https://github.com/xdp-project/bpf-examples/tree/master/pping" rel="noreferrer" target="_blank">https://github.com/xdp-project/bpf-examples/tree/master/pping</a>> but I worry<br>
> about throughput and the possibility of latency tracking being slightly<br>
> different from cpumap-pping, which could limit the utility of a comparison.<br>
> We'd have to match IPs in a way that's a bit more involved here.<br>
><br>
> Thoughts?<br>
<br>
Well, this kind of thing is exactly why I think concatenating the two<br>
programs (cpumap and pping) into a single BPF program was a mistake:<br>
those are two distinct pieces of functionality, and you want to be able<br>
to run them separately, as your "monitor mode" use case shows. The<br>
overhead of parsing the packet twice is trivial compared to everything<br>
else those apps are doing, so I don't think the gain is worth losing<br>
that flexibility.<br>
<br>
So I definitely think using the regular epping is the right thing to do<br>
here. Simon is looking into improving its reporting so it can be<br>
per-subnet using a user-supplied configuration file for the actual<br>
subnets, which should hopefully make this feasible. I'm sure he'll chime<br>
in here once he has something to test and/or with any questions that pop<br>
up in the process.<br>
<br>
Longer term, I'm hoping all of Herbert's other improvements to epping<br>
reporting/formatting can make it into upstream epping, so LibreQoS can<br>
just use that for everything :)<br>
<br>
-Toke<br>
_______________________________________________<br>
LibreQoS mailing list<br>
<a href="mailto:LibreQoS@lists.bufferbloat.net" target="_blank">LibreQoS@lists.bufferbloat.net</a><br>
<a href="https://lists.bufferbloat.net/listinfo/libreqos" rel="noreferrer" target="_blank">https://lists.bufferbloat.net/listinfo/libreqos</a><br>
</blockquote></div>