That's true. The 12th gen does seem to have some "special" features... makes for a nice writing platform (this box is primarily my "write books and articles" machine). I'll be doing a wider test on a more normal platform, probably at the weekend (with real traffic, hence the delay - have to find a time in which I minimize disruption) On Wed, Oct 19, 2022 at 10:49 AM dan wrote: > Those 'efficiency' threads in Intel 12th gen should probably be addressed > as well. You can't turn them off in BIOS. > > On Wed, Oct 19, 2022 at 8:48 AM Robert Chacón via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> Awesome work on this! >> I suspect there should be a slight performance bump once Hyperthreading >> is disabled and efficient power management is off. >> Hyperthreading/SMT always messes with HTB performance when I leave it on. >> Thank you for mentioning that - I now went ahead and added instructions on >> disabling hyperthreading on the Wiki for new users. >> Super promising results! >> Interested to see what throughput is with xdp-cpumap-tc vs cpumap-pping. >> So far in your VM setup it seems to be doing very well. >> >> On Wed, Oct 19, 2022 at 8:06 AM Herbert Wolverson via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> >>> Also, I forgot to mention that I *think* the current version has removed >>> the requirement that the inbound >>> and outbound classifiers be placed on the same CPU. I know interduo was >>> particularly keen on packing >>> upload into fewer cores. I'll add that to my list of things to test. >>> >>> On Wed, Oct 19, 2022 at 9:01 AM Herbert Wolverson >>> wrote: >>> >>>> I'll definitely take a look - that does look interesting. I don't have >>>> X11 on any of my test VMs, but >>>> it looks like it can work without the GUI. >>>> >>>> Thanks! >>>> >>>> On Wed, Oct 19, 2022 at 8:58 AM Dave Taht wrote: >>>> >>>>> could I coax you to adopt flent? >>>>> >>>>> apt-get install flent netperf irtt fping >>>>> >>>>> You sometimes have to compile netperf yourself with --enable-demo on >>>>> some systems. >>>>> There are a bunch of python libs neede for the gui, but only on the >>>>> client. >>>>> >>>>> Then you can run a really gnarly test series and plot the results over >>>>> time. >>>>> >>>>> flent --socket-stats --step-size=.05 -t 'the-test-conditions' -H >>>>> the_server_name rrul # 110 other tests >>>>> >>>>> >>>>> On Wed, Oct 19, 2022 at 6:44 AM Herbert Wolverson via LibreQoS >>>>> wrote: >>>>> > >>>>> > Hey, >>>>> > >>>>> > Testing the current version ( >>>>> https://github.com/thebracket/cpumap-pping-hackjob ), it's doing >>>>> better than I hoped. This build has shared (not per-cpu) maps, and a >>>>> userspace daemon (xdp_pping) to extract and reset stats. >>>>> > >>>>> > My testing environment has grown a bit: >>>>> > * ShaperVM - running Ubuntu Server and LibreQoS, with the new >>>>> cpumap-pping-hackjob version of xdp-cpumap. >>>>> > * ExtTest - running Ubuntu Server, set as 10.64.1.1. Hosts an iperf >>>>> server. >>>>> > * ClientInt1 - running Ubuntu Server (minimal), set as 10.64.1.2. >>>>> Hosts iperf client. >>>>> > * ClientInt2 - running Ubuntu Server (minimal), set as 10.64.1.3. >>>>> Hosts iperf client. >>>>> > >>>>> > ClientInt1, ClientInt2 and one interface (LAN facing) of ShaperVM >>>>> are on a virtual switch. >>>>> > ExtTest and the other interface (WAN facing) of ShaperVM are on a >>>>> different virtual switch. >>>>> > >>>>> > These are all on a host machine running Windows 11, a core i7 12th >>>>> gen, 32 Gb RAM and fast SSD setup. >>>>> > >>>>> > TEST 1: DUAL STREAMS, LOW THROUGHPUT >>>>> > >>>>> > For this test, LibreQoS is configured: >>>>> > * Two APs, each with 5gbit/s max. >>>>> > * 100.64.1.2 and 100.64.1.3 setup as CPEs, each limited to about >>>>> 100mbit/s. They map to 1:5 and 2:5 respectively (separate CPUs). >>>>> > * Set to use Cake >>>>> > >>>>> > On each client, roughly simultaneously run: iperf -c 100.64.1.1 -t >>>>> 500 (for a long run). Running xdp_pping yields correct results: >>>>> > >>>>> > [ >>>>> > {"tc":"1:5", "avg" : 4, "min" : 3, "max" : 5, "samples" : 11}, >>>>> > {"tc":"2:5", "avg" : 4, "min" : 3, "max" : 5, "samples" : 11}, >>>>> > {}] >>>>> > >>>>> > Or when I waited a while to gather/reset: >>>>> > >>>>> > [ >>>>> > {"tc":"1:5", "avg" : 4, "min" : 3, "max" : 6, "samples" : 60}, >>>>> > {"tc":"2:5", "avg" : 4, "min" : 3, "max" : 5, "samples" : 60}, >>>>> > {}] >>>>> > >>>>> > The ShaperVM shows no errors, just periodic logging that it is >>>>> recording data. CPU is about 2-3% on two CPUs, zero on the others (as >>>>> expected). >>>>> > >>>>> > After 500 seconds of continual iperfing, each client reported a >>>>> throughput of 104 Mbit/sec and 6.06 GBytes of data transmitted. >>>>> > >>>>> > So for smaller streams, I'd call this a success. >>>>> > >>>>> > TEST 2: DUAL STREAMS, HIGH THROUGHPUT >>>>> > >>>>> > For this test, LibreQoS is configured: >>>>> > * Two APs, each with 5gb/s max. >>>>> > * 100.64.1.2 and 100.64.1.3 setup as CPEs, each limited to 5Gbit/s! >>>>> Mapped to 1:5 and 2:5 respectively (separate CPUs). >>>>> > >>>>> > Run iperfc -c 100.64.1.1 -t 500 on each client at the same time. >>>>> > >>>>> > xdp_pping shows results, too: >>>>> > >>>>> > [ >>>>> > {"tc":"1:5", "avg" : 4, "min" : 1, "max" : 7, "samples" : 58}, >>>>> > {"tc":"2:5", "avg" : 7, "min" : 3, "max" : 11, "samples" : 58}, >>>>> > {}] >>>>> > >>>>> > [ >>>>> > {"tc":"1:5", "avg" : 5, "min" : 4, "max" : 8, "samples" : 13}, >>>>> > {"tc":"2:5", "avg" : 8, "min" : 7, "max" : 10, "samples" : 13}, >>>>> > {}] >>>>> > >>>>> > The ShaperVM shows two CPUs pegging between 70 and 90 percent. >>>>> > >>>>> > After 500 seconds of continual iperfing, each client reported a >>>>> throughput of 2.72 Gbits/sec (158 GBytes) and 3.89 Gbits/sec and 226 GBytes. >>>>> > >>>>> > Maxing out HyperV like this is inducing a bit of latency (which is >>>>> to be expected), but it's not bad. I also forgot to disable hyperthreading, >>>>> and looking at the host performance it is sometimes running the second >>>>> virtual CPU on an underpowered "fake" CPU. >>>>> > >>>>> > So for two large streams, I think we're doing pretty well also! >>>>> > >>>>> > TEST 3: DUAL STREAMS, SINGLE CPU >>>>> > >>>>> > This test is designed to try and blow things up. It's the same as >>>>> test 2, but both CPEs are set to the same CPU (1), using TC handles 1:5 and >>>>> 1:6. >>>>> > >>>>> > ShaperVM CPU1 maxed out in the high 90s, the other CPUs were idle. >>>>> The pping stats start to show a bit of degradation in performance for >>>>> pounding it so hard: >>>>> > >>>>> > [ >>>>> > {"tc":"1:6", "avg" : 10, "min" : 9, "max" : 19, "samples" : 24}, >>>>> > {"tc":"1:5", "avg" : 10, "min" : 8, "max" : 18, "samples" : 24}, >>>>> > {}] >>>>> > >>>>> > For whatever reason, it smoothed out over time: >>>>> > >>>>> > [ >>>>> > {"tc":"1:6", "avg" : 10, "min" : 9, "max" : 12, "samples" : 50}, >>>>> > {"tc":"1:5", "avg" : 10, "min" : 8, "max" : 13, "samples" : 50}, >>>>> > {}] >>>>> > >>>>> > Surprisingly (to me), I didn't encounter errors. Each client >>>>> received 2.22 Gbit/s performance, over 129 Gbytes of data. >>>>> > >>>>> > TEST 4: DUAL STREAMS, 50 SUB-STREAMS >>>>> > >>>>> > This test is also designed to break things. Same as test 3, but >>>>> using iperf -c 100.64.1.1 -P 50 -t 120 - 50 substreams, to try and really >>>>> tax the flow tracking. (Shorter time window because I really wanted to go >>>>> and find coffee) >>>>> > >>>>> > ShaperVM CPU sat at around 80-97%, tending towards 97%. pping >>>>> results show that this torture test is worsening performance, and there's >>>>> always lots of samples in the buffer: >>>>> > >>>>> > [ >>>>> > {"tc":"1:6", "avg" : 23, "min" : 19, "max" : 27, "samples" : 49}, >>>>> > {"tc":"1:5", "avg" : 24, "min" : 19, "max" : 27, "samples" : 49}, >>>>> > {}] >>>>> > >>>>> > This test also ran better than I expected. You can definitely see >>>>> some latency creeping in as I make the system work hard. Each VM showed >>>>> around 2.4 Gbit/s in total performance at the end of the iperf session. >>>>> There's definitely some latency creeping in, which is expected - but I'm >>>>> not sure I expected quite that much. >>>>> > >>>>> > WHAT'S NEXT & CONCLUSION >>>>> > >>>>> > I noticed that I forgot to turn off efficient power management on my >>>>> VMs and host, and left Hyperthreading on by mistake. So that hurts overall >>>>> performance. >>>>> > >>>>> > The base system seems to be working pretty solidly, at least for >>>>> small tests.Next up, I'll be removing extraneous debug reporting code, >>>>> removing some code paths that don't do anything but report, and looking for >>>>> any small optimization opportunities. I'll then re-run these tests. Once >>>>> that's done, I hope to find a maintenance window on my WISP and try it with >>>>> actual traffic. >>>>> > >>>>> > I also need to re-run these tests without the pping system to >>>>> provide some before/after analysis. >>>>> > >>>>> > On Tue, Oct 18, 2022 at 1:01 PM Herbert Wolverson < >>>>> herberticus@gmail.com> wrote: >>>>> >> >>>>> >> It's probably not entirely thread-safe right now (ran into some >>>>> issues reading per_cpu maps back from userspace; hopefully, I'll get that >>>>> figured out) - but the commits I just pushed have it basically working on >>>>> single-stream testing. :-) >>>>> >> >>>>> >> Setup cpumap as usual, and periodically run xdp-pping. This gives >>>>> you per-connection RTT information in JSON: >>>>> >> >>>>> >> [ >>>>> >> {"tc":"1:5", "avg" : 5, "min" : 5, "max" : 5, "samples" : 1}, >>>>> >> {}] >>>>> >> >>>>> >> (With the extra {} because I'm not tracking the tail and haven't >>>>> done comma removal). The tool also empties the various maps used to gather >>>>> data, acting as a "reset" point. There's a max of 60 samples per queue, in >>>>> a ringbuffer setup (so newest will start to overwrite the oldest). >>>>> >> >>>>> >> I'll start trying to test on a larger scale now. >>>>> >> >>>>> >> On Mon, Oct 17, 2022 at 3:34 PM Robert Chacón < >>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>> >>> >>>>> >>> Hey Herbert, >>>>> >>> >>>>> >>> Fantastic work! Super exciting to see this coming together, >>>>> especially so quickly. >>>>> >>> I'll test it soon. >>>>> >>> I understand and agree with your decision to omit certain features >>>>> (ICMP tracking,DNS tracking, etc) to optimize performance for our use case. >>>>> Like you said, in order to merge the functionality without a performance >>>>> hit, merging them is sort of the only way right now. Otherwise there would >>>>> be a lot of redundancy and lost throughput for an ISP's use. Though >>>>> hopefully long term there will be a way to keep all projects working >>>>> independently but interoperably with a plugin system of some kind. >>>>> >>> >>>>> >>> By the way, I'm making some headway on LibreQoS v1.3. Focusing on >>>>> optimizations for high sub counts (8000+ subs) as well as stateful changes >>>>> to the queue structure. >>>>> >>> I'm working to set up a physical lab to test high throughput and >>>>> high client count scenarios. >>>>> >>> When testing beyond ~32,000 filters we get "no space left on >>>>> device" from xdp-cpumap-tc, which I think relates to the bpf map size >>>>> limitation you mentioned. Maybe in the coming months we can take a look at >>>>> that. >>>>> >>> >>>>> >>> Anyway great work on the cpumap-pping program! Excited to see more >>>>> on this. >>>>> >>> >>>>> >>> Thanks, >>>>> >>> Robert >>>>> >>> >>>>> >>> On Mon, Oct 17, 2022 at 12:45 PM Herbert Wolverson via LibreQoS < >>>>> libreqos@lists.bufferbloat.net> wrote: >>>>> >>>> >>>>> >>>> Hey, >>>>> >>>> >>>>> >>>> My current (unfinished) progress on this is now available here: >>>>> https://github.com/thebracket/cpumap-pping-hackjob >>>>> >>>> >>>>> >>>> I mean it about the warnings, this isn't at all stable, debugged >>>>> - and can't promise that it won't unleash the nasal demons >>>>> >>>> (to use a popular C++ phrase). The name is descriptive! ;-) >>>>> >>>> >>>>> >>>> With that said, I'm pretty happy so far: >>>>> >>>> >>>>> >>>> * It runs only on the classifier - which xdp-cpumap-tc has nicely >>>>> shunted onto a dedicated CPU. It has to run on both >>>>> >>>> the inbound and outbound classifiers, since otherwise it would >>>>> only see half the conversation. >>>>> >>>> * It does assume that your ingress and egress CPUs are mapped to >>>>> the same interface; I do that anyway in BracketQoS. Not doing >>>>> >>>> that opens up a potential world of pain, since writes to the >>>>> shared maps would require a locking scheme. Too much locking, and you lose >>>>> all of the benefit of using multiple CPUs to begin with. >>>>> >>>> * It is pretty wasteful of RAM, but most of the shaper systems >>>>> I've worked with have lots of it. >>>>> >>>> * I've been gradually removing features that I don't want for >>>>> BracketQoS. A hypothetical future "useful to everyone" version wouldn't do >>>>> that. >>>>> >>>> * Rate limiting is working, but I removed the requirement for a >>>>> shared configuration provided from userland - so right now it's always set >>>>> to report at 1 second intervals per stream. >>>>> >>>> >>>>> >>>> My testbed is currently 3 Hyper-V VMs - a simple "client" and >>>>> "world", and a "shaper" VM in between running a slightly hacked-up LibreQoS. >>>>> >>>> iperf from "client" to "world" (with Libre set to allow 10gbit/s >>>>> max, via a cake/HTB queue setup) is around 5 gbit/s at present, on my >>>>> >>>> test PC (the host is a core i7, 12th gen, 12 cores - 64gb RAM and >>>>> fast SSDs) >>>>> >>>> >>>>> >>>> Output currently consists of debug messages reading: >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 515.399222: >>>>> bpf_trace_printk: (tc) Flow open event >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 515.399239: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 374696 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 515.399466: >>>>> bpf_trace_printk: (tc) Flow open event >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 515.399475: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 247069 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 516.405151: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 5217155 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 517.405248: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 4515394 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 518.406117: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 4481289 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 519.406255: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 4255268 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 520.407864: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 5249493 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 521.406664: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 3795993 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 522.407469: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 3949519 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 523.408126: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 4365335 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 524.408929: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 4154910 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 525.410048: >>>>> bpf_trace_printk: (tc) Send performance event (5,1), 4405582 >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 525.434080: >>>>> bpf_trace_printk: (tc) Send flow event >>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 525.482714: >>>>> bpf_trace_printk: (tc) Send flow event >>>>> >>>> >>>>> >>>> The times haven't been tweaked yet. The (5,1) is tc handle >>>>> major/minor, allocated by the xdp-cpumap parent. >>>>> >>>> I get pretty low latency between VMs; I'll set up a test with >>>>> some real-world data very soon. >>>>> >>>> >>>>> >>>> I plan to keep hacking away, but feel free to take a peek. >>>>> >>>> >>>>> >>>> Thanks, >>>>> >>>> Herbert >>>>> >>>> >>>>> >>>> On Mon, Oct 17, 2022 at 10:14 AM Simon Sundberg < >>>>> Simon.Sundberg@kau.se> wrote: >>>>> >>>>> >>>>> >>>>> Hi, thanks for adding me to the conversation. Just a couple of >>>>> quick >>>>> >>>>> notes. >>>>> >>>>> >>>>> >>>>> On Mon, 2022-10-17 at 16:13 +0200, Toke Høiland-Jørgensen wrote: >>>>> >>>>> > [ Adding Simon to Cc ] >>>>> >>>>> > >>>>> >>>>> > Herbert Wolverson via LibreQoS >>>>> writes: >>>>> >>>>> > >>>>> >>>>> > > Hey, >>>>> >>>>> > > >>>>> >>>>> > > I've had some pretty good success with merging xdp-pping ( >>>>> >>>>> > > >>>>> https://github.com/xdp-project/bpf-examples/blob/master/pping/pping.h >>>>> ) >>>>> >>>>> > > into xdp-cpumap-tc ( >>>>> https://github.com/xdp-project/xdp-cpumap-tc ). >>>>> >>>>> > > >>>>> >>>>> > > I ported over most of the xdp-pping code, and then changed >>>>> the entry point >>>>> >>>>> > > and packet parsing code to make use of the work already done >>>>> in >>>>> >>>>> > > xdp-cpumap-tc (it's already parsed a big chunk of the >>>>> packet, no need to do >>>>> >>>>> > > it twice). Then I switched the maps to per-cpu maps, and had >>>>> to pin them - >>>>> >>>>> > > otherwise the two tc instances don't properly share data. >>>>> >>>>> > > >>>>> >>>>> >>>>> >>>>> I guess the xdp-cpumap-tc ensures that the same flow is >>>>> processed on >>>>> >>>>> the same CPU core at both ingress or egress. Otherwise, if a >>>>> flow may >>>>> >>>>> be processed by different cores on ingress and egress the >>>>> per-CPU maps >>>>> >>>>> will not really work reliably as each core will have a different >>>>> view >>>>> >>>>> on the state of the flow, if there's been a previous packet with >>>>> a >>>>> >>>>> certain TSval from that flow etc. >>>>> >>>>> >>>>> >>>>> Furthermore, if a flow is always processed on the same core (on >>>>> both >>>>> >>>>> ingress and egress) I think per-CPU maps may be a bit wasteful on >>>>> >>>>> memory. From my understanding the keys for per-CPU maps are still >>>>> >>>>> shared across all CPUs, it's just that each CPU gets its own >>>>> value. So >>>>> >>>>> all CPUs will then have their own data for each flow, but it's >>>>> only the >>>>> >>>>> CPU processing the flow that will have any relevant data for the >>>>> flow >>>>> >>>>> while the remaining CPUs will just have an empty state for that >>>>> flow. >>>>> >>>>> Under the same assumption that packets within the same flow are >>>>> always >>>>> >>>>> processed on the same core there should generally not be any >>>>> >>>>> concurrency issues with having a global (non-per-CPU) either as >>>>> packets >>>>> >>>>> from the same flow cannot be processed concurrently then (and >>>>> thus no >>>>> >>>>> concurrent access to the same value in the map). I am however >>>>> still >>>>> >>>>> very unclear on if there's any considerable performance impact >>>>> between >>>>> >>>>> global and per-CPU map versions if the same key is not accessed >>>>> >>>>> concurrently. >>>>> >>>>> >>>>> >>>>> > > Right now, output >>>>> >>>>> > > is just stubbed - I've still got to port the perfmap output >>>>> code. Instead, >>>>> >>>>> > > I'm dumping a bunch of extra data to the kernel debug pipe, >>>>> so I can see >>>>> >>>>> > > roughly what the output would look like. >>>>> >>>>> > > >>>>> >>>>> > > With debug enabled and just logging I'm now getting about >>>>> 4.9 Gbits/sec on >>>>> >>>>> > > single-stream iperf between two VMs (with a shaper VM in the >>>>> middle). :-) >>>>> >>>>> > >>>>> >>>>> > Just FYI, that "just logging" is probably the biggest source of >>>>> >>>>> > overhead, then. What Simon found was that sending the data >>>>> from kernel >>>>> >>>>> > to userspace is one of the most expensive bits of epping, at >>>>> least when >>>>> >>>>> > the number of data points goes up (which is does as additional >>>>> flows are >>>>> >>>>> > added). >>>>> >>>>> >>>>> >>>>> Yhea, reporting individual RTTs when there's lots of them (you >>>>> may get >>>>> >>>>> upwards of 1000 RTTs/s per flow) is not only problematic in >>>>> terms of >>>>> >>>>> direct overhead from the tool itself, but also becomes demanding >>>>> for >>>>> >>>>> whatever you use all those RTT samples for (i.e. need to log, >>>>> parse, >>>>> >>>>> analyze etc. a very large amount of RTTs). One way to deal with >>>>> that is >>>>> >>>>> of course to just apply some sort of sampling (the >>>>> -r/--rate-limit and >>>>> >>>>> -R/--rtt-rate >>>>> >>>>> > >>>>> >>>>> > > So my question: how would you prefer to receive this data? >>>>> I'll have to >>>>> >>>>> > > write a daemon that provides userspace control (periodic >>>>> cleanup as well as >>>>> >>>>> > > reading the performance stream), so the world's kinda our >>>>> oyster. I can >>>>> >>>>> > > stick to Kathie's original format (and dump it to a named >>>>> pipe, perhaps?), >>>>> >>>>> > > a condensed format that only shows what you want to use, an >>>>> efficient >>>>> >>>>> > > binary format if you feel like parsing that... >>>>> >>>>> > >>>>> >>>>> > It would be great if we could combine efforts a bit here so we >>>>> don't >>>>> >>>>> > fork the codebase more than we have to. I.e., if "upstream" >>>>> epping and >>>>> >>>>> > whatever daemon you end up writing can agree on data format >>>>> etc that >>>>> >>>>> > would be fantastic! Added Simon to Cc to facilitate this :) >>>>> >>>>> > >>>>> >>>>> > Briefly what I've discussed before with Simon was to have the >>>>> ability to >>>>> >>>>> > aggregate the metrics in the kernel (WiP PR [0]) and have a >>>>> userspace >>>>> >>>>> > utility periodically pull them out. What we discussed was >>>>> doing this >>>>> >>>>> > using an LPM map (which is not in that PR yet). The idea would >>>>> be that >>>>> >>>>> > userspace would populate the LPM map with the keys (prefixes) >>>>> they >>>>> >>>>> > wanted statistics for (in LibreQOS context that could be one >>>>> key per >>>>> >>>>> > customer, for instance). Epping would then do a map lookup >>>>> into the LPM, >>>>> >>>>> > and if it gets a match it would update the statistics in that >>>>> map entry >>>>> >>>>> > (keeping a histogram of latency values seen, basically). >>>>> Simon's PR >>>>> >>>>> > below uses this technique where userspace will "reset" the >>>>> histogram >>>>> >>>>> > every time it loads it by swapping out two different map >>>>> entries when it >>>>> >>>>> > does a read; this allows you to control the sampling rate from >>>>> >>>>> > userspace, and you'll just get the data since the last time >>>>> you polled. >>>>> >>>>> >>>>> >>>>> Thank's Toke for summarzing both the current state and the plan >>>>> going >>>>> >>>>> forward. I will just note that this PR (and all my other work >>>>> with >>>>> >>>>> ePPing/BPF-PPing/XDP-PPing/I-suck-at-names-PPing) will be more >>>>> or less >>>>> >>>>> on hold for a couple of weeks right now as I'm trying to finish >>>>> up a >>>>> >>>>> paper. >>>>> >>>>> >>>>> >>>>> > I was thinking that if we all can agree on the map format, >>>>> then your >>>>> >>>>> > polling daemon could be one userspace "client" for that, and >>>>> the epping >>>>> >>>>> > binary itself could be another; but we could keep >>>>> compatibility between >>>>> >>>>> > the two, so we don't duplicate effort. >>>>> >>>>> > >>>>> >>>>> > Similarly, refactoring of the epping code itself so it can be >>>>> plugged >>>>> >>>>> > into the cpumap-tc code would be a good goal... >>>>> >>>>> >>>>> >>>>> Should probably do that...at some point. In general I think it's >>>>> a bit >>>>> >>>>> of an interesting problem to think about how to chain multiple >>>>> XDP/tc >>>>> >>>>> programs together in an efficent way. Most XDP and tc programs >>>>> will do >>>>> >>>>> some amount of packet parsing and when you have many chained >>>>> programs >>>>> >>>>> parsing the same packets this obviously becomes a bit wasteful. >>>>> In the >>>>> >>>>> same time it would be nice if one didn't need to manually merge >>>>> >>>>> multiple programs together into a single one like this to get >>>>> rid of >>>>> >>>>> this duplicated parsing, or at least make that process of >>>>> merging those >>>>> >>>>> programs as simple as possible. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> > -Toke >>>>> >>>>> > >>>>> >>>>> > [0] https://github.com/xdp-project/bpf-examples/pull/59 >>>>> >>>>> >>>>> >>>>> När du skickar e-post till Karlstads universitet behandlar vi >>>>> dina personuppgifter. >>>>> >>>>> When you send an e-mail to Karlstad University, we will process >>>>> your personal data. >>>>> >>>> >>>>> >>>> _______________________________________________ >>>>> >>>> LibreQoS mailing list >>>>> >>>> LibreQoS@lists.bufferbloat.net >>>>> >>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> -- >>>>> >>> Robert Chacón >>>>> >>> CEO | JackRabbit Wireless LLC >>>>> > >>>>> > _______________________________________________ >>>>> > LibreQoS mailing list >>>>> > LibreQoS@lists.bufferbloat.net >>>>> > https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>>> >>>>> >>>>> -- >>>>> This song goes out to all the folk that thought Stadia would work: >>>>> >>>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >>>>> Dave Täht CEO, TekLibre, LLC >>>>> >>>> _______________________________________________ >>> LibreQoS mailing list >>> LibreQoS@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/libreqos >>> >> >> >> -- >> Robert Chacón >> CEO | JackRabbit Wireless LLC >> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos >> >