From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 57BA43B29D for ; Sat, 22 Oct 2022 10:47:48 -0400 (EDT) Received: by mail-ed1-x532.google.com with SMTP id a13so16082294edj.0 for ; Sat, 22 Oct 2022 07:47:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jackrabbitwireless.com; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TwWnXWPmMqdGR9a6jFzfSEqQzynQkcxdzrTKAGIhlr0=; b=RMUi3SEjJORkIhdgQTObcR3vSYThnGZX4+uvM4vYWSZ8F/5XrO9jfxlMH6XWOf89eI xHCY4Rn1fhVe0NB3l0oHr/8n6lG/KqoNlCGtE+r8OhYuPZKmtW9QZ9uKCQwuxViy0Cv/ Cb6cZJJEILWhDkzfamAKoZsQLuBkA5CjPWXCs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TwWnXWPmMqdGR9a6jFzfSEqQzynQkcxdzrTKAGIhlr0=; b=J3JOsDBefwhqhuCxvdisCrk4RAIdCvFZygOrKDFN5pHb93V/erIjrngM/Rt4HX33oA 0R/2JhuImKm4oFoFyh9xhTRNn1/QURdj1smpoheW6b8K69JOAhzi0F6TealjaG/O8p/x xavwHkVFUIrgmV1ilusq7NNzQ2BwdUBt6uyXS2xh0GlZKg+EzjfwiLJZE2TTFKFiEnu7 hetDNkuJvXymTPiA3Fy2YbsivQmyqAgjVTVyIuRY1dmCkmEfVwCkBFmj0YDCKkCWYKIZ YRCTvoF1gsLU39oy73vldDDnExpe2W1sgOafFswvN/DN6ZEjJfTUwcZa9VxwfmTMexNb JHzg== X-Gm-Message-State: ACrzQf1CiZYQSrk5J9EtW01v+mr6hlOM70T9u8K8SlmO+BLE7Y8J9+0I ii1kGiLTyUb6B05H4TUWiVH0PmNtxSsqTT3mXggKzOAL7OI= X-Google-Smtp-Source: AMsMyM5qJQ4NZKJIAtCC+vkrvFjEMPGUuTSR4xHH7LVyd5ySQJKdEbmSkwLgiaQc+ZPfh6nd241r+0IoRrrdSQAh8R8= X-Received: by 2002:a17:907:7215:b0:791:a61f:56b3 with SMTP id dr21-20020a170907721500b00791a61f56b3mr17352977ejc.331.1666450066987; Sat, 22 Oct 2022 07:47:46 -0700 (PDT) MIME-Version: 1.0 References: <87bkqatu61.fsf@toke.dk> <759c25c6fd54dceccc00eada5ccf5358d2d1c20c.camel@kau.se> In-Reply-To: From: =?UTF-8?Q?Robert_Chac=C3=B3n?= Date: Sat, 22 Oct 2022 08:47:36 -0600 Message-ID: To: Herbert Wolverson Cc: libreqos@lists.bufferbloat.net Content-Type: multipart/alternative; boundary="00000000000047775205eba0a443" Subject: Re: [LibreQoS] In BPF pping - so far X-BeenThere: libreqos@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Many ISPs need the kinds of quality shaping cake can do List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Oct 2022 14:47:48 -0000 --00000000000047775205eba0a443 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Awesome work! It's really amazing how little additional CPU the TCP tracking adds. Super excited to start testing in production myself soon. Have a great restful morning with your daughter. =F0=9F=98=8C On Sat, Oct 22, 2022, 8:32 AM Herbert Wolverson via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > This morning I tested cpu-pping with live customers! > A little over 1,200 mapped IP addresses, about 600 mbps of real traffic > flowing through a big > hierarchy of 52 sites. (600 is our "quiet time" traffic) > > It started very well: the updated xdp-cpumap system dropped in place and > the system worked as > before. xdp_pping started to show data with correct mappings. CPU load > from the mapping > system is within 1% of where it was before. > > After about 20 minutes of continuous execution, it started to run into > some scaling issues. > The shaping system continued to run wonderfully, and CPU load was still > fine. However, > it stopped reporting latency data! A bit of debugging showed that once yo= u > exceed > 16,384 in-flight TCP streams it isn't handling the "map full" situation > gracefully - and > clearing the map from userspace isn't working correctly. So I hacked away > and hacked > away. > > Anyway, it turns out that it does in fact work fine at that scale. There'= s > just a one-line > bug in the xdp_pping.c file. I forgot to actually *call* one line of > packet cleanup code. > Adding that, and everything was awesome. > > The entire patch that fixed it consists of adding one line: > cleanup_packet_ts(packet_ts); > > Oops. > > Anyway, with that in place it's running superbly. I did identify a couple > of places in > which it's being overly verbose with debug information, so I've patched > that also. > > After reducing the overly eager warning about not being able to read a TC= P > header, > CPU performance improved by another 2% on average. > > Longer-term (i.e. not on a Saturday morning, when I'd rather be playing > with my > daughter!), I think I'll look at raising some of the buffer sizes. > > Thanks, > Herbert > > On Wed, Oct 19, 2022 at 11:13 AM Dave Taht wrote: > >> PS - today's (free) p99 conference is *REALLY AWESOME*. >> https://www.p99conf.io/ >> >> On Wed, Oct 19, 2022 at 9:13 AM Dave Taht wrote: >> > >> > flent outputs a flent.gz file that I can parse and plot 20 differnt >> > ways. Also the graphing tools work on osx >> > >> > On Wed, Oct 19, 2022 at 9:11 AM Herbert Wolverson via LibreQoS >> > wrote: >> > > >> > > That's true. The 12th gen does seem to have some "special" >> features... makes for a nice writing platform >> > > (this box is primarily my "write books and articles" machine). I'll >> be doing a wider test on a more normal >> > > platform, probably at the weekend (with real traffic, hence the dela= y >> - have to find a time in which I >> > > minimize disruption) >> > > >> > > On Wed, Oct 19, 2022 at 10:49 AM dan wrote: >> > >> >> > >> Those 'efficiency' threads in Intel 12th gen should probably be >> addressed as well. You can't turn them off in BIOS. >> > >> >> > >> On Wed, Oct 19, 2022 at 8:48 AM Robert Chac=C3=B3n via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> > >>> >> > >>> Awesome work on this! >> > >>> I suspect there should be a slight performance bump once >> Hyperthreading is disabled and efficient power management is off. >> > >>> Hyperthreading/SMT always messes with HTB performance when I leave >> it on. Thank you for mentioning that - I now went ahead and added >> instructions on disabling hyperthreading on the Wiki for new users. >> > >>> Super promising results! >> > >>> Interested to see what throughput is with xdp-cpumap-tc vs >> cpumap-pping. So far in your VM setup it seems to be doing very well. >> > >>> >> > >>> On Wed, Oct 19, 2022 at 8:06 AM Herbert Wolverson via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> > >>>> >> > >>>> Also, I forgot to mention that I *think* the current version has >> removed the requirement that the inbound >> > >>>> and outbound classifiers be placed on the same CPU. I know >> interduo was particularly keen on packing >> > >>>> upload into fewer cores. I'll add that to my list of things to >> test. >> > >>>> >> > >>>> On Wed, Oct 19, 2022 at 9:01 AM Herbert Wolverson < >> herberticus@gmail.com> wrote: >> > >>>>> >> > >>>>> I'll definitely take a look - that does look interesting. I don'= t >> have X11 on any of my test VMs, but >> > >>>>> it looks like it can work without the GUI. >> > >>>>> >> > >>>>> Thanks! >> > >>>>> >> > >>>>> On Wed, Oct 19, 2022 at 8:58 AM Dave Taht >> wrote: >> > >>>>>> >> > >>>>>> could I coax you to adopt flent? >> > >>>>>> >> > >>>>>> apt-get install flent netperf irtt fping >> > >>>>>> >> > >>>>>> You sometimes have to compile netperf yourself with >> --enable-demo on >> > >>>>>> some systems. >> > >>>>>> There are a bunch of python libs neede for the gui, but only on >> the client. >> > >>>>>> >> > >>>>>> Then you can run a really gnarly test series and plot the >> results over time. >> > >>>>>> >> > >>>>>> flent --socket-stats --step-size=3D.05 -t 'the-test-conditions'= -H >> > >>>>>> the_server_name rrul # 110 other tests >> > >>>>>> >> > >>>>>> >> > >>>>>> On Wed, Oct 19, 2022 at 6:44 AM Herbert Wolverson via LibreQoS >> > >>>>>> wrote: >> > >>>>>> > >> > >>>>>> > Hey, >> > >>>>>> > >> > >>>>>> > Testing the current version ( >> https://github.com/thebracket/cpumap-pping-hackjob ), it's doing better >> than I hoped. This build has shared (not per-cpu) maps, and a userspace >> daemon (xdp_pping) to extract and reset stats. >> > >>>>>> > >> > >>>>>> > My testing environment has grown a bit: >> > >>>>>> > * ShaperVM - running Ubuntu Server and LibreQoS, with the new >> cpumap-pping-hackjob version of xdp-cpumap. >> > >>>>>> > * ExtTest - running Ubuntu Server, set as 10.64.1.1. Hosts an >> iperf server. >> > >>>>>> > * ClientInt1 - running Ubuntu Server (minimal), set as >> 10.64.1.2. Hosts iperf client. >> > >>>>>> > * ClientInt2 - running Ubuntu Server (minimal), set as >> 10.64.1.3. Hosts iperf client. >> > >>>>>> > >> > >>>>>> > ClientInt1, ClientInt2 and one interface (LAN facing) of >> ShaperVM are on a virtual switch. >> > >>>>>> > ExtTest and the other interface (WAN facing) of ShaperVM are >> on a different virtual switch. >> > >>>>>> > >> > >>>>>> > These are all on a host machine running Windows 11, a core i7 >> 12th gen, 32 Gb RAM and fast SSD setup. >> > >>>>>> > >> > >>>>>> > TEST 1: DUAL STREAMS, LOW THROUGHPUT >> > >>>>>> > >> > >>>>>> > For this test, LibreQoS is configured: >> > >>>>>> > * Two APs, each with 5gbit/s max. >> > >>>>>> > * 100.64.1.2 and 100.64.1.3 setup as CPEs, each limited to >> about 100mbit/s. They map to 1:5 and 2:5 respectively (separate CPUs). >> > >>>>>> > * Set to use Cake >> > >>>>>> > >> > >>>>>> > On each client, roughly simultaneously run: iperf -c >> 100.64.1.1 -t 500 (for a long run). Running xdp_pping yields correct >> results: >> > >>>>>> > >> > >>>>>> > [ >> > >>>>>> > {"tc":"1:5", "avg" : 4, "min" : 3, "max" : 5, "samples" : 11}= , >> > >>>>>> > {"tc":"2:5", "avg" : 4, "min" : 3, "max" : 5, "samples" : 11}= , >> > >>>>>> > {}] >> > >>>>>> > >> > >>>>>> > Or when I waited a while to gather/reset: >> > >>>>>> > >> > >>>>>> > [ >> > >>>>>> > {"tc":"1:5", "avg" : 4, "min" : 3, "max" : 6, "samples" : 60}= , >> > >>>>>> > {"tc":"2:5", "avg" : 4, "min" : 3, "max" : 5, "samples" : 60}= , >> > >>>>>> > {}] >> > >>>>>> > >> > >>>>>> > The ShaperVM shows no errors, just periodic logging that it i= s >> recording data. CPU is about 2-3% on two CPUs, zero on the others (as >> expected). >> > >>>>>> > >> > >>>>>> > After 500 seconds of continual iperfing, each client reported >> a throughput of 104 Mbit/sec and 6.06 GBytes of data transmitted. >> > >>>>>> > >> > >>>>>> > So for smaller streams, I'd call this a success. >> > >>>>>> > >> > >>>>>> > TEST 2: DUAL STREAMS, HIGH THROUGHPUT >> > >>>>>> > >> > >>>>>> > For this test, LibreQoS is configured: >> > >>>>>> > * Two APs, each with 5gb/s max. >> > >>>>>> > * 100.64.1.2 and 100.64.1.3 setup as CPEs, each limited to >> 5Gbit/s! Mapped to 1:5 and 2:5 respectively (separate CPUs). >> > >>>>>> > >> > >>>>>> > Run iperfc -c 100.64.1.1 -t 500 on each client at the same >> time. >> > >>>>>> > >> > >>>>>> > xdp_pping shows results, too: >> > >>>>>> > >> > >>>>>> > [ >> > >>>>>> > {"tc":"1:5", "avg" : 4, "min" : 1, "max" : 7, "samples" : 58}= , >> > >>>>>> > {"tc":"2:5", "avg" : 7, "min" : 3, "max" : 11, "samples" : 58= }, >> > >>>>>> > {}] >> > >>>>>> > >> > >>>>>> > [ >> > >>>>>> > {"tc":"1:5", "avg" : 5, "min" : 4, "max" : 8, "samples" : 13}= , >> > >>>>>> > {"tc":"2:5", "avg" : 8, "min" : 7, "max" : 10, "samples" : 13= }, >> > >>>>>> > {}] >> > >>>>>> > >> > >>>>>> > The ShaperVM shows two CPUs pegging between 70 and 90 percent= . >> > >>>>>> > >> > >>>>>> > After 500 seconds of continual iperfing, each client reported >> a throughput of 2.72 Gbits/sec (158 GBytes) and 3.89 Gbits/sec and 226 >> GBytes. >> > >>>>>> > >> > >>>>>> > Maxing out HyperV like this is inducing a bit of latency >> (which is to be expected), but it's not bad. I also forgot to disable >> hyperthreading, and looking at the host performance it is sometimes runn= ing >> the second virtual CPU on an underpowered "fake" CPU. >> > >>>>>> > >> > >>>>>> > So for two large streams, I think we're doing pretty well als= o! >> > >>>>>> > >> > >>>>>> > TEST 3: DUAL STREAMS, SINGLE CPU >> > >>>>>> > >> > >>>>>> > This test is designed to try and blow things up. It's the sam= e >> as test 2, but both CPEs are set to the same CPU (1), using TC handles 1= :5 >> and 1:6. >> > >>>>>> > >> > >>>>>> > ShaperVM CPU1 maxed out in the high 90s, the other CPUs were >> idle. The pping stats start to show a bit of degradation in performance = for >> pounding it so hard: >> > >>>>>> > >> > >>>>>> > [ >> > >>>>>> > {"tc":"1:6", "avg" : 10, "min" : 9, "max" : 19, "samples" : >> 24}, >> > >>>>>> > {"tc":"1:5", "avg" : 10, "min" : 8, "max" : 18, "samples" : >> 24}, >> > >>>>>> > {}] >> > >>>>>> > >> > >>>>>> > For whatever reason, it smoothed out over time: >> > >>>>>> > >> > >>>>>> > [ >> > >>>>>> > {"tc":"1:6", "avg" : 10, "min" : 9, "max" : 12, "samples" : >> 50}, >> > >>>>>> > {"tc":"1:5", "avg" : 10, "min" : 8, "max" : 13, "samples" : >> 50}, >> > >>>>>> > {}] >> > >>>>>> > >> > >>>>>> > Surprisingly (to me), I didn't encounter errors. Each client >> received 2.22 Gbit/s performance, over 129 Gbytes of data. >> > >>>>>> > >> > >>>>>> > TEST 4: DUAL STREAMS, 50 SUB-STREAMS >> > >>>>>> > >> > >>>>>> > This test is also designed to break things. Same as test 3, >> but using iperf -c 100.64.1.1 -P 50 -t 120 - 50 substreams, to try and >> really tax the flow tracking. (Shorter time window because I really want= ed >> to go and find coffee) >> > >>>>>> > >> > >>>>>> > ShaperVM CPU sat at around 80-97%, tending towards 97%. pping >> results show that this torture test is worsening performance, and there'= s >> always lots of samples in the buffer: >> > >>>>>> > >> > >>>>>> > [ >> > >>>>>> > {"tc":"1:6", "avg" : 23, "min" : 19, "max" : 27, "samples" : >> 49}, >> > >>>>>> > {"tc":"1:5", "avg" : 24, "min" : 19, "max" : 27, "samples" : >> 49}, >> > >>>>>> > {}] >> > >>>>>> > >> > >>>>>> > This test also ran better than I expected. You can definitely >> see some latency creeping in as I make the system work hard. Each VM sho= wed >> around 2.4 Gbit/s in total performance at the end of the iperf session. >> There's definitely some latency creeping in, which is expected - but I'm >> not sure I expected quite that much. >> > >>>>>> > >> > >>>>>> > WHAT'S NEXT & CONCLUSION >> > >>>>>> > >> > >>>>>> > I noticed that I forgot to turn off efficient power managemen= t >> on my VMs and host, and left Hyperthreading on by mistake. So that hurts >> overall performance. >> > >>>>>> > >> > >>>>>> > The base system seems to be working pretty solidly, at least >> for small tests.Next up, I'll be removing extraneous debug reporting cod= e, >> removing some code paths that don't do anything but report, and looking = for >> any small optimization opportunities. I'll then re-run these tests. Once >> that's done, I hope to find a maintenance window on my WISP and try it w= ith >> actual traffic. >> > >>>>>> > >> > >>>>>> > I also need to re-run these tests without the pping system to >> provide some before/after analysis. >> > >>>>>> > >> > >>>>>> > On Tue, Oct 18, 2022 at 1:01 PM Herbert Wolverson < >> herberticus@gmail.com> wrote: >> > >>>>>> >> >> > >>>>>> >> It's probably not entirely thread-safe right now (ran into >> some issues reading per_cpu maps back from userspace; hopefully, I'll ge= t >> that figured out) - but the commits I just pushed have it basically work= ing >> on single-stream testing. :-) >> > >>>>>> >> >> > >>>>>> >> Setup cpumap as usual, and periodically run xdp-pping. This >> gives you per-connection RTT information in JSON: >> > >>>>>> >> >> > >>>>>> >> [ >> > >>>>>> >> {"tc":"1:5", "avg" : 5, "min" : 5, "max" : 5, "samples" : 1}= , >> > >>>>>> >> {}] >> > >>>>>> >> >> > >>>>>> >> (With the extra {} because I'm not tracking the tail and >> haven't done comma removal). The tool also empties the various maps used= to >> gather data, acting as a "reset" point. There's a max of 60 samples per >> queue, in a ringbuffer setup (so newest will start to overwrite the olde= st). >> > >>>>>> >> >> > >>>>>> >> I'll start trying to test on a larger scale now. >> > >>>>>> >> >> > >>>>>> >> On Mon, Oct 17, 2022 at 3:34 PM Robert Chac=C3=B3n < >> robert.chacon@jackrabbitwireless.com> wrote: >> > >>>>>> >>> >> > >>>>>> >>> Hey Herbert, >> > >>>>>> >>> >> > >>>>>> >>> Fantastic work! Super exciting to see this coming together, >> especially so quickly. >> > >>>>>> >>> I'll test it soon. >> > >>>>>> >>> I understand and agree with your decision to omit certain >> features (ICMP tracking,DNS tracking, etc) to optimize performance for o= ur >> use case. Like you said, in order to merge the functionality without a >> performance hit, merging them is sort of the only way right now. Otherwi= se >> there would be a lot of redundancy and lost throughput for an ISP's use. >> Though hopefully long term there will be a way to keep all projects work= ing >> independently but interoperably with a plugin system of some kind. >> > >>>>>> >>> >> > >>>>>> >>> By the way, I'm making some headway on LibreQoS v1.3. >> Focusing on optimizations for high sub counts (8000+ subs) as well as >> stateful changes to the queue structure. >> > >>>>>> >>> I'm working to set up a physical lab to test high throughpu= t >> and high client count scenarios. >> > >>>>>> >>> When testing beyond ~32,000 filters we get "no space left o= n >> device" from xdp-cpumap-tc, which I think relates to the bpf map size >> limitation you mentioned. Maybe in the coming months we can take a look = at >> that. >> > >>>>>> >>> >> > >>>>>> >>> Anyway great work on the cpumap-pping program! Excited to >> see more on this. >> > >>>>>> >>> >> > >>>>>> >>> Thanks, >> > >>>>>> >>> Robert >> > >>>>>> >>> >> > >>>>>> >>> On Mon, Oct 17, 2022 at 12:45 PM Herbert Wolverson via >> LibreQoS wrote: >> > >>>>>> >>>> >> > >>>>>> >>>> Hey, >> > >>>>>> >>>> >> > >>>>>> >>>> My current (unfinished) progress on this is now available >> here: https://github.com/thebracket/cpumap-pping-hackjob >> > >>>>>> >>>> >> > >>>>>> >>>> I mean it about the warnings, this isn't at all stable, >> debugged - and can't promise that it won't unleash the nasal demons >> > >>>>>> >>>> (to use a popular C++ phrase). The name is descriptive! ;-= ) >> > >>>>>> >>>> >> > >>>>>> >>>> With that said, I'm pretty happy so far: >> > >>>>>> >>>> >> > >>>>>> >>>> * It runs only on the classifier - which xdp-cpumap-tc has >> nicely shunted onto a dedicated CPU. It has to run on both >> > >>>>>> >>>> the inbound and outbound classifiers, since otherwise it >> would only see half the conversation. >> > >>>>>> >>>> * It does assume that your ingress and egress CPUs are >> mapped to the same interface; I do that anyway in BracketQoS. Not doing >> > >>>>>> >>>> that opens up a potential world of pain, since writes to >> the shared maps would require a locking scheme. Too much locking, and yo= u >> lose all of the benefit of using multiple CPUs to begin with. >> > >>>>>> >>>> * It is pretty wasteful of RAM, but most of the shaper >> systems I've worked with have lots of it. >> > >>>>>> >>>> * I've been gradually removing features that I don't want >> for BracketQoS. A hypothetical future "useful to everyone" version would= n't >> do that. >> > >>>>>> >>>> * Rate limiting is working, but I removed the requirement >> for a shared configuration provided from userland - so right now it's >> always set to report at 1 second intervals per stream. >> > >>>>>> >>>> >> > >>>>>> >>>> My testbed is currently 3 Hyper-V VMs - a simple "client" >> and "world", and a "shaper" VM in between running a slightly hacked-up >> LibreQoS. >> > >>>>>> >>>> iperf from "client" to "world" (with Libre set to allow >> 10gbit/s max, via a cake/HTB queue setup) is around 5 gbit/s at present,= on >> my >> > >>>>>> >>>> test PC (the host is a core i7, 12th gen, 12 cores - 64gb >> RAM and fast SSDs) >> > >>>>>> >>>> >> > >>>>>> >>>> Output currently consists of debug messages reading: >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 515.399222: >> bpf_trace_printk: (tc) Flow open event >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 515.399239: >> bpf_trace_printk: (tc) Send performance event (5,1), 374696 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 515.399466: >> bpf_trace_printk: (tc) Flow open event >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 515.399475: >> bpf_trace_printk: (tc) Send performance event (5,1), 247069 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 516.405151: >> bpf_trace_printk: (tc) Send performance event (5,1), 5217155 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 517.405248: >> bpf_trace_printk: (tc) Send performance event (5,1), 4515394 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 518.406117: >> bpf_trace_printk: (tc) Send performance event (5,1), 4481289 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 519.406255: >> bpf_trace_printk: (tc) Send performance event (5,1), 4255268 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 520.407864: >> bpf_trace_printk: (tc) Send performance event (5,1), 5249493 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 521.406664: >> bpf_trace_printk: (tc) Send performance event (5,1), 3795993 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 522.407469: >> bpf_trace_printk: (tc) Send performance event (5,1), 3949519 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 523.408126: >> bpf_trace_printk: (tc) Send performance event (5,1), 4365335 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 524.408929: >> bpf_trace_printk: (tc) Send performance event (5,1), 4154910 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 525.410048: >> bpf_trace_printk: (tc) Send performance event (5,1), 4405582 >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 525.434080: >> bpf_trace_printk: (tc) Send flow event >> > >>>>>> >>>> cpumap/0/map:4-1371 [000] D..2. 525.482714: >> bpf_trace_printk: (tc) Send flow event >> > >>>>>> >>>> >> > >>>>>> >>>> The times haven't been tweaked yet. The (5,1) is tc handle >> major/minor, allocated by the xdp-cpumap parent. >> > >>>>>> >>>> I get pretty low latency between VMs; I'll set up a test >> with some real-world data very soon. >> > >>>>>> >>>> >> > >>>>>> >>>> I plan to keep hacking away, but feel free to take a peek. >> > >>>>>> >>>> >> > >>>>>> >>>> Thanks, >> > >>>>>> >>>> Herbert >> > >>>>>> >>>> >> > >>>>>> >>>> On Mon, Oct 17, 2022 at 10:14 AM Simon Sundberg < >> Simon.Sundberg@kau.se> wrote: >> > >>>>>> >>>>> >> > >>>>>> >>>>> Hi, thanks for adding me to the conversation. Just a >> couple of quick >> > >>>>>> >>>>> notes. >> > >>>>>> >>>>> >> > >>>>>> >>>>> On Mon, 2022-10-17 at 16:13 +0200, Toke H=C3=B8iland-J=C3= =B8rgensen >> wrote: >> > >>>>>> >>>>> > [ Adding Simon to Cc ] >> > >>>>>> >>>>> > >> > >>>>>> >>>>> > Herbert Wolverson via LibreQoS < >> libreqos@lists.bufferbloat.net> writes: >> > >>>>>> >>>>> > >> > >>>>>> >>>>> > > Hey, >> > >>>>>> >>>>> > > >> > >>>>>> >>>>> > > I've had some pretty good success with merging >> xdp-pping ( >> > >>>>>> >>>>> > > >> https://github.com/xdp-project/bpf-examples/blob/master/pping/pping.h ) >> > >>>>>> >>>>> > > into xdp-cpumap-tc ( >> https://github.com/xdp-project/xdp-cpumap-tc ). >> > >>>>>> >>>>> > > >> > >>>>>> >>>>> > > I ported over most of the xdp-pping code, and then >> changed the entry point >> > >>>>>> >>>>> > > and packet parsing code to make use of the work >> already done in >> > >>>>>> >>>>> > > xdp-cpumap-tc (it's already parsed a big chunk of the >> packet, no need to do >> > >>>>>> >>>>> > > it twice). Then I switched the maps to per-cpu maps, >> and had to pin them - >> > >>>>>> >>>>> > > otherwise the two tc instances don't properly share >> data. >> > >>>>>> >>>>> > > >> > >>>>>> >>>>> >> > >>>>>> >>>>> I guess the xdp-cpumap-tc ensures that the same flow is >> processed on >> > >>>>>> >>>>> the same CPU core at both ingress or egress. Otherwise, i= f >> a flow may >> > >>>>>> >>>>> be processed by different cores on ingress and egress the >> per-CPU maps >> > >>>>>> >>>>> will not really work reliably as each core will have a >> different view >> > >>>>>> >>>>> on the state of the flow, if there's been a previous >> packet with a >> > >>>>>> >>>>> certain TSval from that flow etc. >> > >>>>>> >>>>> >> > >>>>>> >>>>> Furthermore, if a flow is always processed on the same >> core (on both >> > >>>>>> >>>>> ingress and egress) I think per-CPU maps may be a bit >> wasteful on >> > >>>>>> >>>>> memory. From my understanding the keys for per-CPU maps >> are still >> > >>>>>> >>>>> shared across all CPUs, it's just that each CPU gets its >> own value. So >> > >>>>>> >>>>> all CPUs will then have their own data for each flow, but >> it's only the >> > >>>>>> >>>>> CPU processing the flow that will have any relevant data >> for the flow >> > >>>>>> >>>>> while the remaining CPUs will just have an empty state fo= r >> that flow. >> > >>>>>> >>>>> Under the same assumption that packets within the same >> flow are always >> > >>>>>> >>>>> processed on the same core there should generally not be >> any >> > >>>>>> >>>>> concurrency issues with having a global (non-per-CPU) >> either as packets >> > >>>>>> >>>>> from the same flow cannot be processed concurrently then >> (and thus no >> > >>>>>> >>>>> concurrent access to the same value in the map). I am >> however still >> > >>>>>> >>>>> very unclear on if there's any considerable performance >> impact between >> > >>>>>> >>>>> global and per-CPU map versions if the same key is not >> accessed >> > >>>>>> >>>>> concurrently. >> > >>>>>> >>>>> >> > >>>>>> >>>>> > > Right now, output >> > >>>>>> >>>>> > > is just stubbed - I've still got to port the perfmap >> output code. Instead, >> > >>>>>> >>>>> > > I'm dumping a bunch of extra data to the kernel debug >> pipe, so I can see >> > >>>>>> >>>>> > > roughly what the output would look like. >> > >>>>>> >>>>> > > >> > >>>>>> >>>>> > > With debug enabled and just logging I'm now getting >> about 4.9 Gbits/sec on >> > >>>>>> >>>>> > > single-stream iperf between two VMs (with a shaper VM >> in the middle). :-) >> > >>>>>> >>>>> > >> > >>>>>> >>>>> > Just FYI, that "just logging" is probably the biggest >> source of >> > >>>>>> >>>>> > overhead, then. What Simon found was that sending the >> data from kernel >> > >>>>>> >>>>> > to userspace is one of the most expensive bits of >> epping, at least when >> > >>>>>> >>>>> > the number of data points goes up (which is does as >> additional flows are >> > >>>>>> >>>>> > added). >> > >>>>>> >>>>> >> > >>>>>> >>>>> Yhea, reporting individual RTTs when there's lots of them >> (you may get >> > >>>>>> >>>>> upwards of 1000 RTTs/s per flow) is not only problematic >> in terms of >> > >>>>>> >>>>> direct overhead from the tool itself, but also becomes >> demanding for >> > >>>>>> >>>>> whatever you use all those RTT samples for (i.e. need to >> log, parse, >> > >>>>>> >>>>> analyze etc. a very large amount of RTTs). One way to dea= l >> with that is >> > >>>>>> >>>>> of course to just apply some sort of sampling (the >> -r/--rate-limit and >> > >>>>>> >>>>> -R/--rtt-rate >> > >>>>>> >>>>> > >> > >>>>>> >>>>> > > So my question: how would you prefer to receive this >> data? I'll have to >> > >>>>>> >>>>> > > write a daemon that provides userspace control >> (periodic cleanup as well as >> > >>>>>> >>>>> > > reading the performance stream), so the world's kinda >> our oyster. I can >> > >>>>>> >>>>> > > stick to Kathie's original format (and dump it to a >> named pipe, perhaps?), >> > >>>>>> >>>>> > > a condensed format that only shows what you want to >> use, an efficient >> > >>>>>> >>>>> > > binary format if you feel like parsing that... >> > >>>>>> >>>>> > >> > >>>>>> >>>>> > It would be great if we could combine efforts a bit her= e >> so we don't >> > >>>>>> >>>>> > fork the codebase more than we have to. I.e., if >> "upstream" epping and >> > >>>>>> >>>>> > whatever daemon you end up writing can agree on data >> format etc that >> > >>>>>> >>>>> > would be fantastic! Added Simon to Cc to facilitate thi= s >> :) >> > >>>>>> >>>>> > >> > >>>>>> >>>>> > Briefly what I've discussed before with Simon was to >> have the ability to >> > >>>>>> >>>>> > aggregate the metrics in the kernel (WiP PR [0]) and >> have a userspace >> > >>>>>> >>>>> > utility periodically pull them out. What we discussed >> was doing this >> > >>>>>> >>>>> > using an LPM map (which is not in that PR yet). The ide= a >> would be that >> > >>>>>> >>>>> > userspace would populate the LPM map with the keys >> (prefixes) they >> > >>>>>> >>>>> > wanted statistics for (in LibreQOS context that could b= e >> one key per >> > >>>>>> >>>>> > customer, for instance). Epping would then do a map >> lookup into the LPM, >> > >>>>>> >>>>> > and if it gets a match it would update the statistics i= n >> that map entry >> > >>>>>> >>>>> > (keeping a histogram of latency values seen, basically)= . >> Simon's PR >> > >>>>>> >>>>> > below uses this technique where userspace will "reset" >> the histogram >> > >>>>>> >>>>> > every time it loads it by swapping out two different ma= p >> entries when it >> > >>>>>> >>>>> > does a read; this allows you to control the sampling >> rate from >> > >>>>>> >>>>> > userspace, and you'll just get the data since the last >> time you polled. >> > >>>>>> >>>>> >> > >>>>>> >>>>> Thank's Toke for summarzing both the current state and th= e >> plan going >> > >>>>>> >>>>> forward. I will just note that this PR (and all my other >> work with >> > >>>>>> >>>>> ePPing/BPF-PPing/XDP-PPing/I-suck-at-names-PPing) will be >> more or less >> > >>>>>> >>>>> on hold for a couple of weeks right now as I'm trying to >> finish up a >> > >>>>>> >>>>> paper. >> > >>>>>> >>>>> >> > >>>>>> >>>>> > I was thinking that if we all can agree on the map >> format, then your >> > >>>>>> >>>>> > polling daemon could be one userspace "client" for that= , >> and the epping >> > >>>>>> >>>>> > binary itself could be another; but we could keep >> compatibility between >> > >>>>>> >>>>> > the two, so we don't duplicate effort. >> > >>>>>> >>>>> > >> > >>>>>> >>>>> > Similarly, refactoring of the epping code itself so it >> can be plugged >> > >>>>>> >>>>> > into the cpumap-tc code would be a good goal... >> > >>>>>> >>>>> >> > >>>>>> >>>>> Should probably do that...at some point. In general I >> think it's a bit >> > >>>>>> >>>>> of an interesting problem to think about how to chain >> multiple XDP/tc >> > >>>>>> >>>>> programs together in an efficent way. Most XDP and tc >> programs will do >> > >>>>>> >>>>> some amount of packet parsing and when you have many >> chained programs >> > >>>>>> >>>>> parsing the same packets this obviously becomes a bit >> wasteful. In the >> > >>>>>> >>>>> same time it would be nice if one didn't need to manually >> merge >> > >>>>>> >>>>> multiple programs together into a single one like this to >> get rid of >> > >>>>>> >>>>> this duplicated parsing, or at least make that process of >> merging those >> > >>>>>> >>>>> programs as simple as possible. >> > >>>>>> >>>>> >> > >>>>>> >>>>> >> > >>>>>> >>>>> > -Toke >> > >>>>>> >>>>> > >> > >>>>>> >>>>> > [0] https://github.com/xdp-project/bpf-examples/pull/59 >> > >>>>>> >>>>> >> > >>>>>> >>>>> N=C3=A4r du skickar e-post till Karlstads universitet beh= andlar >> vi dina personuppgifter. >> > >>>>>> >>>>> When you send an e-mail to Karlstad University, we will >> process your personal data. >> > >>>>>> >>>> >> > >>>>>> >>>> _______________________________________________ >> > >>>>>> >>>> LibreQoS mailing list >> > >>>>>> >>>> LibreQoS@lists.bufferbloat.net >> > >>>>>> >>>> https://lists.bufferbloat.net/listinfo/libreqos >> > >>>>>> >>> >> > >>>>>> >>> >> > >>>>>> >>> >> > >>>>>> >>> -- >> > >>>>>> >>> Robert Chac=C3=B3n >> > >>>>>> >>> CEO | JackRabbit Wireless LLC >> > >>>>>> > >> > >>>>>> > _______________________________________________ >> > >>>>>> > LibreQoS mailing list >> > >>>>>> > LibreQoS@lists.bufferbloat.net >> > >>>>>> > https://lists.bufferbloat.net/listinfo/libreqos >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> -- >> > >>>>>> This song goes out to all the folk that thought Stadia would >> work: >> > >>>>>> >> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-69813666= 65607352320-FXtz >> > >>>>>> Dave T=C3=A4ht CEO, TekLibre, LLC >> > >>>> >> > >>>> _______________________________________________ >> > >>>> LibreQoS mailing list >> > >>>> LibreQoS@lists.bufferbloat.net >> > >>>> https://lists.bufferbloat.net/listinfo/libreqos >> > >>> >> > >>> >> > >>> >> > >>> -- >> > >>> Robert Chac=C3=B3n >> > >>> CEO | JackRabbit Wireless LLC >> > >>> _______________________________________________ >> > >>> LibreQoS mailing list >> > >>> LibreQoS@lists.bufferbloat.net >> > >>> https://lists.bufferbloat.net/listinfo/libreqos >> > > >> > > _______________________________________________ >> > > LibreQoS mailing list >> > > LibreQoS@lists.bufferbloat.net >> > > https://lists.bufferbloat.net/listinfo/libreqos >> > >> > >> > >> > -- >> > This song goes out to all the folk that thought Stadia would work: >> > >> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-69813666= 65607352320-FXtz >> > Dave T=C3=A4ht CEO, TekLibre, LLC >> >> >> >> -- >> This song goes out to all the folk that thought Stadia would work: >> >> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-69813666= 65607352320-FXtz >> Dave T=C3=A4ht CEO, TekLibre, LLC >> > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > --00000000000047775205eba0a443 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Awesome work! It's really amazing how little add= itional CPU the TCP tracking adds. Super excited to start testing in produc= tion myself soon. Have a great restful morning with your daughter. =F0=9F= =98=8C

it stopped= reporting latency data! A bit of debugging showed that once you exceed
16,384 in-flight TCP streams it isn't handling the "map ful= l" situation gracefully - and
clearing the map from userspac= e isn't working correctly. So I hacked away and hacked
away.<= /div>

Anyway, it turns out that it does in fact work fin= e at that scale. There's just a one-line
bug in the xdp_pping= .c file. I forgot to actually *call* one line of packet cleanup code.
<= /div>
Adding that, and everything was awesome.

The entire patch that fixed it consists of adding one line:
cleanup_packet_ts(packet_ts);

Oops.

Anyway, with that in pl= ace it's running superbly. I did identify a couple of places in
which it's being overly verbose with debug information, so I've = patched that also.

After reducing the overly eager= warning about not being able to read a TCP header,
CPU performan= ce improved by another 2% on average.

Longer-t= erm (i.e. not on a Saturday morning, when I'd rather be playing with my=
daughter!), I think I'll look at raising some of the buffer = sizes.

Thanks,
Herbert
PS - today's= (free) p99 conference is *REALLY AWESOME*. https://www.p99conf.io= /

On Wed, Oct 19, 2022 at 9:13 AM Dave Taht <dave.taht@gmail.com> = wrote:
>
> flent outputs a flent.gz file that I can parse and plot 20 differnt > ways. Also the graphing tools work on osx
>
> On Wed, Oct 19, 2022 at 9:11 AM Herbert Wolverson via LibreQoS
> <libreqos@lists.bufferbloat.net> wrote:
> >
> > That's true. The 12th gen does seem to have some "specia= l" features... makes for a nice writing platform
> > (this box is primarily my "write books and articles" ma= chine). I'll be doing a wider test on a more normal
> > platform, probably at the weekend (with real traffic, hence the d= elay - have to find a time in which I
> > minimize disruption)
> >
> > On Wed, Oct 19, 2022 at 10:49 AM dan <dandenson@gmail.com= > wrote:
> >>
> >> Those 'efficiency' threads in Intel 12th gen should p= robably be addressed as well.=C2=A0 You can't turn them off in BIOS. > >>
> >> On Wed, Oct 19, 2022 at 8:48 AM Robert Chac=C3=B3n via LibreQ= oS <libreqos@lists.bufferbloat.net> wrote:
> >>>
> >>> Awesome work on this!
> >>> I suspect there should be a slight performance bump once = Hyperthreading is disabled and efficient power management is off.
> >>> Hyperthreading/SMT always messes with HTB performance whe= n I leave it on. Thank you for mentioning that - I now went ahead and added= instructions on disabling hyperthreading on the Wiki for new users.
> >>> Super promising results!
> >>> Interested to see what throughput is with xdp-cpumap-tc v= s cpumap-pping. So far in your VM setup it seems to be doing very well.
> >>>
> >>> On Wed, Oct 19, 2022 at 8:06 AM Herbert Wolverson via Lib= reQoS <libreqos@lists.bufferbloat.net> wrote:
> >>>>
> >>>> Also, I forgot to mention that I *think* the current = version has removed the requirement that the inbound
> >>>> and outbound classifiers be placed on the same CPU. I= know interduo was particularly keen on packing
> >>>> upload into fewer cores. I'll add that to my list= of things to test.
> >>>>
> >>>> On Wed, Oct 19, 2022 at 9:01 AM Herbert Wolverson <= ;herberticus@gmail.com> wrote:
> >>>>>
> >>>>> I'll definitely take a look - that does look = interesting. I don't have X11 on any of my test VMs, but
> >>>>> it looks like it can work without the GUI.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> On Wed, Oct 19, 2022 at 8:58 AM Dave Taht <da= ve.taht@gmail.com> wrote:
> >>>>>>
> >>>>>> could I coax you to adopt flent?
> >>>>>>
> >>>>>> apt-get install flent netperf irtt fping
> >>>>>>
> >>>>>> You sometimes have to compile netperf yoursel= f with --enable-demo on
> >>>>>> some systems.
> >>>>>> There are a bunch of python libs neede for th= e gui, but only on the client.
> >>>>>>
> >>>>>> Then you can run a really gnarly test series = and plot the results over time.
> >>>>>>
> >>>>>> flent --socket-stats --step-size=3D.05 -t = 9;the-test-conditions' -H
> >>>>>> the_server_name rrul # 110 other tests
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Oct 19, 2022 at 6:44 AM Herbert Wolve= rson via LibreQoS
> >>>>>> <libreqos@lists.bufferbloat.ne= t> wrote:
> >>>>>> >
> >>>>>> > Hey,
> >>>>>> >
> >>>>>> > Testing the current version ( https://github.com/thebracket/cpumap-pping-hackjo= b ), it's doing better than I hoped. This build has shared (not per= -cpu) maps, and a userspace daemon (xdp_pping) to extract and reset stats.<= br> > >>>>>> >
> >>>>>> > My testing environment has grown a bit:<= br> > >>>>>> > * ShaperVM - running Ubuntu Server and L= ibreQoS, with the new cpumap-pping-hackjob version of xdp-cpumap.
> >>>>>> > * ExtTest - running Ubuntu Server, set a= s 10.64.1.1. Hosts an iperf server.
> >>>>>> > * ClientInt1 - running Ubuntu Server (mi= nimal), set as 10.64.1.2. Hosts iperf client.
> >>>>>> > * ClientInt2 - running Ubuntu Server (mi= nimal), set as 10.64.1.3. Hosts iperf client.
> >>>>>> >
> >>>>>> > ClientInt1, ClientInt2 and one interface= (LAN facing) of ShaperVM are on a virtual switch.
> >>>>>> > ExtTest and the other interface (WAN fac= ing) of ShaperVM are on a different virtual switch.
> >>>>>> >
> >>>>>> > These are all on a host machine running = Windows 11, a core i7 12th gen, 32 Gb RAM and fast SSD setup.
> >>>>>> >
> >>>>>> > TEST 1: DUAL STREAMS, LOW THROUGHPUT
> >>>>>> >
> >>>>>> > For this test, LibreQoS is configured: > >>>>>> > * Two APs, each with 5gbit/s max.
> >>>>>> > * 100.64.1.2 and 100.64.1.3 setup as CPE= s, each limited to about 100mbit/s. They map to 1:5 and 2:5 respectively (s= eparate CPUs).
> >>>>>> > * Set to use Cake
> >>>>>> >
> >>>>>> > On each client, roughly simultaneously r= un: iperf -c 100.64.1.1 -t 500 (for a long run). Running xdp_pping yields c= orrect results:
> >>>>>> >
> >>>>>> > [
> >>>>>> > {"tc":"1:5", "a= vg" : 4, "min" : 3, "max" : 5, "samples"= : 11},
> >>>>>> > {"tc":"2:5", "a= vg" : 4, "min" : 3, "max" : 5, "samples"= : 11},
> >>>>>> > {}]
> >>>>>> >
> >>>>>> > Or when I waited a while to gather/reset= :
> >>>>>> >
> >>>>>> > [
> >>>>>> > {"tc":"1:5", "a= vg" : 4, "min" : 3, "max" : 6, "samples"= : 60},
> >>>>>> > {"tc":"2:5", "a= vg" : 4, "min" : 3, "max" : 5, "samples"= : 60},
> >>>>>> > {}]
> >>>>>> >
> >>>>>> > The ShaperVM shows no errors, just perio= dic logging that it is recording data.=C2=A0 CPU is about 2-3% on two CPUs,= zero on the others (as expected).
> >>>>>> >
> >>>>>> > After 500 seconds of continual iperfing,= each client reported a throughput of 104 Mbit/sec and 6.06 GBytes of data = transmitted.
> >>>>>> >
> >>>>>> > So for smaller streams, I'd call thi= s a success.
> >>>>>> >
> >>>>>> > TEST 2: DUAL STREAMS, HIGH THROUGHPUT > >>>>>> >
> >>>>>> > For this test, LibreQoS is configured: > >>>>>> > * Two APs, each with 5gb/s max.
> >>>>>> > * 100.64.1.2 and 100.64.1.3 setup as CPE= s, each limited to 5Gbit/s! Mapped to 1:5 and 2:5 respectively (separate CP= Us).
> >>>>>> >
> >>>>>> > Run iperfc -c 100.64.1.1 -t 500 on each = client at the same time.
> >>>>>> >
> >>>>>> > xdp_pping shows results, too:
> >>>>>> >
> >>>>>> > [
> >>>>>> > {"tc":"1:5", "a= vg" : 4, "min" : 1, "max" : 7, "samples"= : 58},
> >>>>>> > {"tc":"2:5", "a= vg" : 7, "min" : 3, "max" : 11, "samples"= ; : 58},
> >>>>>> > {}]
> >>>>>> >
> >>>>>> > [
> >>>>>> > {"tc":"1:5", "a= vg" : 5, "min" : 4, "max" : 8, "samples"= : 13},
> >>>>>> > {"tc":"2:5", "a= vg" : 8, "min" : 7, "max" : 10, "samples"= ; : 13},
> >>>>>> > {}]
> >>>>>> >
> >>>>>> > The ShaperVM shows two CPUs pegging betw= een 70 and 90 percent.
> >>>>>> >
> >>>>>> > After 500 seconds of continual iperfing,= each client reported a throughput of 2.72 Gbits/sec (158 GBytes) and 3.89 = Gbits/sec and 226 GBytes.
> >>>>>> >
> >>>>>> > Maxing out HyperV like this is inducing = a bit of latency (which is to be expected), but it's not bad. I also fo= rgot to disable hyperthreading, and looking at the host performance it is s= ometimes running the second virtual CPU on an underpowered "fake"= CPU.
> >>>>>> >
> >>>>>> > So for two large streams, I think we'= ;re doing pretty well also!
> >>>>>> >
> >>>>>> > TEST 3: DUAL STREAMS, SINGLE CPU
> >>>>>> >
> >>>>>> > This test is designed to try and blow th= ings up. It's the same as test 2, but both CPEs are set to the same CPU= (1), using TC handles 1:5 and 1:6.
> >>>>>> >
> >>>>>> > ShaperVM CPU1 maxed out in the high 90s,= the other CPUs were idle. The pping stats start to show a bit of degradati= on in performance for pounding it so hard:
> >>>>>> >
> >>>>>> > [
> >>>>>> > {"tc":"1:6", "a= vg" : 10, "min" : 9, "max" : 19, "samples&quo= t; : 24},
> >>>>>> > {"tc":"1:5", "a= vg" : 10, "min" : 8, "max" : 18, "samples&quo= t; : 24},
> >>>>>> > {}]
> >>>>>> >
> >>>>>> > For whatever reason, it smoothed out ove= r time:
> >>>>>> >
> >>>>>> > [
> >>>>>> > {"tc":"1:6", "a= vg" : 10, "min" : 9, "max" : 12, "samples&quo= t; : 50},
> >>>>>> > {"tc":"1:5", "a= vg" : 10, "min" : 8, "max" : 13, "samples&quo= t; : 50},
> >>>>>> > {}]
> >>>>>> >
> >>>>>> > Surprisingly (to me), I didn't encou= nter errors. Each client received 2.22 Gbit/s performance, over 129 Gbytes = of data.
> >>>>>> >
> >>>>>> > TEST 4: DUAL STREAMS, 50 SUB-STREAMS
> >>>>>> >
> >>>>>> > This test is also designed to break thin= gs. Same as test 3, but using iperf -c 100.64.1.1 -P 50 -t 120 - 50 substre= ams, to try and really tax the flow tracking. (Shorter time window because = I really wanted to go and find coffee)
> >>>>>> >
> >>>>>> > ShaperVM CPU sat at around 80-97%, tendi= ng towards 97%. pping results show that this torture test is worsening perf= ormance, and there's always lots of samples in the buffer:
> >>>>>> >
> >>>>>> > [
> >>>>>> > {"tc":"1:6", "a= vg" : 23, "min" : 19, "max" : 27, "samples&qu= ot; : 49},
> >>>>>> > {"tc":"1:5", "a= vg" : 24, "min" : 19, "max" : 27, "samples&qu= ot; : 49},
> >>>>>> > {}]
> >>>>>> >
> >>>>>> > This test also ran better than I expecte= d. You can definitely see some latency creeping in as I make the system wor= k hard. Each VM showed around 2.4 Gbit/s in total performance at the end of= the iperf session. There's definitely some latency creeping in, which = is expected - but I'm not sure I expected quite that much.
> >>>>>> >
> >>>>>> > WHAT'S NEXT & CONCLUSION
> >>>>>> >
> >>>>>> > I noticed that I forgot to turn off effi= cient power management on my VMs and host, and left Hyperthreading on by mi= stake. So that hurts overall performance.
> >>>>>> >
> >>>>>> > The base system seems to be working pret= ty solidly, at least for small tests.Next up, I'll be removing extraneo= us debug reporting code, removing some code paths that don't do anythin= g but report, and looking for any small optimization opportunities. I'l= l then re-run these tests. Once that's done, I hope to find a maintenan= ce window on my WISP and try it with actual traffic.
> >>>>>> >
> >>>>>> > I also need to re-run these tests withou= t the pping system to provide some before/after analysis.
> >>>>>> >
> >>>>>> > On Tue, Oct 18, 2022 at 1:01 PM Herbert = Wolverson <herberticus@gmail.com> wrote:
> >>>>>> >>
> >>>>>> >> It's probably not entirely threa= d-safe right now (ran into some issues reading per_cpu maps back from users= pace; hopefully, I'll get that figured out) - but the commits I just pu= shed have it basically working on single-stream testing. :-)
> >>>>>> >>
> >>>>>> >> Setup cpumap as usual, and periodica= lly run xdp-pping. This gives you per-connection RTT information in JSON: > >>>>>> >>
> >>>>>> >> [
> >>>>>> >> {"tc":"1:5", &qu= ot;avg" : 5, "min" : 5, "max" : 5, "samples&q= uot; : 1},
> >>>>>> >> {}]
> >>>>>> >>
> >>>>>> >> (With the extra {} because I'm n= ot tracking the tail and haven't done comma removal). The tool also emp= ties the various maps used to gather data, acting as a "reset" po= int. There's a max of 60 samples per queue, in a ringbuffer setup (so n= ewest will start to overwrite the oldest).
> >>>>>> >>
> >>>>>> >> I'll start trying to test on a l= arger scale now.
> >>>>>> >>
> >>>>>> >> On Mon, Oct 17, 2022 at 3:34 PM Robe= rt Chac=C3=B3n <robert.chacon@jackrabbitwireless.com> wrote:
> >>>>>> >>>
> >>>>>> >>> Hey Herbert,
> >>>>>> >>>
> >>>>>> >>> Fantastic work! Super exciting t= o see this coming together, especially so quickly.
> >>>>>> >>> I'll test it soon.
> >>>>>> >>> I understand and agree with your= decision to omit certain features (ICMP tracking,DNS tracking, etc) to opt= imize performance for our use case. Like you said, in order to merge the fu= nctionality without a performance hit, merging them is sort of the only way= right now. Otherwise there would be a lot of redundancy and lost throughpu= t for an ISP's use. Though hopefully long term there will be a way to k= eep all projects working independently but interoperably with a plugin syst= em of some kind.
> >>>>>> >>>
> >>>>>> >>> By the way, I'm making some = headway on LibreQoS v1.3. Focusing on optimizations for high sub counts (80= 00+ subs) as well as stateful changes to the queue structure.
> >>>>>> >>> I'm working to set up a phys= ical lab to test high throughput and high client count scenarios.
> >>>>>> >>> When testing beyond ~32,000 filt= ers we get "no space left on device" from xdp-cpumap-tc, which I = think relates to the bpf map size limitation you mentioned. Maybe in the co= ming months we can take a look at that.
> >>>>>> >>>
> >>>>>> >>> Anyway great work on the cpumap-= pping program! Excited to see more on this.
> >>>>>> >>>
> >>>>>> >>> Thanks,
> >>>>>> >>> Robert
> >>>>>> >>>
> >>>>>> >>> On Mon, Oct 17, 2022 at 12:45 PM= Herbert Wolverson via LibreQoS <
libreqos@lists.bufferbloat.= net> wrote:
> >>>>>> >>>>
> >>>>>> >>>> Hey,
> >>>>>> >>>>
> >>>>>> >>>> My current (unfinished) prog= ress on this is now available here: htt= ps://github.com/thebracket/cpumap-pping-hackjob
> >>>>>> >>>>
> >>>>>> >>>> I mean it about the warnings= , this isn't at all stable, debugged - and can't promise that it wo= n't unleash the nasal demons
> >>>>>> >>>> (to use a popular C++ phrase= ). The name is descriptive! ;-)
> >>>>>> >>>>
> >>>>>> >>>> With that said, I'm pret= ty happy so far:
> >>>>>> >>>>
> >>>>>> >>>> * It runs only on the classi= fier - which xdp-cpumap-tc has nicely shunted onto a dedicated CPU. It has = to run on both
> >>>>>> >>>>=C2=A0 =C2=A0the inbound and = outbound classifiers, since otherwise it would only see half the conversati= on.
> >>>>>> >>>> * It does assume that your i= ngress and egress CPUs are mapped to the same interface; I do that anyway i= n BracketQoS. Not doing
> >>>>>> >>>>=C2=A0 =C2=A0that opens up a = potential world of pain, since writes to the shared maps would require a lo= cking scheme. Too much locking, and you lose all of the benefit of using mu= ltiple CPUs to begin with.
> >>>>>> >>>> * It is pretty wasteful of R= AM, but most of the shaper systems I've worked with have lots of it. > >>>>>> >>>> * I've been gradually re= moving features that I don't want for BracketQoS. A hypothetical future= "useful to everyone" version wouldn't do that.
> >>>>>> >>>> * Rate limiting is working, = but I removed the requirement for a shared configuration provided from user= land - so right now it's always set to report at 1 second intervals per= stream.
> >>>>>> >>>>
> >>>>>> >>>> My testbed is currently 3 Hy= per-V VMs - a simple "client" and "world", and a "= shaper" VM in between running a slightly hacked-up LibreQoS.
> >>>>>> >>>> iperf from "client"= ; to "world" (with Libre set to allow 10gbit/s max, via a cake/HT= B queue setup) is around 5 gbit/s at present, on my
> >>>>>> >>>> test PC (the host is a core = i7, 12th gen, 12 cores - 64gb RAM and fast SSDs)
> >>>>>> >>>>
> >>>>>> >>>> Output currently consists of= debug messages reading:
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0515.399222: bpf_trace_printk: (tc)= Flow open event
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0515.399239: bpf_trace_printk: (tc)= Send performance event (5,1), 374696
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0515.399466: bpf_trace_printk: (tc)= Flow open event
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0515.399475: bpf_trace_printk: (tc)= Send performance event (5,1), 247069
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0516.405151: bpf_trace_printk: (tc)= Send performance event (5,1), 5217155
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0517.405248: bpf_trace_printk: (tc)= Send performance event (5,1), 4515394
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0518.406117: bpf_trace_printk: (tc)= Send performance event (5,1), 4481289
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0519.406255: bpf_trace_printk: (tc)= Send performance event (5,1), 4255268
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0520.407864: bpf_trace_printk: (tc)= Send performance event (5,1), 5249493
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0521.406664: bpf_trace_printk: (tc)= Send performance event (5,1), 3795993
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0522.407469: bpf_trace_printk: (tc)= Send performance event (5,1), 3949519
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0523.408126: bpf_trace_printk: (tc)= Send performance event (5,1), 4365335
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0524.408929: bpf_trace_printk: (tc)= Send performance event (5,1), 4154910
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0525.410048: bpf_trace_printk: (tc)= Send performance event (5,1), 4405582
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0525.434080: bpf_trace_printk: (tc)= Send flow event
> >>>>>> >>>>=C2=A0 =C2=A0cpumap/0/map:4-1= 371=C2=A0 =C2=A0 [000] D..2.=C2=A0 =C2=A0525.482714: bpf_trace_printk: (tc)= Send flow event
> >>>>>> >>>>
> >>>>>> >>>> The times haven't been t= weaked yet. The (5,1) is tc handle major/minor, allocated by the xdp-cpumap= parent.
> >>>>>> >>>> I get pretty low latency bet= ween VMs; I'll set up a test with some real-world data very soon.
> >>>>>> >>>>
> >>>>>> >>>> I plan to keep hacking away,= but feel free to take a peek.
> >>>>>> >>>>
> >>>>>> >>>> Thanks,
> >>>>>> >>>> Herbert
> >>>>>> >>>>
> >>>>>> >>>> On Mon, Oct 17, 2022 at 10:1= 4 AM Simon Sundberg <Simon.Sundberg@kau.se> wrote:
> >>>>>> >>>>>
> >>>>>> >>>>> Hi, thanks for adding me= to the conversation. Just a couple of quick
> >>>>>> >>>>> notes.
> >>>>>> >>>>>
> >>>>>> >>>>> On Mon, 2022-10-17 at 16= :13 +0200, Toke H=C3=B8iland-J=C3=B8rgensen wrote:
> >>>>>> >>>>> > [ Adding Simon to C= c ]
> >>>>>> >>>>> >
> >>>>>> >>>>> > Herbert Wolverson v= ia LibreQoS <libreqos@lists.bufferbloat.net> writes:<= br> > >>>>>> >>>>> >
> >>>>>> >>>>> > > Hey,
> >>>>>> >>>>> > >
> >>>>>> >>>>> > > I've had s= ome pretty good success with merging xdp-pping (
> >>>>>> >>>>> > > https://github.com/xdp-project/bpf= -examples/blob/master/pping/pping.h )
> >>>>>> >>>>> > > into xdp-cpuma= p-tc ( https://github.com/xdp-project/xdp-cpu= map-tc ).
> >>>>>> >>>>> > >
> >>>>>> >>>>> > > I ported over = most of the xdp-pping code, and then changed the entry point
> >>>>>> >>>>> > > and packet par= sing code to make use of the work already done in
> >>>>>> >>>>> > > xdp-cpumap-tc = (it's already parsed a big chunk of the packet, no need to do
> >>>>>> >>>>> > > it twice). The= n I switched the maps to per-cpu maps, and had to pin them -
> >>>>>> >>>>> > > otherwise the = two tc instances don't properly share data.
> >>>>>> >>>>> > >
> >>>>>> >>>>>
> >>>>>> >>>>> I guess the xdp-cpumap-t= c ensures that the same flow is processed on
> >>>>>> >>>>> the same CPU core at bot= h ingress or egress. Otherwise, if a flow may
> >>>>>> >>>>> be processed by differen= t cores on ingress and egress the per-CPU maps
> >>>>>> >>>>> will not really work rel= iably as each core will have a different view
> >>>>>> >>>>> on the state of the flow= , if there's been a previous packet with a
> >>>>>> >>>>> certain TSval from that = flow etc.
> >>>>>> >>>>>
> >>>>>> >>>>> Furthermore, if a flow i= s always processed on the same core (on both
> >>>>>> >>>>> ingress and egress) I th= ink per-CPU maps may be a bit wasteful on
> >>>>>> >>>>> memory. From my understa= nding the keys for per-CPU maps are still
> >>>>>> >>>>> shared across all CPUs, = it's just that each CPU gets its own value. So
> >>>>>> >>>>> all CPUs will then have = their own data for each flow, but it's only the
> >>>>>> >>>>> CPU processing the flow = that will have any relevant data for the flow
> >>>>>> >>>>> while the remaining CPUs= will just have an empty state for that flow.
> >>>>>> >>>>> Under the same assumptio= n that packets within the same flow are always
> >>>>>> >>>>> processed on the same co= re there should generally not be any
> >>>>>> >>>>> concurrency issues with = having a global (non-per-CPU) either as packets
> >>>>>> >>>>> from the same flow canno= t be processed concurrently then (and thus no
> >>>>>> >>>>> concurrent access to the= same value in the map). I am however still
> >>>>>> >>>>> very unclear on if there= 's any considerable performance impact between
> >>>>>> >>>>> global and per-CPU map v= ersions if the same key is not accessed
> >>>>>> >>>>> concurrently.
> >>>>>> >>>>>
> >>>>>> >>>>> > > Right now, out= put
> >>>>>> >>>>> > > is just stubbe= d - I've still got to port the perfmap output code. Instead,
> >>>>>> >>>>> > > I'm dumpin= g a bunch of extra data to the kernel debug pipe, so I can see
> >>>>>> >>>>> > > roughly what t= he output would look like.
> >>>>>> >>>>> > >
> >>>>>> >>>>> > > With debug ena= bled and just logging I'm now getting about 4.9 Gbits/sec on
> >>>>>> >>>>> > > single-stream = iperf between two VMs (with a shaper VM in the middle). :-)
> >>>>>> >>>>> >
> >>>>>> >>>>> > Just FYI, that &quo= t;just logging" is probably the biggest source of
> >>>>>> >>>>> > overhead, then. Wha= t Simon found was that sending the data from kernel
> >>>>>> >>>>> > to userspace is one= of the most expensive bits of epping, at least when
> >>>>>> >>>>> > the number of data = points goes up (which is does as additional flows are
> >>>>>> >>>>> > added).
> >>>>>> >>>>>
> >>>>>> >>>>> Yhea, reporting individu= al RTTs when there's lots of them (you may get
> >>>>>> >>>>> upwards of 1000 RTTs/s p= er flow) is not only problematic in terms of
> >>>>>> >>>>> direct overhead from the= tool itself, but also becomes demanding for
> >>>>>> >>>>> whatever you use all tho= se RTT samples for (i.e. need to log, parse,
> >>>>>> >>>>> analyze etc. a very larg= e amount of RTTs). One way to deal with that is
> >>>>>> >>>>> of course to just apply = some sort of sampling (the -r/--rate-limit and
> >>>>>> >>>>> -R/--rtt-rate
> >>>>>> >>>>> >
> >>>>>> >>>>> > > So my question= : how would you prefer to receive this data? I'll have to
> >>>>>> >>>>> > > write a daemon= that provides userspace control (periodic cleanup as well as
> >>>>>> >>>>> > > reading the pe= rformance stream), so the world's kinda our oyster. I can
> >>>>>> >>>>> > > stick to Kathi= e's original format (and dump it to a named pipe, perhaps?),
> >>>>>> >>>>> > > a condensed fo= rmat that only shows what you want to use, an efficient
> >>>>>> >>>>> > > binary format = if you feel like parsing that...
> >>>>>> >>>>> >
> >>>>>> >>>>> > It would be great i= f we could combine efforts a bit here so we don't
> >>>>>> >>>>> > fork the codebase m= ore than we have to. I.e., if "upstream" epping and
> >>>>>> >>>>> > whatever daemon you= end up writing can agree on data format etc that
> >>>>>> >>>>> > would be fantastic!= Added Simon to Cc to facilitate this :)
> >>>>>> >>>>> >
> >>>>>> >>>>> > Briefly what I'= ve discussed before with Simon was to have the ability to
> >>>>>> >>>>> > aggregate the metri= cs in the kernel (WiP PR [0]) and have a userspace
> >>>>>> >>>>> > utility periodicall= y pull them out. What we discussed was doing this
> >>>>>> >>>>> > using an LPM map (w= hich is not in that PR yet). The idea would be that
> >>>>>> >>>>> > userspace would pop= ulate the LPM map with the keys (prefixes) they
> >>>>>> >>>>> > wanted statistics f= or (in LibreQOS context that could be one key per
> >>>>>> >>>>> > customer, for insta= nce). Epping would then do a map lookup into the LPM,
> >>>>>> >>>>> > and if it gets a ma= tch it would update the statistics in that map entry
> >>>>>> >>>>> > (keeping a histogra= m of latency values seen, basically). Simon's PR
> >>>>>> >>>>> > below uses this tec= hnique where userspace will "reset" the histogram
> >>>>>> >>>>> > every time it loads= it by swapping out two different map entries when it
> >>>>>> >>>>> > does a read; this a= llows you to control the sampling rate from
> >>>>>> >>>>> > userspace, and you&= #39;ll just get the data since the last time you polled.
> >>>>>> >>>>>
> >>>>>> >>>>> Thank's Toke for sum= marzing both the current state and the plan going
> >>>>>> >>>>> forward. I will just not= e that this PR (and all my other work with
> >>>>>> >>>>> ePPing/BPF-PPing/XDP-PPi= ng/I-suck-at-names-PPing) will be more or less
> >>>>>> >>>>> on hold for a couple of = weeks right now as I'm trying to finish up a
> >>>>>> >>>>> paper.
> >>>>>> >>>>>
> >>>>>> >>>>> > I was thinking that= if we all can agree on the map format, then your
> >>>>>> >>>>> > polling daemon coul= d be one userspace "client" for that, and the epping
> >>>>>> >>>>> > binary itself could= be another; but we could keep compatibility between
> >>>>>> >>>>> > the two, so we don&= #39;t duplicate effort.
> >>>>>> >>>>> >
> >>>>>> >>>>> > Similarly, refactor= ing of the epping code itself so it can be plugged
> >>>>>> >>>>> > into the cpumap-tc = code would be a good goal...
> >>>>>> >>>>>
> >>>>>> >>>>> Should probably do that.= ..at some point. In general I think it's a bit
> >>>>>> >>>>> of an interesting proble= m to think about how to chain multiple XDP/tc
> >>>>>> >>>>> programs together in an = efficent way. Most XDP and tc programs will do
> >>>>>> >>>>> some amount of packet pa= rsing and when you have many chained programs
> >>>>>> >>>>> parsing the same packets= this obviously becomes a bit wasteful. In the
> >>>>>> >>>>> same time it would be ni= ce if one didn't need to manually merge
> >>>>>> >>>>> multiple programs togeth= er into a single one like this to get rid of
> >>>>>> >>>>> this duplicated parsing,= or at least make that process of merging those
> >>>>>> >>>>> programs as simple as po= ssible.
> >>>>>> >>>>>
> >>>>>> >>>>>
> >>>>>> >>>>> > -Toke
> >>>>>> >>>>> >
> >>>>>> >>>>> > [0] https://github.com/xdp-project/bpf-examples/pull/59
> >>>>>> >>>>>
> >>>>>> >>>>> N=C3=A4r du skickar e-po= st till Karlstads universitet behandlar vi dina personuppgifter<
https://www.kau.se/gdpr>.
> >>>>>> >>>>> When you send an e-mail = to Karlstad University, we will process your personal data<h= ttps://www.kau.se/en/gdpr>.
> >>>>>> >>>>
> >>>>>> >>>> ____________________________= ___________________
> >>>>>> >>>> LibreQoS mailing list
> >>>>>> >>>> LibreQoS@lists.b= ufferbloat.net
> >>>>>> >>>> https://lists.bufferbloat.net/listinfo/libreqos
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>> --
> >>>>>> >>> Robert Chac=C3=B3n
> >>>>>> >>> CEO | JackRabbit Wireless LLC > >>>>>> >
> >>>>>> > ________________________________________= _______
> >>>>>> > LibreQoS mailing list
> >>>>>> > LibreQoS@lists.bufferbloat.n= et
> >>>>>> > https:/= /lists.bufferbloat.net/listinfo/libreqos
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> This song goes out to all the folk that thoug= ht Stadia would work:
> >>>>>> https://www.linkedin.com/posts/dtaht_the-mushr= oom-song-activity-6981366665607352320-FXtz
> >>>>>> Dave T=C3=A4ht CEO, TekLibre, LLC
> >>>>
> >>>> _______________________________________________
> >>>> LibreQoS mailing list
> >>>> LibreQoS@lists.bufferbloat.net
> >>>> https://lists.buffer= bloat.net/listinfo/libreqos
> >>>
> >>>
> >>>
> >>> --
> >>> Robert Chac=C3=B3n
> >>> CEO | JackRabbit Wireless LLC
> >>> _______________________________________________
> >>> LibreQoS mailing list
> >>> LibreQoS@lists.bufferbloat.net
> >>> https://lists.bufferbloa= t.net/listinfo/libreqos
> >
> > _______________________________________________
> > LibreQoS mailing list
> > LibreQoS@lists.bufferbloat.net
> > https://lists.bufferbloat.net/= listinfo/libreqos
>
>
>
> --
> This song goes out to all the folk that thought Stadia would work:
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366= 665607352320-FXtz
> Dave T=C3=A4ht CEO, TekLibre, LLC



--
This song goes out to all the folk that thought Stadia would work:
h= ttps://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-698136666560= 7352320-FXtz
Dave T=C3=A4ht CEO, TekLibre, LLC
_______________________________________________
LibreQoS mailing list
LibreQoS@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/lib= reqos
--00000000000047775205eba0a443--