[LibreQoS] Integration system, aka fun with graph theory

Tue Nov 1 09:38:51 EDT 2022

Dave: in this case, I'm running inside the eBPF VM - so I'm already in
kernel space, but have a very limited set of functions available.
bpf_ktime_get_ns() seems to be the approved way to get the clock. There was
a big debate that it uses the kernel's monotonic clock, which takes longer
to sample. I'm guessing they improved that, because I'm not seeing the
delay that some people were complaining about (it's not free, but it's also
a *lot* faster than the estimates I was finding).

> > preseems numbers ar -074 green, 75-124 yellow, 125-200 red, and they
just consolidate everything >200 to 200, basically so there's no 'terrible'
color lol.
> I am sorry to hear those numbers are considered to be good.

It's interesting that you see adverts on Wisp Talk (the FB group) showing
"wow, half my APs are now green!" (and showing about 50% green, 25% yellow,
25% red). When we had Preseem, we always took "red" to mean "oh no,
something's really wrong" - and got to work fixing it. There were a couple
of distant (many hops down the chain) APs that struggled to stay yellow,
but red was always a sign for battle stations. I think that's part of why
WISPs suffer from "jump ship as soon as something better comes along" - I'd
be jumping ship too, if my ISP expected me to "enjoy" 125-200 ms RTT
latency for any extended period of time (I'm pretty understanding about
"something went wrong, we're working on it").

Geography does play a large part. I'll see if I can resurrect a tool I had
that turned RTT latency measurements into a Google Maps heatmap overlay
(updating, so you could see the orange/red areas moving when the network
suffered). It can be pretty tough to find a good upstream far from towns,
which affects everything. But more, deep chains of backhauls add up - and
add up fast if you have any sort of congestion issue along the way. For
example:

   - We have a pretty decently connected upstream, averaging 8ms ping
   round-trip time to Cloudflare's DNS.
   - Going down our "hottest" path (60 ghz AF60 LR to a tower, and then
   another one to a 3,000 bed apartment complex - peaks at 900 mbit/s every
   night; will peak at a lot more than that as soon as their check clears for
   some Siklu gear), we worked *stupidly hard* to keep the average ping
   time there at 9ms to Cloudflare's DNS. Even then, it's closer to 16ms when
   fully loaded. They are a topic for a future Cake discussion. :-)
   - We have a few clients connected directly off of the facility with the
   upstream - and they all get great RTT times (a mix of 5.8 and 3.6 CBRS;
   Wave coming as soon as it's in stock at the same time as the guy with the
   money being at a keyboard!).
   - Our largest (by # of customers) tower is 11 miles away, currently fed
   by 2 AirFiber 5XHD (ECMP balanced). We've worked really hard to keep that
   tower's average ping time to Cloudflare at 18ms. We have some nicer radios
   (the Cambium 400C is a beast) going in soon, which should help.
      - That tower feeds 4 micro-pops. The worst is near line-of-sight
      (trees) on a 3.6 ghz Medusa. It suffers a bit at 33ms round-trip
ping times
      to Cloudflare. The best averages 22ms ping times to Cloudflare.
   - We have a bunch more sites behind a 13 mile backhaul hop (followed by
   a 3 mile backhaul hop; geography meant going around a tree-covered ridge).
   We've had a heck of time getting that up to scratch; AF5XHD kinda worked,
   but the experience was pretty wretched. They were the testbed for the
   Cambium 400C, and now average 22ms to Cloudflare.
      - There's 15 (!) small towers behind that one! We eventually got the
      most distant one to 35ms to Cloudflare pings - but
ripped/replaced SO much
      hardware to get there. (Even then, customer experience at some of those
      sites isn't what I'd like; I just tried a ping test from a
customer running
      a 2.4 ghz "elevated" Ubiquiti dish to an old ePMP 1000 - at a
tower 5 hops
      in. 45-50ms to Cloudflare. Not great.

Physics dictates that the tiny towers, separated from the core by miles of
backhaul and hops between them aren't going to perform as well as the
nearby ones. You *can* get them going well, but it's expensive and time
consuming.

One thing Preseem does pretty well is show daily reports in brightly
colored bars, which "gamifies" fixing the issue. If you have any gamers on
staff, they start to obsess with turning everything green. It's great. :-)

The other thing I keep running into is network management. A few years ago,
we bought a WISP with 20 towers and a few hundred customers (it was a
friendly "I'm getting too unwell to keep doing this" purchase). The guy who
set it up was pretty amazing; he had no networking experience whatsoever,
but was pretty good at building things. So he'd built most of the towers
himself, purely because he wanted to get better service out to some *very*
rural parts of Missouri (including a whole bunch of non-profits and
churches, which is our largest market). While it's impressive what he
pulled off, he'd still just lost 200 customers to an electric coop's fiber
build-out. His construction skills were awesome; his network skills - not
so much. He had 1 public IP, connected to a 100mbit/s connection at his
house. Every single tower (over a 60 mile spread) was connected to exactly
one other tower. Every tower had backhauls in bridge mode, connected to a
(netgear consumer) switch at the tower. Every AP (all of them 2.4ghz Bullet
M2) was in bridge mode with client isolation turned off, connected to an
assortment of CPES (mostly Airgrid M2) - also in bridge mode. No DHCP, he
had every customer type in their 192.168.x.y address (he had the whole /16
setup on the one link; no VLANs). Speed limits were set by turning on
traffic shaping on the M2 CPEs... and he wondered why latency sometimes
resembled remote control of a Mars rover, or parts of the network would
randomly die when somebody accidentally plugged their net connection into
their router's LAN port. A couple of customers had foregone routers
altogether, and you could see their Windows networking broadcasts
traversing the network! I wish I could say that was unusual, but I've
helped a handful of WISPs in similar situations.

One of the first things we did was get Preseem running (after adding every
client into UNMS as it was called then). That made a big difference, and
gave good visibility into how bad it was. Then it was a long process of
breaking the network down into routed chunks, enabling DHCP, replacing
backhauls (there were a bunch of times when towers were connected in the
order they were constructed, and never connected to a new tower a mile away
- but 20 miles down the chain), switching out bullets, etc. Eventually,
it's a great network - and growing again. I'm not sure we could've done
that without a) great visibility from monitoring platforms, and b) decades
of experience between us.

Longer-term, I'm hoping that we can help networks like that one. Great
shaping and visibility go a *long* way. Building up some "best practices"
and offering advice can go a *really long* way. (And good mapping makes a
big difference; I'm not all that far from releasing a generally usable
version of my LiDAR mapping suite, an ancient version is here -
https://github.com/thebracket/rf-signals ;  You can get LiDAR data for
about 2/3 of the US for free, now. ).

On Mon, Oct 31, 2022 at 10:32 PM Dave Taht <dave.taht at gmail.com> wrote:

> Calling rdtsc directly used to be even faster than gettimeofday
>
> https://github.com/dtaht/libv6/blob/master/erm/includes/get_cycles.h
>
> On Mon, Oct 31, 2022 at 2:20 PM Herbert Wolverson via LibreQoS
> <libreqos at lists.bufferbloat.net> wrote:
> >
> > I'd agree with color coding (when it exists - no rush, IMO) being
> configurable.
> >
> > From the "how much delay are we adding" discussion earlier, I thought
> I'd do a little bit of profiling of the BPF programs themselves. This is
> with the latest round of performance updates (
> https://github.com/thebracket/cpumap-pping/issues/2), so it's not
> measuring anything in production. I simply added a call to get the clock at
> the start, and again at the end - and log the difference. Measuring both
> XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap
> sends it to the right CPU)->(egress)->(TC sends it to the right classifier,
> on the correct CPU and measures RTT latency). This is adding about two
> clock checks and a debug log entry to execution time, so measuring it is
> slowing it down.
> >
> > The results are interesting, and mostly tell me to try a different
> measurement system. I'm seeing a pretty wide variance. Hammering it with an
> iperf session and a queue capped at 5 gbit/s: most of the TC timings were
> 40 nanoseconds - not a packet that requires extra tracking, already in
> cache, so proceed. When the TCP RTT tracker fired and recorded a
> performance event, it peaked at 5,900 nanoseconds. So the tc xdp program
> seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side
> of things is typically in the 300-400 nanosecond range, I saw a handful of
> worst-case numbers in the 3400 nanosecond range. So the XDP side is adding
> 0.00349 ms. So - assuming worst case (and keeping the overhead added by the
> not-so-great monitoring), we're adding 0.0093 ms to packet transit time
> with the BPF programs.
> >
> > With a much more sedate queue (ceiling 500 mbit/s), I saw much more
> consistent numbers. The vast majority of XDP timings were in the 75-150
> nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't
> have an update to perform - peaking very occasionally at 1500 nanoseconds.
> Only adding 0.00155 ms to packet times is pretty good.
> >
> > It definitely performs best on long streams, probably because the
> previous lookups are all in cache. This is also making me question the
> answer I found to "how long does it take to read the clock?" I'd seen
> ballpark estimates of 53 nanoseconds. Given that this reads the clock
> twice, that can't be right. (I'm *really* not sure how to measure that one)
> >
> > Again - not a great test (I'll have to learn the perf system to do this
> properly - which in turn opens up the potential for flame graphs and some
> proper tracing). Interesting ballpark, though.
> >
> > On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson at gmail.com> wrote:
> >>
> >>
> >>
> >> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <
> libreqos at lists.bufferbloat.net> wrote:
> >>>
> >>> How about the idea of "metaverse-ready" metrics, with one table that
> is preseem-like and another that's
> >>>
> >>> blue =  < 8ms
> >>> green = < 20ms
> >>> yellow = < 50ms
> >>> orange  = < 70ms
> >>> red = > 70ms
> >>
> >>
> >> These need configurable.  There are a lot of wisps that would have
> everything orange/red.  We're considering anything under 100ms good on the
> rural plans.   Also keep in mind that if you're tracking latence via pping
> etc, then you need some buffer in there for the internet at large.  <70ms
> to Amazon is one thing, they're very well connected, but <70ms to most of
> the internet isn't probably very realistic and would make most charts look
> like poop.
> >
> > _______________________________________________
> > LibreQoS mailing list
> > LibreQoS at lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/libreqos
>
>
>
> --
> This song goes out to all the folk that thought Stadia would work:
>
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221101/d3387907/attachment-0001.html>