Dave: in this case, I'm running inside the eBPF VM - so I'm already in kernel space, but have a very limited set of functions available. bpf_ktime_get_ns() seems to be the approved way to get the clock. There was a big debate that it uses the kernel's monotonic clock, which takes longer to sample. I'm guessing they improved that, because I'm not seeing the delay that some people were complaining about (it's not free, but it's also a *lot* faster than the estimates I was finding).
> > preseems numbers ar -074 green, 75-124 yellow,
125-200 red, and they just consolidate everything >200 to 200,
basically so there's no 'terrible' color lol.
> I am sorry to hear those numbers are considered to be good.
It's interesting that you see adverts on Wisp Talk (the FB group) showing "wow, half my APs are now green!" (and showing about 50% green, 25% yellow, 25% red). When we had Preseem, we always took "red" to mean "oh no, something's really wrong" - and got to work fixing it. There were a couple of distant (many hops down the chain) APs that struggled to stay yellow, but red was always a sign for battle stations. I think that's part of why WISPs suffer from "jump ship as soon as something better comes along" - I'd be jumping ship too, if my ISP expected me to "enjoy" 125-200 ms RTT latency for any extended period of time (I'm pretty understanding about "something went wrong, we're working on it").
Geography does play a large part. I'll see if I can resurrect a tool I had that turned RTT latency measurements into a Google Maps heatmap overlay (updating, so you could see the orange/red areas moving when the network suffered). It can be pretty tough to find a good upstream far from towns, which affects everything. But more, deep chains of backhauls add up - and add up fast if you have any sort of congestion issue along the way. For example:
- We have a pretty decently connected upstream, averaging 8ms ping round-trip time to Cloudflare's DNS.
- Going down our "hottest" path (60 ghz AF60 LR to a tower, and then another one to a 3,000 bed apartment complex - peaks at 900 mbit/s every night; will peak at a lot more than that as soon as their check clears for some Siklu gear), we worked stupidly hard to keep the average ping time there at 9ms to Cloudflare's DNS. Even then, it's closer to 16ms when fully loaded. They are a topic for a future Cake discussion. :-)
- We have a few clients connected directly off of the facility with the upstream - and they all get great RTT times (a mix of 5.8 and 3.6 CBRS; Wave coming as soon as it's in stock at the same time as the guy with the money being at a keyboard!).
- Our largest (by # of customers) tower is 11 miles away, currently fed by 2 AirFiber 5XHD (ECMP balanced). We've worked really hard to keep that tower's average ping time to Cloudflare at 18ms. We have some nicer radios (the Cambium 400C is a beast) going in soon, which should help.
- That tower feeds 4 micro-pops. The worst is near line-of-sight (trees) on a 3.6 ghz Medusa. It suffers a bit at 33ms round-trip ping times to Cloudflare. The best averages 22ms ping times to Cloudflare.
- We have a bunch more sites behind a 13 mile backhaul hop (followed by a 3 mile backhaul hop; geography meant going around a tree-covered ridge). We've had a heck of time getting that up to scratch; AF5XHD kinda worked, but the experience was pretty wretched. They were the testbed for the Cambium 400C, and now average 22ms to Cloudflare.
- There's 15 (!) small towers behind that one! We eventually got the most distant one to 35ms to Cloudflare pings - but ripped/replaced SO much hardware to get there. (Even then, customer experience at some of those sites isn't what I'd like; I just tried a ping test from a customer running a 2.4 ghz "elevated" Ubiquiti dish to an old ePMP 1000 - at a tower 5 hops in. 45-50ms to Cloudflare. Not great.
Physics dictates that the tiny towers, separated from the core by miles of backhaul and hops between them aren't going to perform as well as the nearby ones. You can get them going well, but it's expensive and time consuming.
One thing Preseem does pretty well is show daily reports in brightly colored bars, which "gamifies" fixing the issue. If you have any gamers on staff, they start to obsess with turning everything green. It's great. :-)
The other thing I keep running into is network management. A few years ago, we bought a WISP with 20 towers and a few hundred customers (it was a friendly "I'm getting too unwell to keep doing this" purchase). The guy who set it up was pretty amazing; he had no networking experience whatsoever, but was pretty good at building things. So he'd built most of the towers himself, purely because he wanted to get better service out to some *very* rural parts of Missouri (including a whole bunch of non-profits and churches, which is our largest market). While it's impressive what he pulled off, he'd still just lost 200 customers to an electric coop's fiber build-out. His construction skills were awesome; his network skills - not so much. He had 1 public IP, connected to a 100mbit/s connection at his house. Every single tower (over a 60 mile spread) was connected to exactly one other tower. Every tower had backhauls in bridge mode, connected to a (netgear consumer) switch at the tower. Every AP (all of them 2.4ghz Bullet M2) was in bridge mode with client isolation turned off, connected to an assortment of CPES (mostly Airgrid M2) - also in bridge mode. No DHCP, he had every customer type in their 192.168.x.y address (he had the whole /16 setup on the one link; no VLANs). Speed limits were set by turning on traffic shaping on the M2 CPEs... and he wondered why latency sometimes resembled remote control of a Mars rover, or parts of the network would randomly die when somebody accidentally plugged their net connection into their router's LAN port. A couple of customers had foregone routers altogether, and you could see their Windows networking broadcasts traversing the network! I wish I could say that was unusual, but I've helped a handful of WISPs in similar situations.
One of the first things we did was get Preseem running (after adding every client into UNMS as it was called then). That made a big difference, and gave good visibility into how bad it was. Then it was a long process of breaking the network down into routed chunks, enabling DHCP, replacing backhauls (there were a bunch of times when towers were connected in the order they were constructed, and never connected to a new tower a mile away - but 20 miles down the chain), switching out bullets, etc. Eventually, it's a great network - and growing again. I'm not sure we could've done that without a) great visibility from monitoring platforms, and b) decades of experience between us.
Longer-term, I'm hoping that we can help networks like that one. Great shaping and visibility go a
long way. Building up some "best practices" and offering advice can go a
really long way. (And good mapping makes a big difference; I'm not all that far from releasing a generally usable version of my LiDAR mapping suite, an ancient version is here -
https://github.com/thebracket/rf-signals ;
You can get LiDAR data for about 2/3 of the US for free, now.
).