[Bloat] [Starlink] [Rpm] [LibreQoS] [EXTERNAL] Re: Researchers Seeking Probe Volunteers in USA

Mon Mar 13 14:14:14 EDT 2023

Hi Dan,

> On Mar 13, 2023, at 18:26, dan <dandenson at gmail.com> wrote:
> 
> On Mon, Mar 13, 2023 at 10:36 AM Sebastian Moeller <moeller0 at gmx.de> wrote:
>> 
>> Hi Dan,
>> 
>> 
>>> On Mar 13, 2023, at 17:12, dan <dandenson at gmail.com> wrote:
>>> ...
>>> 
>>> High water mark on their router.
>> 
>>        [SM] Nope, my router is connected to my (bridged) modem via gigabit ethernet, with out a traffic shaper there is never going to be any noticeable water mark on the router side... sure the modem will built up a queue, but alas it does not expose the length of that DSL queue to me... A high water mark on my traffic shaped router informs me about my shaper setting (which I already know, after al I set it) but little about the capacity over the bottleneck link. And we are still talking about the easy egress direction, in the download direction Jeremy's question aplied is the achieved thoughput I measure limited by the link's capacity of are there simply not enoiugh packet available/sent to fill the pipe.
>> 
> 
> And yet it can still see the flow of data on it's ports.  The queue is
> irelevant to the measurement of data across a port.

	I respectfully disagree, if say, my modem had a 4 GB queue I could theoretically burst ~4GB worth of data at line rate into that buffer without learning anything about the the modem-link capacity.

>  turn off the
> shaper and run anything.  run your speed test.  don't look at the
> speed test results, just use it to generate some traffic.  you'll find
> your peak and where you hit the buffers on the DSL modem by measuring
> on the interface and measuring latency.  

	Peak of what? Exactly? The peak sending rate of my router is well known, its 1 Gbps gross ethernet rate...

> That speed test isn't giving
> you this data and more than Disney+, other than you get to pick when
> it runs.

	Hrm, no we sre back at actually saturating the link, 

> 
>>> Highwater mark on our CPE, on our
>>> shaper, etc.  Modern services are very happy to burst traffic.
>> 
>>        [SM] Yes, this is also one of the readons, why too-little-buffering is problematic, I like the Nichols/Jacobsen analogy of buffers as shiock (burst) absorbers.
>> 
>>> Nearly
>>> every customer we have will hit the top of their service place each
>>> day, even if only briefly and even if their average usage is quite
>>> low.  Customers on 600Mbps mmwave services have a usage charge that is
>>> flat lines and ~600Mbps blips.
>> 
>>        [SM] Fully agree. most links are essentially idle most of the time, but that does not answer what instantaneous capacity is actually available, no?
> 
> yes, because most services burst.  That Roku Ultra or Apple TV is
> going to running a 'speed test' every time it goes to fill it's
> buffer.

	[SM] not really, given enough capacity, typical streaming protocols will actually not hit the ceiling, at least the one's I look at every now and then tend to stay well below actual capacity of the link.

>  Windows and Apple updates are asking for everything.  Again,
> I'm measuring even the lowly grandma's house as consuming the entire
> connection for a few seconds before it sits idle for a minute.  That
> instantaneous capacity is getting used up so long as there is a
> device/connection on the network capable of using it up.

	[SM] But my problem is that on variable rate links I want to measure the instantaneous capacity such that I can do adaptive admission control and avpid over filling my modem's DSL buffers (I wish they would do something like BQL, but alas they don't).

> 
>> 
>>> 
>>> "  [SM] No ISP I know of publishes which periods are low, mid, high
>>> congestion so end-users will need to make some assumptions here (e.g.
>>> by looking at per day load graphs of big traffic exchanges like DE-CIX
>>> here https://www.de-cix.net/en/locations/frankfurt/statistics )"
>>> 
>>> You read this wrong.  Consumer routers run their daily speeds tests in
>>> the middle of the night.
>> 
>>        [SM] So on my turris omnia I run a speedtest roughly every 2 hours exactly so I get coverage through low and high demand epochs. The only consumer router I know that does repeated tests is the IQrouter, which as far as I know schedules them regularly so it can adjust the traffic shaper to still deliver acceptale responsiveness even during peak hour.
> 
> Consider this.   Customer under load, using their plan to the maximum,
> speed test fires up adding more constraint.  Speed test is a stress
> test, not a capacity test.

	[SM] With competent AQM (like cake on ingress and egress configured for per-internal-IP isolation) I do not even notice whether a speedtes runs or not, and from the reported capacity I can estimate the concurrent load from other endhosts in my network.

>  Speed test cannot return actual capacity
> because it's being used by other services AND the rest of the internet
> is in the way of accuracy as well, unless of course you prioritize the
> speed test and then you cause an effective outage or you're running a
> speed test on-net which isn't an 'internet' test, it's a network test.

	[SM] Conventional capcaity tests give a decent enough estimate of current capacity to be useful, I could not care less that they are potential not perfect, sorry. The question still is how to estimate capacity without loading the link...

> Guess what the only way to get an actual measure of the capacity is?
> my way.  measure what's passing the interface and measure what happens
> to a reliable latency test during that time.

	[SM] This is, respectfully, what we do in cake-autorate, but that requires an actual load and only accidentally detects the capacity, if a high enough load is sustained long enough to evoke a latency increase. But I knew that already, what you initially wrote sounded to me like you had a method to detect instantaneous capacity without needing to generate load. (BTW, in cake-autorate we do not generate an artificial load (only artificial/active latency probes) but use the organic user generated traffic as load generator*).

*) If all endhosts are idle we do not care much about the capacity, only if there is traffic, however the quicker we can estimate the capacity the tigher our controller can operate.

> 
>> 
>> 
>>> Eero at 3am for example.  Netgear 230-430am.
>> 
>>        [SM] That sounds"specisl" not a useless daa point per se, but of limited utility during normal usage times.
> 
> In practical terms, useless.  Like measuring how freeway congestion
> affects commutes at 3am.

	[SM] That is not "useless" sorry, it gives my a lower bound for my compute (or allows to estimate a lower duration of a transfer of a given size). But I agree it does little to inform me what to expect during peak hour.

> 
>> 
>>> THAT is a bad measurement of the experience the consumer will have.
>> 
>>        [SM] Sure, but it still gives a usable reference for "what is the best my ISP actually delivers" if if the odds are stacked in his direction.
> 
> ehh...... what the ISP could deliver if all other considerations are
> removed.

	[SM] No, this is still a test of the real existing network...

>  I mean, it'd be a synthetic test in any other scenario and
> the only reason it's not is because it's on real hardware.  I don't
> have a single subscriber on network that can't get 2-3x their plan
> speed at 3am if I opened up their shaper.  Very narrow use case here
> from a consumer point of view.   Eero runs speed tests at 3am every
> single day on a few hundred subs on my network and they look AMAZING
> every time.  no surprise.

	[SM] While I defend some utility for such a test on pronciple, I agree that if eero only runs a single test 3 AM is not the best time to do that, except for night owls.

> 
>> 
>>> It's essentially useless data for ...
>> 
>>        [SM] There is no single "service latency" it really depends on he specific network paths to the remote end and back. Unless you are talking about the latency over the access link only tere we have a single number but one of limited utility.
> 
> The intermediate hops are still useless to the consumer.  Only the
> latency to their door so to speak.  again, hop 2 to hop 3 on my
> network gives them absolutely nothing.

	[SM] I agree if these are mandatory hops I need to traverse every time, but if these are host I can potentially avoid then this changes, even though I am now trying to gsame my ISP to some degree which in the long run is a loosing proposition.

> 
>> 
>> 
>>> My (ISP) latency from hop 2 to 3 on the network has
>> ...> > hops are which because they'll hidden in a tunnel/MPLS/etc.
>> 
>>        [SM] Yes, end-users can do little, but not nothing, e.g. one can often work-around shitty peering by using a VPN to route one's packets into an AS that is both well connected with one's ISP as well as with one's remote ASs. And I accept your point of one-way testing, getting a remote site at the ight location to do e.g. reverse tracerputes mtrs is tricky (sometimes RIPE ATLAS can help) to impossible (like my ISP that does not offer even simple lookingglas servers at all)).
> 
> This is a REALLY narrow use case. Also, irrelevant.

	[SM] You would think, would you ;). However over here the T1 incumbent telco plays "peering games" and purposefully runs its transit links too hot so in primetime traffic coming from content providers via transit suffers. The telco's idea here is to incentivize these content providers to buy "transit" from that telco that happens to cost integer multiples of transit and hence will only ever be used to access this telco's customers if a content provider actually buys in.
As an end-user of that telco, I have three options:
a) switch content providers
b) switch ISP
c) route around my ISPs unfriendly peering

(Personally even though not directly affected by this I opted for b) and found a better connected yet still cheaper ISP).

I am not alone in this, actually a lot of gamers do something similar using gaming oriented VPN services. But then gamers are a bit like audiophiles to me, some of the things they do look like cargo-cult to me, but I digress/

>  Consumer can test
> to their target, to the datacenter, and datacenter to target and
> compare, and do that in reverse to get bi-directional latency.  

	[SM] I have been tought thst does not actually work as the true return path is essentially invisible without a probe for a reverse traceroute at the site of the remote server, no?

> per
> hop latency is still of zero benefit to them because they can't use
> that in any practical way.  

	[SM] And again I disagree, I can within reason diagnose congested path elements from an mtr... say if on a path across three AS that at best takes 10 milliseconds, I see at primetime that from the link between AS1 and AS2 all hops including the endpoint show an RTT of say 100ms, I can form the hypothsis that somewhere between AS1 and AS2 there is a undue queue build-up. Pratically this can mean I need to rpute my own traffic differentky, either by VPN, or by switching the used application content provider hoping to avoid the apparently congested link. What I can not do is fix the problem, that is true ;)

> Like 1 in maybe a few thousand consumers
> might be able to use this data to identify the slow hop and find a
> datacenter before that hop to route around it and they get about 75%
> of the way with a traditional trace router. and then of course they've
> added VPN overheads so are they really getting an improvement?

	[SM] In the german telco case peak rate to some datacenters/VPS (single-homed at cogent) dropped into the low Kbps range, while a VPN route-around returned that into the high double digit Mbps, so yes the improvement can be tangible. Again, my soluton to that possibility was to "vote with my feet" and change ISPs (a pity, because outside of that unpleasant peering/transit behaviour that telco is a pretty competent ISP; case in point the transit links run too hot, but are are competently managed to actually stay at the selected "temperature" and do not progress into to total overload territory.)

> I'm not saying that testing is bad in any way, I'm saying that 'speed
> tests' as they are generally understood in this industry are a bad
> method.  

	[SM] +1, with that I can agree. But I see some mild improvements, with e.g. Ookla reporting latency numbers from during the load phases. Sure the chosen measure inter-quartil mean, is sub-optimal, but infinitely better than hat they had before, no latecny under load numbers.

> Run 10 speed tests, get 10 results.  Run a speed test while
> netflix buffers, get a bad result.  Run a speed test from a weak wifi
> connection, get a bad result.  A tool that is inherently flawed
> because it's methodology is flawed is of no use to find the truth.

	[SM] For most end users speedtests are the one convenient way of generating saturating loads. BUt saturating loads by them selves are not that useful.

> 
> If you troubleshoot your ISP based on speed tests you will be chasing
> your tail.  

	My most recent attempt was with mtr traces to document packetloss only when "packed" into one specific IP range, and that packetloss happens even without load at night, so no speedtest required (I did run a few speed.cloudflare.com tests, but mainly because they contain a very simple and short packet loss test, that finishes a tad earlier than my go to 'mtr -ezbw -c 1000 IP_HERE' packet loss test ;) )

> Meanwhile, that internet facing interface can see the true
> numbers the entire time.

	[SM] Only averaged over time...

>  The consumer is pulling their full capacity
> on almost all links routinely even if briefly and can be nudged into
> pushing more a dozen ways (including a speed test).  The only thing
> lacking is a latency measurement of some sort.  Preseem and Libreqos's
> TCP measurements on the head end are awesome, but that's not available
> on the subscriber's side but if it were, there's the full testing
> suite.  how much peak data, what happened to latency.  If you could
> get data from the ISP's head end to diff you'd have internet vs isp
> latencies.    'speed test' is a stress test or a burn in test in
> effect.

	[SM] I still agree "speedtests" are misnames capacity tests and do have their value (e.g. over here the main determinant of internet access price is the headline capacity number) we even have an "official" capacity test blessed by the national regulatory agency that can be used to defend consumer rights against those ISP that over-promise but under-deliver (few people d though, as it happens if your ISP generally delivers acceptable throughput and generally is not a d*ck, people are fine with not caring all too much). On the last point I believe the more responsiveness an ISP link maintains under load the fewer people will get unhappy about their internet experience and without unhappyness most users I presime have better things to do than fight with their ISP. ;)

Regards
	Sebastian