[Starlink] [Rpm] [M-Lab-Discuss] misery metrics & consequences

Sebastian Moeller moeller0 at gmx.de
Sun Oct 23 09:52:57 EDT 2022


Hi David,


> On Oct 23, 2022, at 15:11, Dave Collier-Brown <dave.collier-Brown at indexexchange.com> wrote:
> 
> On 10/23/22 08:26, Sebastian Moeller wrote:
> 
>>         [SM] Kathy Nichols' pping (https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpollere%2Fpping&data=05%7C01%7C%7C678b8216944a43dba97508dab4f1c261%7Cb07c069022b843668d8d7b845d088e18%7C1%7C0%7C638021247687178024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LZLefdgL2yTM%2F5jOLeKLCRcWokWA4ox4Vs0RwYScmqg%3D&reserved=0) might be an option, either on the ISP side or run on CPEs with some method to harvest the collected data from the ISP side. 
> Yes: I use pping to investigate occasional problems at work, but I was thinking more about home networks, where some big speed-changes happen and local congestion happens.

	[SM] Okay. In the context of cake-autorate (https://github.com/lynxthecat/CAKE-autorate/blob/main/README.md) we implemented a flight recorder type logging that continuously logs the last X (configurable) epoch and stores bot shaper and achieved rates as well as the results from the latency probes. This script can be used with rate setting disabled to record relevant data and the user just needs to remember to export the data after experiencing interesting/abnormal events. Sure this does not have per application resolution, but should give some idea about current latency as well as current traffic. I will admit though that this logging is not exactly cheap CPU-wise and lacks the precision of packet captures... but it can be operated as flight recorder where relevant events can be exported/stored post-hoc...


> 
>> Protocols with less fields readable like QUIC would require special care to evaluate the spin-bit if that exists. Or just resort to active polling and ping* each CPE once per second or so (for a course resolution, you could increase the polling rate on detecting anomalies thereby risking to make congestion slightly worse). None of this will allow to measure within home network congestion though, but it might still be a wortwhile diagnostic to know that the access link is OK, while the user reports latency issues.
> If one has a good way to capture RTT and data rate for one problematic app, say zoom, then one could see that network problems were happening at the same time as lags and dropouts. 

	[SM] As above logging all traffic is relatively easy, per application or per flow will require different tools or packet captures...

> ISPs would positively hate that, of course.

	[SM] Assuming they come out of this looking bad, if the outcome is to imply the local WiFi being the root cause ISPs might actually appreciate it ;)

Regards
	Sebastian

> 
> --dave
> 
> 
> 
> 
>> 
>> Regards
>>         Sebastian
>> 
>> *) I think there are dedicated devices available that allow to ping large numbers of IPs in a periodic fashion.
>> 
>> 
>> 
>>> --dave
>>> 
>>> 
>>> 
>>> On 10/23/22 07:57, Sebastian Moeller via Starlink wrote:
>>> 
>>>> Hi Glenn,
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Oct 23, 2022, at 02:17, Glenn Fishbine via Rpm <rpm at lists.bufferbloat.net>
>>>>> 
>>>>>  wrote:
>>>>> 
>>>>> As a classic died in the wool empiricist, granted that you can identify "misery" factors, given a population of 1,000 users, how do you propose deriving a misery index for that population?
>>>>> 
>>>>> We can measure download, upload, ping, jitter pretty much without user intervention.  For the measurements you hypothesize, how you you automatically extract those indecies without subjective user contamination.
>>>>> 
>>>>> I.e.  my download speed sucks. Measure the download speed.
>>>>> 
>>>>> My isp doesn't fix my problem. Measure what? How?
>>>>> 
>>>>> Human survey technology is 70+ years old and it still has problems figuring out how to correlate opinion with fact.
>>>>> 
>>>>> Without an objective measurement scheme that doesn't require human interaction, the misery index is a cool hypothesis with no way to link to actual data.  What objective measurements can be made?  Answer that and the index becomes useful. Otherwise it's just consumer whining.
>>>>> 
>>>>> Not trying to be combative here, in fact I like the concept you support, but I'm hard pressed to see how the concept can lead to data, and the data lead to policy proposals.
>>>>> 
>>>>> 
>>>>      [SM] So it seems that outside of seemingly simple to test throughput numbers*, the next most important quality number (or the most important depending on subjective ranking) is how does latency change under "load". Absolute latency is also important albeit static high latency can be worked around within limits so the change under load seems more relevant.
>>>>      All of flent's RRUL test, apple's networkQuality/RPM, and iperf2's bounceback test offer methods to asses latency change under load**, as do waveforms bufferbloat tests and even to a degree Ookla's speedtest.net. IMHO something like latency increase under load or apple's responsiveness measure RPM (basically the inverse of the latency under load calculated on a per minute basis, so it scales in the typical higher numbers are better way, unlike raw latency under load numbers where smaller is better).
>>>>      IMHO what networkQuality is missing ATM is to measure and report the unloaded RPM as well as the loaded the first gives a measure over the static latency the second over how well things keep working if capacity gets tight. They report the base RTT which can be converted to RPM. As an example:
>>>> 
>>>> macbook:~ user$ networkQuality -v
>>>> ==== SUMMARY ====
>>>> Upload capacity: 24.341 Mbps
>>>> Download capacity: 91.951 Mbps
>>>> Upload flows: 20
>>>> Download flows: 16
>>>> Responsiveness: High (2123 RPM)
>>>> Base RTT: 16
>>>> Start: 10/23/22, 13:44:39
>>>> End: 10/23/22, 13:44:53
>>>> OS Version: Version 12.6 (Build 21G115)
>>>> 
>>>> Here RPM 2133 corresponds to 60000/2123 = 28.26 ms latency under load, while the Base RTT of 16ms corresponds to 60000/16 = 3750 RPM, son on this link load reduces the responsiveness by 3750-2123 = 1627 RPM a reduction by 100-100*2123/3750 = 43.4%, and that is with competent AQM and scheduling on the router.
>>>> 
>>>> Without competent AQM/shaping I get:
>>>> ==== SUMMARY ====
>>>> Upload capacity: 15.101 Mbps
>>>> Download capacity: 97.664 Mbps
>>>> Upload flows: 20
>>>> Download flows: 12
>>>> Responsiveness: Medium (427 RPM)
>>>> Base RTT: 16
>>>> Start: 10/23/22, 13:51:50
>>>> End: 10/23/22, 13:52:06
>>>> OS Version: Version 12.6 (Build 21G115)
>>>> latency under load: 60000/427 = 140.52 ms
>>>> base RPM: 60000/16 = 3750 RPM
>>>> reduction RPM: 100-100*427/3750 = 88.6%
>>>> 
>>>> 
>>>> I understand apple's desire to have a single reported number with a single qualifier medium/high/... because in the end a link is only reliably usable if responsiveness under load stays acceptable, but with two numbers it is easier to see what one's ISP could do to help. (I guess some ISPs might already be unhappy with the single number, so this needs some diplomacy/tact)
>>>> 
>>>> Regards
>>>>      Sebastian
>>>> 
>>>> 
>>>> 
>>>> *) Seemingly as quite some ISPs operate their own speedtest servers in their network and ignore customers not reaching the contracted rates into speedtest-servers located in different ASs. As the product is called internet access I a inclined to expect that my ISP maintains sufficient peering/transit capacity to reach the next tier of AS at my contracted rate (the EU legislative seems to agree, see EU directive 2015/2120).
>>>> 
>>>> **) Most do by creating load themselves and measuring throughput at the same time, bounceback IIUC will focus on the latency measurement and leave the load generation optional (so offers a mode to measure responsiveness of a live network with minimal measurement traffic). @Bob, please correct me if this is wrong.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Fri, Oct 21, 2022, 5:20 PM Dave Taht
>>>>> 
>>>>> <dave.taht at gmail.com>
>>>>> 
>>>>>  wrote:
>>>>> One of the best talks I've ever seen on how to measure customer
>>>>> satisfaction properly just went up after the P99 Conference.
>>>>> 
>>>>> It's called Misery Metrics.
>>>>> 
>>>>> After going through a deep dive as to why and how we think and act on
>>>>> percentiles, bins, and other statistical methods as to how we use the
>>>>> web and internet are *so wrong* (well worth watching and thinking
>>>>> about if you are relying on or creating network metrics today), it
>>>>> then points to the real metrics that matter to users and the ultimate
>>>>> success of an internet business: Timeouts, retries, misses, failed
>>>>> queries, angry phone calls, abandoned shopping carts and loss of
>>>>> engagement.
>>>>> 
>>>>> 
>>>>> 
>>>>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.p99conf.io%2Fsession%2Fmisery-metrics-consequences%2F&data=05%7C01%7C%7C678b8216944a43dba97508dab4f1c261%7Cb07c069022b843668d8d7b845d088e18%7C1%7C0%7C638021247687178024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BcMJsNSiXRF%2F77x%2FADA88rnaFK8YbIIBKPOua2Rz41s%3D&reserved=0
>>>>> 
>>>>> 
>>>>> 
>>>>> The ending advice was - don't aim to make a specific percentile
>>>>> acceptable, aim for an acceptable % of misery.
>>>>> 
>>>>> I enjoyed the p99 conference more than any conference I've attended in years.
>>>>> 
>>>>> --
>>>>> This song goes out to all the folk that thought Stadia would work:
>>>>> 
>>>>> 
>>>>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fposts%2Fdtaht_the-mushroom-song-activity-6981366665607352320-FXtz&data=05%7C01%7C%7C678b8216944a43dba97508dab4f1c261%7Cb07c069022b843668d8d7b845d088e18%7C1%7C0%7C638021247687178024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aSrboZRnm30gb6ZRrFtZ01Gl65axo1vmxaouBE1%2FK9k%3D&reserved=0
>>>>> 
>>>>> 
>>>>> Dave Täht CEO, TekLibre, LLC
>>>>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups "discuss" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to
>>>>> 
>>>>> discuss+unsubscribe at measurementlab.net
>>>>> 
>>>>> .
>>>>> To view this discussion on the web visit
>>>>> 
>>>>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Fmeasurementlab.net%2Fd%2Fmsgid%2Fdiscuss%2FCAA93jw4w27a1EO_QQG7NNkih%252BC3QQde5%253D_7OqGeS9xy9nB6wkg%2540mail.gmail.com&data=05%7C01%7C%7C678b8216944a43dba97508dab4f1c261%7Cb07c069022b843668d8d7b845d088e18%7C1%7C0%7C638021247687178024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=a2Yru9HMRhkHuP6M8qsA5pgB20uw11w%2BdiyX%2Fy9VYTQ%3D&reserved=0
>>>>> 
>>>>> .
>>>>> _______________________________________________
>>>>> Rpm mailing list
>>>>> 
>>>>> 
>>>>> Rpm at lists.bufferbloat.net
>>>>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.bufferbloat.net%2Flistinfo%2Frpm&data=05%7C01%7C%7C678b8216944a43dba97508dab4f1c261%7Cb07c069022b843668d8d7b845d088e18%7C1%7C0%7C638021247687178024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=51HHEyIB0moDJPIsjDnhNNT4YxMvIAGiyh3he5WguVU%3D&reserved=0
>>>> _______________________________________________
>>>> Starlink mailing list
>>>> 
>>>> 
>>>> Starlink at lists.bufferbloat.net
>>>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.bufferbloat.net%2Flistinfo%2Fstarlink&data=05%7C01%7C%7C678b8216944a43dba97508dab4f1c261%7Cb07c069022b843668d8d7b845d088e18%7C1%7C0%7C638021247687178024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fLB3ojY%2FXNZ2%2FhWc%2B2WfwOhHz1vLIrC653g2ZmlLRrA%3D&reserved=0
>>> --
>>> David Collier-Brown,         | Always do right. This will gratify
>>> System Programmer and Author | some people and astonish the rest
>>> 
>>> 
>>> dave.collier-brown at indexexchange.com
>>>  |              -- Mark Twain
>>> 
>>> CONFIDENTIALITY NOTICE AND DISCLAIMER : This telecommunication, including any and all attachments, contains confidential information intended only for the person(s) to whom it is addressed. Any dissemination, distribution, copying or disclosure is strictly prohibited and is not a waiver of confidentiality. If you have received this telecommunication in error, please notify the sender immediately by return electronic mail and delete the message from your inbox and deleted items folders. This telecommunication does not constitute an express or implied agreement to conduct transactions by electronic means, nor does it constitute a contract offer, a contract amendment or an acceptance of a contract offer. Contract terms contained in this telecommunication are subject to legal review and the completion of formal documentation and are not binding until same is confirmed in writing and has been signed by an authorized signatory.
>>> 
>>> _______________________________________________
>>> Starlink mailing list
>>> 
>>> Starlink at lists.bufferbloat.net
>>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.bufferbloat.net%2Flistinfo%2Fstarlink&data=05%7C01%7C%7C678b8216944a43dba97508dab4f1c261%7Cb07c069022b843668d8d7b845d088e18%7C1%7C0%7C638021247687178024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fLB3ojY%2FXNZ2%2FhWc%2B2WfwOhHz1vLIrC653g2ZmlLRrA%3D&reserved=0
> -- 
> David Collier-Brown,         | Always do right. This will gratify
> System Programmer and Author | some people and astonish the rest
> 
> dave.collier-brown at indexexchange.com |              -- Mark Twain



More information about the Starlink mailing list