From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 5EE9B3B2A4; Wed, 6 May 2020 04:09:01 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1588752535; bh=VKeyD7q5bMQzHkE2TxH/pxkQp1r9qv6D94k5tAJvx6o=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=lVSLMwFLE2btu02YgUclg8LdE5fUoezDlmrGrykshE5eo/+XzvqiNCl5PBI7u4SOb nKKK3cTdcZpZo8XxcfwEWiLuvh4hqv4mRmVwR39G2R8pesOM5vc03F/fZXgWOCZpl2 qXD5lx4bwaXMwTVyvMU4RPj9T6msXlOi92TK/r9o= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [10.11.12.16] ([134.76.241.253]) by mail.gmx.com (mrgmx105 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MYeMt-1jaFbZ3Ss7-00Vf8j; Wed, 06 May 2020 10:08:54 +0200 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.14\)) From: Sebastian Moeller X-Priority: 3 (Normal) In-Reply-To: <1588518416.66682155@apps.rackspace.com> Date: Wed, 6 May 2020 10:08:53 +0200 Cc: =?utf-8?Q?Dave_T=C3=A4ht?= , Make-Wifi-fast , Jannie Hanekom , Cake List , Sergey Fedorov , bloat , =?utf-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= Content-Transfer-Encoding: quoted-printable Message-Id: <5A0DCDB9-CF68-4C0D-896D-2C8F8B304FC8@gmx.de> References: <1588518416.66682155@apps.rackspace.com> To: "David P. Reed" X-Mailer: Apple Mail (2.3445.104.14) X-Provags-ID: V03:K1:Cv3FN4zuurCk2S2BxSX/fencAJbD9Qg70JeFqeiC6CWMbTOAxwt lHACTIDcqPAaaO4Mabc0wxPEQQjinSXcV1qQMRWZAw0Uiuh5kV9RnF3GEo5v2y8ZuLxUx7/ 6MfjXXSkUk6VyFt/aiCRFtkNutL2Iu/j/pXGVM5jtL9ku5/g9iEJeEMGRGP5Se8DkY/GC16 h/Locq3ME2f4KzD5QwMPw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:f5Lydo4ujg0=:CsRXiCIwweLtbdgVszkEHH /jp5eg5cQUkob00Pf7P8ubGspP+2cGTRdmBscnVzR3BOkyLxiHgHVm9eVGmmtO2RsUqfJ673e lXA4puv3z2dATGdvHwEaBV5WZtC7fygQ/u8OF6YJ71Kd2pHYYiiNbDYOdfShiz4zWJDt7jj2c LYRFMJqNKNO4WGJ7GaXzaV3HnWyvZNcRxiX8SVCgfmzUM1afgTasJHguYORQuy4E8mFUI0nwo fjED3hXvDyii9xaEqw0Uocg0eynO+A4AOMWSjclzoKWTtlkI1sOGtorkbMtM+RCyzjN5P+Rgo qvTDnOrEkSffGPbxQH4bwxb1oeexysU0WoZ6IZyTmiqSKpfGQbMoiGJ0AGkKuWACeID3TXdFq mFFqRCCDlQUMcPM2hD0qUaW7Lqo8PU7zil1MqBQRh8CLtwQ4oaGIev9bBAeprY2PqHB+8DQP6 WIXsZp62RVydojShD29hS0aZPWeD9I/b/OU6F9frfGSUeNnlRa+8vrDtC//+ffc11ZZKqiVbB 55hH7XrJVU83xXZ7HKuHY4HRMlFO8PHjKeZNtJ+smoFJ4OfKurLsbuxDooBVyQvIGsWmvXd2g pGyQn6jAVfz7X43Dak7FLsNGxgCwiHOOY0bv6RN314aZ2Sp8l+gF3yHyu1E5LwRlzkqbBitGD zk2EztdbnNw9ElvL3JgQcL2NJTA7yOFSiPdpuWP6JRDIT0aTPSNrqa2N3JJRDnTdjuz7N91VE y87H3G/OnY51fnM0a6/SjmZqmjRqsRs9Sa8BUcSQ0VLffZscpZ1u2/heMnWoHCw/vzH/+RzY9 k71TVAOMxNga5LKmjIOjS5C1tKbGMOT4DARgd7AxgIMUvYDPm43UlgPdIZNtNXI9haKFCyOgd 5Cmj+iGGZoyLOxDc7ve3X3wLN+HS4Z9ejG/IgBnWSncFW5XK/VB5IKObfHiVTZrKQUNwGqqJW I9NPYZDtvVWzC+DsZehbpniqPwbAMKcU8bUU8hUXfnnd6Ttkp+emKag4g2MP/4OQr2uri372q 0cGa9rKl3FpNSNfzZoXCGOYgorzyyG9rQmjg1WEA3X4FNCNjypbp+Dn870IYPEJAEjvvNJY0M OvvES6m3MavJhoSDsNOODGW056IuszF8Q44s77kwq1QXihdXUt6PpqHtOelSCY1+54gxZB/1w +zV3PwfiJz7hCgCmO0fbER+SE0/Qtsmvksl+6YcDlUC5L7p6koN6poPxRNZFFXMC6H2Z5IJnN YjaLPU9QyfewAnirJ Subject: Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 May 2020 08:09:01 -0000 Dear David, Thanks for the elaboration below, and indeed I was not appreciating the = full scope of the challenge. > On May 3, 2020, at 17:06, David P. Reed wrote: >=20 > Thanks Sebastian. I do agree that in many cases, reflecting the ICMP = off the entry device that has the external IP address for the NAT gets = most of the RTT measure, and if there's no queueing built up in the NAT = device, that's a reasonable measure. But... Yes, I see; I really hope that with IPv6 coming more and more = online, and hence less NAT, end-to-end RTT measurements will be simpler = in the future. But cue the people who will for example recommend to = drop/ignore ICMP in the name of security theater... Its the same mindset = that basically recommends to ignore ICMP and/or IP timestamps, because = "information leakage", while all the information that leaks for a = standards conformant host is the time since midnight UTC (and = potentially an idea about the difference between the local clock = setting)... I fail to understand the rationale thread model behind = eschewing this... For our purpoes one-way timestamps would be most = excellent to have to be able to assess on which "leg" overload actually = happens. >=20 > However, if the router has "taken up the queueing delay" by rate = limiting its uplink traffic to slightly less than the capacity (as with = Cake and other TC shaping that isn't as good as cake), then there is a = queue in the TC layer itself. This is what concerns me as a distortion = in the measurement that can fool one into thinking the TC shaper is = doing a good job, when in fact, lag under load may be quite high from = inside the routed domain (the home). As long as the shaper is instantiated on the NAT box, the = latency probes reflected by that NAT-box will also travel through the = shaper; but now you mention it, in SQM we do ingress shaping via an IFB = and hence will also shape the incoming latency probes, but I started to = recommend to do ingress shaping as egress-shaping on the LAN-wards = interface of a router (to avoid the computational cost of the IFB = redirection dance, and to allow people to use iptables for ingress*), = and in such a configuration router reflected/emitted WAN-probes will = avoid the ingress TC-queues...=20 *) With nftables having a hook at ingress, that second rationale will = become moot in the near future... >=20 > As you point out this unmeasured queueing delay can also be a problem = with WiFi inside the home. But it isn't limited to that. >=20 > A badly set up shaping/congestion management subsystem inside the NAT = can look "very good" in its echo of ICMP packets, but be terrible in = response time to trivial HTTP requests from inside, or equally terrible = in twitch games and video conferencing. Good point, and one of Dave's pet peeves, in former time people = recommended to up-priritize ICMP packets to make RTT look good, falling = exactly into the trap you described. >=20 > So, for example, for tuning settings with "Cake" it is useless. I believe that at least for the way we instantiate things by = default in SQM-scripts we avoid that pit-fall. What do you think @Toke? >=20 > To be fair, usually the Access Provider has no control of what is done = after the cable is terminated at the home, so as a way to decide if the = provider is badly engineering its side, a ping from a server is a = reasonable quality measure of the provider.=20 Most providers in Germany will try to steer customers to rent a = wifi router from the ISP, so bloat in the wifi link would also be under = the responsibility of the ISP to some degree, no? >=20 > But not a good measure of the user experience, and if the provider = provides the NAT box, even if it has a good shaper in it, like Cake or = fq_codel, it will just confuse the user and create the opportunity for a = "finger pointing" argument where neither side understands what is going = on. >=20 > This is why we need=20 >=20 > 1) a clear definition of lag under load that is from end-to-end in = latency, and involves, ideally, independent traffic from multiple = sources through the bottleneck. I am all for it, in addition in the past we also reasoned that = this definition needs to be relative simple so it can be easily = explained to turn naive layperson into informed amateurs ;) The multiple = sources thing is something that dslreports did welll, they typically = tried to serve from multiple server sites and reported some stats per = site. Now with its basically gone, it becomes clear how much clue went = into that speedtest, a pitty that most of the competition did not follow = their lead yet (I am especially looking at you Ookla...). >=20 > 2) ideally, a better way to localize where the queues are building up = and present that to users and access providers. =20 Yes. Now how to do this robustly and reliably escapes me, albeit = enabling one-way timestamps might help, then a saturating speedtest = could be accompanied not by conceptually a "simple" IVMP echo request, = but by a repeated traceroute that gets there-and-back delay measurements = for the approximated path (approximated because of the complications of = understanding traceroute results). > The flent graphs are not interpretable by most non-experts. And sometimes not even by experts ;) > What we need is a simple visualization of a sketch-map of the path = (like traceroute might provide) with queueing delay measures shown at = key points that the user can understand. I am on the fence, personally I would absolutely love that, but = I am not sure how the rest of my family would receive something like = that? I guess it depends on the simplicity of the representation and = probably, following fast.com's lead, a way tp also compress that = expanded results into a reasonable one-number representation. I hate = on-number-representations for complex issues, but people generally will = come up with one themselves if none is supplied. (And I get this, = outside our areas of expertise we all prefer the world to be simple) Best Regards Sebastian > On Saturday, May 2, 2020 4:19pm, "Sebastian Moeller" = said: >=20 >> Hi David, >>=20 >> in principle I agree, a NATed IPv4 ICMP probe will be at best = reflected at the NAT >> router (CPE) (some commercial home gateways do not respond to ICMP = echo requests >> in the name of security theatre). So it is pretty hard to measure the = full end to >> end path in that configuration. I believe that IPv6 should make that >> easier/simpler in that NAT hopefully will be out of the path (but = let's see what >> ingenuity ISPs will come up with). >> Then again, traditionally the relevant bottlenecks often are a) the = internet >> access link itself and there the CPE is in a reasonable position as a = reflector on >> the other side of the bottleneck as seen from an internet server, b) = the home >> network between CPE and end-host, often with variable rate wifi, here = I agree >> reflecting echos at the CPE hides part of the issue. >>=20 >>=20 >>=20 >>> On May 2, 2020, at 19:38, David P. Reed wrote: >>>=20 >>> I am still a bit worried about properly defining "latency under = load" for a >> NAT routed situation. If the test is based on ICMP Ping packets *from = the server*, >> it will NOT be measuring the full path latency, and if the potential = congestion >> is in the uplink path from the access provider's residential box to = the access >> provider's router/switch, it will NOT measure congestion caused by = bufferbloat >> reliably on either side, since the bufferbloat will be outside the = ICMP Ping >> path. >>=20 >> Puzzled, as i believe it is going to be the residential box that will = respond >> here, or will it be the AFTRs for CG-NAT that reflect the ICMP echo = requests? >>=20 >>>=20 >>> I realize that a browser based speed test has to be basically run = from the >> "server" end, because browsers are not that good at time measurement = on a packet >> basis. However, there are ways to solve this and avoid the ICMP Ping = issue, with a >> cooperative server. >>>=20 >>> I once built a test that fixed this issue reasonably well. It = carefully >> created a TCP based RTT measurement channel (over HTTP) that made the = echo have to >> traverse the whole end-to-end path, which is the best and only way to = accurately >> define lag under load from the user's perspective. The client end of = an unloaded >> TCP connection can depend on TCP (properly prepared by getting it = past slowstart) >> to generate a single packet response. >>>=20 >>> This "TCP ping" is thus compatible with getting the end-to-end = measurement on >> the server end of a true RTT. >>>=20 >>> It's like tcp-traceroute tool, in that it tricks anyone in the = middle boxes >> into thinking this is a real, serious packet, not an optional low = priority >> packet. >>>=20 >>> The same issue comes up with non-browser-based techniques for = measuring true >> lag-under-load. >>>=20 >>> Now as we move HTTP to QUIC, this actually gets easier to do. >>>=20 >>> One other opportunity I haven't explored, but which is pregnant with >> potential is the use of WebRTC, which runs over UDP internally. Since = JavaScript >> has direct access to create WebRTC connections (multiple ones), this = makes >> detailed testing in the browser quite reasonable. >>>=20 >>> And the time measurements can resolve well below 100 microseconds, = if the JS >> is based on modern JIT compilation (Chrome, Firefox, Edge all compile = to machine >> code speed if the code is restricted and in a loop). Then again, = there is Web >> Assembly if you want to write C code that runs in the brower fast. = WebAssembly is >> a low level language that compiles to machine code in the browser = execution, and >> still has access to all the browser networking facilities. >>=20 >> Mmmh, according to https://github.com/w3c/hr-time/issues/56 due to = spectre >> side-channel vulnerabilities many browsers seemed to have lowered the = timer >> resolution, but even the ~1ms resolution should be fine for typical = RTTs. >>=20 >> Best Regards >> Sebastian >>=20 >> P.S.: I assume that I simply do not see/understand the full scope of = the issue at >> hand yet. >>=20 >>=20 >>>=20 >>> On Saturday, May 2, 2020 12:52pm, "Dave Taht" >> said: >>>=20 >>>> On Sat, May 2, 2020 at 9:37 AM Benjamin Cronce >> wrote: >>>>>=20 >>>>>> Fast.com reports my unloaded latency as 4ms, my loaded latency >> as ~7ms >>>>=20 >>>> I guess one of my questions is that with a switch to BBR netflix is >>>> going to do pretty well. If fast.com is using bbr, well... that >>>> excludes much of the current side of the internet. >>>>=20 >>>>> For download, I show 6ms unloaded and 6-7 loaded. But for upload >> the loaded >>>> shows as 7-8 and I see it blip upwards of 12ms. But I am no longer = using >> any >>>> traffic shaping. Any anti-bufferbloat is from my ISP. A graph of = the >> bloat would >>>> be nice. >>>>=20 >>>> The tests do need to last a fairly long time. >>>>=20 >>>>> On Sat, May 2, 2020 at 9:51 AM Jannie Hanekom >> >>>> wrote: >>>>>>=20 >>>>>> Michael Richardson : >>>>>>> Does it find/use my nearest Netflix cache? >>>>>>=20 >>>>>> Thankfully, it appears so. The DSLReports bloat test was >> interesting, >>>> but >>>>>> the jitter on the ~240ms base latency from South Africa (and >> other parts >>>> of >>>>>> the world) was significant enough that the figures returned >> were often >>>>>> unreliable and largely unusable - at least in my experience. >>>>>>=20 >>>>>> Fast.com reports my unloaded latency as 4ms, my loaded latency >> as ~7ms >>>> and >>>>>> mentions servers located in local cities. I finally have a test >> I can >>>> share >>>>>> with local non-technical people! >>>>>>=20 >>>>>> (Agreed, upload test would be nice, but this is a huge step >> forward from >>>>>> what I had access to before.) >>>>>>=20 >>>>>> Jannie Hanekom >>>>>>=20 >>>>>> _______________________________________________ >>>>>> Cake mailing list >>>>>> Cake@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/cake >>>>>=20 >>>>> _______________________________________________ >>>>> Cake mailing list >>>>> Cake@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/cake >>>>=20 >>>>=20 >>>>=20 >>>> -- >>>> Make Music, Not War >>>>=20 >>>> Dave T=C3=A4ht >>>> CTO, TekLibre, LLC >>>> http://www.teklibre.com >>>> Tel: 1-831-435-0729 >>>> _______________________________________________ >>>> Cake mailing list >>>> Cake@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/cake >>>>=20 >>> _______________________________________________ >>> Cake mailing list >>> Cake@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/cake >>=20 >>=20 >=20