From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 8DBBC21F644 for ; Sat, 26 Jul 2014 16:39:20 -0700 (PDT) Received: from hms-beagle.home.lan ([217.231.210.84]) by mail.gmx.com (mrgmx001) with ESMTPSA (Nemesis) id 0M5dMm-1WIN0Q2RQm-00xf4E; Sun, 27 Jul 2014 01:39:10 +0200 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Sebastian Moeller In-Reply-To: Date: Sun, 27 Jul 2014 01:39:08 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <1E34489F-F863-41DC-8935-DE1B798B5D3E@gmx.de> References: <13144.1406313454@turing-police.cc.vt.edu> <36889fad276c5cdd1cd083d1c83f2265@lang.hm> <2483CF77-EE7D-4D76-ACC8-5CBC75D093A7@gmx.de> <93489218-DB72-4A74-96A4-F95AF4800BBE@gmx.de> To: David Lang X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:PNVXa2F6u8ERYgZDHv5iXIZaRHAr8BOR8Um4RMxH9Ygx1EWrs0i JlN2DGdRGMx08r31g/m1eWarytIISEH5IHBD2f7QWBI2SdSHmAwvtojNcKfQqVjpD3LmqKt mDcA8fn6fTUvnp/yRAPhaxsLU1QpZTHtf+lTY8WXRcqojZ13ouHkv0FqI1cLfYvVcya8M+Z gO2gobUywhlu40ysaVBBw== Cc: cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] Ideas on how to simplify and popularize bufferbloat control for consideration. X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jul 2014 23:39:21 -0000 Hi David, On Jul 27, 2014, at 00:53 , David Lang wrote: > On Sun, 27 Jul 2014, Sebastian Moeller wrote: >=20 >> Hi David, >>=20 >> On Jul 26, 2014, at 23:45 , David Lang wrote: >>=20 >>> On Sat, 26 Jul 2014, Sebastian Moeller wrote: >>>=20 >>>> On Jul 26, 2014, at 22:39 , David Lang wrote: >>>>=20 >>>>> by how much tuning is required, I wasn't meaning how frequently to = tune, but how close default settings can come to the performance of a = expertly tuned setup. >>>>=20 >>>> Good question. >>>>=20 >>>>>=20 >>>>> Ideally the tuning takes into account the characteristics of the = hardware of the link layer. If it's IP encapsulated in something else = (ATM, PPPoE, VPN, VLAN tagging, ethernet with jumbo packet support for = example), then you have overhead from the encapsulation that you would = ideally take into account when tuning things. >>>>>=20 >>>>> the question I'm talking about below is how much do you loose = compared to the idea if you ignore this sort of thing and just assume = that the wire is dumb and puts the bits on them as you send them? By = dumb I mean don't even allow for inter-packet gaps, don't measure the = bandwidth, don't try to pace inbound connections by the timing of your = acks, etc. Just run BQL and fq_codel and start the BQL sizes based on = the wire speed of your link (Gig-E on the 3800) and shrink them based on = long-term passive observation of the sender. >>>>=20 >>>> As data talks I just did a quick experiment with my ADSL2+ koine = at home. The solid lines in the attached plot show the results for = proper shaping with SQM (shaping to 95% of del link rates of downstream = and upstream while taking the link layer properties, that is ATM = encapsulation and per packet overhead into account) the broken lines = show the same system with just the link layer adjustments and per packet = overhead adjustments disabled, but still shaping to 95% of link rate = (this is roughly equivalent to 15% underestimation of the packet size). = The actual theist is netperf-wrappers RRUL (4 tcp streams up, 4 tcp = steams down while measuring latency with ping and UDP probes). As you = can see from the plot just getting the link layer encapsulation wrong = destroys latency under load badly. The host is ~52ms RTT away, and with = fq_codel the ping time per leg is just increased one codel target of 5ms = each resulting in an modest latency increase of ~10ms with proper = shaping for a total of ~65ms, with improper shaping RTTs increase to = ~95ms (they almost double), so RTT increases by ~43ms. Also note how the = extremes for the broken lines are much worse than for the solid lines. = In short I would estimate that a slight misjudgment (15%) results in = almost 80% increase of latency under load. In other words getting the = rates right matters a lot. (I should also note that in my setup there is = a secondary router that limits RTT to max 300ms, otherwise the broken = lines might look even worse...) >>>=20 >>> what is the latency like without BQL and codel? the pre-bufferbloat = version? (without any traffic shaping) >>=20 >> So I just disabled SQM and the plot looks almost exactly like = the broken line plot I sent before (~95ms RTT up from 55ms unloaded, = with single pings delayed for > 1000ms, just as with the broken line, = with proper shaping even extreme pings stay < 100ms). But as I said = before I need to run through my ISP supplied primary router (not just a = dumb modem) that also tries to bound the latencies under load to some = degree. Actually I just repeated the test connected directly to the = primary router and get the same ~95ms average ping time with frequent = extremes > 1000ms, so it looks like just getting the shaping wrong by = 15% eradicates the buffer de-bloating efforts completely... >=20 > just so I understand this completely >=20 > you have >=20 > debloated box <-> ISP router <-> ADSL <-> Internet <-> debloated = server? Well more like: Macbook with dubious bloat-state -> wifi to de-bloated cerowrt = box that shapes the traffic -> ISP router -> ADSL -> internet -> server I assume that Dave debated these servers well, but it should not really = matter as the problem are the buffers on both ends of the bottleneck = ADSL link. >=20 > and are you measuring the latency impact when uploading or = downloading? No I measure the impact of latency of saturating both up- and = downlink, pretty much the worst case scenario. >=20 > I think a lot of people would be happy with 95ms average pings on a = loaded connection, even with occasional outliers. No that is too low an aim, this still is not useable for real = time applications, we should aim for base RTT plus 10ms. (For very slow = links we need to cut some slack but for > 3Mbps 10ms should be = achievable ) > It's far better than sustained multi-second ping times which is what = I've seen with stock setups. True, but compared to multi seconds even <1000ms would be a = really great improvement, but also not enough. >=20 > but if no estimate is this bad, how bad is it if you use as your = estimate the 'rated' speed of your DSL (i.e. what the ISP claims they = are providing you) instead of the fully accurate speed that includes = accounting for ATM encapsulation? Well ~95ms with outliers > 1000ms, just as bad as no estimate. I = shaped 5% below rated speed as reported by the DSL modem, so disabling = the ATM link layer adjustments (as shown in the broken lines in the = plot), basically increased the effective shaped rate by ~13% or to = effectively 107% of line rate, your proposal would be line rate and no = link layer adjustments or effectively 110% of line rate; I do not feel = like repeating this experiment right now as I think the data so far = shows that even with less misjudgment the bloat effect is fully visible = ) Not accounting for ATM framing carries a ~10% cost in link speed, as = ATM packet size on the wire increases by >=3D ~10%. >=20 > It's also worth figuring out if this problem would remain in place if = you didn't have to go through the ISP router and were runing fq_codel on = that router. If the DSL modem would be debloated at least on upstream no = shaping would be required any more; but that does not fix the need for = downstream shaping (and bandwidth estimation) until the head end gear is = debloated.. > As long as fixing bufferbloat involves esoteric measurements and = tuning, it's not going to be solved, but if it could be solved by people = flahing openwrt onto their DSL router and then using the defaults, it = could gain traction fairly quickly. But as there are only very few DSL modems with open sources = (especially of the DSL chips) this just as esoteric ;) Really if = equipment manufactures could be convinced to take these issues seriously = and actually fix their gear that would be best. But this does not look = like it is happening on the fast track. (Even DOCSIS developer cable = labs punted on requiring codel or fq_codel in DOCSIS modems since the = think that the required timestamps are to =93expensive=94 on the device = class they want to use for modems. They opted for PIE, much better than = what we have right now but far away from my latency under load increase = of 10ms...) >=20 >>> I agree that going from 65ms to 95ms seems significant, but if the = stock version goes into up above 1000ms, then I think we are talking = about things that are =91close' >>=20 >> Well if we include outliers (and we should as enough outliers = will degrade the FPS and voip suitability of an otherwise responsive = system quickly) stock and improper shaping are in the >1000ms worst case = range, while proper SQM bounds this to 100ms. >>=20 >>>=20 >>> assuming that latency under load without the improvents got >1000ms >>>=20 >>> fast-slow (in ms) >> ideal=3D10 >> untuned=3D43 >> bloated > 1000 >>=20 >> The sign seems off as fast < slow? I like this best ;) >=20 > yep, I reversed fast/slow in all of these >=20 >>>=20 >>> fast/slow >>> ideal =3D 1.25 >>> untuned =3D 1.83 >>> bloated > 19 >>=20 >> But Fast < Slow and hence this ration should be <0? >=20 > 1 not 0, but yes, this is really slow/fast >=20 >>> slow/fast >>> ideal =3D 0.8 >>> untuned =3D 0.55 >>> bloated =3D 0.05 >>>=20 >>=20 >> and this >0? >=20 > and this is really fast/slow What about taking the latency difference an re;aging it with a = reference time, like say the time a photon would take to travel once = around the equator, or the earth=92s diamater? Best Regards Sebastian >=20 > David Lang