From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 6AC4121F2F0 for ; Thu, 19 Mar 2015 01:29:58 -0700 (PDT) Received: from u-089-d066.biologie.uni-tuebingen.de ([134.2.89.66]) by mail.gmx.com (mrgmx003) with ESMTPSA (Nemesis) id 0M4WNA-1ZSYPO2MdY-00yhgr; Thu, 19 Mar 2015 09:29:54 +0100 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Sebastian Moeller In-Reply-To: <5509F8B1.6090608@gmail.com> Date: Thu, 19 Mar 2015 09:29:55 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1895D16A-1B0F-48C7-B4B5-6FC84CA92F43@gmx.de> <5509F8B1.6090608@gmail.com> To: Alan Jenkins X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:ewrzDTVhFyfPFRzM0KqLYkJbPFOyIXZy8zrCEiiewY1itOigupk TUzbvWROywYIGE+UCIcoSt9bHHYDlCGOvzv4CP9XFC+gNQvcW2fj062OsRjTjo4xilkMMLR smmOluJyt3a+PfC57N/cAe+LhtbY9O5XwqEynX7X4sxloVny++eDlpZyBsuZxcQlab5ZEAo Ojewp3Taoj3TmSLACWvIA== X-UI-Out-Filterresults: notjunk:1; Cc: cerowrt-devel Subject: Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers... X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Mar 2015 08:30:28 -0000 Hi Alan, On Mar 18, 2015, at 23:14 , Alan Jenkins = wrote: > Hi Seb >=20 > I tested shaping on eth1 vs pppoe-wan, as it applies to ADSL. (On = Barrier Breaker + sqm-scripts). Maybe this is going back a bit & no = longer interesting to read. But it seemed suspicious & interesting = enough that I wanted to test it. >=20 > My conclusion was 1) I should stick with pppoe-wan, Not a bad decision, especially given the recent changes to SQM = to make it survive transient pppoe-interface disappearances. Before = those changes the beauty of shaping on the ethernet device was that = pppoe could come and go, but SQM stayed active and working. But due to = your help this problem seems fixed now. > 2) the question really means do you want to disable classification > 3) I personally want to preserve the upload bandwidth and accept = slightly higher latency. My question still is, is the bandwidth sacrifice really = necessary or is this test just showing a corner case in simple.qos that = can be fixed. I currently lack enough time to tackle this effectively. >=20 >=20 > On 15/10/14 01:03, Sebastian Moeller wrote: >> Hi All, >>=20 >> some more testing: On Oct 12, 2014, at 01:12 , Sebastian Moeller >> wrote: >=20 >>> 1) SQM on ge00 does not show a working egress classification in the >>> RRUL test (no visible =93banding=94/stratification of the 4 = different >>> priority TCP flows), while SQM on pppoe-ge00 does show this >>> stratification. >=20 >> Usind tc filters u32 filter makes it possible to actually dive into >> PPPoE encapsulated ipv4 and ipv6 packets and perform classification >> on =93pass-through=94 PPPoE packets (as encountered when starting SQM = on >> ge00 instead of pppoe-ge00, if the latter actually handles the wan >> connection), so that one is solved (but see below). >>=20 >>>=20 >>> 2) SQM on ge00 shows better latency under load (LUL), the LUL >>> increases for ~2*fq_codels target so 10ms, while SQM on pppeo-ge00 >>> shows a LUL-increase (LULI) roughly twice as large or around 20ms. >>>=20 >>> I have no idea why that is, if anybody has an idea please chime >>> in. >=20 > I saw the same, though with higher difference for egress rate. See = first three files here: >=20 > = https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=3D= 0 >=20 > [netperf-wrapper noob puzzle: most of the ping lines vanish part-way = through. Maybe I failed it somehow.] This is not your fault, the UDP probes net-perf wrapper uses do = not accept packet loss, once a packet (I believe) is lost the stream = stops. This is not ideal, but it gives a good quick indicator of packet = loss for sparse streams ;) >=20 >> Once SQM on ge00 actually dives into the PPPoE packets and >> applies/tests u32 filters the LUL increases to be almost identical to >> pppoe-ge00=92s if both ingress and egress classification are active = and >> do work. So it looks like the u32 filters I naively set up are quite >> costly. Maybe there is a better way to set these up... >=20 > Later you mentioned testing for coupling with egress rate. But you = didn't test coupling with classification! True, I was interesting in getting the 3-tier shaper to behave = sanely, so I did not look at the 1-tier simplest.qos. >=20 > I switched from simple.qos to simplest.qos, and that achieved the = lower latency on pppoe-wan. So I think your naive u32 filter setup = wasn't the real problem. Erm, but simplest.qos is not using the relevant tc filters, so = the these could still account for the issue; that or some loss due to = the 3 htb shapers... >=20 > I did think ECN wouldn't be applied on eth1, and that would be the = cause of the latency. But disabling ECN didn't affect it. See files 3 = to 6: >=20 > = https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=3D= 0 We typically only enable ECN on the downlink so far (under the = assumption that this is a faster congestion signal to the receiver than = dropping the packet and then having to wait for the next packet to = create dupACKs; typically the router is close to the end-hosts and the = packets already cleared the real bottleneck, so dropping them is not = going to help the effective bandwidth use); on the uplink the reasoning = reverses, here dropping instead of marking saves bandwidth for other = packets (also often uplink bandwidth is more precious) and the packets = basically just started their journey so the control loop still can take = a long time to complete and other hops can drop the packet. (I guess my = current link is fast enough to activate ECN on uplink as well to see how = that behaves, so I will try that for a bit...) >=20 > I also admit surprise at fq_codel working within 20%/10ms on eth1. I = thought it'd really hurt, by breaking the FQ part. Now I guess it = doesn't. I still wonder about ECN marking, though I didn't check my = endpoint is using ECN. >=20 >>>=20 >>> 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on >>> ge00 (with ingress more or less identical between the two). Also 2) >>> and 3) do not seem to be coupled, artificially reducing the egress >>> rate on pppoe-ge00 to yield the same egress rate as seen on ge00 >>> does not reduce the LULI to the ge00 typical 10ms, but it stays at >>> 20ms. >>>=20 >>> For this I also have no good hypothesis, any ideas? >>=20 >> With classification fixed the difference in egress rate shrinks to >> ~10% instead of 20, so this partly seems related to the >> classification issue as well. >=20 > My tests look like simplest.qos gives a lower egress rate, but not as = low as eth1. (Like 20% vs 40%). So that's also similar. >=20 >>> So the current choice is either to accept a noticeable increase in >>> LULI (but note some years ago even an average of 20ms most likely >>> was rare in the real life) or a equally noticeable decrease in >>> egress bandwidth=85 >>=20 >> I guess it is back to the drawing board to figure out how to speed up >> the classification=85 and then revisit the PPPoE question again=85 >=20 > so maybe the question is actually classification v.s. not? >=20 > + IMO slow asymmetric links don't want to lose more upload bandwidth = than necessary. And I'm losing a *lot* in this test. > + As you say, having only 20ms excess would still be a big = improvement. We could ignore the bait of 10ms right now. >=20 > vs >=20 > - lowest latency I've seen testing my link. almost suspicious. looks = close to 10ms average, when the dsl rate puts a lower bound of 7ms on = the average. Curious: what is your link speed? > - fq_codel honestly works miracles already. classification is the knob = people had to use previously, who had enough time to twiddle it. > - on netperf-runner plots the "banding" doesn't look brilliant on slow = links anyway On slow links I always used to add =93-s 0.8=94 with higher = numbers the slower the link to increase the temporal averaging window, = this reduces accuracy of the display for the downlink, but at least = allows better understanding of the uplink. I always wanted to see = whether I could treach netperf-wrapper to allow larger averaging windows = after measurements, just for display purposes, but I am a total beginner = with python... >=20 >=20 >> Regards Sebastian >>=20 >>>=20 >>> Best Regards Sebastian >>>=20 >>> P.S.: It turns out, at least on my link, that for shaping on >>> pppoe-ge00 the kernel does not account for any header >>> automatically, so I need to specify a per-packet-overhead (PPOH) of >>> 40 bytes (an an ADSL2+ link with ATM linklayer); when shaping on >>> ge00 however (with the kernel still terminating the PPPoE link to >>> my ISP) I only need to specify an PPOH of 26 as the kernel already >>> adds the 14 bytes for the ethernet header=85 Please disregard this part, I need to implement better tests for = this instead on only relaying on netperf-wrapper results ;)