From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 970A821F3E7 for ; Tue, 14 Oct 2014 17:03:41 -0700 (PDT) Received: from hms-beagle-6.home.lan ([93.194.238.62]) by mail.gmx.com (mrgmx101) with ESMTPSA (Nemesis) id 0LyzW8-1YHfbO1qD2-0148qW for ; Wed, 15 Oct 2014 02:03:38 +0200 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Sebastian Moeller In-Reply-To: Date: Wed, 15 Oct 2014 02:03:37 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <1895D16A-1B0F-48C7-B4B5-6FC84CA92F43@gmx.de> References: To: cerowrt-devel X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:hwjVpDIgLFeZ2GXUlrjxhGkL+zAvnxlThCCl90IjO+gEYSB08OZ Kg6Ker+U3METxGUCPwix+3aLXfiwSQgBV6GEJmMBrV8/e1gmqsUyPZm14BNw3T8CcwQyLJ1 l+zCaWfj/hT6d5aITUPkvTrrc+PqfZwuC31V3GwOqrLiGyq+9rVcIrpDXmV3vw4TLKZjq7j ZIR8SaJUz17BhQf3J1E9A== X-UI-Out-Filterresults: notjunk:1; Subject: Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers... X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Oct 2014 00:04:10 -0000 Hi All, some more testing: On Oct 12, 2014, at 01:12 , Sebastian Moeller wrote: > Hi, >=20 > just to document my current understanding of using SQM on a router = that also terminates a pppoe wan connection. We basically have two = options either set up SQM on the real interface (let=92s call it ge00 = like cerowrt does) or on the associated pop device, pppoe-ge00. In = theory both should produce the same results; in praxis current SQM has = significant different results. Let me enumerate the main differences = that show up when testing with netperf-wrapper=92s RRUL test: >=20 > 1) SQM on ge00 does not show a working egress classification in the = RRUL test (no visible =93banding=94/stratification of the 4 different = priority TCP flows), while SQM on pppoe-ge00 does show this = stratification. >=20 > Now the reason for this is quite obvious once we take into = account that on ge00 the kernel sees a packet that already contains a = PPP header between ethernet and IP header and has a different ether_type = field, and our diffserv filters currently ignore everything except = straight ipv4 and ipv6 packets, so due to the unexpected/un-handled PPP = header everything lands in the default priority class and hence no = stratification. If we shape on pppoe-ge00 the kernel seems to do all = processing before encapsulating the data with PP so all filters just = work. In theory that should be relatively easy to fix (at least for the = specific PPPoE case, I am unsure about a generic solution) by using = offsets to try to access the TOS bits in PPP-packets. Also most likely = we face the same issue in other encapsulations that pass through cerowrt = to some degree (except most of those will use an outer IP header from = where we can scratch DSCPs=85, but I digress) Usind tc filters u32 filter makes it possible to actually dive = into PPPoE encapsulated ipv4 and ipv6 packets and perform classification = on =93pass-through=94 PPPoE packets (as encountered when starting SQM on = ge00 instead of pppoe-ge00, if the latter actually handles the wan = connection), so that one is solved (but see below). >=20 > 2) SQM on ge00 shows better latency under load (LUL), the LUL = increases for ~2*fq_codels target so 10ms, while SQM on pppeo-ge00 shows = a LUL-increase (LULI) roughly twice as large or around 20ms. >=20 > I have no idea why that is, if anybody has an idea please chime = in. Once SQM on ge00 actually dives into the PPPoE packets and = applies/tests u32 filters the LUL increases to be almost identical to = pppoe-ge00=92s if both ingress and egress classification are active and = do work. So it looks like the u32 filters I naively set up are quite = costly. Maybe there is a better way to set these up... >=20 > 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on = ge00 (with ingress more or less identical between the two). Also 2) and = 3) do not seem to be coupled, artificially reducing the egress rate on = pppoe-ge00 to yield the same egress rate as seen on ge00 does not reduce = the LULI to the ge00 typical 10ms, but it stays at 20ms. >=20 > For this I also have no good hypothesis, any ideas? With classification fixed the difference in egress rate shrinks = to ~10% instead of 20, so this partly seems related to the = classification issue as well. >=20 >=20 > So the current choice is either to accept a noticeable increase in = LULI (but note some years ago even an average of 20ms most likely was = rare in the real life) or a equally noticeable decrease in egress = bandwidth=85=20 I guess it is back to the drawing board to figure out how to = speed up the classification=85 and then revisit the PPPoE question = again=85 Regards Sebastian >=20 > Best Regards > Sebastian >=20 > P.S.: It turns out, at least on my link, that for shaping on = pppoe-ge00 the kernel does not account for any header automatically, so = I need to specify a per-packet-overhead (PPOH) of 40 bytes (an an ADSL2+ = link with ATM linklayer); when shaping on ge00 however (with the kernel = still terminating the PPPoE link to my ISP) I only need to specify an = PPOH of 26 as the kernel already adds the 14 bytes for the ethernet = header=85 >=20