[Cerowrt-devel] SQM and PPPoE, more questions than answers...
Alan Jenkins
alan.christopher.jenkins at gmail.com
Wed Mar 18 18:14:09 EDT 2015
Hi Seb
I tested shaping on eth1 vs pppoe-wan, as it applies to ADSL. (On
Barrier Breaker + sqm-scripts). Maybe this is going back a bit & no
longer interesting to read. But it seemed suspicious & interesting
enough that I wanted to test it.
My conclusion was 1) I should stick with pppoe-wan, 2) the question
really means do you want to disable classification 3) I personally want
to preserve the upload bandwidth and accept slightly higher latency.
On 15/10/14 01:03, Sebastian Moeller wrote:
> Hi All,
>
> some more testing: On Oct 12, 2014, at 01:12 , Sebastian Moeller
> <moeller0 at gmx.de> wrote:
>> 1) SQM on ge00 does not show a working egress classification in the
>> RRUL test (no visible “banding”/stratification of the 4 different
>> priority TCP flows), while SQM on pppoe-ge00 does show this
>> stratification.
> Usind tc filters u32 filter makes it possible to actually dive into
> PPPoE encapsulated ipv4 and ipv6 packets and perform classification
> on “pass-through” PPPoE packets (as encountered when starting SQM on
> ge00 instead of pppoe-ge00, if the latter actually handles the wan
> connection), so that one is solved (but see below).
>
>>
>> 2) SQM on ge00 shows better latency under load (LUL), the LUL
>> increases for ~2*fq_codels target so 10ms, while SQM on pppeo-ge00
>> shows a LUL-increase (LULI) roughly twice as large or around 20ms.
>>
>> I have no idea why that is, if anybody has an idea please chime
>> in.
I saw the same, though with higher difference for egress rate. See
first three files here:
https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0
[netperf-wrapper noob puzzle: most of the ping lines vanish part-way
through. Maybe I failed it somehow.]
> Once SQM on ge00 actually dives into the PPPoE packets and
> applies/tests u32 filters the LUL increases to be almost identical to
> pppoe-ge00’s if both ingress and egress classification are active and
> do work. So it looks like the u32 filters I naively set up are quite
> costly. Maybe there is a better way to set these up...
Later you mentioned testing for coupling with egress rate. But you
didn't test coupling with classification!
I switched from simple.qos to simplest.qos, and that achieved the lower
latency on pppoe-wan. So I think your naive u32 filter setup wasn't the
real problem.
I did think ECN wouldn't be applied on eth1, and that would be the cause
of the latency. But disabling ECN didn't affect it. See files 3 to 6:
https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0
I also admit surprise at fq_codel working within 20%/10ms on eth1. I
thought it'd really hurt, by breaking the FQ part. Now I guess it
doesn't. I still wonder about ECN marking, though I didn't check my
endpoint is using ECN.
>>
>> 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on
>> ge00 (with ingress more or less identical between the two). Also 2)
>> and 3) do not seem to be coupled, artificially reducing the egress
>> rate on pppoe-ge00 to yield the same egress rate as seen on ge00
>> does not reduce the LULI to the ge00 typical 10ms, but it stays at
>> 20ms.
>>
>> For this I also have no good hypothesis, any ideas?
>
> With classification fixed the difference in egress rate shrinks to
> ~10% instead of 20, so this partly seems related to the
> classification issue as well.
My tests look like simplest.qos gives a lower egress rate, but not as
low as eth1. (Like 20% vs 40%). So that's also similar.
>> So the current choice is either to accept a noticeable increase in
>> LULI (but note some years ago even an average of 20ms most likely
>> was rare in the real life) or a equally noticeable decrease in
>> egress bandwidth…
>
> I guess it is back to the drawing board to figure out how to speed up
> the classification… and then revisit the PPPoE question again…
so maybe the question is actually classification v.s. not?
+ IMO slow asymmetric links don't want to lose more upload bandwidth
than necessary. And I'm losing a *lot* in this test.
+ As you say, having only 20ms excess would still be a big
improvement. We could ignore the bait of 10ms right now.
vs
- lowest latency I've seen testing my link. almost suspicious. looks
close to 10ms average, when the dsl rate puts a lower bound of 7ms on
the average.
- fq_codel honestly works miracles already. classification is the knob
people had to use previously, who had enough time to twiddle it.
- on netperf-runner plots the "banding" doesn't look brilliant on slow
links anyway
> Regards Sebastian
>
>>
>> Best Regards Sebastian
>>
>> P.S.: It turns out, at least on my link, that for shaping on
>> pppoe-ge00 the kernel does not account for any header
>> automatically, so I need to specify a per-packet-overhead (PPOH) of
>> 40 bytes (an an ADSL2+ link with ATM linklayer); when shaping on
>> ge00 however (with the kernel still terminating the PPPoE link to
>> my ISP) I only need to specify an PPOH of 26 as the kernel already
>> adds the 14 bytes for the ethernet header…
More information about the Cerowrt-devel
mailing list