[Cerowrt-devel] SQM and PPPoE, more questions than answers...

Alan Jenkins alan.christopher.jenkins at gmail.com
Wed Mar 18 18:14:09 EDT 2015


Hi Seb

I tested shaping on eth1 vs pppoe-wan, as it applies to ADSL.  (On 
Barrier Breaker + sqm-scripts).  Maybe this is going back a bit & no 
longer interesting to read.  But it seemed suspicious & interesting 
enough that I wanted to test it.

My conclusion was 1) I should stick with pppoe-wan, 2) the question 
really means do you want to disable classification 3) I personally want 
to preserve the upload bandwidth and accept slightly higher latency.


On 15/10/14 01:03, Sebastian Moeller wrote:
> Hi All,
>
> some more testing: On Oct 12, 2014, at 01:12 , Sebastian Moeller
> <moeller0 at gmx.de> wrote:

>> 1) SQM on ge00 does not show a working egress classification in the
>> RRUL test (no visible “banding”/stratification of the 4 different
>> priority TCP flows), while SQM on pppoe-ge00 does show this
>> stratification.

> Usind tc filters u32 filter makes it possible to actually dive into
> PPPoE encapsulated ipv4 and ipv6 packets and perform classification
> on “pass-through” PPPoE packets (as encountered when starting SQM on
> ge00 instead of pppoe-ge00, if the latter actually handles the wan
> connection), so that one is solved (but see below).
>
>>
>> 2) SQM on ge00 shows better latency under load (LUL), the LUL
>> increases for ~2*fq_codels target so 10ms, while SQM on pppeo-ge00
>> shows a LUL-increase (LULI) roughly twice as large or around 20ms.
>>
>> I have no idea why that is, if anybody has an idea please chime
>> in.

I saw the same, though with higher difference for egress rate.  See 
first three files here:

https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0

[netperf-wrapper noob puzzle: most of the ping lines vanish part-way 
through.  Maybe I failed it somehow.]

> Once SQM on ge00 actually dives into the PPPoE packets and
> applies/tests u32 filters the LUL increases to be almost identical to
> pppoe-ge00’s if both ingress and egress classification are active and
> do work. So it looks like the u32 filters I naively set up are quite
> costly. Maybe there is a better way to set these up...

Later you mentioned testing for coupling with egress rate.  But you 
didn't test coupling with classification!

I switched from simple.qos to simplest.qos, and that achieved the lower 
latency on pppoe-wan.  So I think your naive u32 filter setup wasn't the 
real problem.

I did think ECN wouldn't be applied on eth1, and that would be the cause 
of the latency.  But disabling ECN didn't affect it.  See files 3 to 6:

https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0

I also admit surprise at fq_codel working within 20%/10ms on eth1.  I 
thought it'd really hurt, by breaking the FQ part.  Now I guess it 
doesn't.  I still wonder about ECN marking, though I didn't check my 
endpoint is using ECN.

>>
>> 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on
>> ge00 (with ingress more or less identical between the two). Also 2)
>> and 3) do not seem to be coupled, artificially reducing the egress
>> rate on pppoe-ge00 to yield the same egress rate as seen on ge00
>> does not reduce the LULI to the ge00 typical 10ms, but it stays at
>> 20ms.
>>
>> For this I also have no good hypothesis, any ideas?
>
> With classification fixed the difference in egress rate shrinks to
> ~10% instead of 20, so this partly seems related to the
> classification issue as well.

My tests look like simplest.qos gives a lower egress rate, but not as 
low as eth1.  (Like 20% vs 40%).  So that's also similar.

>> So the current choice is either to accept a noticeable increase in
>> LULI (but note some years ago even an average of 20ms most likely
>> was rare in the real life) or a equally noticeable decrease in
>> egress bandwidth…
>
> I guess it is back to the drawing board to figure out how to speed up
> the classification… and then revisit the PPPoE question again…

so maybe the question is actually classification v.s. not?

  + IMO slow asymmetric links don't want to lose more upload bandwidth 
than necessary.  And I'm losing a *lot* in this test.
  + As you say, having only 20ms excess would still be a big 
improvement.  We could ignore the bait of 10ms right now.

vs

  - lowest latency I've seen testing my link. almost suspicious. looks 
close to 10ms average, when the dsl rate puts a lower bound of 7ms on 
the average.
  - fq_codel honestly works miracles already. classification is the knob 
people had to use previously, who had enough time to twiddle it.
  - on netperf-runner plots the "banding" doesn't look brilliant on slow 
links anyway


> Regards Sebastian
>
>>
>> Best Regards Sebastian
>>
>> P.S.: It turns out, at least on my link, that for shaping on
>> pppoe-ge00 the kernel does not account for any header
>> automatically, so I need to specify a per-packet-overhead (PPOH) of
>> 40 bytes (an an ADSL2+ link with ATM linklayer); when shaping on
>> ge00 however (with the kernel still terminating the PPPoE link to
>> my ISP) I only need to specify an PPOH of 26 as the kernel already
>> adds the 14 bytes for the ethernet header…



More information about the Cerowrt-devel mailing list