[Cerowrt-devel] SQM and PPPoE, more questions than answers...

David Lang david at lang.hm
Wed Mar 18 22:43:08 EDT 2015


On Wed, 18 Mar 2015, Alan Jenkins wrote:

>> Once SQM on ge00 actually dives into the PPPoE packets and
>> applies/tests u32 filters the LUL increases to be almost identical to
>> pppoe-ge00’s if both ingress and egress classification are active and
>> do work. So it looks like the u32 filters I naively set up are quite
>> costly. Maybe there is a better way to set these up...
>
> Later you mentioned testing for coupling with egress rate.  But you didn't 
> test coupling with classification!
>
> I switched from simple.qos to simplest.qos, and that achieved the lower 
> latency on pppoe-wan.  So I think your naive u32 filter setup wasn't the real 
> problem.
>
> I did think ECN wouldn't be applied on eth1, and that would be the cause of 
> the latency.  But disabling ECN didn't affect it.  See files 3 to 6:
>
> https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0
>
> I also admit surprise at fq_codel working within 20%/10ms on eth1.  I thought 
> it'd really hurt, by breaking the FQ part.  Now I guess it doesn't.  I still 
> wonder about ECN marking, though I didn't check my endpoint is using ECN.

ECN should never increase latency, if it has any effect it should improve 
latency because you slow down sending packets when some hop along the path is 
overloaded rather than sending the packets anyway and having them sit in a 
buffer for a while. This doesn't decrease actual throughput either (although if 
you are doing a test that doesn't actually wait for all the packets to arrive at 
the far end, it will look like it decreases throughput)

>>> 
>>> 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on
>>> ge00 (with ingress more or less identical between the two). Also 2)
>>> and 3) do not seem to be coupled, artificially reducing the egress
>>> rate on pppoe-ge00 to yield the same egress rate as seen on ge00
>>> does not reduce the LULI to the ge00 typical 10ms, but it stays at
>>> 20ms.
>>> 
>>> For this I also have no good hypothesis, any ideas?
>> 
>> With classification fixed the difference in egress rate shrinks to
>> ~10% instead of 20, so this partly seems related to the
>> classification issue as well.
>
> My tests look like simplest.qos gives a lower egress rate, but not as low as 
> eth1.  (Like 20% vs 40%).  So that's also similar.
>
>>> So the current choice is either to accept a noticeable increase in
>>> LULI (but note some years ago even an average of 20ms most likely
>>> was rare in the real life) or a equally noticeable decrease in
>>> egress bandwidth…
>> 
>> I guess it is back to the drawing board to figure out how to speed up
>> the classification… and then revisit the PPPoE question again…
>
> so maybe the question is actually classification v.s. not?
>
> + IMO slow asymmetric links don't want to lose more upload bandwidth than 
> necessary.  And I'm losing a *lot* in this test.
> + As you say, having only 20ms excess would still be a big improvement.  We 
> could ignore the bait of 10ms right now.
>
> vs
>
> - lowest latency I've seen testing my link. almost suspicious. looks close 
> to 10ms average, when the dsl rate puts a lower bound of 7ms on the average.
> - fq_codel honestly works miracles already. classification is the knob 
> people had to use previously, who had enough time to twiddle it.

That's what most people find when they try it. Classification doesn't result in 
throughput vs latency tradeoffs as much as it gives absolute priority to some 
types of traffic. But unless you are really up against your bandwidth limit, 
this seldom matters in the real world. As long as latency is kept low, 
everything works so you don't need to give VoIP priority over other traffic or 
things like that.

David Lang


More information about the Cerowrt-devel mailing list