[Cerowrt-devel] SQM and PPPoE, more questions than answers...

Thu Mar 19 05:58:28 EDT 2015

HI Alan,

On Mar 19, 2015, at 10:42 , Alan Jenkins <alan.christopher.jenkins at gmail.com> wrote:

> On 19/03/15 08:29, Sebastian Moeller wrote:
>> Hi Alan,
>> 
>> 
>> On Mar 18, 2015, at 23:14 , Alan Jenkins <alan.christopher.jenkins at gmail.com> wrote:
>> 
>>> Hi Seb
>>> 
>>> I tested shaping on eth1 vs pppoe-wan, as it applies to ADSL.  (On Barrier Breaker + sqm-scripts).  Maybe this is going back a bit & no longer interesting to read.  But it seemed suspicious & interesting enough that I wanted to test it.
>>> 
>>> My conclusion was 1) I should stick with pppoe-wan,
>> 	Not a bad decision, especially given the recent changes to SQM to make it survive transient pppoe-interface disappearances. Before those changes the beauty of shaping on the ethernet device was that pppoe could come and go, but SQM stayed active and working. But due to your help this problem seems fixed now.
> I'd say your help and my selfish prodding :).
> 
>>> 2) the question really means do you want to disable classification
>>> 3) I personally want to preserve the upload bandwidth and accept slightly higher latency.
>> 	My question still is, is the bandwidth sacrifice really necessary or is this test just showing a corner case in simple.qos that can be fixed. I currently lack enough time to tackle this effectively.
> Yep ok (no complaint).
> 
>>> [netperf-wrapper noob puzzle: most of the ping lines vanish part-way through.  Maybe I failed it somehow.]
>> 	This is not your fault, the UDP probes net-perf wrapper uses do not accept packet loss, once a packet (I believe) is lost the stream stops. This is not ideal, but it gives a good quick indicator of packet loss for sparse streams ;)
> Heh, thanks.
> 
>>> My tests look like simplest.qos gives a lower egress rate, but not as low as eth1.  (Like 20% vs 40%).  So that's also similar.
>>> 
>>>>> So the current choice is either to accept a noticeable increase in
>>>>> LULI (but note some years ago even an average of 20ms most likely
>>>>> was rare in the real life) or a equally noticeable decrease in
>>>>> egress bandwidth…
>>>> I guess it is back to the drawing board to figure out how to speed up
>>>> the classification… and then revisit the PPPoE question again…
>>> so maybe the question is actually classification v.s. not?
>>> 
>>> + IMO slow asymmetric links don't want to lose more upload bandwidth than necessary.  And I'm losing a *lot* in this test.
>>> + As you say, having only 20ms excess would still be a big improvement.  We could ignore the bait of 10ms right now.
>>> 
>>> vs
>>> 
>>> - lowest latency I've seen testing my link. almost suspicious. looks close to 10ms average, when the dsl rate puts a lower bound of 7ms on the average.
>> 	Curious: what is your link speed?
> 
> dsl sync 912k up
> shaped at 850
> fq_codel auto target says => 14.5ms <=
> 
> MTU time is
> 912kbps / (1500*8)b = 0.0132s
> so if the link is filled with MTU packets, there's a hard 7ms lower bound, on average icmp ping increase v.s. an empty link
> and the same logic says on achieving that average, you have >= 7ms jitter

	Ah I see, 50% chance of getting the link immediately versus having to wait for a full packet transmit time.

> 
> 
> (or 6.5ms, but since my download rate is about 10x better, 6.5 + 0.65 ~= 7).
> 
>>> - fq_codel honestly works miracles already. classification is the knob people had to use previously, who had enough time to twiddle it.
>>> - on netperf-runner plots the "banding" doesn't look brilliant on slow links anyway
>> 	On slow links I always used to add “-s 0.8” with higher numbers the slower the link to increase the temporal averaging window, this reduces accuracy of the display for the downlink, but at least allows better understanding of the uplink. I always wanted to see whether I could treach netperf-wrapper to allow larger averaging windows after measurements, just for display purposes, but I am a total beginner with python...
>> 
>>>>> P.S.: It turns out, at least on my link, that for shaping on
>>>>> pppoe-ge00 the kernel does not account for any header
>>>>> automatically, so I need to specify a per-packet-overhead (PPOH) of
>>>>> 40 bytes (an an ADSL2+ link with ATM linklayer); when shaping on
>>>>> ge00 however (with the kernel still terminating the PPPoE link to
>>>>> my ISP) I only need to specify an PPOH of 26 as the kernel already
>>>>> adds the 14 bytes for the ethernet header…
>> 	Please disregard this part, I need to implement better tests for this instead on only relaying on netperf-wrapper results ;)
> </troll-for-information>.  Apart from kernel code, I did wonder how this was tested :).

	Oh, quite roughly… at that time I was only limited by my DSLAM (now I have a lower throttle in the BRAS that is somewhat hard to measure), I realized I could get decent RRUL results with egress shaping at 100% if the encapsulation and per packet overhead was set correctly. Increasing the per packet overhead above theoretical value did not affect latency and bandwidth (it should have affected bandwidth but the change was too small to measure). Decreasing the per packet overhead below the correct value noticeably increased the LULI during RRUL runs. The issue is I did not collect enough runs to be certain about the LULI I measured, even though my current hypothesis is that the kernel does not account for the ethernet header on an pppoe interface… Also This can partly be tested on router itself with a bit of tc magic that someone used to show me that the kernel does account for the 14 bytes for ethernet interfaces; I just need to find my notes from that experiment again (I fear it was lost by my btrfs raid5 disintegrating… they call btrfs raid5 experimental for a reason ;) )

Best Regards
	Sebastian

> 
> Thanks again
> Alan