[Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.

Sat Dec 28 14:54:23 EST 2013

Hi Fred,

On Dec 28, 2013, at 15:27 , Fred Stratton <fredstratton at imap.cc> wrote:

> 
> On 28/12/13 13:42, Sebastian Moeller wrote:
>> Hi Fred,
>> 
>> 
>> On Dec 28, 2013, at 12:09 , Fred Stratton 
>> <fredstratton at imap.cc>
>>  wrote:
>> 
>> 
>>> IThe UK consensus fudge factor has always been 85 per cent of the rate achieved, not 95 or 99 per cent.
>>> 
>> 	I know that the recommendations have been lower in the past; I think this is partly because before Jesper Brouer's and Russels Stuart's work to properly account for ATM "quantization" people typically had to deal with a ~10% rate tax for the 5byte per cell overhead (48 byte payload in 53 byte cells 90.57% useable rate) plus an additional 5% to stochastically account for the padding of the last cell and the per packet overhead both of which affect the effective good put way more for small than large packets, so the 85% never worked well for all packet sizes. My hypothesis now is since we can and do properly account for these effects of ATM framing we can afford to start with a fudge factor of 90% or even 95% percent. As far as I know the recommended fudge factors are never ever explained by more than "this works empirically"...
> 
> The fudge factors are totally empirical. IF you are proposing a more formal approach, I shall try a 90 per cent fudge factor, although 'current rate' varies here.

	My hypothesis is that we can get away with less fudge as we have a better handle on the actual wire size. Personally, I do start at 95% to figure out the trade-off between bandwidth loss and latency increase.

>> 
>>> Devices express 2 values: the sync rate - or 'maximum rate attainable' - and the dynamic value of 'current rate'.
>>> 
>> 	The actual data rate is the relevant information for shaping, often DSL modems report the link capacity as "maximum rate attainable" or some such, while the actual bandwidth is limited to a rate below what the line would support by contract (often this bandwidth reduction is performed on the PPPoE link to the BRAS).
>> 
>> 
>>> As the sync rate is fairly stable for any given installation - ADSL or Fibre  - this could be used as a starting value. decremented by the traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor applied to that.
>>> 
>> 	I would like to propose to use the "current rate" as starting point, as 'maximum rate attainable' >= 'current rate'.
> 
> 'current rate' is still a sync rate, and so is conventionally viewed as 15 per cent above the unmeasurable actual rate.

	No no, the current rate really is the current link capacity between modem and DSLAM (or CPE and CTS), only this rate typically is for the raw ATM stream, so we have to subtract all the additional layers until we reach the IP layer...

> As you are proposing a new approach, I shall take 90 per cent of 'current rate' as a starting point.

	I would love to learn how that works put for you. Because for all my theories about why 85% was used, the proof still is in the (plum-) pudding...

> 
> No one in the UK uses SRA currently. One small ISP used to.

	That is sad, because on paper SRA looks like a good feature to have (lower bandwidth sure beats synchronization loss).

> The ISP I currently use has Dynamic Line Management, which changes target SNR constantly.

	Now that is much better, as we should neuter notice nor care; I assume that this happens on layers below ATM even.

> The DSLAM is made by Infineon.  
> 
> 
>> 
>>> Fibre - FTTC - connections can suffer quite large download speed fluctuations over the 200 - 500 metre link to the MSAN.  This phenomenon is not confined to ADSL links.
>>> 
>> 	On the actual xDSL link? As far as I know no telco actually uses SRA (seamless rate adaptation or so) so the current link speed will only get lower not higher, so I would expect a relative stable current rate (it might take a while, a few days to actually slowly degrade to the highest link speed supported under all conditions, but I hope you still get my point)
> I understand the point, but do not think it is the case, from data I have seen, but cannot find now, unfortunately.

	I see, maybe my assumption here is wrong, I would love to see data though before changing my hypothesis.

>> 
>>> 
>>> An alternative speed test is something like this
>>> 
>>> 
>>> http://download.bethere.co.uk/downloadMeter.html
>>> 
>>> 
>>> which, as Be has been bought by Sky, may not exist after the end of April 2014.
>>> 
>> 	But, if we recommend to run speed tests we really need to advise our users to start several concurrent up- and downloads to independent servers to actually measure the bandwidth of our bottleneck link; often a single server connection will not saturate a link (I seem to recall that with TCP it is guaranteed to only reach 75% or so averaged over time, is that correct?).
>> 	But I think this is not the proper way to set the bandwidth for the shaper, because upstream of our link to the ISP we have no guaranteed bandwidth at all and just can hope the ISP is oing the right thing AQM-wise.
>> 
> 
> I quote the Be site as an alternative to a java based approach. I would be very happy to see your suggestion adopted.
>> 
>> 
>> 
>>> 	• [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?]
>>> 
>>> For a PPPoA service, the PPPoA link is treated as PPPoE on the second device, here running ceroWRT.
>>> 
>> 	This still means you should specify the PPPoA overhead, not PPPoE.
> 
> I shall try the PPPoA overhead.

	Great, let me know how that works.

>> 
>>> The packet overhead values are written in the dubious man page for tc_stab.
>>> 
>> 	The only real flaw in that man page, as far as I know, is the fact that it indicates that the kernel will account for the 18byte ethernet header automatically, while the kernel does no such thing (which I hope to change).
> It mentions link layer types as 'atm' ethernet' and 'adsl'. There is no reference anywhere to the last. I do not see its relevance.

	If you have a look inside the source code for tc and the kernel, you will notice that atm and adel are aliases for the same thing. I just think that we should keep naming the thing ATM since that is the problematic layer in the stack that causes most of the useable link rate judgements, adel just happens to use ATM exclusively.

>> 
>>> Sebastian has a potential alternative method of formal calculation.
>>> 
>> 	So, I have no formal calculation method available, but an empirical way of detecting ATM quantization as well as measuring the per packet overhead of an ATM link. 
>> 	The idea is to measure the RTT of ICMP packets of increasing length and then displaying the distribution of RTTs by ICMP packet length, on an ATM carrier we expect to see a step function with steps 48 bytes apart. For non-ATM carrier we expect to rather see a smooth ramp. By comparing the residuals of a linear fit of the data with the residuals of the best step function fit to the data. The fit with the lower residuals "wins". Attached you will find an example of this approach, ping data in red (median of NNN repetitions for each ICMP packet size), linear fit in blue, and best staircase fit in green. You notice that data starts somewhere in a 48 byte ATM cell. Since the ATM encapsulation overhead is maximally 44 bytes and we know the IP and ICMP overhead of the ping probe we can calculate the overhead preceding the IP header, which is what needs to be put in the overhead field in the GUI. (Note where the green line intersect the y-axis at 0 bytes packet size? this is where the IP hea
>> der starts, the "missing" part of this ATM cell is the overhead).
>> 
> 
> You are curve fitting. This is calculation.

	I see, that is certainly a valid way to look at it, just one that had not occurred to me.

>> 
>> 
>> 
>> 
>> 
>> 
>> 	Believe it or not, this methods works reasonable well (I tested successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection (overhead 32 bytes), and several PPPOE, LLC, (overhead 40) connections (from ADSL1 @ 3008/512 to ADSL2+ @ 16402/2558)). But it takes relative long time to measure the ping train especially at the higher rates… and it requires ping time stamps with decent resolution (which rules out windows) and my naive data acquisition scripts creates really large raw data files. I guess I should post the code somewhere so others can test and improve it.
>> 	Fred I would be delighted to get a data set from your connection, to test a known different encapsulation. 
>> 
> 
> I shall try this. If successful, I shall initially pass you the raw data.

	Great, but be warned this will be hundreds of megabytes. (For production use the measurement script would need to prune the generated log file down to the essential values… and potentially store the data in binary)

> I have not used MatLab since the 1980s.

	Lucky you, I sort of have to use matlab in my day job and hence are most "fluent" in matlabese, but the code should also work with octave (I tested version 3.6.4) so it should be relatively easy to run the analysis yourself. That said, I would love to get a copy of the ping sweep :)

>> 
>>> TYPICAL OVERHEADS
>>>        The following values are typical for different adsl scenarios (based on
>>>        [1] and [2]):
>>> 
>>>        LLC based:
>>>            PPPoA - 14 (PPP - 2, ATM - 12)
>>>            PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>>            Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>>            IPoA - 16 (ATM - 16)
>>> 
>>>        VC Mux based:
>>>            PPPoA - 10 (PPP - 2, ATM - 8)
>>>            PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>>            Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>>            IPoA - 8 (ATM - 8)
>>> 
>>> 
>>> For VC Mux based PPPoA, I am currently using an overhead of 18 for the PPPoE setting in ceroWRT.
>>> 
>> 	Yeah we could put this list into the wiki, but how shall a typical user figure out which encapsulation is used? And good luck in figuring out whether the frame check sequence (FCS) is included or not…
>> BTW 18, I predict that if PPPoE is only used between cerowrt and the "modem' or gateway your effective overhead should be 10 bytes; I would love if you could run the following against your link at night (also attached 
>> 
>> 
>> 
>> ):
>> 
>> #! /bin/bash
>> # TODO use seq or bash to generate a list of the requested sizes (to allow for non-equidistantly spaced sizes)
>> 
>> #.
>> TECH=ADSL2	# just to give some meaning to the ping trace file name
>> # finding a proper target IP is somewhat of an art, just traceroute a remote site.
>> # and find the nearest host reliably responding to pings showing the smallet variation of pingtimes
>> TARGET=${1} # the IP against which to run the ICMP pings
>> DATESTR=`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential records
>> LOG=ping_sweep_${TECH}_${DATESTR}.txt
>> 
>> 
>> # by default non-root ping will only end one packet per second, so work around that by calling ping independently for each package
>> # empirically figure out the shortest period still giving the standard ping time (to avoid being slow-pathed by our target)
>> PINGPERIOD=0.01><------># in seconds
>> PINGSPERSIZE=10000
>> 
>> # Start, needed to find the per packet overhead dependent on the ATM encapsulation
>> # to reiably show ATM quantization one would like to see at least two steps, so cover a range > 2 ATM cells (so > 96 bytes)
>> SWEEPMINSIZE=16><------># 64bit systems seem to require 16 bytes of payload to include a timestamp...
>> SWEEPMAXSIZE=116
>> 
>> n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}`
>> 
>> i_sweep=0
>> i_size=0
>> 
>> echo "Running ICMP RTT measurement against: ${TARGET}"
>> while [ ${i_sweep} -lt ${PINGSPERSIZE} ]
>> do
>>     (( i_sweep++ ))
>>     echo "Current iteration: ${i_sweep}"
>>     # now loop from sweepmin to sweepmax
>>     i_size=${SWEEPMINSIZE}
>>     while [ ${i_size} -le ${SWEEPMAXSIZE} ]
>>     do
>> 	echo "${i_sweep}. repetition of ping size ${i_size}"
>> 	ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\
>> 	(( i_size++ ))
>> 	# we need a sleep binary that allows non integer times (GNU sleep is fine as is sleep of macosx 10.8.4)
>> 	sleep ${PINGPERIOD}
>>     done
>> done
>> echo "Done... ($0)"
>> 
>> 
>> This will try to run 10000 repetitions for ICMP packet sizes from 16 to 116 bytes running (10000 * 101 * 0.01 / 60 =) 168 minutes, but you should be able to stop it with ctrl c if you are not patience enough, with your link I would estimate that 3000 should be plenty, but if you could run it over night that would be great and then ~3 hours should not matter much.
>> 	And then run the following attached code in octave or matlab 
>> 
>> 
>> 
>> . Invoce with "tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_said_file')". The parser will run on the first invocation and is reallr really slow, but further invocations should be faster. If issues arise, let me know, I am happy to help.
>> 
>> 
>>> Were I to use a single directly connected gateway, I would input a suitable value for PPPoA in that openWRT firmware.
>>> 
>> 	I think you should do that right now.
> 
> The firmware has not yet been released.
>> 
>>> In theory, I might need to use a negative value, bmt the current kernel does not support that.
>>> 
>> 	If you use tc_stab, negative overheads are fully supported, only htb_private has overhead defined as unsigned integer and hence does not allow negative values.
> 
> Jesper Brouer posted about this. I thought he was referring to tc_stab.

	I recall having a discussion with Jesper about this topic, where he agreed that tc_stab was not affected, only htb_private.

Best Regards
	Sebastian

>> 
>>> I have used many different arbitrary values for overhead. All appear to have little effect.
>>> 
>> 	So the issue here is that only at small packet sizes does the overhead and last cell padding eat a disproportionate amount of your bandwidth (64 byte packet plus 44 byte overhead plus 47 byte worst case cell padding: 100* (44+47+64)/64 = 242% effective packet size to what the shaper estimated ), at typical packet sizes the max error (44 bytes missing overhead and potentially misjudged cell padding of 47 bytes adds up to a theoretical 100*(44+47+1500)/1500 = 106%  effective packet size to what the shaper estimated). It is obvious that at 1500 byte packets the whole ATM issue can be easily dismissed with just reducing the link rate by ~10% for the 48 in 53 framing and an additional ~6% for overhead and cell padding. But once you mix smaller packets in your traffic for say VoIP, the effective wire size misjudgment will kill your ability to control the queueing. Note that the common wisdom of shape down to 85% might be fem the ~15% ATM "tax" on 1500 byte traffic size...
>> 
>> 
>>> As I understand it, the current recommendation is to use tc_stab in preference to htb_private. I do not know the basis for this value judgement.
>>> 
>> 	In short: tc_stab allows negative overheads, tc_stab works with HTB, TBF, HFSC while htb_private only works with HTB. Currently htb_private has two advantages: it will estimate the per packet overhead correctly of GSO (generic segmentation offload) is enabled and it will produce exact ATM link layer estimates for all possible packet sizes. In practice almost everyone uses an MTU of 1500 or less for their internet access making both htb_private advantages effectively moot. (Plus if no one beats me to it I intend to address both theoretical short coming of tc_stab next year).
>> 
>> Best Regards
>> 	Sebastian
>> 
>> 
>>> 
>>> 
>>> 
>>> 
>>> On 28/12/13 10:01, Sebastian Moeller wrote:
>>> 
>>>> Hi Rich,
>>>> 
>>>> great! A few comments:
>>>> 
>>>> Basic Settings:
>>>> [Is 95% the right fudge factor?] I think that ideally, if we get can precisely measure the useable link rate even 99% of that should work out well, to keep the queue in our device. I assume that due to the difficulties in measuring and accounting for the link properties as link layer and overhead people typically rely on setting the shaped rate a bit lower than required to stochastically/empirically account for the link properties. I predict that if we get a correct description of the link properties to the shaper we should be fine with 95% shaping. Note though, it is not trivial on an adel link to get the actually useable bit rate from the modem so 95% of what can be deduced from the modem or the ISP's invoice might be a decent proxy…
>>>> 
>>>> [Do we have a recommendation for an easy way to tell if it's working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked page looks like a decent probe for buffer bloat.
>>>> 
>>>> 
>>>> 
>>>>> Basic Settings - the details...
>>>>> 
>>>>> CeroWrt is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet.
>>>>> 
>>>>> 
>>>> 	I think we can only actually control the first link to the ISP, which often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) the cumulative sold bandwidth to the customers is larger than the back bone connection (which is called over-subscription and is almost guaranteed to be the case in every DSLAM) which typically is not a problem, as typically people do not use their internet that much. My point being we can not really control congestion in the DSLAM's uplink (as we have no idea what the reserved rate per customer is in the worst case, if there is any).
>>>> 
>>>> 
>>>> 
>>>>> CeroWrt can automatically adapt to network conditions to improve the delay/latency of data without any settings.
>>>>> 
>>>>> 
>>>> 	Does this describe the default fq_codels on each interface (except fib?)?
>>>> 
>>>> 
>>>> 
>>>>> However, it can do a better job if it knows more about the actual link speeds available. You can adjust this setting by entering link speeds that are a few percent below the actual speeds. 
>>>>> 
>>>>> Note: it can be difficult to get an accurate measurement of the link speeds. The speed advertised by your provider is a starting point, but your experience often won't meet their published specs. You can also use a speed test program or web site like 
>>>>> 
>>>>> http://speedtest.net
>>>>> 
>>>>>  to estimate actual operating speeds.
>>>>> 
>>>>> 
>>>> 	While this approach is commonly recommended on the internet, I do not believe that it is that useful. Between a user and the speediest site there are a number of potential congestion points that can affect (reduce) the throughput, like bad peering. Now that said the sppedtets will report something <= the actual link speed and hence be conservative (interactivity stays great at 90% of link rate as well as 80% so underestimating the bandwidth within reason does not affect the latency gains from traffic shaping it just sacrifices a bit more bandwidth; and given the difficulty to actually measure the actually attainable bandwidth might have been effectively a decent recommendation even though the theory of it seems flawed)
>>>> 
>>>> 
>>>> 
>>>>> Be sure to make your measurement when network is quiet, and others in your home aren’t generating traffic.
>>>>> 
>>>>> 
>>>> 	This is great advise.
>>>> 
>>>> I would love to comment further, but after reloading 
>>>> 
>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>> 
>>>>  just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>>>> 
>>>> Best
>>>> 	Sebastian
>>>> 
>>>> 
>>>> On Dec 27, 2013, at 23:09 , Rich Brown 
>>>> 
>>>> <richb.hanover at gmail.com>
>>>> 
>>>>  wrote:
>>>> 
>>>> 
>>>> 
>>>>>> You are a very good writer and I am on a tablet.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> 
>>>>>> Ill take a pass at the wiki tomorrow.
>>>>>> 
>>>>>> The shaper does up and down was my first thought...
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
>>>>> 
>>>>> Rich
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover at gmail.com>
>>>>>> 
>>>>>>  wrote:
>>>>>> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> There are still lots of open questions. Comments, please.
>>>>>> 
>>>>>> Rich
>>>>>> _______________________________________________
>>>>>> Cerowrt-devel mailing list
>>>>>> 
>>>>>> 
>>>>>> Cerowrt-devel at lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>> _______________________________________________
>>>>> Cerowrt-devel mailing list
>>>>> 
>>>>> 
>>>>> Cerowrt-devel at lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>> _______________________________________________
>>>> Cerowrt-devel mailing list
>>>> 
>>>> 
>>>> Cerowrt-devel at lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>