* [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
@ 2013-12-27 18:48 Rich Brown
2013-12-27 19:53 ` Dave Taht
0 siblings, 1 reply; 14+ messages in thread
From: Rich Brown @ 2013-12-27 18:48 UTC (permalink / raw)
To: cerowrt-devel
I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
There are still lots of open questions. Comments, please.
Rich
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-27 18:48 [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed Rich Brown
@ 2013-12-27 19:53 ` Dave Taht
2013-12-27 22:09 ` Rich Brown
0 siblings, 1 reply; 14+ messages in thread
From: Dave Taht @ 2013-12-27 19:53 UTC (permalink / raw)
To: Richard E. Brown; +Cc: cerowrt-devel
[-- Attachment #1: Type: text/plain, Size: 621 bytes --]
You are a very good writer and I am on a tablet.
Ill take a pass at the wiki tomorrow.
The shaper does up and down was my first thought...
On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com> wrote:
> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>
>
> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>
> There are still lots of open questions. Comments, please.
>
> Rich
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
[-- Attachment #2: Type: text/html, Size: 1189 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-27 19:53 ` Dave Taht
@ 2013-12-27 22:09 ` Rich Brown
2013-12-28 10:01 ` Sebastian Moeller
0 siblings, 1 reply; 14+ messages in thread
From: Rich Brown @ 2013-12-27 22:09 UTC (permalink / raw)
To: Dave Taht; +Cc: cerowrt-devel
[-- Attachment #1: Type: text/plain, Size: 759 bytes --]
> You are a very good writer and I am on a tablet.
>
Thanks!
> Ill take a pass at the wiki tomorrow.
>
> The shaper does up and down was my first thought...
>
Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
Rich
> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com> wrote:
> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>
> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>
> There are still lots of open questions. Comments, please.
>
> Rich
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
[-- Attachment #2: Type: text/html, Size: 1649 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-27 22:09 ` Rich Brown
@ 2013-12-28 10:01 ` Sebastian Moeller
2013-12-28 11:09 ` Fred Stratton
2013-12-28 14:27 ` Rich Brown
0 siblings, 2 replies; 14+ messages in thread
From: Sebastian Moeller @ 2013-12-28 10:01 UTC (permalink / raw)
To: Rich Brown; +Cc: cerowrt-devel
Hi Rich,
great! A few comments:
Basic Settings:
[Is 95% the right fudge factor?] I think that ideally, if we get can precisely measure the useable link rate even 99% of that should work out well, to keep the queue in our device. I assume that due to the difficulties in measuring and accounting for the link properties as link layer and overhead people typically rely on setting the shaped rate a bit lower than required to stochastically/empirically account for the link properties. I predict that if we get a correct description of the link properties to the shaper we should be fine with 95% shaping. Note though, it is not trivial on an adel link to get the actually useable bit rate from the modem so 95% of what can be deduced from the modem or the ISP's invoice might be a decent proxy…
[Do we have a recommendation for an easy way to tell if it's working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked page looks like a decent probe for buffer bloat.
> Basic Settings - the details...
>
> CeroWrt is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet.
I think we can only actually control the first link to the ISP, which often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) the cumulative sold bandwidth to the customers is larger than the back bone connection (which is called over-subscription and is almost guaranteed to be the case in every DSLAM) which typically is not a problem, as typically people do not use their internet that much. My point being we can not really control congestion in the DSLAM's uplink (as we have no idea what the reserved rate per customer is in the worst case, if there is any).
> CeroWrt can automatically adapt to network conditions to improve the delay/latency of data without any settings.
Does this describe the default fq_codels on each interface (except fib?)?
> However, it can do a better job if it knows more about the actual link speeds available. You can adjust this setting by entering link speeds that are a few percent below the actual speeds.
>
> Note: it can be difficult to get an accurate measurement of the link speeds. The speed advertised by your provider is a starting point, but your experience often won't meet their published specs. You can also use a speed test program or web site like http://speedtest.net to estimate actual operating speeds.
While this approach is commonly recommended on the internet, I do not believe that it is that useful. Between a user and the speediest site there are a number of potential congestion points that can affect (reduce) the throughput, like bad peering. Now that said the sppedtets will report something <= the actual link speed and hence be conservative (interactivity stays great at 90% of link rate as well as 80% so underestimating the bandwidth within reason does not affect the latency gains from traffic shaping it just sacrifices a bit more bandwidth; and given the difficulty to actually measure the actually attainable bandwidth might have been effectively a decent recommendation even though the theory of it seems flawed)
> Be sure to make your measurement when network is quiet, and others in your home aren’t generating traffic.
This is great advise.
I would love to comment further, but after reloading http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
Best
Sebastian
On Dec 27, 2013, at 23:09 , Rich Brown <richb.hanover@gmail.com> wrote:
>> You are a very good writer and I am on a tablet.
>>
> Thanks!
>> Ill take a pass at the wiki tomorrow.
>>
>> The shaper does up and down was my first thought...
>>
> Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
>
> Rich
>
>> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com> wrote:
>> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>>
>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>
>> There are still lots of open questions. Comments, please.
>>
>> Rich
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 10:01 ` Sebastian Moeller
@ 2013-12-28 11:09 ` Fred Stratton
2013-12-28 13:42 ` Sebastian Moeller
2013-12-28 14:27 ` Rich Brown
1 sibling, 1 reply; 14+ messages in thread
From: Fred Stratton @ 2013-12-28 11:09 UTC (permalink / raw)
To: Sebastian Moeller, Richard E. Brown, cerowrt-devel
[-- Attachment #1: Type: text/plain, Size: 7393 bytes --]
IThe UK consensus fudge factor has always been 85 per cent of the rate
achieved, not 95 or 99 per cent.
Devices express 2 values: the sync rate - or 'maximum rate attainable' -
and the dynamic value of 'current rate'.
As the sync rate is fairly stable for any given installation - ADSL or
Fibre - this could be used as a starting value. decremented by the
traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor
applied to that.
Fibre - FTTC - connections can suffer quite large download speed
fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon
is not confined to ADSL links.
An alternative speed test is something like this
http://download.bethere.co.uk/downloadMeter.html
which, as Be has been bought by Sky, may not exist after the end of
April 2014.
* /[What is the proper description here?]/If you use PPPoE (but not
over ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you
should choose/[what?]/and set the Per-packet Overhead to/[what?]/
//For a PPPoA service, the PPPoA link is treated as PPPoE on the second
device, here running ceroWRT.
The packet overhead values are written in the dubious man page for
tc_stab. Sebastian has a potential alternative method of formal calculation.
TYPICAL OVERHEADS
The following values are typical for different adsl scenarios
(based on
[1] and [2]):
LLC based:
PPPoA - 14 (PPP - 2, ATM - 12)
PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS
- 4+padding)
Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding)
IPoA - 16 (ATM - 16)
VC Mux based:
PPPoA - 10 (PPP - 2, ATM - 8)
PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS
- 4+padding)
Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding)
IPoA - 8 (ATM - 8)
For VC Mux based PPPoA, I am currently using an overhead of 18 for the
PPPoE setting in ceroWRT.
Were I to use a single directly connected gateway, I would input a
suitable value for PPPoA in that openWRT firmware. In theory, I might
need to use a negative value, bmt the current kernel does not support that.
I have used many different arbitrary values for overhead. All appear to
have little effect.
As I understand it, the current recommendation is to use tc_stab in
preference to htb_private. I do not know the basis for this value judgement.
On 28/12/13 10:01, Sebastian Moeller wrote:
> Hi Rich,
>
> great! A few comments:
>
> Basic Settings:
> [Is 95% the right fudge factor?] I think that ideally, if we get can precisely measure the useable link rate even 99% of that should work out well, to keep the queue in our device. I assume that due to the difficulties in measuring and accounting for the link properties as link layer and overhead people typically rely on setting the shaped rate a bit lower than required to stochastically/empirically account for the link properties. I predict that if we get a correct description of the link properties to the shaper we should be fine with 95% shaping. Note though, it is not trivial on an adel link to get the actually useable bit rate from the modem so 95% of what can be deduced from the modem or the ISP's invoice might be a decent proxy…
>
> [Do we have a recommendation for an easy way to tell if it's working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked page looks like a decent probe for buffer bloat.
>
>> Basic Settings - the details...
>>
>> CeroWrt is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet.
> I think we can only actually control the first link to the ISP, which often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) the cumulative sold bandwidth to the customers is larger than the back bone connection (which is called over-subscription and is almost guaranteed to be the case in every DSLAM) which typically is not a problem, as typically people do not use their internet that much. My point being we can not really control congestion in the DSLAM's uplink (as we have no idea what the reserved rate per customer is in the worst case, if there is any).
>
>> CeroWrt can automatically adapt to network conditions to improve the delay/latency of data without any settings.
> Does this describe the default fq_codels on each interface (except fib?)?
>
>> However, it can do a better job if it knows more about the actual link speeds available. You can adjust this setting by entering link speeds that are a few percent below the actual speeds.
>>
>> Note: it can be difficult to get an accurate measurement of the link speeds. The speed advertised by your provider is a starting point, but your experience often won't meet their published specs. You can also use a speed test program or web site like http://speedtest.net to estimate actual operating speeds.
> While this approach is commonly recommended on the internet, I do not believe that it is that useful. Between a user and the speediest site there are a number of potential congestion points that can affect (reduce) the throughput, like bad peering. Now that said the sppedtets will report something <= the actual link speed and hence be conservative (interactivity stays great at 90% of link rate as well as 80% so underestimating the bandwidth within reason does not affect the latency gains from traffic shaping it just sacrifices a bit more bandwidth; and given the difficulty to actually measure the actually attainable bandwidth might have been effectively a decent recommendation even though the theory of it seems flawed)
>
>> Be sure to make your measurement when network is quiet, and others in your home aren’t generating traffic.
> This is great advise.
>
> I would love to comment further, but after reloading http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>
> Best
> Sebastian
>
>
> On Dec 27, 2013, at 23:09 , Rich Brown <richb.hanover@gmail.com> wrote:
>
>>> You are a very good writer and I am on a tablet.
>>>
>> Thanks!
>>> Ill take a pass at the wiki tomorrow.
>>>
>>> The shaper does up and down was my first thought...
>>>
>> Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
>>
>> Rich
>>
>>> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com> wrote:
>>> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>>>
>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>
>>> There are still lots of open questions. Comments, please.
>>>
>>> Rich
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
[-- Attachment #2: Type: text/html, Size: 10968 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 11:09 ` Fred Stratton
@ 2013-12-28 13:42 ` Sebastian Moeller
2013-12-28 14:27 ` Fred Stratton
0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2013-12-28 13:42 UTC (permalink / raw)
To: Fred Stratton; +Cc: cerowrt-devel
[-- Attachment #1: Type: text/plain, Size: 5194 bytes --]
Hi Fred,
On Dec 28, 2013, at 12:09 , Fred Stratton <fredstratton@imap.cc> wrote:
> IThe UK consensus fudge factor has always been 85 per cent of the rate achieved, not 95 or 99 per cent.
I know that the recommendations have been lower in the past; I think this is partly because before Jesper Brouer's and Russels Stuart's work to properly account for ATM "quantization" people typically had to deal with a ~10% rate tax for the 5byte per cell overhead (48 byte payload in 53 byte cells 90.57% useable rate) plus an additional 5% to stochastically account for the padding of the last cell and the per packet overhead both of which affect the effective good put way more for small than large packets, so the 85% never worked well for all packet sizes. My hypothesis now is since we can and do properly account for these effects of ATM framing we can afford to start with a fudge factor of 90% or even 95% percent. As far as I know the recommended fudge factors are never ever explained by more than "this works empirically"...
>
> Devices express 2 values: the sync rate - or 'maximum rate attainable' - and the dynamic value of 'current rate'.
The actual data rate is the relevant information for shaping, often DSL modems report the link capacity as "maximum rate attainable" or some such, while the actual bandwidth is limited to a rate below what the line would support by contract (often this bandwidth reduction is performed on the PPPoE link to the BRAS).
>
> As the sync rate is fairly stable for any given installation - ADSL or Fibre - this could be used as a starting value. decremented by the traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor applied to that.
I would like to propose to use the "current rate" as starting point, as 'maximum rate attainable' >= 'current rate'.
>
> Fibre - FTTC - connections can suffer quite large download speed fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon is not confined to ADSL links.
On the actual xDSL link? As far as I know no telco actually uses SRA (seamless rate adaptation or so) so the current link speed will only get lower not higher, so I would expect a relative stable current rate (it might take a while, a few days to actually slowly degrade to the highest link speed supported under all conditions, but I hope you still get my point).
>
>
> An alternative speed test is something like this
>
> http://download.bethere.co.uk/downloadMeter.html
>
> which, as Be has been bought by Sky, may not exist after the end of April 2014.
But, if we recommend to run speed tests we really need to advise our users to start several concurrent up- and downloads to independent servers to actually measure the bandwidth of our bottleneck link; often a single server connection will not saturate a link (I seem to recall that with TCP it is guaranteed to only reach 75% or so averaged over time, is that correct?).
But I think this is not the proper way to set the bandwidth for the shaper, because upstream of our link to the ISP we have no guaranteed bandwidth at all and just can hope the ISP is oing the right thing AQM-wise.
>
> • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?]
>
> For a PPPoA service, the PPPoA link is treated as PPPoE on the second device, here running ceroWRT.
This still means you should specify the PPPoA overhead, not PPPoE.
> The packet overhead values are written in the dubious man page for tc_stab.
The only real flaw in that man page, as far as I know, is the fact that it indicates that the kernel will account for the 18byte ethernet header automatically, while the kernel does no such thing (which I hope to change).
> Sebastian has a potential alternative method of formal calculation.
So, I have no formal calculation method available, but an empirical way of detecting ATM quantization as well as measuring the per packet overhead of an ATM link.
The idea is to measure the RTT of ICMP packets of increasing length and then displaying the distribution of RTTs by ICMP packet length, on an ATM carrier we expect to see a step function with steps 48 bytes apart. For non-ATM carrier we expect to rather see a smooth ramp. By comparing the residuals of a linear fit of the data with the residuals of the best step function fit to the data. The fit with the lower residuals "wins". Attached you will find an example of this approach, ping data in red (median of NNN repetitions for each ICMP packet size), linear fit in blue, and best staircase fit in green. You notice that data starts somewhere in a 48 byte ATM cell. Since the ATM encapsulation overhead is maximally 44 bytes and we know the IP and ICMP overhead of the ping probe we can calculate the overhead preceding the IP header, which is what needs to be put in the overhead field in the GUI. (Note where the green line intersect the y-axis at 0 bytes packet size? this is where the IP header starts, the "missing" part of this ATM cell is the overhead).
[-- Attachment #2: PastedGraphic-1.tiff --]
[-- Type: image/tiff, Size: 50820 bytes --]
[-- Attachment #3: Type: text/plain, Size: 1887 bytes --]
Believe it or not, this methods works reasonable well (I tested successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection (overhead 32 bytes), and several PPPOE, LLC, (overhead 40) connections (from ADSL1 @ 3008/512 to ADSL2+ @ 16402/2558)). But it takes relative long time to measure the ping train especially at the higher rates… and it requires ping time stamps with decent resolution (which rules out windows) and my naive data acquisition scripts creates really large raw data files. I guess I should post the code somewhere so others can test and improve it.
Fred I would be delighted to get a data set from your connection, to test a known different encapsulation.
> TYPICAL OVERHEADS
> The following values are typical for different adsl scenarios (based on
> [1] and [2]):
>
> LLC based:
> PPPoA - 14 (PPP - 2, ATM - 12)
> PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS - 4+padding)
> Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding)
> IPoA - 16 (ATM - 16)
>
> VC Mux based:
> PPPoA - 10 (PPP - 2, ATM - 8)
> PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS - 4+padding)
> Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding)
> IPoA - 8 (ATM - 8)
>
>
> For VC Mux based PPPoA, I am currently using an overhead of 18 for the PPPoE setting in ceroWRT.
Yeah we could put this list into the wiki, but how shall a typical user figure out which encapsulation is used? And good luck in figuring out whether the frame check sequence (FCS) is included or not…
BTW 18, I predict that if PPPoE is only used between cerowrt and the "modem' or gateway your effective overhead should be 10 bytes; I would love if you could run the following against your link at night (also attached
[-- Attachment #4: ping_sweeper5_fs.sh --]
[-- Type: application/octet-stream, Size: 1747 bytes --]
#! /bin/bash
# TODO use seq or bash to generate a list of the requested sizes (to allow for non-equidistantly spaced sizes)
#
TECH=ADSL2 # just a name...
# finding a proper target IP is somewhat of an art, just traceroute a remote site
# and find the nearest host reliably responding to pings showing the smallet variation of pingtimes
TARGET=${1} # which ip to ping
DATESTR=`date +%Y%m%d_%H%M%S` # to allow multiple sequential records
LOG=ping_sweep_${TECH}_${DATESTR}.txt
# by default non-root ping will only end one packet per second, so work around that by calling ping independently for each package
# empirically figure out the shortest period still giving the standard ping time (to avoid being slow-pathed by our target)
PINGPERIOD=0.01 # in seconds
PINGSPERSIZE=10000
# Start, needed to find the per packet overhead dependent on the ATM encapsulation
# to reiably show ATM quantization one would like to see at least two steps, so cover a range > 2 ATM cells (so > 96 bytes)
SWEEPMINSIZE=16 # 64bit systems seem to require 16 bytes of payload to include a timestamp...
SWEEPMAXSIZE=116
n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}`
i_sweep=0
i_size=0
echo "Running ICMP RTT measurement against: ${TARGET}"
while [ ${i_sweep} -lt ${PINGSPERSIZE} ]
do
(( i_sweep++ ))
echo "Current iteration: ${i_sweep}"
# now loop from sweepmin to sweepmax
i_size=${SWEEPMINSIZE}
while [ ${i_size} -le ${SWEEPMAXSIZE} ]
do
echo "${i_sweep}. repetition of ping size ${i_size}"
ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &
(( i_size++ ))
# we need a sleep binary that allows non integer times (GNU sleep is fine as is sleep of macosx 10.8.4)
sleep ${PINGPERIOD}
done
done
#tail -f ${LOG}
echo "Done... ($0)"
[-- Attachment #5: Type: text/plain, Size: 2295 bytes --]
):
#! /bin/bash
# TODO use seq or bash to generate a list of the requested sizes (to allow for non-equidistantly spaced sizes)
#.
TECH=ADSL2 # just to give some meaning to the ping trace file name
# finding a proper target IP is somewhat of an art, just traceroute a remote site.
# and find the nearest host reliably responding to pings showing the smallet variation of pingtimes
TARGET=${1} # the IP against which to run the ICMP pings
DATESTR=`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential records
LOG=ping_sweep_${TECH}_${DATESTR}.txt
# by default non-root ping will only end one packet per second, so work around that by calling ping independently for each package
# empirically figure out the shortest period still giving the standard ping time (to avoid being slow-pathed by our target)
PINGPERIOD=0.01><------># in seconds
PINGSPERSIZE=10000
# Start, needed to find the per packet overhead dependent on the ATM encapsulation
# to reiably show ATM quantization one would like to see at least two steps, so cover a range > 2 ATM cells (so > 96 bytes)
SWEEPMINSIZE=16><------># 64bit systems seem to require 16 bytes of payload to include a timestamp...
SWEEPMAXSIZE=116
n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}`
i_sweep=0
i_size=0
echo "Running ICMP RTT measurement against: ${TARGET}"
while [ ${i_sweep} -lt ${PINGSPERSIZE} ]
do
(( i_sweep++ ))
echo "Current iteration: ${i_sweep}"
# now loop from sweepmin to sweepmax
i_size=${SWEEPMINSIZE}
while [ ${i_size} -le ${SWEEPMAXSIZE} ]
do
echo "${i_sweep}. repetition of ping size ${i_size}"
ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\
(( i_size++ ))
# we need a sleep binary that allows non integer times (GNU sleep is fine as is sleep of macosx 10.8.4)
sleep ${PINGPERIOD}
done
done
echo "Done... ($0)"
This will try to run 10000 repetitions for ICMP packet sizes from 16 to 116 bytes running (10000 * 101 * 0.01 / 60 =) 168 minutes, but you should be able to stop it with ctrl c if you are not patience enough, with your link I would estimate that 3000 should be plenty, but if you could run it over night that would be great and then ~3 hours should not matter much.
And then run the following attached code in octave or matlab
[-- Attachment #6: tc_stab_parameter_guide_03.m --]
[-- Type: application/octet-stream, Size: 35609 bytes --]
function [ output_args ] = tc_stab_parameter_guide_03( sweep_fqn, up_Kbit, down_Kbit )
%TC_STAB_PARAMETER_GUIDE Summary of this function goes here
% try to read in the result from a ping sweep run
% sweep_fqn: the log file of the ping sweep against the first hop after
% the DSL link
% up_Kbit: the uplink rate in Kilobits per second
% down_Kbit: the downlink rate in Kilobits per second
%
% TODO:
% find whether the carrier is ATM quantized (via FFT?)
% test whther best stair fits better than a simple linear regresson
% line?
% if yes:
% what is the RTT step (try to deduce the combined up and down rates from this)
% estimate the best MTU for the estimated protocol stack (how to test this?)
% 1) estimate the largest MTU that avoids fragmentation (default 1500 - 28 should be largest without fragmentation)
% 2) estimate the largest MTU that does not have padding in the last
% ATM cell, for this pick the MTU that no partial ATM cell remains
%
% DONE:
% Allow for holes in the ping data (missing sizes)
% make sure all sizes are filled (use NaN for empty ones???)
% maybe require to give the nominal up and down rates, to estimate the
% RTT stepsize
% try to figure out the overhead for each packet
%
%Thoughts:
% ask about IPv4 or IPv6 (what about tunneling?)
% the sweep should be taken directly connected to the modem to reduce
% non-ATM routing delays
dbstop if error;
if ~(isoctave)
timestamps.(mfilename).start = tic;
else
tic();
end
disp(['Starting: ', mfilename]);
output_args = [];
% control options
show_mean = 0; % the means are noisier than the medians
show_median = 1; % the median seems the way to go
show_min = 1; % the min should be the best measure, but in the ATM test sweep it is too variable
show_max = 0; % only useful for debugging
show_sem = 0; % give some estimate of the variance
show_ci = 1; % show the confidence interval of the mean, if the mean is shown
ci_alpha = 0.05; % alpha for confidence interval calculation
use_measure = 'median';
use_processed_results = 1;
max_samples_per_size = []; % if not empty only use maximally that many samples per size
% if not specified we try to estimate the per cell RTT from the data
default_up_Kbit = [];
default_down_KBit = [];
if (nargin == 0)
sweep_fqn = '';
sweep_fqn = fullfile(pwd, 'ping_sweep_ATM.txt'); % was Bridged, LLC/SNAP RFC-1483/2684 connection (overhead 32 bytes - 14 = 18)
sweep_fqn = fullfile(pwd, 'ping_sweep_ATM_20130610_234707.txt'); % telekom PPPOE, LLC, overhead 40!
sweep_fqn = fullfile(pwd, 'ping_sweep_ATM_20130618_233008.txt'); % telekom PPPOE
sweep_fqn = fullfile(pwd, 'ping_sweep_ATM_20130620_234659.txt'); % telekom PPPOE
sweep_fqn = fullfile(pwd, 'ping_sweep_ATM_20130618-20.txt'); % telekom PPPOE
% sweep_fqn = fullfile(pwd, 'ping_sweep_CABLE_20120426_230227.txt');
% sweep_fqn = fullfile(pwd, 'ping_sweep_CABLE_20120801_001235.txt');
if isempty(sweep_fqn)
[sweep_name, sweep_dir] = uigetfile('ping*.txt');
sweep_fqn = fullfile(sweep_dir, sweep_name);
end
up_Kbit = default_up_Kbit;
down_Kbit = default_down_KBit;
end
if (nargin == 1)
up_Kbit = default_up_Kbit;
down_Kbit = default_down_KBit;
end
if (nargin == 2)
down_Kbit = default_down_KBit;
end
%ATM
quantum.byte = 48; % ATM packets are always 53 bytes, 48 thereof payload
quantum.bit = quantum.byte * 8;
ATM_cell.byte = 53;
ATM_cell.bit = ATM_cell.byte * 8;
% known packet size offsets in bytes
offsets.IPv4 = 20; % assume no IPv4 options are used, IP 6 would be 40bytes?
offsets.IPv6 = 40; % not used yet...
offsets.ICMP = 8; % ICMP header
offsets.ethernet = 14; % ethernet header
offset.ATM.max_encapsulation_bytes = 44; % see http://ace-host.stuart.id.au/russell/files/tc/tc-atm/
MTU = 1500; % the nominal MTU to the ping host should be 1500, but might be lower if using a VPN
% fragmentation will cause an addition relative large increase in RTT (not necessarily registered to the ATM cells)
% that will confuse the ATM quantisation offset detector, so exclude all
% ping sizes that are potentially affected by fragmentation
max_ping_size_without_fragmentation = MTU + offsets.ethernet - offsets.IPv4 - offset.ATM.max_encapsulation_bytes;
% unknown offsets is what we need to figure out to feed tc-stab...
[sweep_dir, sweep_name] = fileparts(sweep_fqn);
cur_parsed_data_mat = [sweep_fqn(1:end-4), '.mat'];
if (use_processed_results && ~isempty(dir(cur_parsed_data_mat)))
disp(['Loading processed ping data from ', cur_parsed_data_mat]);
load(cur_parsed_data_mat, 'ping');
else
% read in the result from a ping sweep
disp(['Processing ping data from ', sweep_fqn]);
ping = parse_ping_output(sweep_fqn);
if isempty(ping)
disp('No useable ping data found, exiting...');
return
end
save(cur_parsed_data_mat, 'ping');
end
% analyze the data
min_ping_size = min(ping.data(:, ping.cols.size)) - offsets.ICMP;
disp(['Minimum size of ping payload used: ', num2str(min_ping_size), ' bytes.']);
known_overhead = offsets.IPv4; % ping reports the ICMP header already included in size
ping.data(:, ping.cols.size) = ping.data(:, ping.cols.size) + known_overhead; % we know we used IPv4 so add the 20 bytes already, so that size are relative to the start of the IP header
size_list = unique(ping.data(:, ping.cols.size)); % this is the number of different sizes, but there might be holes/missing sizes
max_pingsize = max(size_list);
per_size.header = {'size', 'mean', 'median', 'min', 'max', 'std', 'n', 'sem', 'ci'};
per_size.cols = get_column_name_indices(per_size.header);
per_size.data = zeros([max_pingsize, length(per_size.header)]) / 0; % NaNs
per_size.data(:, per_size.cols.size) = (1:1:max_pingsize);
if ~isempty(max_samples_per_size)
disp(['Analysing only the first ', num2str(max_samples_per_size), ' samples.']);
end
for i_size = 1 : length(size_list)
cur_size = size_list(i_size);
cur_size_idx = find(ping.data(:, ping.cols.size) == cur_size);
if ~isempty(max_samples_per_size)
n_selected_samples = min([length(cur_size_idx), max_samples_per_size]);
cur_size_idx = cur_size_idx(1:n_selected_samples);
%disp(['Analysing only the first ', num2str(max_samples_per_size), ' samples of ', num2str(length(cur_size_idx))]);
end
per_size.data(cur_size, per_size.cols.mean) = mean(ping.data(cur_size_idx, ping.cols.time));
per_size.data(cur_size, per_size.cols.median) = median(ping.data(cur_size_idx, ping.cols.time));
per_size.data(cur_size, per_size.cols.min) = min(ping.data(cur_size_idx, ping.cols.time));
per_size.data(cur_size, per_size.cols.max) = max(ping.data(cur_size_idx, ping.cols.time));
per_size.data(cur_size, per_size.cols.std) = std(ping.data(cur_size_idx, ping.cols.time), 0);
per_size.data(cur_size, per_size.cols.n) = length(cur_size_idx);
per_size.data(cur_size, per_size.cols.sem) = per_size.data(cur_size, per_size.cols.std) / sqrt(length(cur_size_idx));
per_size.data(cur_size, per_size.cols.ci) = calc_cihw(per_size.data(cur_size, per_size.cols.std), per_size.data(cur_size, per_size.cols.n), ci_alpha);
end
clear ping % with large data sets 32bit matlab will run into memory issues...
figure('Name', sweep_name);
hold on;
legend_str = {};
if (show_mean)
% means
legend_str{end + 1} = 'mean';
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.mean), 'Color', [0 1 0 ]);
if (show_sem)
legend_str{end + 1} = '+sem';
legend_str{end + 1} = '-sem';
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.mean) - per_size.data(:, per_size.cols.sem), 'Color', [0 0.66 0]);
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.mean) + per_size.data(:, per_size.cols.sem), 'Color', [0 0.66 0]);
end
if (show_ci)
legend_str{end + 1} = '+ci';
legend_str{end + 1} = '-ci';
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.mean) - per_size.data(:, per_size.cols.ci), 'Color', [0 0.37 0]);
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.mean) + per_size.data(:, per_size.cols.ci), 'Color', [0 0.37 0]);
end
end
if(show_median)
% median +- standard error of the mean, confidence interval would be
% better
legend_str{end + 1} = 'median';
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.median), 'Color', [1 0 0]);
if (show_sem)
legend_str{end + 1} = '+sem';
legend_str{end + 1} = '-sem';
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.median) - per_size.data(:, per_size.cols.sem), 'Color', [0.66 0 0]);
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.median) + per_size.data(:, per_size.cols.sem), 'Color', [0.66 0 0]);
end
if(show_min)
% minimum, should be cleanest, but for the test data set looks quite sad...
legend_str{end + 1} = 'min';
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.min), 'Color', [0 0 1]);
end
if(show_max)
% minimum, should be cleanest, but for the test data set looks quite sad...
legend_str{end + 1} = 'max';
plot(per_size.data(:, per_size.cols.size), per_size.data(:, per_size.cols.max), 'Color', [0 0 0.66]);
end
end
title(['If this plot shows a (noisy) step function with a stepping ~', num2str(quantum.byte), ' bytes then the data carrier is quantised, make sure to use tc-stab']);
xlabel('Approximate packet size [bytes]');
ylabel('ICMP round trip times (ping RTT) [ms]');
legend(legend_str, 'Location', 'NorthWest');
hold off;
% potentially clean up the data, by interpolating values with large sem
% from the neighbours or replacing those with NaNs?
% if the size of the ping packet exceeds the MTU the ping packets gets
% fragmented the step over this ping size will cause a RTT increaser >> one
% RTT_quantum, so exclude all sizes potentially affected by this from the
% search space, (for now assume that the route to the ping host actually can carry 1500 byte MTUs...)
measured_pingsize_idx = find(~isnan(per_size.data(:, per_size.cols.(use_measure))));
tmp_idx = find(measured_pingsize_idx <= max_ping_size_without_fragmentation);
last_non_fragmented_pingsize = measured_pingsize_idx(tmp_idx(end));
ping_sizes_for_linear_fit = measured_pingsize_idx(tmp_idx);
% fit a line to the data, to estimate the RTT per byte
[p, S] = polyfit(per_size.data(ping_sizes_for_linear_fit, per_size.cols.size), per_size.data(ping_sizes_for_linear_fit, per_size.cols.(use_measure)), 1);
RTT_per_byte = p(end - 1);
fitted_line = polyval(p, per_size.data(ping_sizes_for_linear_fit, per_size.cols.size), S);
input_data = per_size.data(ping_sizes_for_linear_fit, per_size.cols.(use_measure));
% estimate the goodness of the linear fit the same way as for the stair
% function
linear_cumulative_difference = sum(abs(input_data - fitted_line));
% figure
% hold on
% plot(per_size.data(ping_sizes_for_linear_fit, per_size.cols.size), per_size.data(ping_sizes_for_linear_fit, per_size.cols.(use_measure)), 'Color', [0 1 0]);
% plot(per_size.data(ping_sizes_for_linear_fit, per_size.cols.size), fitted_line, 'Color', [1 0 0]);
% hold off
% based on the linear fit we can estimate the average RTT per ATM cell
estimated_RTT_quantum_ms = RTT_per_byte * 48;
% the RTT should equal the average RTT increase per ATM quantum
% estimate the RTT step size
% at ADSL down 3008kbit/sec up 512kbit/sec we expect, this does not include
% processing time
if ~isempty(down_Kbit) || ~isempty(up_Kbit)
expected_RTT_quantum_ms = (ATM_cell.bit / (down_Kbit * 1024) + ATM_cell.bit / (up_Kbit * 1024) ) * 1000; % this estimate is rather a lower bound for fastpath , so search for best fits
else
expected_RTT_quantum_ms = estimated_RTT_quantum_ms;
end
disp(['lower bound estimate for one ATM cell RTT based of specified up and downlink is ', num2str(expected_RTT_quantum_ms), ' ms.']);
disp(['estimate for one ATM cell RTT based on linear fit of the ping sweep data is ', num2str(estimated_RTT_quantum_ms), ' ms.']);
% lets search from expected_RTT_quantum_ms to 1.5 * expected_RTT_quantum_ms
% in steps of expected_RTT_quantum_ms / 100
% to allow for interleaved ATM set ups increase the search space up to 32
% times best fastpath RTT estimate, 64 interleave seems to add 25ms to the
% latency, but this only
RTT_quantum_list = (expected_RTT_quantum_ms / 2 : expected_RTT_quantum_ms / 100 : 32 * expected_RTT_quantum_ms);
quantum_list = (1 : 1 : quantum.byte);
% BRUTE FORCE search of best fitting stair...
differences = zeros([length(RTT_quantum_list) length(quantum_list)]);
cumulative_differences = differences;
all_stairs = zeros([length(RTT_quantum_list) length(quantum_list) length(per_size.data(1:last_non_fragmented_pingsize, per_size.cols.(use_measure)))]);
for i_RTT_quant = 1 : length(RTT_quantum_list)
cur_RTT_quant = RTT_quantum_list(i_RTT_quant);
for i_quant = 1 : quantum.byte
[differences(i_RTT_quant, i_quant), cumulative_differences(i_RTT_quant, i_quant), all_stairs(i_RTT_quant, i_quant, :)] = ...
get_difference_between_data_and_stair( per_size.data(1:last_non_fragmented_pingsize, per_size.cols.size), per_size.data(1:last_non_fragmented_pingsize, per_size.cols.(use_measure)), ...
quantum_list(i_quant), quantum.byte, 0, cur_RTT_quant );
end
end
% for the initial test DSL set the best x_offset was 21, corresponding to 32 bytes overhead before the IP header.
[min_cum_diff, min_cum_diff_idx] = min(cumulative_differences(:));
[min_cum_diff_row_idx, min_cum_diff_col_idx] = ind2sub(size(cumulative_differences),min_cum_diff_idx);
best_difference = differences(min_cum_diff_row_idx, min_cum_diff_col_idx);
disp(['Best staircase fit cumulative difference is: ', num2str(cumulative_differences(min_cum_diff_row_idx, min_cum_diff_col_idx))]);
disp(['Best linear fit cumulative difference is: ', num2str(linear_cumulative_difference)]);
% judge the quantization
if (cumulative_differences(min_cum_diff_row_idx, min_cum_diff_col_idx) < linear_cumulative_difference)
% stair fits better than line
quant_string = ['Quantized ATM carrier LIKELY (cummulative residual: stair fit ', num2str(cumulative_differences(min_cum_diff_row_idx, min_cum_diff_col_idx)), ' linear fit ', num2str(linear_cumulative_difference)];
else
quant_string = ['Quantized ATM carrier UNLIKELY (cummulative residual: stair fit ', num2str(cumulative_differences(min_cum_diff_row_idx, min_cum_diff_col_idx)), ' linear fit ', num2str(linear_cumulative_difference)];
end
disp(quant_string);
disp(['remaining ATM cell length after ICMP header is ', num2str(quantum_list(min_cum_diff_col_idx)), ' bytes.']);
disp(['ICMP RTT of a single ATM cell is ', num2str(RTT_quantum_list(min_cum_diff_col_idx)), ' ms.']);
% as first approximation use the ATM cell offset and known offsets (ICMP
% IPv4 min_ping_size) to estimate the number of cells used for per paket
% overhead
% this assumes that no ATM related overhead is >= ATM cell size
% -1 to account for matlab 1 based indices
% what is the offset in the 2nd ATM cell
n_bytes_overhead_2nd_cell = quantum.byte - (quantum_list(min_cum_diff_col_idx) - 1); % just assume we can not fit all overhead into one cell...
% what is the known overhead size for the first data point:
tmp_idx = find(~isnan(per_size.data(:, per_size.cols.mean)));
known_overhead_first_ping_size = tmp_idx(1);
%pre_IP_overhead = quantum.byte + (n_bytes_overhead_2nd_cell - known_overhead); % ths is the one we are after in the end
pre_IP_overhead = quantum.byte + (n_bytes_overhead_2nd_cell - known_overhead_first_ping_size); % ths is the one we are after in the end
disp(' ');
disp(['Estimated overhead preceeding the IP header: ', num2str(pre_IP_overhead), ' bytes']);
figure('Name', 'Comparing ping data with');
hold on
legend_str = {'ping_data', 'fitted_stair', 'fitted_line'};
plot(per_size.data(1:last_non_fragmented_pingsize, per_size.cols.size), per_size.data(1:last_non_fragmented_pingsize, per_size.cols.(use_measure)), 'Color', [1 0 0]);
plot(per_size.data(1:last_non_fragmented_pingsize, per_size.cols.size), squeeze(all_stairs(min_cum_diff_row_idx, min_cum_diff_col_idx, :)) + best_difference, 'Color', [0 1 0]);
fitted_line = polyval(p, per_size.data(1:last_non_fragmented_pingsize, per_size.cols.size), S);
plot(per_size.data(1:last_non_fragmented_pingsize, per_size.cols.size), fitted_line, 'Color', [0 0 1]);
title({['Estimated RTT per quantum: ', num2str(RTT_quantum_list(min_cum_diff_col_idx)), ' ms; ICMP data offset in quantum ', num2str(quantum_list(min_cum_diff_col_idx)), ' bytes'];...
['Estimated overhead preceeding the IP header: ', num2str(pre_IP_overhead), ' bytes'];...
quant_string});
xlabel('Approximate packet size [bytes]');
ylabel('ICMP round trip times (ping RTT) [ms]');
if (isoctave)
legend(legend_str, 'Location', 'NorthWest');
else
%annotation('textbox', [0.0 0.95 1.0 .05], 'String', ['Estimated overhead preceeding the IP header: ', num2str(pre_IP_overhead), ' bytes'], 'FontSize', 9, 'Interpreter', 'none', 'Color', [1 0 0], 'LineStyle', 'none');
legend(legend_str, 'Interpreter', 'none', 'Location', 'NorthWest');
end
hold off
% use http://ace-host.stuart.id.au/russell/files/tc/tc-atm/ to present the
% most likely ATM encapsulation for a given overhead and present a recommendation
% for the tc stab invocation
display_protocol_stack_information(pre_IP_overhead);
% now turn this into tc-stab recommendations:
disp(['Add the following to both the egress root qdisc:']);
% disp(' ');
disp(['A) Assuming the router connects over ethernet to the DSL-modem:']);
disp(['stab mtu 2048 tsize 128 overhead ', num2str(pre_IP_overhead), ' linklayer atm']); % currently tc stab does not account for the ethernet header
% disp(['stab mtu 2048 tsize 128 overhead ', num2str(pre_IP_overhead - offsets.ethernet), ' linklayer atm']);
% disp(' ');
% disp(['B) Assuming the router connects via PPP and non-ethernet to the modem:']);
% disp(['stab mtu 2048 tsize 128 overhead ', num2str(pre_IP_overhead), ' linklayer atm']);
disp(' ');
% on ingress do not exclude the the ethernet header?
disp(['Add the following to both the ingress root qdisc:']);
disp(' ');
disp(['A) Assuming the router connects over ethernet to the DSL-modem:']);
disp(['stab mtu 2048 tsize 128 overhead ', num2str(pre_IP_overhead), ' linklayer atm']);
disp(' ');
if ~(isoctave)
timestamps.(mfilename).end = toc(timestamps.(mfilename).start);
disp([mfilename, ' took: ', num2str(timestamps.(mfilename).end), ' seconds.']);
else
toc
end
% and now the other end of the data, what is the max MTU for the link and
% what is the best ATM cell aligned MTU
disp('Done...');
return
end
function [ ping_data ] = parse_ping_output( ping_log_fqn )
%PARSE_PING_OUTPUT read the putput of a ping run/sweep
% for further processing
% TODO:
% use a faster parser, using srtok is quite expensive
%
if ~(isoctave)
timestamps.parse_ping_output.start = tic;
else
tic();
end
verbose = 0;
n_rows_to_grow_table_by = 10000; % grow table increment to avoid excessive memory copy ops
ping_data = [];
cur_sweep_fd = fopen(ping_log_fqn, 'r');
if (cur_sweep_fd == -1)
disp(['Could not open ', ping_log_fqn, '.']);
if isempty(dir(ping_log_fqn))
disp('Reason: file does not seem to exist at the given directory...')
end
return
end
ping_data.header = {'size', 'icmp_seq', 'ttl', 'time'};
ping_data.cols = get_column_name_indices(ping_data.header);
ping_data.data = zeros([n_rows_to_grow_table_by, length(ping_data.header)]);
cur_data_lines = 0;
cur_lines = 0;
% skip the first line
% PING netblock-75-79-143-1.dslextreme.com (75.79.143.1): (16 ... 1000)
% data bytes
header_line = fgetl(cur_sweep_fd);
while ~feof(cur_sweep_fd)
% grow the data table if need be
if (size(ping_data.data, 1) == cur_data_lines)
if (verbose)
disp('Growing ping data table...');
end
ping_data.data = [ping_data.data; zeros([n_rows_to_grow_table_by, length(ping_data.header)])];
end
cur_line = fgetl(cur_sweep_fd);
if ~(mod(cur_lines, 1000))
disp([num2str(cur_lines +1), ' lines parsed...']);
end
cur_lines = cur_lines + 1;
[first_element, remainder] = strtok(cur_line);
first_element_as_number = str2double(first_element);
if isempty(first_element) || strcmp('Request', first_element) || strcmp('---', first_element)
% skip empty lines explicitly
continue;
end
% the following will not work for merged ping
%if strmatch('---', first_element)
% %we reached the end of sweeps
% break;
%end
% now read in the data
% 30 bytes from 75.79.143.1: icmp_seq=339 ttl=63 time=14.771 ms
if ~isempty(first_element_as_number)
% get the next element
[tmp_next_item, tmp_remainder] = strtok(remainder);
if strcmp(tmp_next_item, 'bytes')
if ~(mod(cur_data_lines, 1000))
disp(['Milestone ', num2str(cur_data_lines +1), ' ping packets reached...']);
end
cur_data_lines = cur_data_lines + 1;
% size of the ICMP package
ping_data.data(cur_data_lines, ping_data.cols.size) = first_element_as_number;
% now process the remainder
while ~isempty(remainder)
[next_item, remainder] = strtok(remainder);
equality_pos = strfind(next_item, '=');
% data items are name+value pairs
if ~isempty(equality_pos);
cur_key = next_item(1: equality_pos - 1);
cur_value = str2double(next_item(equality_pos + 1: end));
switch cur_key
% busybox ping and macosx ping return different key names
case {'seq', 'icmp_seq'}
ping_data.data(cur_data_lines, ping_data.cols.icmp_seq) = cur_value;
case 'ttl'
ping_data.data(cur_data_lines, ping_data.cols.ttl) = cur_value;
case 'time'
ping_data.data(cur_data_lines, ping_data.cols.time) = cur_value;
end
end
end
else
% skip this line
if (verbose)
disp(['Skipping: ', cur_line]);
end
end
else
if (verbose)
disp(['Ping output: ', cur_line, ' not handled yet...']);
end
end
end
% remove empty lines
if (size(ping_data.data, 1) > cur_data_lines)
ping_data.data = ping_data.data(1:cur_data_lines, :);
end
disp(['Found ', num2str(cur_data_lines), ' ping packets in ', ping_log_fqn]);
% clean up
fclose(cur_sweep_fd);
if ~(isoctave)
timestamps.parse_ping_output.end = toc(timestamps.parse_ping_output.start);
disp(['Parsing took: ', num2str(timestamps.parse_ping_output.end), ' seconds.']);
else
toc
end
return
end
function [ difference , cumulative_difference, stair_y ] = get_difference_between_data_and_stair( data_x, data_y, x_size, stair_x_step_size, y_offset, stair_y_step_size )
% 130619sm: handle NaNs in data_y (marker for missing ping sizes)
% x_size is the flat part of the first stair, that is quantum minus the
% offset
% TODO: understand the offset issue and simplify this function
% extrapolate the stair towards x = 0 again
debug = 0;
difference = [];
tmp_idx = find(~isnan(data_y));
x_start_val_idx = tmp_idx(1);
x_start_val = data_x(x_start_val_idx);
x_end_val = data_x(end); % data_x is sorted...
% construct stair
stair_x = data_x;
proto_stair_y = zeros([x_end_val 1]); % we need the final value in
% make sure the x_size values do not exceed the step size...
if (x_size > stair_x_step_size)
if mod(x_size, stair_x_step_size) == 0
x_size = stair_x_step_size;
else
x_size = mod(x_size, stair_x_step_size);
end
end
%stair_y_step_idx = (x_start_val + x_size : stair_x_step_size : x_end_val);
%% we really want steps registered to x_start_val
%stair_y_step_idx = (mod(x_start_val, stair_x_step_size) + x_size : stair_x_step_size : x_end_val);
stair_y_step_idx = (mod(x_start_val + x_size, stair_x_step_size) : stair_x_step_size : x_end_val);
if stair_y_step_idx(1) == 0
stair_y_step_idx(1) = [];
end
proto_stair_y(stair_y_step_idx) = stair_y_step_size;
stair_y = cumsum(proto_stair_y);
if (debug)
figure
hold on;
title(['x offset used: ', num2str(x_size), ' with quantum ', num2str(stair_x_step_size)]);
plot(data_x, data_y, 'Color', [0 1 0]);
plot(stair_x, stair_y, 'Color', [1 0 0]);
hold off;
end
% missing ping sizes are filled with NaNs, so skip those
notnan_idx = find(~isnan(data_y));
% estimate the best y_offset for the stair
difference = sum(abs(data_y(notnan_idx) - stair_y(notnan_idx))) / length(data_y(notnan_idx));
% calculate the cumulative difference between stair and data...
cumulative_difference = sum(abs(data_y(notnan_idx) - (stair_y(notnan_idx) + difference)));
return
end
% function [ stair ] = build_stair(x_vector, x_size, stair_x_step_size, y_offset, stair_y_step_size )
% stair = [];
%
% return
% end
function [columnnames_struct, n_fields] = get_column_name_indices(name_list)
% return a structure with each field for each member if the name_list cell
% array, giving the position in the name_list, then the columnnames_struct
% can serve as to address the columns, so the functions assitgning values
% to the columns do not have to care too much about the positions, and it
% becomes easy to add fields.
n_fields = length(name_list);
for i_col = 1 : length(name_list)
cur_name = name_list{i_col};
columnnames_struct.(cur_name) = i_col;
end
return
end
function [ci_halfwidth_vector] = calc_cihw(std_vector, n, alpha)
%calc_ci : calculate the half width of the confidence interval (for 1 - alpha)
% the t_value lookup depends on alpha and the samplesize n; the relevant
% calculation of the degree of freedom is performed inside calc_t_val.
% ci_halfwidth = t_val(alpha, n-1) * std / sqrt(n)
% Each groups CI ranges from mean - ci_halfwidth to mean - ci_halfwidth, so
% the calling function has to perform this calculation...
%
% INPUTS:
% std_vector: vector containing the standard deviations of all requested
% groups
% n: number of samples in each group, if the groups have different
% samplesizes, specify each groups samplesize in a vector
% alpha: the desired maximal uncertainty/error in the range of [0, 1]
% OUTPUT:
% ci_halfwidth_vector: vector containing the confidence intervals half width
% for each group
% calc_t_val return one sided t-values, for the desired two sidedness one has
% to half the alpha for the table lookup
cur_alpha = alpha / 2;
% if n is scalar use same n for all elements of std_vec
if isscalar(n)
t_ci = calc_t_val(cur_alpha, n);
ci_halfwidth_vector = std_vector * t_ci / sqrt(n);
% if n is a vector, prepare a matching vector of t_ci values
elseif isvector(n)
t_ci_vector = n;
% this is probably ugly, but calc_t_val only accepts scalars.
for i_pos = 1 : length(n)
t_ci_vector(i_pos) = calc_t_val(cur_alpha, n(i_pos));
end
ci_halfwidth_vector = std_vector .* t_ci_vector ./ sqrt(n);
end
return
end
%-----------------------------------------------------------------------------
function [t_val] = calc_t_val(alpha, n)
% the t value for the given alpha and n
% so call with the n of the sample, not with degres of freedom
% see http://mathworld.wolfram.com/Studentst-Distribution.html for formulas
% return values follow Bortz, Statistik fuer Sozialwissenschaftler, Springer
% 1999, table D page 775. That is it returns one sided t-values.
% primary author S. Moeller
% TODO:
% sidedness of t-value???
% basic error checking
if nargin < 2
error('alpha and n have to be specified...');
end
% probabilty of error
tmp_alpha = alpha ;%/ 2;
if (tmp_alpha < 0) || (tmp_alpha > 1)
msgbox('alpha has to be taken from [0, 1]...');
t_val = NaN;
return
end
if tmp_alpha == 0
t_val = -Inf;
return
elseif tmp_alpha ==1
t_val = Inf;
return
end
% degree of freedom
df = n - 1;
if df < 1
%msgbox('The n has to be >= 2 (=> df >= 1)...');
% disp('The n has to be >= 2 (=> df >= 1)...');
t_val = NaN;
return
end
% only calculate each (alpha, df) combination once, store the results
persistent t_val_array;
% create the t_val_array
if ~iscell(t_val_array)
t_val_array = {[NaN;NaN]};
end
% search for the (alpha, df) tupel, avoid calculation if already stored
if iscell(t_val_array)
% cell array of 2d arrays containing alpha / t_val pairs
if df <= length(t_val_array)
% test whether the required alpha, t_val tupel exists
if ~isempty(t_val_array{df})
% search for alpha
tmp_array = t_val_array{df};
alpha_index = find(tmp_array(1,:) == tmp_alpha);
if any(alpha_index)
t_val = tmp_array(2, alpha_index);
return
end
end
else
% grow t_val_array to length of n
missing_cols = df - length(t_val_array);
for i_missing_cols = 1: missing_cols
t_val_array{end + 1} = [NaN;NaN];
end
end
end
% check the sign
cdf_sign = 1;
if (1 - tmp_alpha) == 0.5
t_val = t_cdf;
elseif (1 - tmp_alpha) < 0.5 % the t-cdf is point symmetric around (0, 0.5)
cdf_sign = -1;
tmp_alpha = 1 - tmp_alpha; % this will be undone later
end
% init some variables
n_iterations = 0;
delta_t = 1;
last_alpha = 1;
higher_t = 50;
lower_t = 0;
% find a t-value pair around the desired alpha value
while norm_students_cdf(higher_t, df) < (1 - tmp_alpha);
lower_t = higher_t;
higher_t = higher_t * 2;
end
% search the t value for the given alpha...
while (n_iterations < 1000) && (abs(delta_t) >= 0.0001)
n_iterations = n_iterations + 1;
% get the test_t (TODO linear interpolation)
% higher_alpha = norm_students_cdf(higher_t, df);
% lower_alpha = norm_students_cdf(lower_t, df);
test_t = lower_t + ((higher_t - lower_t) / 2);
cur_alpha = norm_students_cdf(test_t, df);
% just in case we hit the right t spot on...
if cur_alpha == (1 - tmp_alpha)
t_crit = test_t;
break;
% probably we have to search for the right t
elseif cur_alpha < (1 - tmp_alpha)
% test_t is the new lower_t
lower_t = test_t;
%higher_t = higher_t; % this stays as is...
elseif cur_alpha > (1 - tmp_alpha)
%
%lower_t = lower_t; % this stays as is...
higher_t = test_t;
end
delta_t = higher_t - lower_t;
last_alpha = cur_alpha;
end
t_crit = test_t;
% set the return value, correct for negative t values
t_val = t_crit * cdf_sign;
if cdf_sign < 0
tmp_alpha = 1 - tmp_alpha;
end
% store the alpha, n, t_val tupel in t_val_array
pos = size(t_val_array{df}, 2);
t_val_array{df}(1, (pos + 1)) = tmp_alpha;
t_val_array{df}(2, (pos + 1)) = t_val;
return
end
%-----------------------------------------------------------------------------
function [scaled_cdf] = norm_students_cdf(t, df)
% calculate the cdf of students distribution for a given degree of freedom df,
% and all given values of t, then normalize the result
% the extreme values depend on the values of df!!!
% get min and max by calculating values for extrem t-values (e.g. -10000000,
% 10000000)
extreme_cdf_vals = students_cdf([-10000000, 10000000], df);
tmp_cdf = students_cdf(t, df);
scaled_cdf = (tmp_cdf - extreme_cdf_vals(1)) /...
(extreme_cdf_vals(2) - extreme_cdf_vals(1));
return
end
%-----------------------------------------------------------------------------
function [cdf_value_array] = students_cdf(t_value_array, df)
%students_cdf: calc the cumulative density function for a t-distribution
% Calculate the CDF value for each value t of the input array
% see http://mathworld.wolfram.com/Studentst-Distribution.html for formulas
% INPUTS: t_value_array: array containing the t values for which to
% calculate the cdf
% df: degree of freedom; equals n - 1 for the t-distribution
cdf_value_array = 0.5 +...
((betainc(1, 0.5 * df, 0.5) / beta(0.5 * df, 0.5)) - ...
(betainc((df ./ (df + t_value_array.^2)), 0.5 * df, 0.5) /...
beta(0.5 * df, 0.5))) .*...
sign(t_value_array);
return
end
%-----------------------------------------------------------------------------
function [t_prob_dist] = students_pf(df, t_arr)
% calculate the probability function for students t-distribution
t_prob_dist = (df ./ (df + t_arr.^2)).^((1 + df) / 2) /...
(sqrt(df) * beta(0.5 * df, 0.5));
% % calculate and scale the cdf by hand...
% cdf = cumsum(t_prob_dist);
% discrete_t_cdf = (cdf - min(cdf)) / (max(cdf) - min(cdf));
% % numericaly get the t-value for the given alpha
% tmp_index = find(discrete_t_cdf > (1 - tmp_alpha));
% t_crit = t(tmp_index(1));
return
end
function in = isoctave ()
persistent inout;
if isempty(inout),
inout = exist('OCTAVE_VERSION','builtin') ~= 0;
end;
in = inout;
return;
end
function [] = display_protocol_stack_information(pre_IP_overhead)
% use [1] http://ace-host.stuart.id.au/russell/files/tc/tc-atm/ to present the
% most likely ATM protocol stack setup for a given overhead so the user can
% compare with his prior knowledge
% how much data fits into ATM cells without padding? 32 cells would be 1519
% which is larger than the 1500 max MTU for ethernet
ATM_31_cells_proto_MTU = 31 * 48; % according to [1] 31 cells are the optimum for all protocol stacks
ATM_32_cells_proto_MTU = 32 * 48; % should be best for case 44
disp(' ');
disp('According to http://ace-host.stuart.id.au/russell/files/tc/tc-atm/');
disp(['', num2str(pre_IP_overhead), ' bytes overhead indicate']);
switch pre_IP_overhead
case 8
disp('Connection: IPoA, VC/Mux RFC-2684');
disp('Protocol (bytes): ATM AAL5 SAR (8) : Total 8');
overhead_bytes_around_MTU = 8;
overhead_bytes_in_MTU = 0;
case 16
disp('Connection: IPoA, LLC/SNAP RFC-2684');
disp('Protocol (bytes): ATM LLC (3), ATM SNAP (5), ATM AAL5 SAR (8) : Total 16');
overhead_bytes_around_MTU = 16;
overhead_bytes_in_MTU = 0;
case 24
disp('Connection: Bridged, VC/Mux RFC-1483/2684');
disp('Protocol (bytes): Ethernet Header (14), ATM pad (2), ATM AAL5 SAR (8) : Total 24');
overhead_bytes_around_MTU = 24;
overhead_bytes_in_MTU = 0;
case 28
disp('Connection: Bridged, VC/Mux+FCS RFC-1483/2684');
disp('Protocol (bytes): Ethernet Header (14), Ethernet PAD [8] (0), Ethernet Checksum (4), ATM pad (2), ATM AAL5 SAR (8) : Total 28');
overhead_bytes_around_MTU = 28;
overhead_bytes_in_MTU = 0;
case 32
disp('Connection: Bridged, LLC/SNAP RFC-1483/2684');
disp('Protocol (bytes): Ethernet Header (14), ATM LLC (3), ATM SNAP (5), ATM pad (2), ATM AAL5 SAR (8) : Total 32');
overhead_bytes_around_MTU = 32;
overhead_bytes_in_MTU = 0;
disp('OR');
disp('Connection: PPPoE, VC/Mux RFC-2684');
disp('Protocol (bytes): PPP (2), PPPoE (6), Ethernet Header (14), ATM pad (2), ATM AAL5 SAR (8) : Total 32');
overhead_bytes_around_MTU = 24;
overhead_bytes_in_MTU = 8;
case 36
disp('Connection: Bridged, LLC/SNAP+FCS RFC-1483/2684');
disp('Protocol (bytes): Ethernet Header (14), Ethernet PAD [8] (0), Ethernet Checksum (4), ATM LLC (3), ATM SNAP (5), ATM pad (2), ATM AAL5 SAR (8) : Total 36');
overhead_bytes_around_MTU = 36;
overhead_bytes_in_MTU = 0;
disp('OR');
disp('Connection: PPPoE, VC/Mux+FCS RFC-2684');
disp('Protocol (bytes): PPP (2), PPPoE (6), Ethernet Header (14), Ethernet PAD [8] (0), Ethernet Checksum (4), ATM pad (2), ATM AAL5 SAR (8) : Total 36');
overhead_bytes_around_MTU = 28;
overhead_bytes_in_MTU = 8;
case 10
disp('Connection: PPPoA, VC/Mux RFC-2364');
disp('Protocol (bytes): PPP (2), ATM AAL5 SAR (8) : Total 10');
overhead_bytes_around_MTU = 8;
overhead_bytes_in_MTU = 2;
case 14
disp('Connection: PPPoA, LLC RFC-2364');
disp('Protocol (bytes): PPP (2), ATM LLC (3), ATM LLC-NLPID (1), ATM AAL5 SAR (8) : Total 14');
overhead_bytes_around_MTU = 12;
overhead_bytes_in_MTU = 2;
case 40
disp('Connection: PPPoE, LLC/SNAP RFC-2684');
disp('Protocol (bytes): PPP (2), PPPoE (6), Ethernet Header (14), ATM LLC (3), ATM SNAP (5), ATM pad (2), ATM AAL5 SAR (8) : Total 40');
overhead_bytes_around_MTU = 32;
overhead_bytes_in_MTU = 8;
case 44
disp('Connection: PPPoE, LLC/SNAP+FCS RFC-2684');
disp('Protocol (bytes): PPP (2), PPPoE (6), Ethernet Header (14), Ethernet PAD [8] (0), Ethernet Checksum (4), ATM LLC (3), ATM SNAP (5), ATM pad (2), ATM AAL5 SAR (8) : Total 44');
overhead_bytes_around_MTU = 36;
overhead_bytes_in_MTU = 8;
otherwise
disp('a protocol stack this program does NOT know (yet)...');
end
disp(' ');
return;
end
[-- Attachment #7: Type: text/plain, Size: 7789 bytes --]
. Invoce with "tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_said_file')". The parser will run on the first invocation and is reallr really slow, but further invocations should be faster. If issues arise, let me know, I am happy to help.
>
> Were I to use a single directly connected gateway, I would input a suitable value for PPPoA in that openWRT firmware.
I think you should do that right now.
> In theory, I might need to use a negative value, bmt the current kernel does not support that.
If you use tc_stab, negative overheads are fully supported, only htb_private has overhead defined as unsigned integer and hence does not allow negative values.
> I have used many different arbitrary values for overhead. All appear to have little effect.
So the issue here is that only at small packet sizes does the overhead and last cell padding eat a disproportionate amount of your bandwidth (64 byte packet plus 44 byte overhead plus 47 byte worst case cell padding: 100* (44+47+64)/64 = 242% effective packet size to what the shaper estimated ), at typical packet sizes the max error (44 bytes missing overhead and potentially misjudged cell padding of 47 bytes adds up to a theoretical 100*(44+47+1500)/1500 = 106% effective packet size to what the shaper estimated). It is obvious that at 1500 byte packets the whole ATM issue can be easily dismissed with just reducing the link rate by ~10% for the 48 in 53 framing and an additional ~6% for overhead and cell padding. But once you mix smaller packets in your traffic for say VoIP, the effective wire size misjudgment will kill your ability to control the queueing. Note that the common wisdom of shape down to 85% might be fem the ~15% ATM "tax" on 1500 byte traffic size...
> As I understand it, the current recommendation is to use tc_stab in preference to htb_private. I do not know the basis for this value judgement.
In short: tc_stab allows negative overheads, tc_stab works with HTB, TBF, HFSC while htb_private only works with HTB. Currently htb_private has two advantages: it will estimate the per packet overhead correctly of GSO (generic segmentation offload) is enabled and it will produce exact ATM link layer estimates for all possible packet sizes. In practice almost everyone uses an MTU of 1500 or less for their internet access making both htb_private advantages effectively moot. (Plus if no one beats me to it I intend to address both theoretical short coming of tc_stab next year).
Best Regards
Sebastian
>
>
>
>
>
> On 28/12/13 10:01, Sebastian Moeller wrote:
>> Hi Rich,
>>
>> great! A few comments:
>>
>> Basic Settings:
>> [Is 95% the right fudge factor?] I think that ideally, if we get can precisely measure the useable link rate even 99% of that should work out well, to keep the queue in our device. I assume that due to the difficulties in measuring and accounting for the link properties as link layer and overhead people typically rely on setting the shaped rate a bit lower than required to stochastically/empirically account for the link properties. I predict that if we get a correct description of the link properties to the shaper we should be fine with 95% shaping. Note though, it is not trivial on an adel link to get the actually useable bit rate from the modem so 95% of what can be deduced from the modem or the ISP's invoice might be a decent proxy…
>>
>> [Do we have a recommendation for an easy way to tell if it's working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked page looks like a decent probe for buffer bloat.
>>
>>
>>> Basic Settings - the details...
>>>
>>> CeroWrt is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet.
>>>
>> I think we can only actually control the first link to the ISP, which often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) the cumulative sold bandwidth to the customers is larger than the back bone connection (which is called over-subscription and is almost guaranteed to be the case in every DSLAM) which typically is not a problem, as typically people do not use their internet that much. My point being we can not really control congestion in the DSLAM's uplink (as we have no idea what the reserved rate per customer is in the worst case, if there is any).
>>
>>
>>> CeroWrt can automatically adapt to network conditions to improve the delay/latency of data without any settings.
>>>
>> Does this describe the default fq_codels on each interface (except fib?)?
>>
>>
>>> However, it can do a better job if it knows more about the actual link speeds available. You can adjust this setting by entering link speeds that are a few percent below the actual speeds.
>>>
>>> Note: it can be difficult to get an accurate measurement of the link speeds. The speed advertised by your provider is a starting point, but your experience often won't meet their published specs. You can also use a speed test program or web site like
>>> http://speedtest.net
>>> to estimate actual operating speeds.
>>>
>> While this approach is commonly recommended on the internet, I do not believe that it is that useful. Between a user and the speediest site there are a number of potential congestion points that can affect (reduce) the throughput, like bad peering. Now that said the sppedtets will report something <= the actual link speed and hence be conservative (interactivity stays great at 90% of link rate as well as 80% so underestimating the bandwidth within reason does not affect the latency gains from traffic shaping it just sacrifices a bit more bandwidth; and given the difficulty to actually measure the actually attainable bandwidth might have been effectively a decent recommendation even though the theory of it seems flawed)
>>
>>
>>> Be sure to make your measurement when network is quiet, and others in your home aren’t generating traffic.
>>>
>> This is great advise.
>>
>> I would love to comment further, but after reloading
>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>> just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>>
>> Best
>> Sebastian
>>
>>
>> On Dec 27, 2013, at 23:09 , Rich Brown
>> <richb.hanover@gmail.com>
>> wrote:
>>
>>
>>>> You are a very good writer and I am on a tablet.
>>>>
>>>>
>>> Thanks!
>>>
>>>> Ill take a pass at the wiki tomorrow.
>>>>
>>>> The shaper does up and down was my first thought...
>>>>
>>>>
>>> Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
>>>
>>> Rich
>>>
>>>
>>>> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com>
>>>> wrote:
>>>> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>>>>
>>>>
>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>
>>>>
>>>> There are still lots of open questions. Comments, please.
>>>>
>>>> Rich
>>>> _______________________________________________
>>>> Cerowrt-devel mailing list
>>>>
>>>> Cerowrt-devel@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>>
>>> Cerowrt-devel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> _______________________________________________
>> Cerowrt-devel mailing list
>>
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 10:01 ` Sebastian Moeller
2013-12-28 11:09 ` Fred Stratton
@ 2013-12-28 14:27 ` Rich Brown
2013-12-28 20:24 ` Sebastian Moeller
1 sibling, 1 reply; 14+ messages in thread
From: Rich Brown @ 2013-12-28 14:27 UTC (permalink / raw)
To: Sebastian Moeller; +Cc: cerowrt-devel
Hi Sebastian,
> I would love to comment further, but after reloading http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
I’m not sure what happened to this page for you. It’s available now (at least to me) at that URL…
Rich
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 13:42 ` Sebastian Moeller
@ 2013-12-28 14:27 ` Fred Stratton
2013-12-28 19:54 ` Sebastian Moeller
0 siblings, 1 reply; 14+ messages in thread
From: Fred Stratton @ 2013-12-28 14:27 UTC (permalink / raw)
To: Sebastian Moeller, cerowrt-devel
[-- Attachment #1: Type: text/plain, Size: 18712 bytes --]
On 28/12/13 13:42, Sebastian Moeller wrote:
> Hi Fred,
>
>
> On Dec 28, 2013, at 12:09 , Fred Stratton <fredstratton@imap.cc> wrote:
>
>> IThe UK consensus fudge factor has always been 85 per cent of the rate achieved, not 95 or 99 per cent.
> I know that the recommendations have been lower in the past; I think this is partly because before Jesper Brouer's and Russels Stuart's work to properly account for ATM "quantization" people typically had to deal with a ~10% rate tax for the 5byte per cell overhead (48 byte payload in 53 byte cells 90.57% useable rate) plus an additional 5% to stochastically account for the padding of the last cell and the per packet overhead both of which affect the effective good put way more for small than large packets, so the 85% never worked well for all packet sizes. My hypothesis now is since we can and do properly account for these effects of ATM framing we can afford to start with a fudge factor of 90% or even 95% percent. As far as I know the recommended fudge factors are never ever explained by more than "this works empirically"...
The fudge factors are totally empirical. IF you are proposing a more
formal approach, I shall try a 90 per cent fudge factor, although
'current rate' varies here.
>
>> Devices express 2 values: the sync rate - or 'maximum rate attainable' - and the dynamic value of 'current rate'.
> The actual data rate is the relevant information for shaping, often DSL modems report the link capacity as "maximum rate attainable" or some such, while the actual bandwidth is limited to a rate below what the line would support by contract (often this bandwidth reduction is performed on the PPPoE link to the BRAS).
>
>> As the sync rate is fairly stable for any given installation - ADSL or Fibre - this could be used as a starting value. decremented by the traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor applied to that.
> I would like to propose to use the "current rate" as starting point, as 'maximum rate attainable' >= 'current rate'.
'current rate' is still a sync rate, and so is conventionally viewed as
15 per cent above the unmeasurable actual rate. As you are proposing a
new approach, I shall take 90 per cent of 'current rate' as a starting
point.
No one in the UK uses SRA currently. One small ISP used to. The ISP I
currently use has Dynamic Line Management, which changes target SNR
constantly. The DSLAM is made by Infineon.
>
>> Fibre - FTTC - connections can suffer quite large download speed fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon is not confined to ADSL links.
> On the actual xDSL link? As far as I know no telco actually uses SRA (seamless rate adaptation or so) so the current link speed will only get lower not higher, so I would expect a relative stable current rate (it might take a while, a few days to actually slowly degrade to the highest link speed supported under all conditions, but I hope you still get my point)
I understand the point, but do not think it is the case, from data I
have seen, but cannot find now, unfortunately.
>
>>
>> An alternative speed test is something like this
>>
>> http://download.bethere.co.uk/downloadMeter.html
>>
>> which, as Be has been bought by Sky, may not exist after the end of April 2014.
> But, if we recommend to run speed tests we really need to advise our users to start several concurrent up- and downloads to independent servers to actually measure the bandwidth of our bottleneck link; often a single server connection will not saturate a link (I seem to recall that with TCP it is guaranteed to only reach 75% or so averaged over time, is that correct?).
> But I think this is not the proper way to set the bandwidth for the shaper, because upstream of our link to the ISP we have no guaranteed bandwidth at all and just can hope the ISP is oing the right thing AQM-wise.
I quote the Be site as an alternative to a java based approach. I would
be very happy to see your suggestion adopted.
>
>
>> • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?]
>>
>> For a PPPoA service, the PPPoA link is treated as PPPoE on the second device, here running ceroWRT.
> This still means you should specify the PPPoA overhead, not PPPoE.
I shall try the PPPoA overhead.
>
>> The packet overhead values are written in the dubious man page for tc_stab.
> The only real flaw in that man page, as far as I know, is the fact that it indicates that the kernel will account for the 18byte ethernet header automatically, while the kernel does no such thing (which I hope to change).
It mentions link layer types as 'atm' ethernet' and 'adsl'. There is no
reference anywhere to the last. I do not see its relevance.
>
>> Sebastian has a potential alternative method of formal calculation.
> So, I have no formal calculation method available, but an empirical way of detecting ATM quantization as well as measuring the per packet overhead of an ATM link.
> The idea is to measure the RTT of ICMP packets of increasing length and then displaying the distribution of RTTs by ICMP packet length, on an ATM carrier we expect to see a step function with steps 48 bytes apart. For non-ATM carrier we expect to rather see a smooth ramp. By comparing the residuals of a linear fit of the data with the residuals of the best step function fit to the data. The fit with the lower residuals "wins". Attached you will find an example of this approach, ping data in red (median of NNN repetitions for each ICMP packet size), linear fit in blue, and best staircase fit in green. You notice that data starts somewhere in a 48 byte ATM cell. Since the ATM encapsulation overhead is maximally 44 bytes and we know the IP and ICMP overhead of the ping probe we can calculate the overhead preceding the IP header, which is what needs to be put in the overhead field in the GUI. (Note where the green line intersect the y-axis at 0 bytes packet size? this is where the IP header starts, the "missing" part of this ATM cell is the overhead).
You are curve fitting. This is calculation.
>
>
>
>
>
> Believe it or not, this methods works reasonable well (I tested successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection (overhead 32 bytes), and several PPPOE, LLC, (overhead 40) connections (from ADSL1 @ 3008/512 to ADSL2+ @ 16402/2558)). But it takes relative long time to measure the ping train especially at the higher rates… and it requires ping time stamps with decent resolution (which rules out windows) and my naive data acquisition scripts creates really large raw data files. I guess I should post the code somewhere so others can test and improve it.
> Fred I would be delighted to get a data set from your connection, to test a known different encapsulation.
I shall try this. If successful, I shall initially pass you the raw
data. I have not used MatLab since the 1980s.
>
>> TYPICAL OVERHEADS
>> The following values are typical for different adsl scenarios (based on
>> [1] and [2]):
>>
>> LLC based:
>> PPPoA - 14 (PPP - 2, ATM - 12)
>> PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS - 4+padding)
>> Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding)
>> IPoA - 16 (ATM - 16)
>>
>> VC Mux based:
>> PPPoA - 10 (PPP - 2, ATM - 8)
>> PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS - 4+padding)
>> Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding)
>> IPoA - 8 (ATM - 8)
>>
>>
>> For VC Mux based PPPoA, I am currently using an overhead of 18 for the PPPoE setting in ceroWRT.
> Yeah we could put this list into the wiki, but how shall a typical user figure out which encapsulation is used? And good luck in figuring out whether the frame check sequence (FCS) is included or not…
> BTW 18, I predict that if PPPoE is only used between cerowrt and the "modem' or gateway your effective overhead should be 10 bytes; I would love if you could run the following against your link at night (also attached
>
>
> ):
>
> #! /bin/bash
> # TODO use seq or bash to generate a list of the requested sizes (to allow for non-equidistantly spaced sizes)
>
> #.
> TECH=ADSL2 # just to give some meaning to the ping trace file name
> # finding a proper target IP is somewhat of an art, just traceroute a remote site.
> # and find the nearest host reliably responding to pings showing the smallet variation of pingtimes
> TARGET=${1} # the IP against which to run the ICMP pings
> DATESTR=`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential records
> LOG=ping_sweep_${TECH}_${DATESTR}.txt
>
>
> # by default non-root ping will only end one packet per second, so work around that by calling ping independently for each package
> # empirically figure out the shortest period still giving the standard ping time (to avoid being slow-pathed by our target)
> PINGPERIOD=0.01><------># in seconds
> PINGSPERSIZE=10000
>
> # Start, needed to find the per packet overhead dependent on the ATM encapsulation
> # to reiably show ATM quantization one would like to see at least two steps, so cover a range > 2 ATM cells (so > 96 bytes)
> SWEEPMINSIZE=16><------># 64bit systems seem to require 16 bytes of payload to include a timestamp...
> SWEEPMAXSIZE=116
>
> n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}`
>
> i_sweep=0
> i_size=0
>
> echo "Running ICMP RTT measurement against: ${TARGET}"
> while [ ${i_sweep} -lt ${PINGSPERSIZE} ]
> do
> (( i_sweep++ ))
> echo "Current iteration: ${i_sweep}"
> # now loop from sweepmin to sweepmax
> i_size=${SWEEPMINSIZE}
> while [ ${i_size} -le ${SWEEPMAXSIZE} ]
> do
> echo "${i_sweep}. repetition of ping size ${i_size}"
> ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\
> (( i_size++ ))
> # we need a sleep binary that allows non integer times (GNU sleep is fine as is sleep of macosx 10.8.4)
> sleep ${PINGPERIOD}
> done
> done
> echo "Done... ($0)"
>
>
> This will try to run 10000 repetitions for ICMP packet sizes from 16 to 116 bytes running (10000 * 101 * 0.01 / 60 =) 168 minutes, but you should be able to stop it with ctrl c if you are not patience enough, with your link I would estimate that 3000 should be plenty, but if you could run it over night that would be great and then ~3 hours should not matter much.
> And then run the following attached code in octave or matlab
>
>
> . Invoce with "tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_said_file')". The parser will run on the first invocation and is reallr really slow, but further invocations should be faster. If issues arise, let me know, I am happy to help.
>
>> Were I to use a single directly connected gateway, I would input a suitable value for PPPoA in that openWRT firmware.
> I think you should do that right now.
The firmware has not yet been released.
>
>> In theory, I might need to use a negative value, bmt the current kernel does not support that.
> If you use tc_stab, negative overheads are fully supported, only htb_private has overhead defined as unsigned integer and hence does not allow negative values.
Jesper Brouer posted about this. I thought he was referring to tc_stab.
>
>> I have used many different arbitrary values for overhead. All appear to have little effect.
> So the issue here is that only at small packet sizes does the overhead and last cell padding eat a disproportionate amount of your bandwidth (64 byte packet plus 44 byte overhead plus 47 byte worst case cell padding: 100* (44+47+64)/64 = 242% effective packet size to what the shaper estimated ), at typical packet sizes the max error (44 bytes missing overhead and potentially misjudged cell padding of 47 bytes adds up to a theoretical 100*(44+47+1500)/1500 = 106% effective packet size to what the shaper estimated). It is obvious that at 1500 byte packets the whole ATM issue can be easily dismissed with just reducing the link rate by ~10% for the 48 in 53 framing and an additional ~6% for overhead and cell padding. But once you mix smaller packets in your traffic for say VoIP, the effective wire size misjudgment will kill your ability to control the queueing. Note that the common wisdom of shape down to 85% might be fem the ~15% ATM "tax" on 1500 byte traffic size...
>
>> As I understand it, the current recommendation is to use tc_stab in preference to htb_private. I do not know the basis for this value judgement.
> In short: tc_stab allows negative overheads, tc_stab works with HTB, TBF, HFSC while htb_private only works with HTB. Currently htb_private has two advantages: it will estimate the per packet overhead correctly of GSO (generic segmentation offload) is enabled and it will produce exact ATM link layer estimates for all possible packet sizes. In practice almost everyone uses an MTU of 1500 or less for their internet access making both htb_private advantages effectively moot. (Plus if no one beats me to it I intend to address both theoretical short coming of tc_stab next year).
>
> Best Regards
> Sebastian
>
>>
>>
>>
>>
>> On 28/12/13 10:01, Sebastian Moeller wrote:
>>> Hi Rich,
>>>
>>> great! A few comments:
>>>
>>> Basic Settings:
>>> [Is 95% the right fudge factor?] I think that ideally, if we get can precisely measure the useable link rate even 99% of that should work out well, to keep the queue in our device. I assume that due to the difficulties in measuring and accounting for the link properties as link layer and overhead people typically rely on setting the shaped rate a bit lower than required to stochastically/empirically account for the link properties. I predict that if we get a correct description of the link properties to the shaper we should be fine with 95% shaping. Note though, it is not trivial on an adel link to get the actually useable bit rate from the modem so 95% of what can be deduced from the modem or the ISP's invoice might be a decent proxy…
>>>
>>> [Do we have a recommendation for an easy way to tell if it's working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked page looks like a decent probe for buffer bloat.
>>>
>>>
>>>> Basic Settings - the details...
>>>>
>>>> CeroWrt is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet.
>>>>
>>> I think we can only actually control the first link to the ISP, which often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) the cumulative sold bandwidth to the customers is larger than the back bone connection (which is called over-subscription and is almost guaranteed to be the case in every DSLAM) which typically is not a problem, as typically people do not use their internet that much. My point being we can not really control congestion in the DSLAM's uplink (as we have no idea what the reserved rate per customer is in the worst case, if there is any).
>>>
>>>
>>>> CeroWrt can automatically adapt to network conditions to improve the delay/latency of data without any settings.
>>>>
>>> Does this describe the default fq_codels on each interface (except fib?)?
>>>
>>>
>>>> However, it can do a better job if it knows more about the actual link speeds available. You can adjust this setting by entering link speeds that are a few percent below the actual speeds.
>>>>
>>>> Note: it can be difficult to get an accurate measurement of the link speeds. The speed advertised by your provider is a starting point, but your experience often won't meet their published specs. You can also use a speed test program or web site like
>>>> http://speedtest.net
>>>> to estimate actual operating speeds.
>>>>
>>> While this approach is commonly recommended on the internet, I do not believe that it is that useful. Between a user and the speediest site there are a number of potential congestion points that can affect (reduce) the throughput, like bad peering. Now that said the sppedtets will report something <= the actual link speed and hence be conservative (interactivity stays great at 90% of link rate as well as 80% so underestimating the bandwidth within reason does not affect the latency gains from traffic shaping it just sacrifices a bit more bandwidth; and given the difficulty to actually measure the actually attainable bandwidth might have been effectively a decent recommendation even though the theory of it seems flawed)
>>>
>>>
>>>> Be sure to make your measurement when network is quiet, and others in your home aren’t generating traffic.
>>>>
>>> This is great advise.
>>>
>>> I would love to comment further, but after reloading
>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>> just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>>>
>>> Best
>>> Sebastian
>>>
>>>
>>> On Dec 27, 2013, at 23:09 , Rich Brown
>>> <richb.hanover@gmail.com>
>>> wrote:
>>>
>>>
>>>>> You are a very good writer and I am on a tablet.
>>>>>
>>>>>
>>>> Thanks!
>>>>
>>>>> Ill take a pass at the wiki tomorrow.
>>>>>
>>>>> The shaper does up and down was my first thought...
>>>>>
>>>>>
>>>> Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
>>>>
>>>> Rich
>>>>
>>>>
>>>>> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com>
>>>>> wrote:
>>>>> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>>>>>
>>>>>
>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>>
>>>>>
>>>>> There are still lots of open questions. Comments, please.
>>>>>
>>>>> Rich
>>>>> _______________________________________________
>>>>> Cerowrt-devel mailing list
>>>>>
>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>> _______________________________________________
>>>> Cerowrt-devel mailing list
>>>>
>>>> Cerowrt-devel@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>>
>>> Cerowrt-devel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
[-- Attachment #2: Type: text/html, Size: 23938 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 14:27 ` Fred Stratton
@ 2013-12-28 19:54 ` Sebastian Moeller
2013-12-28 20:09 ` Fred Stratton
0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2013-12-28 19:54 UTC (permalink / raw)
To: Fred Stratton; +Cc: cerowrt-devel
Hi Fred,
On Dec 28, 2013, at 15:27 , Fred Stratton <fredstratton@imap.cc> wrote:
>
> On 28/12/13 13:42, Sebastian Moeller wrote:
>> Hi Fred,
>>
>>
>> On Dec 28, 2013, at 12:09 , Fred Stratton
>> <fredstratton@imap.cc>
>> wrote:
>>
>>
>>> IThe UK consensus fudge factor has always been 85 per cent of the rate achieved, not 95 or 99 per cent.
>>>
>> I know that the recommendations have been lower in the past; I think this is partly because before Jesper Brouer's and Russels Stuart's work to properly account for ATM "quantization" people typically had to deal with a ~10% rate tax for the 5byte per cell overhead (48 byte payload in 53 byte cells 90.57% useable rate) plus an additional 5% to stochastically account for the padding of the last cell and the per packet overhead both of which affect the effective good put way more for small than large packets, so the 85% never worked well for all packet sizes. My hypothesis now is since we can and do properly account for these effects of ATM framing we can afford to start with a fudge factor of 90% or even 95% percent. As far as I know the recommended fudge factors are never ever explained by more than "this works empirically"...
>
> The fudge factors are totally empirical. IF you are proposing a more formal approach, I shall try a 90 per cent fudge factor, although 'current rate' varies here.
My hypothesis is that we can get away with less fudge as we have a better handle on the actual wire size. Personally, I do start at 95% to figure out the trade-off between bandwidth loss and latency increase.
>>
>>> Devices express 2 values: the sync rate - or 'maximum rate attainable' - and the dynamic value of 'current rate'.
>>>
>> The actual data rate is the relevant information for shaping, often DSL modems report the link capacity as "maximum rate attainable" or some such, while the actual bandwidth is limited to a rate below what the line would support by contract (often this bandwidth reduction is performed on the PPPoE link to the BRAS).
>>
>>
>>> As the sync rate is fairly stable for any given installation - ADSL or Fibre - this could be used as a starting value. decremented by the traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor applied to that.
>>>
>> I would like to propose to use the "current rate" as starting point, as 'maximum rate attainable' >= 'current rate'.
>
> 'current rate' is still a sync rate, and so is conventionally viewed as 15 per cent above the unmeasurable actual rate.
No no, the current rate really is the current link capacity between modem and DSLAM (or CPE and CTS), only this rate typically is for the raw ATM stream, so we have to subtract all the additional layers until we reach the IP layer...
> As you are proposing a new approach, I shall take 90 per cent of 'current rate' as a starting point.
I would love to learn how that works put for you. Because for all my theories about why 85% was used, the proof still is in the (plum-) pudding...
>
> No one in the UK uses SRA currently. One small ISP used to.
That is sad, because on paper SRA looks like a good feature to have (lower bandwidth sure beats synchronization loss).
> The ISP I currently use has Dynamic Line Management, which changes target SNR constantly.
Now that is much better, as we should neuter notice nor care; I assume that this happens on layers below ATM even.
> The DSLAM is made by Infineon.
>
>
>>
>>> Fibre - FTTC - connections can suffer quite large download speed fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon is not confined to ADSL links.
>>>
>> On the actual xDSL link? As far as I know no telco actually uses SRA (seamless rate adaptation or so) so the current link speed will only get lower not higher, so I would expect a relative stable current rate (it might take a while, a few days to actually slowly degrade to the highest link speed supported under all conditions, but I hope you still get my point)
> I understand the point, but do not think it is the case, from data I have seen, but cannot find now, unfortunately.
I see, maybe my assumption here is wrong, I would love to see data though before changing my hypothesis.
>>
>>>
>>> An alternative speed test is something like this
>>>
>>>
>>> http://download.bethere.co.uk/downloadMeter.html
>>>
>>>
>>> which, as Be has been bought by Sky, may not exist after the end of April 2014.
>>>
>> But, if we recommend to run speed tests we really need to advise our users to start several concurrent up- and downloads to independent servers to actually measure the bandwidth of our bottleneck link; often a single server connection will not saturate a link (I seem to recall that with TCP it is guaranteed to only reach 75% or so averaged over time, is that correct?).
>> But I think this is not the proper way to set the bandwidth for the shaper, because upstream of our link to the ISP we have no guaranteed bandwidth at all and just can hope the ISP is oing the right thing AQM-wise.
>>
>
> I quote the Be site as an alternative to a java based approach. I would be very happy to see your suggestion adopted.
>>
>>
>>
>>> • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?]
>>>
>>> For a PPPoA service, the PPPoA link is treated as PPPoE on the second device, here running ceroWRT.
>>>
>> This still means you should specify the PPPoA overhead, not PPPoE.
>
> I shall try the PPPoA overhead.
Great, let me know how that works.
>>
>>> The packet overhead values are written in the dubious man page for tc_stab.
>>>
>> The only real flaw in that man page, as far as I know, is the fact that it indicates that the kernel will account for the 18byte ethernet header automatically, while the kernel does no such thing (which I hope to change).
> It mentions link layer types as 'atm' ethernet' and 'adsl'. There is no reference anywhere to the last. I do not see its relevance.
If you have a look inside the source code for tc and the kernel, you will notice that atm and adel are aliases for the same thing. I just think that we should keep naming the thing ATM since that is the problematic layer in the stack that causes most of the useable link rate judgements, adel just happens to use ATM exclusively.
>>
>>> Sebastian has a potential alternative method of formal calculation.
>>>
>> So, I have no formal calculation method available, but an empirical way of detecting ATM quantization as well as measuring the per packet overhead of an ATM link.
>> The idea is to measure the RTT of ICMP packets of increasing length and then displaying the distribution of RTTs by ICMP packet length, on an ATM carrier we expect to see a step function with steps 48 bytes apart. For non-ATM carrier we expect to rather see a smooth ramp. By comparing the residuals of a linear fit of the data with the residuals of the best step function fit to the data. The fit with the lower residuals "wins". Attached you will find an example of this approach, ping data in red (median of NNN repetitions for each ICMP packet size), linear fit in blue, and best staircase fit in green. You notice that data starts somewhere in a 48 byte ATM cell. Since the ATM encapsulation overhead is maximally 44 bytes and we know the IP and ICMP overhead of the ping probe we can calculate the overhead preceding the IP header, which is what needs to be put in the overhead field in the GUI. (Note where the green line intersect the y-axis at 0 bytes packet size? this is where the IP hea
>> der starts, the "missing" part of this ATM cell is the overhead).
>>
>
> You are curve fitting. This is calculation.
I see, that is certainly a valid way to look at it, just one that had not occurred to me.
>>
>>
>>
>>
>>
>>
>> Believe it or not, this methods works reasonable well (I tested successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection (overhead 32 bytes), and several PPPOE, LLC, (overhead 40) connections (from ADSL1 @ 3008/512 to ADSL2+ @ 16402/2558)). But it takes relative long time to measure the ping train especially at the higher rates… and it requires ping time stamps with decent resolution (which rules out windows) and my naive data acquisition scripts creates really large raw data files. I guess I should post the code somewhere so others can test and improve it.
>> Fred I would be delighted to get a data set from your connection, to test a known different encapsulation.
>>
>
> I shall try this. If successful, I shall initially pass you the raw data.
Great, but be warned this will be hundreds of megabytes. (For production use the measurement script would need to prune the generated log file down to the essential values… and potentially store the data in binary)
> I have not used MatLab since the 1980s.
Lucky you, I sort of have to use matlab in my day job and hence are most "fluent" in matlabese, but the code should also work with octave (I tested version 3.6.4) so it should be relatively easy to run the analysis yourself. That said, I would love to get a copy of the ping sweep :)
>>
>>> TYPICAL OVERHEADS
>>> The following values are typical for different adsl scenarios (based on
>>> [1] and [2]):
>>>
>>> LLC based:
>>> PPPoA - 14 (PPP - 2, ATM - 12)
>>> PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>> Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>> IPoA - 16 (ATM - 16)
>>>
>>> VC Mux based:
>>> PPPoA - 10 (PPP - 2, ATM - 8)
>>> PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>> Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>> IPoA - 8 (ATM - 8)
>>>
>>>
>>> For VC Mux based PPPoA, I am currently using an overhead of 18 for the PPPoE setting in ceroWRT.
>>>
>> Yeah we could put this list into the wiki, but how shall a typical user figure out which encapsulation is used? And good luck in figuring out whether the frame check sequence (FCS) is included or not…
>> BTW 18, I predict that if PPPoE is only used between cerowrt and the "modem' or gateway your effective overhead should be 10 bytes; I would love if you could run the following against your link at night (also attached
>>
>>
>>
>> ):
>>
>> #! /bin/bash
>> # TODO use seq or bash to generate a list of the requested sizes (to allow for non-equidistantly spaced sizes)
>>
>> #.
>> TECH=ADSL2 # just to give some meaning to the ping trace file name
>> # finding a proper target IP is somewhat of an art, just traceroute a remote site.
>> # and find the nearest host reliably responding to pings showing the smallet variation of pingtimes
>> TARGET=${1} # the IP against which to run the ICMP pings
>> DATESTR=`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential records
>> LOG=ping_sweep_${TECH}_${DATESTR}.txt
>>
>>
>> # by default non-root ping will only end one packet per second, so work around that by calling ping independently for each package
>> # empirically figure out the shortest period still giving the standard ping time (to avoid being slow-pathed by our target)
>> PINGPERIOD=0.01><------># in seconds
>> PINGSPERSIZE=10000
>>
>> # Start, needed to find the per packet overhead dependent on the ATM encapsulation
>> # to reiably show ATM quantization one would like to see at least two steps, so cover a range > 2 ATM cells (so > 96 bytes)
>> SWEEPMINSIZE=16><------># 64bit systems seem to require 16 bytes of payload to include a timestamp...
>> SWEEPMAXSIZE=116
>>
>> n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}`
>>
>> i_sweep=0
>> i_size=0
>>
>> echo "Running ICMP RTT measurement against: ${TARGET}"
>> while [ ${i_sweep} -lt ${PINGSPERSIZE} ]
>> do
>> (( i_sweep++ ))
>> echo "Current iteration: ${i_sweep}"
>> # now loop from sweepmin to sweepmax
>> i_size=${SWEEPMINSIZE}
>> while [ ${i_size} -le ${SWEEPMAXSIZE} ]
>> do
>> echo "${i_sweep}. repetition of ping size ${i_size}"
>> ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\
>> (( i_size++ ))
>> # we need a sleep binary that allows non integer times (GNU sleep is fine as is sleep of macosx 10.8.4)
>> sleep ${PINGPERIOD}
>> done
>> done
>> echo "Done... ($0)"
>>
>>
>> This will try to run 10000 repetitions for ICMP packet sizes from 16 to 116 bytes running (10000 * 101 * 0.01 / 60 =) 168 minutes, but you should be able to stop it with ctrl c if you are not patience enough, with your link I would estimate that 3000 should be plenty, but if you could run it over night that would be great and then ~3 hours should not matter much.
>> And then run the following attached code in octave or matlab
>>
>>
>>
>> . Invoce with "tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_said_file')". The parser will run on the first invocation and is reallr really slow, but further invocations should be faster. If issues arise, let me know, I am happy to help.
>>
>>
>>> Were I to use a single directly connected gateway, I would input a suitable value for PPPoA in that openWRT firmware.
>>>
>> I think you should do that right now.
>
> The firmware has not yet been released.
>>
>>> In theory, I might need to use a negative value, bmt the current kernel does not support that.
>>>
>> If you use tc_stab, negative overheads are fully supported, only htb_private has overhead defined as unsigned integer and hence does not allow negative values.
>
> Jesper Brouer posted about this. I thought he was referring to tc_stab.
I recall having a discussion with Jesper about this topic, where he agreed that tc_stab was not affected, only htb_private.
Best Regards
Sebastian
>>
>>> I have used many different arbitrary values for overhead. All appear to have little effect.
>>>
>> So the issue here is that only at small packet sizes does the overhead and last cell padding eat a disproportionate amount of your bandwidth (64 byte packet plus 44 byte overhead plus 47 byte worst case cell padding: 100* (44+47+64)/64 = 242% effective packet size to what the shaper estimated ), at typical packet sizes the max error (44 bytes missing overhead and potentially misjudged cell padding of 47 bytes adds up to a theoretical 100*(44+47+1500)/1500 = 106% effective packet size to what the shaper estimated). It is obvious that at 1500 byte packets the whole ATM issue can be easily dismissed with just reducing the link rate by ~10% for the 48 in 53 framing and an additional ~6% for overhead and cell padding. But once you mix smaller packets in your traffic for say VoIP, the effective wire size misjudgment will kill your ability to control the queueing. Note that the common wisdom of shape down to 85% might be fem the ~15% ATM "tax" on 1500 byte traffic size...
>>
>>
>>> As I understand it, the current recommendation is to use tc_stab in preference to htb_private. I do not know the basis for this value judgement.
>>>
>> In short: tc_stab allows negative overheads, tc_stab works with HTB, TBF, HFSC while htb_private only works with HTB. Currently htb_private has two advantages: it will estimate the per packet overhead correctly of GSO (generic segmentation offload) is enabled and it will produce exact ATM link layer estimates for all possible packet sizes. In practice almost everyone uses an MTU of 1500 or less for their internet access making both htb_private advantages effectively moot. (Plus if no one beats me to it I intend to address both theoretical short coming of tc_stab next year).
>>
>> Best Regards
>> Sebastian
>>
>>
>>>
>>>
>>>
>>>
>>> On 28/12/13 10:01, Sebastian Moeller wrote:
>>>
>>>> Hi Rich,
>>>>
>>>> great! A few comments:
>>>>
>>>> Basic Settings:
>>>> [Is 95% the right fudge factor?] I think that ideally, if we get can precisely measure the useable link rate even 99% of that should work out well, to keep the queue in our device. I assume that due to the difficulties in measuring and accounting for the link properties as link layer and overhead people typically rely on setting the shaped rate a bit lower than required to stochastically/empirically account for the link properties. I predict that if we get a correct description of the link properties to the shaper we should be fine with 95% shaping. Note though, it is not trivial on an adel link to get the actually useable bit rate from the modem so 95% of what can be deduced from the modem or the ISP's invoice might be a decent proxy…
>>>>
>>>> [Do we have a recommendation for an easy way to tell if it's working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked page looks like a decent probe for buffer bloat.
>>>>
>>>>
>>>>
>>>>> Basic Settings - the details...
>>>>>
>>>>> CeroWrt is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet.
>>>>>
>>>>>
>>>> I think we can only actually control the first link to the ISP, which often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) the cumulative sold bandwidth to the customers is larger than the back bone connection (which is called over-subscription and is almost guaranteed to be the case in every DSLAM) which typically is not a problem, as typically people do not use their internet that much. My point being we can not really control congestion in the DSLAM's uplink (as we have no idea what the reserved rate per customer is in the worst case, if there is any).
>>>>
>>>>
>>>>
>>>>> CeroWrt can automatically adapt to network conditions to improve the delay/latency of data without any settings.
>>>>>
>>>>>
>>>> Does this describe the default fq_codels on each interface (except fib?)?
>>>>
>>>>
>>>>
>>>>> However, it can do a better job if it knows more about the actual link speeds available. You can adjust this setting by entering link speeds that are a few percent below the actual speeds.
>>>>>
>>>>> Note: it can be difficult to get an accurate measurement of the link speeds. The speed advertised by your provider is a starting point, but your experience often won't meet their published specs. You can also use a speed test program or web site like
>>>>>
>>>>> http://speedtest.net
>>>>>
>>>>> to estimate actual operating speeds.
>>>>>
>>>>>
>>>> While this approach is commonly recommended on the internet, I do not believe that it is that useful. Between a user and the speediest site there are a number of potential congestion points that can affect (reduce) the throughput, like bad peering. Now that said the sppedtets will report something <= the actual link speed and hence be conservative (interactivity stays great at 90% of link rate as well as 80% so underestimating the bandwidth within reason does not affect the latency gains from traffic shaping it just sacrifices a bit more bandwidth; and given the difficulty to actually measure the actually attainable bandwidth might have been effectively a decent recommendation even though the theory of it seems flawed)
>>>>
>>>>
>>>>
>>>>> Be sure to make your measurement when network is quiet, and others in your home aren’t generating traffic.
>>>>>
>>>>>
>>>> This is great advise.
>>>>
>>>> I would love to comment further, but after reloading
>>>>
>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>
>>>> just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>>>>
>>>> Best
>>>> Sebastian
>>>>
>>>>
>>>> On Dec 27, 2013, at 23:09 , Rich Brown
>>>>
>>>> <richb.hanover@gmail.com>
>>>>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>>> You are a very good writer and I am on a tablet.
>>>>>>
>>>>>>
>>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>> Ill take a pass at the wiki tomorrow.
>>>>>>
>>>>>> The shaper does up and down was my first thought...
>>>>>>
>>>>>>
>>>>>>
>>>>> Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
>>>>>
>>>>> Rich
>>>>>
>>>>>
>>>>>
>>>>>> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com>
>>>>>>
>>>>>> wrote:
>>>>>> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>>>
>>>>>>
>>>>>>
>>>>>> There are still lots of open questions. Comments, please.
>>>>>>
>>>>>> Rich
>>>>>> _______________________________________________
>>>>>> Cerowrt-devel mailing list
>>>>>>
>>>>>>
>>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>> _______________________________________________
>>>>> Cerowrt-devel mailing list
>>>>>
>>>>>
>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>> _______________________________________________
>>>> Cerowrt-devel mailing list
>>>>
>>>>
>>>> Cerowrt-devel@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 19:54 ` Sebastian Moeller
@ 2013-12-28 20:09 ` Fred Stratton
2013-12-28 20:29 ` Sebastian Moeller
0 siblings, 1 reply; 14+ messages in thread
From: Fred Stratton @ 2013-12-28 20:09 UTC (permalink / raw)
To: Sebastian Moeller, cerowrt-devel
On 28/12/13 19:54, Sebastian Moeller wrote:
> Hi Fred,
>
>
> On Dec 28, 2013, at 15:27 , Fred Stratton <fredstratton@imap.cc> wrote:
>
>> On 28/12/13 13:42, Sebastian Moeller wrote:
>>> Hi Fred,
>>>
>>>
>>> On Dec 28, 2013, at 12:09 , Fred Stratton
>>> <fredstratton@imap.cc>
>>> wrote:
>>>
>>>
>>>> IThe UK consensus fudge factor has always been 85 per cent of the rate achieved, not 95 or 99 per cent.
>>>>
>>> I know that the recommendations have been lower in the past; I think this is partly because before Jesper Brouer's and Russels Stuart's work to properly account for ATM "quantization" people typically had to deal with a ~10% rate tax for the 5byte per cell overhead (48 byte payload in 53 byte cells 90.57% useable rate) plus an additional 5% to stochastically account for the padding of the last cell and the per packet overhead both of which affect the effective good put way more for small than large packets, so the 85% never worked well for all packet sizes. My hypothesis now is since we can and do properly account for these effects of ATM framing we can afford to start with a fudge factor of 90% or even 95% percent. As far as I know the recommended fudge factors are never ever explained by more than "this works empirically"...
>> The fudge factors are totally empirical. IF you are proposing a more formal approach, I shall try a 90 per cent fudge factor, although 'current rate' varies here.
> My hypothesis is that we can get away with less fudge as we have a better handle on the actual wire size. Personally, I do start at 95% to figure out the trade-off between bandwidth loss and latency increase.
You are now saying something slightly different. You are implying now
that you are starting at 95 per cent, and then reducing the nominal
download speed until you achieve an unspecified endpoint.
>
>>>> Devices express 2 values: the sync rate - or 'maximum rate attainable' - and the dynamic value of 'current rate'.
>>>>
>>> The actual data rate is the relevant information for shaping, often DSL modems report the link capacity as "maximum rate attainable" or some such, while the actual bandwidth is limited to a rate below what the line would support by contract (often this bandwidth reduction is performed on the PPPoE link to the BRAS).
>>>
>>>
>>>> As the sync rate is fairly stable for any given installation - ADSL or Fibre - this could be used as a starting value. decremented by the traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor applied to that.
>>>>
>>> I would like to propose to use the "current rate" as starting point, as 'maximum rate attainable' >= 'current rate'.
>> 'current rate' is still a sync rate, and so is conventionally viewed as 15 per cent above the unmeasurable actual rate.
> No no, the current rate really is the current link capacity between modem and DSLAM (or CPE and CTS), only this rate typically is for the raw ATM stream, so we have to subtract all the additional layers until we reach the IP layer...
You are saying the same thing as I am.
>
>> As you are proposing a new approach, I shall take 90 per cent of 'current rate' as a starting point.
> I would love to learn how that works put for you. Because for all my theories about why 85% was used, the proof still is in the (plum-) pudding...
>
>> No one in the UK uses SRA currently. One small ISP used to.
> That is sad, because on paper SRA looks like a good feature to have (lower bandwidth sure beats synchronization loss).
>
>> The ISP I currently use has Dynamic Line Management, which changes target SNR constantly.
> Now that is much better, as we should neuter notice nor care; I assume that this happens on layers below ATM even.
>
>> The DSLAM is made by Infineon.
>>
>>
>>>> Fibre - FTTC - connections can suffer quite large download speed fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon is not confined to ADSL links.
>>>>
>>> On the actual xDSL link? As far as I know no telco actually uses SRA (seamless rate adaptation or so) so the current link speed will only get lower not higher, so I would expect a relative stable current rate (it might take a while, a few days to actually slowly degrade to the highest link speed supported under all conditions, but I hope you still get my point)
>> I understand the point, but do not think it is the case, from data I have seen, but cannot find now, unfortunately.
> I see, maybe my assumption here is wrong, I would love to see data though before changing my hypothesis.
>
>>>> An alternative speed test is something like this
>>>>
>>>>
>>>> http://download.bethere.co.uk/downloadMeter.html
>>>>
>>>>
>>>> which, as Be has been bought by Sky, may not exist after the end of April 2014.
>>>>
>>> But, if we recommend to run speed tests we really need to advise our users to start several concurrent up- and downloads to independent servers to actually measure the bandwidth of our bottleneck link; often a single server connection will not saturate a link (I seem to recall that with TCP it is guaranteed to only reach 75% or so averaged over time, is that correct?).
>>> But I think this is not the proper way to set the bandwidth for the shaper, because upstream of our link to the ISP we have no guaranteed bandwidth at all and just can hope the ISP is oing the right thing AQM-wise.
>>>
>> I quote the Be site as an alternative to a java based approach. I would be very happy to see your suggestion adopted.
>>>
>>>
>>>> • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?]
>>>>
>>>> For a PPPoA service, the PPPoA link is treated as PPPoE on the second device, here running ceroWRT.
>>>>
>>> This still means you should specify the PPPoA overhead, not PPPoE.
>> I shall try the PPPoA overhead.
> Great, let me know how that works.
>
>>>> The packet overhead values are written in the dubious man page for tc_stab.
>>>>
>>> The only real flaw in that man page, as far as I know, is the fact that it indicates that the kernel will account for the 18byte ethernet header automatically, while the kernel does no such thing (which I hope to change).
>> It mentions link layer types as 'atm' ethernet' and 'adsl'. There is no reference anywhere to the last. I do not see its relevance.
> If you have a look inside the source code for tc and the kernel, you will notice that atm and adel are aliases for the same thing. I just think that we should keep naming the thing ATM since that is the problematic layer in the stack that causes most of the useable link rate judgements, adel just happens to use ATM exclusively.
I have reviewed the source. I see what you mean.
>
>>>> Sebastian has a potential alternative method of formal calculation.
>>>>
>>> So, I have no formal calculation method available, but an empirical way of detecting ATM quantization as well as measuring the per packet overhead of an ATM link.
>>> The idea is to measure the RTT of ICMP packets of increasing length and then displaying the distribution of RTTs by ICMP packet length, on an ATM carrier we expect to see a step function with steps 48 bytes apart. For non-ATM carrier we expect to rather see a smooth ramp. By comparing the residuals of a linear fit of the data with the residuals of the best step function fit to the data. The fit with the lower residuals "wins". Attached you will find an example of this approach, ping data in red (median of NNN repetitions for each ICMP packet size), linear fit in blue, and best staircase fit in green. You notice that data starts somewhere in a 48 byte ATM cell. Since the ATM encapsulation overhead is maximally 44 bytes and we know the IP and ICMP overhead of the ping probe we can calculate the overhead preceding the IP header, which is what needs to be put in the overhead field in the GUI. (Note where the green line intersect the y-axis at 0 bytes packet size? this is where the IP hea
>>> der starts, the "missing" part of this ATM cell is the overhead).
>>>
>> You are curve fitting. This is calculation.
> I see, that is certainly a valid way to look at it, just one that had not occurred to me.
>
>>>
>>>
>>>
>>>
>>>
>>> Believe it or not, this methods works reasonable well (I tested successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection (overhead 32 bytes), and several PPPOE, LLC, (overhead 40) connections (from ADSL1 @ 3008/512 to ADSL2+ @ 16402/2558)). But it takes relative long time to measure the ping train especially at the higher rates… and it requires ping time stamps with decent resolution (which rules out windows) and my naive data acquisition scripts creates really large raw data files. I guess I should post the code somewhere so others can test and improve it.
>>> Fred I would be delighted to get a data set from your connection, to test a known different encapsulation.
>>>
>> I shall try this. If successful, I shall initially pass you the raw data.
> Great, but be warned this will be hundreds of megabytes. (For production use the measurement script would need to prune the generated log file down to the essential values… and potentially store the data in binary)
>
>> I have not used MatLab since the 1980s.
> Lucky you, I sort of have to use matlab in my day job and hence are most "fluent" in matlabese, but the code should also work with octave (I tested version 3.6.4) so it should be relatively easy to run the analysis yourself. That said, I would love to get a copy of the ping sweep :)
>
>>>> TYPICAL OVERHEADS
>>>> The following values are typical for different adsl scenarios (based on
>>>> [1] and [2]):
>>>>
>>>> LLC based:
>>>> PPPoA - 14 (PPP - 2, ATM - 12)
>>>> PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>>> Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>>> IPoA - 16 (ATM - 16)
>>>>
>>>> VC Mux based:
>>>> PPPoA - 10 (PPP - 2, ATM - 8)
>>>> PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>>> Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>>> IPoA - 8 (ATM - 8)
>>>>
>>>>
>>>> For VC Mux based PPPoA, I am currently using an overhead of 18 for the PPPoE setting in ceroWRT.
>>>>
>>> Yeah we could put this list into the wiki, but how shall a typical user figure out which encapsulation is used? And good luck in figuring out whether the frame check sequence (FCS) is included or not…
>>> BTW 18, I predict that if PPPoE is only used between cerowrt and the "modem' or gateway your effective overhead should be 10 bytes; I would love if you could run the following against your link at night (also attached
>>>
>>>
>>>
>>> ):
>>>
>>> #! /bin/bash
>>> # TODO use seq or bash to generate a list of the requested sizes (to allow for non-equidistantly spaced sizes)
>>>
>>> #.
>>> TECH=ADSL2 # just to give some meaning to the ping trace file name
>>> # finding a proper target IP is somewhat of an art, just traceroute a remote site.
>>> # and find the nearest host reliably responding to pings showing the smallet variation of pingtimes
>>> TARGET=${1} # the IP against which to run the ICMP pings
>>> DATESTR=`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential records
>>> LOG=ping_sweep_${TECH}_${DATESTR}.txt
>>>
>>>
>>> # by default non-root ping will only end one packet per second, so work around that by calling ping independently for each package
>>> # empirically figure out the shortest period still giving the standard ping time (to avoid being slow-pathed by our target)
>>> PINGPERIOD=0.01><------># in seconds
>>> PINGSPERSIZE=10000
>>>
>>> # Start, needed to find the per packet overhead dependent on the ATM encapsulation
>>> # to reiably show ATM quantization one would like to see at least two steps, so cover a range > 2 ATM cells (so > 96 bytes)
>>> SWEEPMINSIZE=16><------># 64bit systems seem to require 16 bytes of payload to include a timestamp...
>>> SWEEPMAXSIZE=116
>>>
>>> n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}`
>>>
>>> i_sweep=0
>>> i_size=0
>>>
>>> echo "Running ICMP RTT measurement against: ${TARGET}"
>>> while [ ${i_sweep} -lt ${PINGSPERSIZE} ]
>>> do
>>> (( i_sweep++ ))
>>> echo "Current iteration: ${i_sweep}"
>>> # now loop from sweepmin to sweepmax
>>> i_size=${SWEEPMINSIZE}
>>> while [ ${i_size} -le ${SWEEPMAXSIZE} ]
>>> do
>>> echo "${i_sweep}. repetition of ping size ${i_size}"
>>> ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\
>>> (( i_size++ ))
>>> # we need a sleep binary that allows non integer times (GNU sleep is fine as is sleep of macosx 10.8.4)
>>> sleep ${PINGPERIOD}
>>> done
>>> done
>>> echo "Done... ($0)"
>>>
>>>
>>> This will try to run 10000 repetitions for ICMP packet sizes from 16 to 116 bytes running (10000 * 101 * 0.01 / 60 =) 168 minutes, but you should be able to stop it with ctrl c if you are not patience enough, with your link I would estimate that 3000 should be plenty, but if you could run it over night that would be great and then ~3 hours should not matter much.
>>> And then run the following attached code in octave or matlab
>>>
>>>
>>>
>>> . Invoce with "tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_said_file')". The parser will run on the first invocation and is reallr really slow, but further invocations should be faster. If issues arise, let me know, I am happy to help.
>>>
>>>
>>>> Were I to use a single directly connected gateway, I would input a suitable value for PPPoA in that openWRT firmware.
>>>>
>>> I think you should do that right now.
>> The firmware has not yet been released.
>>>> In theory, I might need to use a negative value, bmt the current kernel does not support that.
>>>>
>>> If you use tc_stab, negative overheads are fully supported, only htb_private has overhead defined as unsigned integer and hence does not allow negative values.
>> Jesper Brouer posted about this. I thought he was referring to tc_stab.
> I recall having a discussion with Jesper about this topic, where he agreed that tc_stab was not affected, only htb_private.
Reading what was said on 23rd August, you corrected his error in
interpretation.
>>>> I have used many different arbitrary values for overhead. All appear to have little effect.
>>>>
>>> So the issue here is that only at small packet sizes does the overhead and last cell padding eat a disproportionate amount of your bandwidth (64 byte packet plus 44 byte overhead plus 47 byte worst case cell padding: 100* (44+47+64)/64 = 242% effective packet size to what the shaper estimated ), at typical packet sizes the max error (44 bytes missing overhead and potentially misjudged cell padding of 47 bytes adds up to a theoretical 100*(44+47+1500)/1500 = 106% effective packet size to what the shaper estimated). It is obvious that at 1500 byte packets the whole ATM issue can be easily dismissed with just reducing the link rate by ~10% for the 48 in 53 framing and an additional ~6% for overhead and cell padding. But once you mix smaller packets in your traffic for say VoIP, the effective wire size misjudgment will kill your ability to control the queueing. Note that the common wisdom of shape down to 85% might be fem the ~15% ATM "tax" on 1500 byte traffic size...
>>>
>>>
>>>> As I understand it, the current recommendation is to use tc_stab in preference to htb_private. I do not know the basis for this value judgement.
>>>>
>>> In short: tc_stab allows negative overheads, tc_stab works with HTB, TBF, HFSC while htb_private only works with HTB. Currently htb_private has two advantages: it will estimate the per packet overhead correctly of GSO (generic segmentation offload) is enabled and it will produce exact ATM link layer estimates for all possible packet sizes. In practice almost everyone uses an MTU of 1500 or less for their internet access making both htb_private advantages effectively moot. (Plus if no one beats me to it I intend to address both theoretical short coming of tc_stab next year).
>>>
>>> Best Regards
>>> Sebastian
>>>
>>>
>>>>
>>>>
>>>>
>>>> On 28/12/13 10:01, Sebastian Moeller wrote:
>>>>
>>>>> Hi Rich,
>>>>>
>>>>> great! A few comments:
>>>>>
>>>>> Basic Settings:
>>>>> [Is 95% the right fudge factor?] I think that ideally, if we get can precisely measure the useable link rate even 99% of that should work out well, to keep the queue in our device. I assume that due to the difficulties in measuring and accounting for the link properties as link layer and overhead people typically rely on setting the shaped rate a bit lower than required to stochastically/empirically account for the link properties. I predict that if we get a correct description of the link properties to the shaper we should be fine with 95% shaping. Note though, it is not trivial on an adel link to get the actually useable bit rate from the modem so 95% of what can be deduced from the modem or the ISP's invoice might be a decent proxy…
>>>>>
>>>>> [Do we have a recommendation for an easy way to tell if it's working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked page looks like a decent probe for buffer bloat.
>>>>>
>>>>>
>>>>>
>>>>>> Basic Settings - the details...
>>>>>>
>>>>>> CeroWrt is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet.
>>>>>>
>>>>>>
>>>>> I think we can only actually control the first link to the ISP, which often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) the cumulative sold bandwidth to the customers is larger than the back bone connection (which is called over-subscription and is almost guaranteed to be the case in every DSLAM) which typically is not a problem, as typically people do not use their internet that much. My point being we can not really control congestion in the DSLAM's uplink (as we have no idea what the reserved rate per customer is in the worst case, if there is any).
>>>>>
>>>>>
>>>>>
>>>>>> CeroWrt can automatically adapt to network conditions to improve the delay/latency of data without any settings.
>>>>>>
>>>>>>
>>>>> Does this describe the default fq_codels on each interface (except fib?)?
>>>>>
>>>>>
>>>>>
>>>>>> However, it can do a better job if it knows more about the actual link speeds available. You can adjust this setting by entering link speeds that are a few percent below the actual speeds.
>>>>>>
>>>>>> Note: it can be difficult to get an accurate measurement of the link speeds. The speed advertised by your provider is a starting point, but your experience often won't meet their published specs. You can also use a speed test program or web site like
>>>>>>
>>>>>> http://speedtest.net
>>>>>>
>>>>>> to estimate actual operating speeds.
>>>>>>
>>>>>>
>>>>> While this approach is commonly recommended on the internet, I do not believe that it is that useful. Between a user and the speediest site there are a number of potential congestion points that can affect (reduce) the throughput, like bad peering. Now that said the sppedtets will report something <= the actual link speed and hence be conservative (interactivity stays great at 90% of link rate as well as 80% so underestimating the bandwidth within reason does not affect the latency gains from traffic shaping it just sacrifices a bit more bandwidth; and given the difficulty to actually measure the actually attainable bandwidth might have been effectively a decent recommendation even though the theory of it seems flawed)
>>>>>
>>>>>
>>>>>
>>>>>> Be sure to make your measurement when network is quiet, and others in your home aren’t generating traffic.
>>>>>>
>>>>>>
>>>>> This is great advise.
>>>>>
>>>>> I would love to comment further, but after reloading
>>>>>
>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>>
>>>>> just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>>>>>
>>>>> Best
>>>>> Sebastian
>>>>>
>>>>>
>>>>> On Dec 27, 2013, at 23:09 , Rich Brown
>>>>>
>>>>> <richb.hanover@gmail.com>
>>>>>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>>> You are a very good writer and I am on a tablet.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>> Ill take a pass at the wiki tomorrow.
>>>>>>>
>>>>>>> The shaper does up and down was my first thought...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
>>>>>>
>>>>>> Rich
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com>
>>>>>>>
>>>>>>> wrote:
>>>>>>> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> There are still lots of open questions. Comments, please.
>>>>>>>
>>>>>>> Rich
>>>>>>> _______________________________________________
>>>>>>> Cerowrt-devel mailing list
>>>>>>>
>>>>>>>
>>>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>>> _______________________________________________
>>>>>> Cerowrt-devel mailing list
>>>>>>
>>>>>>
>>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>> _______________________________________________
>>>>> Cerowrt-devel mailing list
>>>>>
>>>>>
>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 14:27 ` Rich Brown
@ 2013-12-28 20:24 ` Sebastian Moeller
2013-12-28 20:31 ` Fred Stratton
0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2013-12-28 20:24 UTC (permalink / raw)
To: Rich Brown; +Cc: cerowrt-devel
Hi Rich,
On Dec 28, 2013, at 15:27 , Rich Brown <richb.hanover@gmail.com> wrote:
> Hi Sebastian,
>
>> I would love to comment further, but after reloading http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>
> I’m not sure what happened to this page for you. It’s available now (at least to me) at that URL…
Well, it is back for me as well,
>
> Rich
So without much further ado...
> Queueing Discipline - the details...
>
> CeroWrt is the proof-of-concept for the CoDel and fq_codel algorithms that prevent large flows of data (downloads, videos, etc.) from affecting applications that use a small number of small packets. The default of fq_codel and the simple.qos script work very well for most people.
>
> [What are the major features of the simple.qos, simplest.qos, and drr.qos scripts?]
simple.qos, has a shaper and three classes with different priorities
simplest.qos has a shaper and just one class for all traffic
drr.qos, no idea yet, I have not tested it nor looked at it closely
>
> Explicit Congestion Notification (ECN) is a mechanism for notifying a sender that its packets are encountering congestion and that the sender should slow its packet delivery rate. We recommend that you turn ECN off for the Upload (outbound, egress) direction, because fq_codel handles and drops packets before the bottleneck, providing the congestion signal to local senders.
Well, we recommend to disable egress ECN as marked packets still need to go over the slow bottleneck link. Dropping these instead frees up the egress queue and will allow faster reactivity on the slow uplink. With a slow enough uplink, every packet counts...
> For the Download (inbound, ingress) link, we recommend you turn ECN on so that CeroWrt can inform the remote sender that it has detected congestion.
The same signaling is achieved by dropping the packet and not sending an ACK packet for that data, but this takes a bit longer as it relays on some timer in the sender.
> [Is this still relevant? Arriving packets have already cleared the bottleneck, and hence dropping has no bandwidth advantage anymore. ]
I think it still is relevant.
>
> If you make your own queue setup script, you can pass parameters to them using the "Dangerous Configuration" strings. The name forewarns you.
Well the dangerous string is just appended to the tc command that sets up the queuing disciplines, so you can use this to modify the existing invocation, say by changing values away from implicit defaults. Like in Fred's case where he added "target 25ms" in the egress string to change the target from the 5ms default.
> 3. Link Layer Adaptation
>
> You must set the Link Layer Adaptation options correctly so that CeroWrt can perform its best with VoIP, gaming, and other protocols that rely on short packets. The general rule for selecting the Link Layer Adaption is:
>
> • If you use any kind of DSL/ADSL connection to the Internet (that is, if you get your internet service through the telephone line), you should choose the "ATM" item.
ADSL is the keyword here, people on VDSL most likely will not need to set ATM, but ethernet.
> Leave the Per-packet Overhead set to zero.
I know I am quite wobbly on this topic, but we should recommend to use 40 as default here if ATM was selected.
> • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link)
You will have at least 8 byte overhead, probably more. Unfortunatelly I have no idea how to measure the overhead on non-ATM links.
> , PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?]
I have to pass, maybe someone with such a link can chime in here? Then again these setups should be rare enough to just punt (we could let the users know they are on their own and ask for the conclusion they reached to incorporate into the wiki).
> • If you use Ethernet, Cable modem, Fiber, or other kind of connection to the Internet, you should choose “none (default)”.
The decision tree should be, you have no ATM carrier and you do not know of any per packet overhead you should select none.
> If you cannot tell what kind of link you have, first try the ATM choice and run the Quick Test for Bufferbloat. If the results are good, you’re done.
This will not really work, on non-ATM links selecting ATM will overestimate the wire size of packets thereby retaining excellent latency even at high nominal shaped ratios (should even work well at 105% of link capacity). To really test for this we need a test that measures the link capacity for different packet sizes, but I digress.
> You can also try the other link layer adaptations to see which performs better.
So for a real ATM link selecting link layer ATM should allow to specify higher shaping percentage than 90% for large and 85% for small packets.
>
>
> Link Layer Adaptation - the details…
>
> It is especially important to set this on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless you tell CeroWrt to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and CeroWrt will penalize that traffic.
>
> CeroWrt can also account for the overhead imposed by PPPoE, PPPoATM and other links when you select that option.
Besides the nasty 48 in 53 issue, each packet also carries some ATM header overhead, just exactly how much depends on the actual encapsulation used (Fred send a useful short list of the different encapsulations).
>
> Ethernet, Cable Modems, Fiber generally do not need any kind of link layer adaptation.
>
> The "Advanced Link Layer" choices are relevant if you are sending packets larger than 1500 bytes (this is unusual for most home setups.)
Actually the defaults will be good up to 2048 byte packets (including overhead) so even baby jumbo frames are covered :)
>
> [What to say about the options on the maximal size, number of entries, and minimal packet size?]
I would give a link to the tc stab man page, we just expose the values to be passed to stab here. I really assume there is nothing that needs changing in here unless the user knows exactly why he wants a change.
>
> [What to say about tc_stab vs htb_private?]
Basically, unless you know better, stick to tc_stab.
Best Regards
Sebastian
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 20:09 ` Fred Stratton
@ 2013-12-28 20:29 ` Sebastian Moeller
2013-12-28 20:36 ` Fred Stratton
0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2013-12-28 20:29 UTC (permalink / raw)
To: Fred Stratton; +Cc: cerowrt-devel
Hi Fred,
On Dec 28, 2013, at 21:09 , Fred Stratton <fredstratton@imap.cc> wrote:
>
> On 28/12/13 19:54, Sebastian Moeller wrote:
>> Hi Fred,
>>
>>
>> On Dec 28, 2013, at 15:27 , Fred Stratton <fredstratton@imap.cc> wrote:
>>
>>> On 28/12/13 13:42, Sebastian Moeller wrote:
>>>> Hi Fred,
>>>>
>>>>
>>>> On Dec 28, 2013, at 12:09 , Fred Stratton
>>>> <fredstratton@imap.cc>
>>>> wrote:
>>>>
>>>>
>>>>> IThe UK consensus fudge factor has always been 85 per cent of the rate achieved, not 95 or 99 per cent.
>>>>>
>>>> I know that the recommendations have been lower in the past; I think this is partly because before Jesper Brouer's and Russels Stuart's work to properly account for ATM "quantization" people typically had to deal with a ~10% rate tax for the 5byte per cell overhead (48 byte payload in 53 byte cells 90.57% useable rate) plus an additional 5% to stochastically account for the padding of the last cell and the per packet overhead both of which affect the effective good put way more for small than large packets, so the 85% never worked well for all packet sizes. My hypothesis now is since we can and do properly account for these effects of ATM framing we can afford to start with a fudge factor of 90% or even 95% percent. As far as I know the recommended fudge factors are never ever explained by more than "this works empirically"...
>>> The fudge factors are totally empirical. IF you are proposing a more formal approach, I shall try a 90 per cent fudge factor, although 'current rate' varies here.
>> My hypothesis is that we can get away with less fudge as we have a better handle on the actual wire size. Personally, I do start at 95% to figure out the trade-off between bandwidth loss and latency increase.
>
> You are now saying something slightly different. You are implying now that you are starting at 95 per cent, and then reducing the nominal download speed until you achieve an unspecified endpoint.
So I typically start with 95%, run RRUL and look at the ping latency increase under load. I try to go as high with the bandwidth as I can and still keep the latency increase close to 10ms (the default fq_codel target of 5ms will allow RTT increases of 5ms in both directions so it adds up to 10). The last time I tried this I ended up at 97% of link rate.
>>
>>>>> Devices express 2 values: the sync rate - or 'maximum rate attainable' - and the dynamic value of 'current rate'.
>>>>>
>>>> The actual data rate is the relevant information for shaping, often DSL modems report the link capacity as "maximum rate attainable" or some such, while the actual bandwidth is limited to a rate below what the line would support by contract (often this bandwidth reduction is performed on the PPPoE link to the BRAS).
>>>>
>>>>
>>>>> As the sync rate is fairly stable for any given installation - ADSL or Fibre - this could be used as a starting value. decremented by the traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor applied to that.
>>>>>
>>>> I would like to propose to use the "current rate" as starting point, as 'maximum rate attainable' >= 'current rate'.
>>> 'current rate' is still a sync rate, and so is conventionally viewed as 15 per cent above the unmeasurable actual rate.
>> No no, the current rate really is the current link capacity between modem and DSLAM (or CPE and CTS), only this rate typically is for the raw ATM stream, so we have to subtract all the additional layers until we reach the IP layer...
>
> You are saying the same thing as I am.
I guess the point I want to make is that we are able to measure the unmeasurable actual rate, that is what the link layer adaptation does for us, if configured properly :)
Best Regards
Sebastian
>>
>>> As you are proposing a new approach, I shall take 90 per cent of 'current rate' as a starting point.
>> I would love to learn how that works put for you. Because for all my theories about why 85% was used, the proof still is in the (plum-) pudding...
>>
>>> No one in the UK uses SRA currently. One small ISP used to.
>> That is sad, because on paper SRA looks like a good feature to have (lower bandwidth sure beats synchronization loss).
>>
>>> The ISP I currently use has Dynamic Line Management, which changes target SNR constantly.
>> Now that is much better, as we should neuter notice nor care; I assume that this happens on layers below ATM even.
>
>
>>
>>> The DSLAM is made by Infineon.
>>>
>>>
>>>>> Fibre - FTTC - connections can suffer quite large download speed fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon is not confined to ADSL links.
>>>>>
>>>> On the actual xDSL link? As far as I know no telco actually uses SRA (seamless rate adaptation or so) so the current link speed will only get lower not higher, so I would expect a relative stable current rate (it might take a while, a few days to actually slowly degrade to the highest link speed supported under all conditions, but I hope you still get my point)
>>> I understand the point, but do not think it is the case, from data I have seen, but cannot find now, unfortunately.
>> I see, maybe my assumption here is wrong, I would love to see data though before changing my hypothesis.
>>
>>>>> An alternative speed test is something like this
>>>>>
>>>>>
>>>>> http://download.bethere.co.uk/downloadMeter.html
>>>>>
>>>>>
>>>>> which, as Be has been bought by Sky, may not exist after the end of April 2014.
>>>>>
>>>> But, if we recommend to run speed tests we really need to advise our users to start several concurrent up- and downloads to independent servers to actually measure the bandwidth of our bottleneck link; often a single server connection will not saturate a link (I seem to recall that with TCP it is guaranteed to only reach 75% or so averaged over time, is that correct?).
>>>> But I think this is not the proper way to set the bandwidth for the shaper, because upstream of our link to the ISP we have no guaranteed bandwidth at all and just can hope the ISP is oing the right thing AQM-wise.
>>>>
>>> I quote the Be site as an alternative to a java based approach. I would be very happy to see your suggestion adopted.
>>>>
>>>>
>>>>> • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?]
>>>>>
>>>>> For a PPPoA service, the PPPoA link is treated as PPPoE on the second device, here running ceroWRT.
>>>>>
>>>> This still means you should specify the PPPoA overhead, not PPPoE.
>>> I shall try the PPPoA overhead.
>> Great, let me know how that works.
>>
>>>>> The packet overhead values are written in the dubious man page for tc_stab.
>>>>>
>>>> The only real flaw in that man page, as far as I know, is the fact that it indicates that the kernel will account for the 18byte ethernet header automatically, while the kernel does no such thing (which I hope to change).
>>> It mentions link layer types as 'atm' ethernet' and 'adsl'. There is no reference anywhere to the last. I do not see its relevance.
>> If you have a look inside the source code for tc and the kernel, you will notice that atm and adel are aliases for the same thing. I just think that we should keep naming the thing ATM since that is the problematic layer in the stack that causes most of the useable link rate judgements, adel just happens to use ATM exclusively.
>
> I have reviewed the source. I see what you mean.
>>
>>>>> Sebastian has a potential alternative method of formal calculation.
>>>>>
>>>> So, I have no formal calculation method available, but an empirical way of detecting ATM quantization as well as measuring the per packet overhead of an ATM link.
>>>> The idea is to measure the RTT of ICMP packets of increasing length and then displaying the distribution of RTTs by ICMP packet length, on an ATM carrier we expect to see a step function with steps 48 bytes apart. For non-ATM carrier we expect to rather see a smooth ramp. By comparing the residuals of a linear fit of the data with the residuals of the best step function fit to the data. The fit with the lower residuals "wins". Attached you will find an example of this approach, ping data in red (median of NNN repetitions for each ICMP packet size), linear fit in blue, and best staircase fit in green. You notice that data starts somewhere in a 48 byte ATM cell. Since the ATM encapsulation overhead is maximally 44 bytes and we know the IP and ICMP overhead of the ping probe we can calculate the overhead preceding the IP header, which is what needs to be put in the overhead field in the GUI. (Note where the green line intersect the y-axis at 0 bytes packet size? this is where the IP hea
>>>> der starts, the "missing" part of this ATM cell is the overhead).
>>>>
>>> You are curve fitting. This is calculation.
>> I see, that is certainly a valid way to look at it, just one that had not occurred to me.
>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Believe it or not, this methods works reasonable well (I tested successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection (overhead 32 bytes), and several PPPOE, LLC, (overhead 40) connections (from ADSL1 @ 3008/512 to ADSL2+ @ 16402/2558)). But it takes relative long time to measure the ping train especially at the higher rates… and it requires ping time stamps with decent resolution (which rules out windows) and my naive data acquisition scripts creates really large raw data files. I guess I should post the code somewhere so others can test and improve it.
>>>> Fred I would be delighted to get a data set from your connection, to test a known different encapsulation.
>>>>
>>> I shall try this. If successful, I shall initially pass you the raw data.
>> Great, but be warned this will be hundreds of megabytes. (For production use the measurement script would need to prune the generated log file down to the essential values… and potentially store the data in binary)
>>
>>> I have not used MatLab since the 1980s.
>> Lucky you, I sort of have to use matlab in my day job and hence are most "fluent" in matlabese, but the code should also work with octave (I tested version 3.6.4) so it should be relatively easy to run the analysis yourself. That said, I would love to get a copy of the ping sweep :)
>>
>>>>> TYPICAL OVERHEADS
>>>>> The following values are typical for different adsl scenarios (based on
>>>>> [1] and [2]):
>>>>>
>>>>> LLC based:
>>>>> PPPoA - 14 (PPP - 2, ATM - 12)
>>>>> PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>>>> Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>>>> IPoA - 16 (ATM - 16)
>>>>>
>>>>> VC Mux based:
>>>>> PPPoA - 10 (PPP - 2, ATM - 8)
>>>>> PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>>>> Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>>>> IPoA - 8 (ATM - 8)
>>>>>
>>>>>
>>>>> For VC Mux based PPPoA, I am currently using an overhead of 18 for the PPPoE setting in ceroWRT.
>>>>>
>>>> Yeah we could put this list into the wiki, but how shall a typical user figure out which encapsulation is used? And good luck in figuring out whether the frame check sequence (FCS) is included or not…
>>>> BTW 18, I predict that if PPPoE is only used between cerowrt and the "modem' or gateway your effective overhead should be 10 bytes; I would love if you could run the following against your link at night (also attached
>>>>
>>>>
>>>>
>>>> ):
>>>>
>>>> #! /bin/bash
>>>> # TODO use seq or bash to generate a list of the requested sizes (to allow for non-equidistantly spaced sizes)
>>>>
>>>> #.
>>>> TECH=ADSL2 # just to give some meaning to the ping trace file name
>>>> # finding a proper target IP is somewhat of an art, just traceroute a remote site.
>>>> # and find the nearest host reliably responding to pings showing the smallet variation of pingtimes
>>>> TARGET=${1} # the IP against which to run the ICMP pings
>>>> DATESTR=`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential records
>>>> LOG=ping_sweep_${TECH}_${DATESTR}.txt
>>>>
>>>>
>>>> # by default non-root ping will only end one packet per second, so work around that by calling ping independently for each package
>>>> # empirically figure out the shortest period still giving the standard ping time (to avoid being slow-pathed by our target)
>>>> PINGPERIOD=0.01><------># in seconds
>>>> PINGSPERSIZE=10000
>>>>
>>>> # Start, needed to find the per packet overhead dependent on the ATM encapsulation
>>>> # to reiably show ATM quantization one would like to see at least two steps, so cover a range > 2 ATM cells (so > 96 bytes)
>>>> SWEEPMINSIZE=16><------># 64bit systems seem to require 16 bytes of payload to include a timestamp...
>>>> SWEEPMAXSIZE=116
>>>>
>>>> n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}`
>>>>
>>>> i_sweep=0
>>>> i_size=0
>>>>
>>>> echo "Running ICMP RTT measurement against: ${TARGET}"
>>>> while [ ${i_sweep} -lt ${PINGSPERSIZE} ]
>>>> do
>>>> (( i_sweep++ ))
>>>> echo "Current iteration: ${i_sweep}"
>>>> # now loop from sweepmin to sweepmax
>>>> i_size=${SWEEPMINSIZE}
>>>> while [ ${i_size} -le ${SWEEPMAXSIZE} ]
>>>> do
>>>> echo "${i_sweep}. repetition of ping size ${i_size}"
>>>> ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\
>>>> (( i_size++ ))
>>>> # we need a sleep binary that allows non integer times (GNU sleep is fine as is sleep of macosx 10.8.4)
>>>> sleep ${PINGPERIOD}
>>>> done
>>>> done
>>>> echo "Done... ($0)"
>>>>
>>>>
>>>> This will try to run 10000 repetitions for ICMP packet sizes from 16 to 116 bytes running (10000 * 101 * 0.01 / 60 =) 168 minutes, but you should be able to stop it with ctrl c if you are not patience enough, with your link I would estimate that 3000 should be plenty, but if you could run it over night that would be great and then ~3 hours should not matter much.
>>>> And then run the following attached code in octave or matlab
>>>>
>>>>
>>>>
>>>> . Invoce with "tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_said_file')". The parser will run on the first invocation and is reallr really slow, but further invocations should be faster. If issues arise, let me know, I am happy to help.
>>>>
>>>>
>>>>> Were I to use a single directly connected gateway, I would input a suitable value for PPPoA in that openWRT firmware.
>>>>>
>>>> I think you should do that right now.
>>> The firmware has not yet been released.
>>>>> In theory, I might need to use a negative value, bmt the current kernel does not support that.
>>>>>
>>>> If you use tc_stab, negative overheads are fully supported, only htb_private has overhead defined as unsigned integer and hence does not allow negative values.
>>> Jesper Brouer posted about this. I thought he was referring to tc_stab.
>> I recall having a discussion with Jesper about this topic, where he agreed that tc_stab was not affected, only htb_private.
> Reading what was said on 23rd August, you corrected his error in interpretation.
>
>
>>>>> I have used many different arbitrary values for overhead. All appear to have little effect.
>>>>>
>>>> So the issue here is that only at small packet sizes does the overhead and last cell padding eat a disproportionate amount of your bandwidth (64 byte packet plus 44 byte overhead plus 47 byte worst case cell padding: 100* (44+47+64)/64 = 242% effective packet size to what the shaper estimated ), at typical packet sizes the max error (44 bytes missing overhead and potentially misjudged cell padding of 47 bytes adds up to a theoretical 100*(44+47+1500)/1500 = 106% effective packet size to what the shaper estimated). It is obvious that at 1500 byte packets the whole ATM issue can be easily dismissed with just reducing the link rate by ~10% for the 48 in 53 framing and an additional ~6% for overhead and cell padding. But once you mix smaller packets in your traffic for say VoIP, the effective wire size misjudgment will kill your ability to control the queueing. Note that the common wisdom of shape down to 85% might be fem the ~15% ATM "tax" on 1500 byte traffic size...
>>>>
>>>>
>>>>> As I understand it, the current recommendation is to use tc_stab in preference to htb_private. I do not know the basis for this value judgement.
>>>>>
>>>> In short: tc_stab allows negative overheads, tc_stab works with HTB, TBF, HFSC while htb_private only works with HTB. Currently htb_private has two advantages: it will estimate the per packet overhead correctly of GSO (generic segmentation offload) is enabled and it will produce exact ATM link layer estimates for all possible packet sizes. In practice almost everyone uses an MTU of 1500 or less for their internet access making both htb_private advantages effectively moot. (Plus if no one beats me to it I intend to address both theoretical short coming of tc_stab next year).
>>>>
>>>> Best Regards
>>>> Sebastian
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 28/12/13 10:01, Sebastian Moeller wrote:
>>>>>
>>>>>> Hi Rich,
>>>>>>
>>>>>> great! A few comments:
>>>>>>
>>>>>> Basic Settings:
>>>>>> [Is 95% the right fudge factor?] I think that ideally, if we get can precisely measure the useable link rate even 99% of that should work out well, to keep the queue in our device. I assume that due to the difficulties in measuring and accounting for the link properties as link layer and overhead people typically rely on setting the shaped rate a bit lower than required to stochastically/empirically account for the link properties. I predict that if we get a correct description of the link properties to the shaper we should be fine with 95% shaping. Note though, it is not trivial on an adel link to get the actually useable bit rate from the modem so 95% of what can be deduced from the modem or the ISP's invoice might be a decent proxy…
>>>>>>
>>>>>> [Do we have a recommendation for an easy way to tell if it's working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked page looks like a decent probe for buffer bloat.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Basic Settings - the details...
>>>>>>>
>>>>>>> CeroWrt is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet.
>>>>>>>
>>>>>>>
>>>>>> I think we can only actually control the first link to the ISP, which often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) the cumulative sold bandwidth to the customers is larger than the back bone connection (which is called over-subscription and is almost guaranteed to be the case in every DSLAM) which typically is not a problem, as typically people do not use their internet that much. My point being we can not really control congestion in the DSLAM's uplink (as we have no idea what the reserved rate per customer is in the worst case, if there is any).
>>>>>>
>>>>>>
>>>>>>
>>>>>>> CeroWrt can automatically adapt to network conditions to improve the delay/latency of data without any settings.
>>>>>>>
>>>>>>>
>>>>>> Does this describe the default fq_codels on each interface (except fib?)?
>>>>>>
>>>>>>
>>>>>>
>>>>>>> However, it can do a better job if it knows more about the actual link speeds available. You can adjust this setting by entering link speeds that are a few percent below the actual speeds.
>>>>>>>
>>>>>>> Note: it can be difficult to get an accurate measurement of the link speeds. The speed advertised by your provider is a starting point, but your experience often won't meet their published specs. You can also use a speed test program or web site like
>>>>>>>
>>>>>>> http://speedtest.net
>>>>>>>
>>>>>>> to estimate actual operating speeds.
>>>>>>>
>>>>>>>
>>>>>> While this approach is commonly recommended on the internet, I do not believe that it is that useful. Between a user and the speediest site there are a number of potential congestion points that can affect (reduce) the throughput, like bad peering. Now that said the sppedtets will report something <= the actual link speed and hence be conservative (interactivity stays great at 90% of link rate as well as 80% so underestimating the bandwidth within reason does not affect the latency gains from traffic shaping it just sacrifices a bit more bandwidth; and given the difficulty to actually measure the actually attainable bandwidth might have been effectively a decent recommendation even though the theory of it seems flawed)
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Be sure to make your measurement when network is quiet, and others in your home aren’t generating traffic.
>>>>>>>
>>>>>>>
>>>>>> This is great advise.
>>>>>>
>>>>>> I would love to comment further, but after reloading
>>>>>>
>>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>>>
>>>>>> just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>>>>>>
>>>>>> Best
>>>>>> Sebastian
>>>>>>
>>>>>>
>>>>>> On Dec 27, 2013, at 23:09 , Rich Brown
>>>>>>
>>>>>> <richb.hanover@gmail.com>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> You are a very good writer and I am on a tablet.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>> Ill take a pass at the wiki tomorrow.
>>>>>>>>
>>>>>>>> The shaper does up and down was my first thought...
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
>>>>>>>
>>>>>>> Rich
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> There are still lots of open questions. Comments, please.
>>>>>>>>
>>>>>>>> Rich
>>>>>>>> _______________________________________________
>>>>>>>> Cerowrt-devel mailing list
>>>>>>>>
>>>>>>>>
>>>>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>>>> _______________________________________________
>>>>>>> Cerowrt-devel mailing list
>>>>>>>
>>>>>>>
>>>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>>> _______________________________________________
>>>>>> Cerowrt-devel mailing list
>>>>>>
>>>>>>
>>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 20:24 ` Sebastian Moeller
@ 2013-12-28 20:31 ` Fred Stratton
0 siblings, 0 replies; 14+ messages in thread
From: Fred Stratton @ 2013-12-28 20:31 UTC (permalink / raw)
To: Sebastian Moeller, Richard E. Brown, cerowrt-devel
On 28/12/13 20:24, Sebastian Moeller wrote:
> Hi Rich,
>
>
> On Dec 28, 2013, at 15:27 , Rich Brown <richb.hanover@gmail.com> wrote:
>
>> Hi Sebastian,
>>
>>> I would love to comment further, but after reloading http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>> I’m not sure what happened to this page for you. It’s available now (at least to me) at that URL…
> Well, it is back for me as well,
>
>
>> Rich
> So without much further ado...
>
>> Queueing Discipline - the details...
>>
>> CeroWrt is the proof-of-concept for the CoDel and fq_codel algorithms that prevent large flows of data (downloads, videos, etc.) from affecting applications that use a small number of small packets. The default of fq_codel and the simple.qos script work very well for most people.
>>
>> [What are the major features of the simple.qos, simplest.qos, and drr.qos scripts?]
> simple.qos, has a shaper and three classes with different priorities
> simplest.qos has a shaper and just one class for all traffic
> drr.qos, no idea yet, I have not tested it nor looked at it closely
>
>> Explicit Congestion Notification (ECN) is a mechanism for notifying a sender that its packets are encountering congestion and that the sender should slow its packet delivery rate. We recommend that you turn ECN off for the Upload (outbound, egress) direction, because fq_codel handles and drops packets before the bottleneck, providing the congestion signal to local senders.
> Well, we recommend to disable egress ECN as marked packets still need to go over the slow bottleneck link. Dropping these instead frees up the egress queue and will allow faster reactivity on the slow uplink. With a slow enough uplink, every packet counts...
>
>> For the Download (inbound, ingress) link, we recommend you turn ECN on so that CeroWrt can inform the remote sender that it has detected congestion.
> The same signaling is achieved by dropping the packet and not sending an ACK packet for that data, but this takes a bit longer as it relays on some timer in the sender.
>
>> [Is this still relevant? Arriving packets have already cleared the bottleneck, and hence dropping has no bandwidth advantage anymore. ]
> I think it still is relevant.
>
>> If you make your own queue setup script, you can pass parameters to them using the "Dangerous Configuration" strings. The name forewarns you.
> Well the dangerous string is just appended to the tc command that sets up the queuing disciplines, so you can use this to modify the existing invocation, say by changing values away from implicit defaults. Like in Fred's case where he added "target 25ms" in the egress string to change the target from the 5ms default.
>
>
>> 3. Link Layer Adaptation
>>
>> You must set the Link Layer Adaptation options correctly so that CeroWrt can perform its best with VoIP, gaming, and other protocols that rely on short packets. The general rule for selecting the Link Layer Adaption is:
>>
>> • If you use any kind of DSL/ADSL connection to the Internet (that is, if you get your internet service through the telephone line), you should choose the "ATM" item.
> ADSL is the keyword here, people on VDSL most likely will not need to set ATM, but ethernet.
>
>> Leave the Per-packet Overhead set to zero.
> I know I am quite wobbly on this topic, but we should recommend to use 40 as default here if ATM was selected.
>
>> • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link)
> You will have at least 8 byte overhead, probably more. Unfortunatelly I have no idea how to measure the overhead on non-ATM links.
>
>> , PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?]
> I have to pass, maybe someone with such a link can chime in here? Then again these setups should be rare enough to just punt (we could let the users know they are on their own and ask for the conclusion they reached to incorporate into the wiki).
PPPoATM is a synonym for PPPoA. This is used by the whole of the UK, in
Italy, and in New Zealand IIRC.
>
>> • If you use Ethernet, Cable modem, Fiber, or other kind of connection to the Internet, you should choose “none (default)”.
> The decision tree should be, you have no ATM carrier and you do not know of any per packet overhead you should select none.
>
>> If you cannot tell what kind of link you have, first try the ATM choice and run the Quick Test for Bufferbloat. If the results are good, you’re done.
> This will not really work, on non-ATM links selecting ATM will overestimate the wire size of packets thereby retaining excellent latency even at high nominal shaped ratios (should even work well at 105% of link capacity). To really test for this we need a test that measures the link capacity for different packet sizes, but I digress.
>
>> You can also try the other link layer adaptations to see which performs better.
> So for a real ATM link selecting link layer ATM should allow to specify higher shaping percentage than 90% for large and 85% for small packets.
>
>>
>> Link Layer Adaptation - the details…
>>
>> It is especially important to set this on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless you tell CeroWrt to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and CeroWrt will penalize that traffic.
>>
>> CeroWrt can also account for the overhead imposed by PPPoE, PPPoATM and other links when you select that option.
> Besides the nasty 48 in 53 issue, each packet also carries some ATM header overhead, just exactly how much depends on the actual encapsulation used (Fred send a useful short list of the different encapsulations).
>
>> Ethernet, Cable Modems, Fiber generally do not need any kind of link layer adaptation.
>>
>> The "Advanced Link Layer" choices are relevant if you are sending packets larger than 1500 bytes (this is unusual for most home setups.)
> Actually the defaults will be good up to 2048 byte packets (including overhead) so even baby jumbo frames are covered :)
>
>> [What to say about the options on the maximal size, number of entries, and minimal packet size?]
> I would give a link to the tc stab man page, we just expose the values to be passed to stab here. I really assume there is nothing that needs changing in here unless the user knows exactly why he wants a change.
>
>> [What to say about tc_stab vs htb_private?]
> Basically, unless you know better, stick to tc_stab.
>
> Best Regards
> Sebastian
>
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.
2013-12-28 20:29 ` Sebastian Moeller
@ 2013-12-28 20:36 ` Fred Stratton
0 siblings, 0 replies; 14+ messages in thread
From: Fred Stratton @ 2013-12-28 20:36 UTC (permalink / raw)
To: Sebastian Moeller, cerowrt-devel
On 28/12/13 20:29, Sebastian Moeller wrote:
> Hi Fred,
>
> On Dec 28, 2013, at 21:09 , Fred Stratton <fredstratton@imap.cc> wrote:
>
>> On 28/12/13 19:54, Sebastian Moeller wrote:
>>> Hi Fred,
>>>
>>>
>>> On Dec 28, 2013, at 15:27 , Fred Stratton <fredstratton@imap.cc> wrote:
>>>
>>>> On 28/12/13 13:42, Sebastian Moeller wrote:
>>>>> Hi Fred,
>>>>>
>>>>>
>>>>> On Dec 28, 2013, at 12:09 , Fred Stratton
>>>>> <fredstratton@imap.cc>
>>>>> wrote:
>>>>>
>>>>>
>>>>>> IThe UK consensus fudge factor has always been 85 per cent of the rate achieved, not 95 or 99 per cent.
>>>>>>
>>>>> I know that the recommendations have been lower in the past; I think this is partly because before Jesper Brouer's and Russels Stuart's work to properly account for ATM "quantization" people typically had to deal with a ~10% rate tax for the 5byte per cell overhead (48 byte payload in 53 byte cells 90.57% useable rate) plus an additional 5% to stochastically account for the padding of the last cell and the per packet overhead both of which affect the effective good put way more for small than large packets, so the 85% never worked well for all packet sizes. My hypothesis now is since we can and do properly account for these effects of ATM framing we can afford to start with a fudge factor of 90% or even 95% percent. As far as I know the recommended fudge factors are never ever explained by more than "this works empirically"...
>>>> The fudge factors are totally empirical. IF you are proposing a more formal approach, I shall try a 90 per cent fudge factor, although 'current rate' varies here.
>>> My hypothesis is that we can get away with less fudge as we have a better handle on the actual wire size. Personally, I do start at 95% to figure out the trade-off between bandwidth loss and latency increase.
>> You are now saying something slightly different. You are implying now that you are starting at 95 per cent, and then reducing the nominal download speed until you achieve an unspecified endpoint.
> So I typically start with 95%, run RRUL and look at the ping latency increase under load. I try to go as high with the bandwidth as I can and still keep the latency increase close to 10ms (the default fq_codel target of 5ms will allow RTT increases of 5ms in both directions so it adds up to 10). The last time I tried this I ended up at 97% of link rate.
I see the rationale. I have tried something similar, But found it very
time consuming. I did not arrive at a clear reproducible end point. I
hope it works for others.
>
>>>>>> Devices express 2 values: the sync rate - or 'maximum rate attainable' - and the dynamic value of 'current rate'.
>>>>>>
>>>>> The actual data rate is the relevant information for shaping, often DSL modems report the link capacity as "maximum rate attainable" or some such, while the actual bandwidth is limited to a rate below what the line would support by contract (often this bandwidth reduction is performed on the PPPoE link to the BRAS).
>>>>>
>>>>>
>>>>>> As the sync rate is fairly stable for any given installation - ADSL or Fibre - this could be used as a starting value. decremented by the traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor applied to that.
>>>>>>
>>>>> I would like to propose to use the "current rate" as starting point, as 'maximum rate attainable' >= 'current rate'.
>>>> 'current rate' is still a sync rate, and so is conventionally viewed as 15 per cent above the unmeasurable actual rate.
>>> No no, the current rate really is the current link capacity between modem and DSLAM (or CPE and CTS), only this rate typically is for the raw ATM stream, so we have to subtract all the additional layers until we reach the IP layer...
>> You are saying the same thing as I am.
> I guess the point I want to make is that we are able to measure the unmeasurable actual rate, that is what the link layer adaptation does for us, if configured properly :)
>
> Best Regards
> Sebastian
>
>>>> As you are proposing a new approach, I shall take 90 per cent of 'current rate' as a starting point.
>>> I would love to learn how that works put for you. Because for all my theories about why 85% was used, the proof still is in the (plum-) pudding...
>>>
>>>> No one in the UK uses SRA currently. One small ISP used to.
>>> That is sad, because on paper SRA looks like a good feature to have (lower bandwidth sure beats synchronization loss).
>>>
>>>> The ISP I currently use has Dynamic Line Management, which changes target SNR constantly.
>>> Now that is much better, as we should neuter notice nor care; I assume that this happens on layers below ATM even.
>>
>>>> The DSLAM is made by Infineon.
>>>>
>>>>
>>>>>> Fibre - FTTC - connections can suffer quite large download speed fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon is not confined to ADSL links.
>>>>>>
>>>>> On the actual xDSL link? As far as I know no telco actually uses SRA (seamless rate adaptation or so) so the current link speed will only get lower not higher, so I would expect a relative stable current rate (it might take a while, a few days to actually slowly degrade to the highest link speed supported under all conditions, but I hope you still get my point)
>>>> I understand the point, but do not think it is the case, from data I have seen, but cannot find now, unfortunately.
>>> I see, maybe my assumption here is wrong, I would love to see data though before changing my hypothesis.
>>>
>>>>>> An alternative speed test is something like this
>>>>>>
>>>>>>
>>>>>> http://download.bethere.co.uk/downloadMeter.html
>>>>>>
>>>>>>
>>>>>> which, as Be has been bought by Sky, may not exist after the end of April 2014.
>>>>>>
>>>>> But, if we recommend to run speed tests we really need to advise our users to start several concurrent up- and downloads to independent servers to actually measure the bandwidth of our bottleneck link; often a single server connection will not saturate a link (I seem to recall that with TCP it is guaranteed to only reach 75% or so averaged over time, is that correct?).
>>>>> But I think this is not the proper way to set the bandwidth for the shaper, because upstream of our link to the ISP we have no guaranteed bandwidth at all and just can hope the ISP is oing the right thing AQM-wise.
>>>>>
>>>> I quote the Be site as an alternative to a java based approach. I would be very happy to see your suggestion adopted.
>>>>>
>>>>>> • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?]
>>>>>>
>>>>>> For a PPPoA service, the PPPoA link is treated as PPPoE on the second device, here running ceroWRT.
>>>>>>
>>>>> This still means you should specify the PPPoA overhead, not PPPoE.
>>>> I shall try the PPPoA overhead.
>>> Great, let me know how that works.
>>>
>>>>>> The packet overhead values are written in the dubious man page for tc_stab.
>>>>>>
>>>>> The only real flaw in that man page, as far as I know, is the fact that it indicates that the kernel will account for the 18byte ethernet header automatically, while the kernel does no such thing (which I hope to change).
>>>> It mentions link layer types as 'atm' ethernet' and 'adsl'. There is no reference anywhere to the last. I do not see its relevance.
>>> If you have a look inside the source code for tc and the kernel, you will notice that atm and adel are aliases for the same thing. I just think that we should keep naming the thing ATM since that is the problematic layer in the stack that causes most of the useable link rate judgements, adel just happens to use ATM exclusively.
>> I have reviewed the source. I see what you mean.
>>>>>> Sebastian has a potential alternative method of formal calculation.
>>>>>>
>>>>> So, I have no formal calculation method available, but an empirical way of detecting ATM quantization as well as measuring the per packet overhead of an ATM link.
>>>>> The idea is to measure the RTT of ICMP packets of increasing length and then displaying the distribution of RTTs by ICMP packet length, on an ATM carrier we expect to see a step function with steps 48 bytes apart. For non-ATM carrier we expect to rather see a smooth ramp. By comparing the residuals of a linear fit of the data with the residuals of the best step function fit to the data. The fit with the lower residuals "wins". Attached you will find an example of this approach, ping data in red (median of NNN repetitions for each ICMP packet size), linear fit in blue, and best staircase fit in green. You notice that data starts somewhere in a 48 byte ATM cell. Since the ATM encapsulation overhead is maximally 44 bytes and we know the IP and ICMP overhead of the ping probe we can calculate the overhead preceding the IP header, which is what needs to be put in the overhead field in the GUI. (Note where the green line intersect the y-axis at 0 bytes packet size? this is where the IP hea
>>>>> der starts, the "missing" part of this ATM cell is the overhead).
>>>>>
>>>> You are curve fitting. This is calculation.
>>> I see, that is certainly a valid way to look at it, just one that had not occurred to me.
>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Believe it or not, this methods works reasonable well (I tested successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection (overhead 32 bytes), and several PPPOE, LLC, (overhead 40) connections (from ADSL1 @ 3008/512 to ADSL2+ @ 16402/2558)). But it takes relative long time to measure the ping train especially at the higher rates… and it requires ping time stamps with decent resolution (which rules out windows) and my naive data acquisition scripts creates really large raw data files. I guess I should post the code somewhere so others can test and improve it.
>>>>> Fred I would be delighted to get a data set from your connection, to test a known different encapsulation.
>>>>>
>>>> I shall try this. If successful, I shall initially pass you the raw data.
>>> Great, but be warned this will be hundreds of megabytes. (For production use the measurement script would need to prune the generated log file down to the essential values… and potentially store the data in binary)
>>>
>>>> I have not used MatLab since the 1980s.
>>> Lucky you, I sort of have to use matlab in my day job and hence are most "fluent" in matlabese, but the code should also work with octave (I tested version 3.6.4) so it should be relatively easy to run the analysis yourself. That said, I would love to get a copy of the ping sweep :)
>>>
>>>>>> TYPICAL OVERHEADS
>>>>>> The following values are typical for different adsl scenarios (based on
>>>>>> [1] and [2]):
>>>>>>
>>>>>> LLC based:
>>>>>> PPPoA - 14 (PPP - 2, ATM - 12)
>>>>>> PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>>>>> Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding)
>>>>>> IPoA - 16 (ATM - 16)
>>>>>>
>>>>>> VC Mux based:
>>>>>> PPPoA - 10 (PPP - 2, ATM - 8)
>>>>>> PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>>>>> Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding)
>>>>>> IPoA - 8 (ATM - 8)
>>>>>>
>>>>>>
>>>>>> For VC Mux based PPPoA, I am currently using an overhead of 18 for the PPPoE setting in ceroWRT.
>>>>>>
>>>>> Yeah we could put this list into the wiki, but how shall a typical user figure out which encapsulation is used? And good luck in figuring out whether the frame check sequence (FCS) is included or not…
>>>>> BTW 18, I predict that if PPPoE is only used between cerowrt and the "modem' or gateway your effective overhead should be 10 bytes; I would love if you could run the following against your link at night (also attached
>>>>>
>>>>>
>>>>>
>>>>> ):
>>>>>
>>>>> #! /bin/bash
>>>>> # TODO use seq or bash to generate a list of the requested sizes (to allow for non-equidistantly spaced sizes)
>>>>>
>>>>> #.
>>>>> TECH=ADSL2 # just to give some meaning to the ping trace file name
>>>>> # finding a proper target IP is somewhat of an art, just traceroute a remote site.
>>>>> # and find the nearest host reliably responding to pings showing the smallet variation of pingtimes
>>>>> TARGET=${1} # the IP against which to run the ICMP pings
>>>>> DATESTR=`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential records
>>>>> LOG=ping_sweep_${TECH}_${DATESTR}.txt
>>>>>
>>>>>
>>>>> # by default non-root ping will only end one packet per second, so work around that by calling ping independently for each package
>>>>> # empirically figure out the shortest period still giving the standard ping time (to avoid being slow-pathed by our target)
>>>>> PINGPERIOD=0.01><------># in seconds
>>>>> PINGSPERSIZE=10000
>>>>>
>>>>> # Start, needed to find the per packet overhead dependent on the ATM encapsulation
>>>>> # to reiably show ATM quantization one would like to see at least two steps, so cover a range > 2 ATM cells (so > 96 bytes)
>>>>> SWEEPMINSIZE=16><------># 64bit systems seem to require 16 bytes of payload to include a timestamp...
>>>>> SWEEPMAXSIZE=116
>>>>>
>>>>> n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}`
>>>>>
>>>>> i_sweep=0
>>>>> i_size=0
>>>>>
>>>>> echo "Running ICMP RTT measurement against: ${TARGET}"
>>>>> while [ ${i_sweep} -lt ${PINGSPERSIZE} ]
>>>>> do
>>>>> (( i_sweep++ ))
>>>>> echo "Current iteration: ${i_sweep}"
>>>>> # now loop from sweepmin to sweepmax
>>>>> i_size=${SWEEPMINSIZE}
>>>>> while [ ${i_size} -le ${SWEEPMAXSIZE} ]
>>>>> do
>>>>> echo "${i_sweep}. repetition of ping size ${i_size}"
>>>>> ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\
>>>>> (( i_size++ ))
>>>>> # we need a sleep binary that allows non integer times (GNU sleep is fine as is sleep of macosx 10.8.4)
>>>>> sleep ${PINGPERIOD}
>>>>> done
>>>>> done
>>>>> echo "Done... ($0)"
>>>>>
>>>>>
>>>>> This will try to run 10000 repetitions for ICMP packet sizes from 16 to 116 bytes running (10000 * 101 * 0.01 / 60 =) 168 minutes, but you should be able to stop it with ctrl c if you are not patience enough, with your link I would estimate that 3000 should be plenty, but if you could run it over night that would be great and then ~3 hours should not matter much.
>>>>> And then run the following attached code in octave or matlab
>>>>>
>>>>>
>>>>>
>>>>> . Invoce with "tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_said_file')". The parser will run on the first invocation and is reallr really slow, but further invocations should be faster. If issues arise, let me know, I am happy to help.
>>>>>
>>>>>
>>>>>> Were I to use a single directly connected gateway, I would input a suitable value for PPPoA in that openWRT firmware.
>>>>>>
>>>>> I think you should do that right now.
>>>> The firmware has not yet been released.
>>>>>> In theory, I might need to use a negative value, bmt the current kernel does not support that.
>>>>>>
>>>>> If you use tc_stab, negative overheads are fully supported, only htb_private has overhead defined as unsigned integer and hence does not allow negative values.
>>>> Jesper Brouer posted about this. I thought he was referring to tc_stab.
>>> I recall having a discussion with Jesper about this topic, where he agreed that tc_stab was not affected, only htb_private.
>> Reading what was said on 23rd August, you corrected his error in interpretation.
>>
>>
>>>>>> I have used many different arbitrary values for overhead. All appear to have little effect.
>>>>>>
>>>>> So the issue here is that only at small packet sizes does the overhead and last cell padding eat a disproportionate amount of your bandwidth (64 byte packet plus 44 byte overhead plus 47 byte worst case cell padding: 100* (44+47+64)/64 = 242% effective packet size to what the shaper estimated ), at typical packet sizes the max error (44 bytes missing overhead and potentially misjudged cell padding of 47 bytes adds up to a theoretical 100*(44+47+1500)/1500 = 106% effective packet size to what the shaper estimated). It is obvious that at 1500 byte packets the whole ATM issue can be easily dismissed with just reducing the link rate by ~10% for the 48 in 53 framing and an additional ~6% for overhead and cell padding. But once you mix smaller packets in your traffic for say VoIP, the effective wire size misjudgment will kill your ability to control the queueing. Note that the common wisdom of shape down to 85% might be fem the ~15% ATM "tax" on 1500 byte traffic size...
>>>>>
>>>>>
>>>>>> As I understand it, the current recommendation is to use tc_stab in preference to htb_private. I do not know the basis for this value judgement.
>>>>>>
>>>>> In short: tc_stab allows negative overheads, tc_stab works with HTB, TBF, HFSC while htb_private only works with HTB. Currently htb_private has two advantages: it will estimate the per packet overhead correctly of GSO (generic segmentation offload) is enabled and it will produce exact ATM link layer estimates for all possible packet sizes. In practice almost everyone uses an MTU of 1500 or less for their internet access making both htb_private advantages effectively moot. (Plus if no one beats me to it I intend to address both theoretical short coming of tc_stab next year).
>>>>>
>>>>> Best Regards
>>>>> Sebastian
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On 28/12/13 10:01, Sebastian Moeller wrote:
>>>>>>
>>>>>>> Hi Rich,
>>>>>>>
>>>>>>> great! A few comments:
>>>>>>>
>>>>>>> Basic Settings:
>>>>>>> [Is 95% the right fudge factor?] I think that ideally, if we get can precisely measure the useable link rate even 99% of that should work out well, to keep the queue in our device. I assume that due to the difficulties in measuring and accounting for the link properties as link layer and overhead people typically rely on setting the shaped rate a bit lower than required to stochastically/empirically account for the link properties. I predict that if we get a correct description of the link properties to the shaper we should be fine with 95% shaping. Note though, it is not trivial on an adel link to get the actually useable bit rate from the modem so 95% of what can be deduced from the modem or the ISP's invoice might be a decent proxy…
>>>>>>>
>>>>>>> [Do we have a recommendation for an easy way to tell if it's working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked page looks like a decent probe for buffer bloat.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Basic Settings - the details...
>>>>>>>>
>>>>>>>> CeroWrt is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet.
>>>>>>>>
>>>>>>>>
>>>>>>> I think we can only actually control the first link to the ISP, which often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) the cumulative sold bandwidth to the customers is larger than the back bone connection (which is called over-subscription and is almost guaranteed to be the case in every DSLAM) which typically is not a problem, as typically people do not use their internet that much. My point being we can not really control congestion in the DSLAM's uplink (as we have no idea what the reserved rate per customer is in the worst case, if there is any).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> CeroWrt can automatically adapt to network conditions to improve the delay/latency of data without any settings.
>>>>>>>>
>>>>>>>>
>>>>>>> Does this describe the default fq_codels on each interface (except fib?)?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> However, it can do a better job if it knows more about the actual link speeds available. You can adjust this setting by entering link speeds that are a few percent below the actual speeds.
>>>>>>>>
>>>>>>>> Note: it can be difficult to get an accurate measurement of the link speeds. The speed advertised by your provider is a starting point, but your experience often won't meet their published specs. You can also use a speed test program or web site like
>>>>>>>>
>>>>>>>> http://speedtest.net
>>>>>>>>
>>>>>>>> to estimate actual operating speeds.
>>>>>>>>
>>>>>>>>
>>>>>>> While this approach is commonly recommended on the internet, I do not believe that it is that useful. Between a user and the speediest site there are a number of potential congestion points that can affect (reduce) the throughput, like bad peering. Now that said the sppedtets will report something <= the actual link speed and hence be conservative (interactivity stays great at 90% of link rate as well as 80% so underestimating the bandwidth within reason does not affect the latency gains from traffic shaping it just sacrifices a bit more bandwidth; and given the difficulty to actually measure the actually attainable bandwidth might have been effectively a decent recommendation even though the theory of it seems flawed)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Be sure to make your measurement when network is quiet, and others in your home aren’t generating traffic.
>>>>>>>>
>>>>>>>>
>>>>>>> This is great advise.
>>>>>>>
>>>>>>> I would love to comment further, but after reloading
>>>>>>>
>>>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>>>>
>>>>>>> just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces…
>>>>>>>
>>>>>>> Best
>>>>>>> Sebastian
>>>>>>>
>>>>>>>
>>>>>>> On Dec 27, 2013, at 23:09 , Rich Brown
>>>>>>>
>>>>>>> <richb.hanover@gmail.com>
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> You are a very good writer and I am on a tablet.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>> Ill take a pass at the wiki tomorrow.
>>>>>>>>>
>>>>>>>>> The shaper does up and down was my first thought...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Everyone else… Don’t let Dave hog all the fun! Read the tech note and give feedback!
>>>>>>>>
>>>>>>>> Rich
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Dec 27, 2013 10:48 AM, "Rich Brown" <richb.hanover@gmail.com>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>> I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There are still lots of open questions. Comments, please.
>>>>>>>>>
>>>>>>>>> Rich
>>>>>>>>> _______________________________________________
>>>>>>>>> Cerowrt-devel mailing list
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>>>>> _______________________________________________
>>>>>>>> Cerowrt-devel mailing list
>>>>>>>>
>>>>>>>>
>>>>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>>>> _______________________________________________
>>>>>>> Cerowrt-devel mailing list
>>>>>>>
>>>>>>>
>>>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-12-28 20:36 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-27 18:48 [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed Rich Brown
2013-12-27 19:53 ` Dave Taht
2013-12-27 22:09 ` Rich Brown
2013-12-28 10:01 ` Sebastian Moeller
2013-12-28 11:09 ` Fred Stratton
2013-12-28 13:42 ` Sebastian Moeller
2013-12-28 14:27 ` Fred Stratton
2013-12-28 19:54 ` Sebastian Moeller
2013-12-28 20:09 ` Fred Stratton
2013-12-28 20:29 ` Sebastian Moeller
2013-12-28 20:36 ` Fred Stratton
2013-12-28 14:27 ` Rich Brown
2013-12-28 20:24 ` Sebastian Moeller
2013-12-28 20:31 ` Fred Stratton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox