From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 3F1A621F1D3 for ; Sat, 28 Dec 2013 11:54:31 -0800 (PST) Received: from [192.168.2.43] ([79.202.2.233]) by mail.gmx.com (mrgmx102) with ESMTPSA (Nemesis) id 0M7UUd-1VZIVb14aN-00xKkL for ; Sat, 28 Dec 2013 20:54:28 +0100 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) From: Sebastian Moeller In-Reply-To: <52BEDFC8.6000301@imap.cc> Date: Sat, 28 Dec 2013 20:54:23 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <3E4BB4E0-66CC-4026-AF8C-41D3158BCD3B@gmx.de> References: <75A7B6AE-8ECA-4FAC-B4D3-08FD14078DA2@gmail.com> <52BEB166.903@imap.cc> <48F50AF1-018A-400F-BBA4-D6F6B95B8AD2@gmx.de> <52BEDFC8.6000301@imap.cc> To: Fred Stratton X-Mailer: Apple Mail (2.1510) X-Provags-ID: V03:K0:4/7oIoyYGPWrHg+wgHKdkvU2LMItR+r6jNwcsg6Vv78Kg8Tawkt KoRZOe/dxAO0D7TzYF9tSVd2d2kFJfxJ/1O+VYucX5K6z7KOo2ACni3iChtbjK9ICZf7gYE 3yb0EQzlfYNgcbfw4390VRIit+LInThcqHRUm4Fij0NrjCjzBueV06nfYrGwInA1MjcTZUI T//VCezDXiU0REFh6pSxg== Cc: cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed. X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Dec 2013 19:54:31 -0000 Hi Fred, On Dec 28, 2013, at 15:27 , Fred Stratton wrote: >=20 > On 28/12/13 13:42, Sebastian Moeller wrote: >> Hi Fred, >>=20 >>=20 >> On Dec 28, 2013, at 12:09 , Fred Stratton=20 >> >> wrote: >>=20 >>=20 >>> IThe UK consensus fudge factor has always been 85 per cent of the = rate achieved, not 95 or 99 per cent. >>>=20 >> I know that the recommendations have been lower in the past; I = think this is partly because before Jesper Brouer's and Russels Stuart's = work to properly account for ATM "quantization" people typically had to = deal with a ~10% rate tax for the 5byte per cell overhead (48 byte = payload in 53 byte cells 90.57% useable rate) plus an additional 5% to = stochastically account for the padding of the last cell and the per = packet overhead both of which affect the effective good put way more for = small than large packets, so the 85% never worked well for all packet = sizes. My hypothesis now is since we can and do properly account for = these effects of ATM framing we can afford to start with a fudge factor = of 90% or even 95% percent. As far as I know the recommended fudge = factors are never ever explained by more than "this works = empirically"... >=20 > The fudge factors are totally empirical. IF you are proposing a more = formal approach, I shall try a 90 per cent fudge factor, although = 'current rate' varies here. My hypothesis is that we can get away with less fudge as we have = a better handle on the actual wire size. Personally, I do start at 95% = to figure out the trade-off between bandwidth loss and latency increase. >>=20 >>> Devices express 2 values: the sync rate - or 'maximum rate = attainable' - and the dynamic value of 'current rate'. >>>=20 >> The actual data rate is the relevant information for shaping, = often DSL modems report the link capacity as "maximum rate attainable" = or some such, while the actual bandwidth is limited to a rate below what = the line would support by contract (often this bandwidth reduction is = performed on the PPPoE link to the BRAS). >>=20 >>=20 >>> As the sync rate is fairly stable for any given installation - ADSL = or Fibre - this could be used as a starting value. decremented by the = traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor = applied to that. >>>=20 >> I would like to propose to use the "current rate" as starting = point, as 'maximum rate attainable' >=3D 'current rate'. >=20 > 'current rate' is still a sync rate, and so is conventionally viewed = as 15 per cent above the unmeasurable actual rate. No no, the current rate really is the current link capacity = between modem and DSLAM (or CPE and CTS), only this rate typically is = for the raw ATM stream, so we have to subtract all the additional layers = until we reach the IP layer... > As you are proposing a new approach, I shall take 90 per cent of = 'current rate' as a starting point. I would love to learn how that works put for you. Because for = all my theories about why 85% was used, the proof still is in the = (plum-) pudding... >=20 > No one in the UK uses SRA currently. One small ISP used to. That is sad, because on paper SRA looks like a good feature to = have (lower bandwidth sure beats synchronization loss). > The ISP I currently use has Dynamic Line Management, which changes = target SNR constantly. Now that is much better, as we should neuter notice nor care; I = assume that this happens on layers below ATM even. > The DSLAM is made by Infineon. =20 >=20 >=20 >>=20 >>> Fibre - FTTC - connections can suffer quite large download speed = fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon = is not confined to ADSL links. >>>=20 >> On the actual xDSL link? As far as I know no telco actually uses = SRA (seamless rate adaptation or so) so the current link speed will only = get lower not higher, so I would expect a relative stable current rate = (it might take a while, a few days to actually slowly degrade to the = highest link speed supported under all conditions, but I hope you still = get my point) > I understand the point, but do not think it is the case, from data I = have seen, but cannot find now, unfortunately. I see, maybe my assumption here is wrong, I would love to see = data though before changing my hypothesis. >>=20 >>>=20 >>> An alternative speed test is something like this >>>=20 >>>=20 >>> http://download.bethere.co.uk/downloadMeter.html >>>=20 >>>=20 >>> which, as Be has been bought by Sky, may not exist after the end of = April 2014. >>>=20 >> But, if we recommend to run speed tests we really need to advise = our users to start several concurrent up- and downloads to independent = servers to actually measure the bandwidth of our bottleneck link; often = a single server connection will not saturate a link (I seem to recall = that with TCP it is guaranteed to only reach 75% or so averaged over = time, is that correct?). >> But I think this is not the proper way to set the bandwidth for = the shaper, because upstream of our link to the ISP we have no = guaranteed bandwidth at all and just can hope the ISP is oing the right = thing AQM-wise. >>=20 >=20 > I quote the Be site as an alternative to a java based approach. I = would be very happy to see your suggestion adopted. >>=20 >>=20 >>=20 >>> =95 [What is the proper description here?] If you use PPPoE (but = not over ADSL/DSL link), PPPoATM, or bridging that isn=92t Ethernet, you = should choose [what?] and set the Per-packet Overhead to [what?] >>>=20 >>> For a PPPoA service, the PPPoA link is treated as PPPoE on the = second device, here running ceroWRT. >>>=20 >> This still means you should specify the PPPoA overhead, not = PPPoE. >=20 > I shall try the PPPoA overhead. Great, let me know how that works. >>=20 >>> The packet overhead values are written in the dubious man page for = tc_stab. >>>=20 >> The only real flaw in that man page, as far as I know, is the = fact that it indicates that the kernel will account for the 18byte = ethernet header automatically, while the kernel does no such thing = (which I hope to change). > It mentions link layer types as 'atm' ethernet' and 'adsl'. There is = no reference anywhere to the last. I do not see its relevance. If you have a look inside the source code for tc and the kernel, = you will notice that atm and adel are aliases for the same thing. I just = think that we should keep naming the thing ATM since that is the = problematic layer in the stack that causes most of the useable link rate = judgements, adel just happens to use ATM exclusively. >>=20 >>> Sebastian has a potential alternative method of formal calculation. >>>=20 >> So, I have no formal calculation method available, but an = empirical way of detecting ATM quantization as well as measuring the per = packet overhead of an ATM link.=20 >> The idea is to measure the RTT of ICMP packets of increasing = length and then displaying the distribution of RTTs by ICMP packet = length, on an ATM carrier we expect to see a step function with steps 48 = bytes apart. For non-ATM carrier we expect to rather see a smooth ramp. = By comparing the residuals of a linear fit of the data with the = residuals of the best step function fit to the data. The fit with the = lower residuals "wins". Attached you will find an example of this = approach, ping data in red (median of NNN repetitions for each ICMP = packet size), linear fit in blue, and best staircase fit in green. You = notice that data starts somewhere in a 48 byte ATM cell. Since the ATM = encapsulation overhead is maximally 44 bytes and we know the IP and ICMP = overhead of the ping probe we can calculate the overhead preceding the = IP header, which is what needs to be put in the overhead field in the = GUI. (Note where the green line intersect the y-axis at 0 bytes packet = size? this is where the IP hea >> der starts, the "missing" part of this ATM cell is the overhead). >>=20 >=20 > You are curve fitting. This is calculation. I see, that is certainly a valid way to look at it, just one = that had not occurred to me. >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >> Believe it or not, this methods works reasonable well (I tested = successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection = (overhead 32 bytes), and several PPPOE, LLC, (overhead 40) connections = (from ADSL1 @ 3008/512 to ADSL2+ @ 16402/2558)). But it takes relative = long time to measure the ping train especially at the higher rates=85 = and it requires ping time stamps with decent resolution (which rules out = windows) and my naive data acquisition scripts creates really large raw = data files. I guess I should post the code somewhere so others can test = and improve it. >> Fred I would be delighted to get a data set from your = connection, to test a known different encapsulation.=20 >>=20 >=20 > I shall try this. If successful, I shall initially pass you the raw = data. Great, but be warned this will be hundreds of megabytes. (For = production use the measurement script would need to prune the generated = log file down to the essential values=85 and potentially store the data = in binary) > I have not used MatLab since the 1980s. Lucky you, I sort of have to use matlab in my day job and hence = are most "fluent" in matlabese, but the code should also work with = octave (I tested version 3.6.4) so it should be relatively easy to run = the analysis yourself. That said, I would love to get a copy of the ping = sweep :) >>=20 >>> TYPICAL OVERHEADS >>> The following values are typical for different adsl scenarios = (based on >>> [1] and [2]): >>>=20 >>> LLC based: >>> PPPoA - 14 (PPP - 2, ATM - 12) >>> PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly = FCS - 4+padding) >>> Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - = 4+padding) >>> IPoA - 16 (ATM - 16) >>>=20 >>> VC Mux based: >>> PPPoA - 10 (PPP - 2, ATM - 8) >>> PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly = FCS - 4+padding) >>> Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - = 4+padding) >>> IPoA - 8 (ATM - 8) >>>=20 >>>=20 >>> For VC Mux based PPPoA, I am currently using an overhead of 18 for = the PPPoE setting in ceroWRT. >>>=20 >> Yeah we could put this list into the wiki, but how shall a = typical user figure out which encapsulation is used? And good luck in = figuring out whether the frame check sequence (FCS) is included or not=85 >> BTW 18, I predict that if PPPoE is only used between cerowrt and the = "modem' or gateway your effective overhead should be 10 bytes; I would = love if you could run the following against your link at night (also = attached=20 >>=20 >>=20 >>=20 >> ): >>=20 >> #! /bin/bash >> # TODO use seq or bash to generate a list of the requested sizes (to = allow for non-equidistantly spaced sizes) >>=20 >> #. >> TECH=3DADSL2 # just to give some meaning to the ping trace file name >> # finding a proper target IP is somewhat of an art, just traceroute a = remote site. >> # and find the nearest host reliably responding to pings showing the = smallet variation of pingtimes >> TARGET=3D${1} # the IP against which to run the ICMP pings >> DATESTR=3D`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential = records >> LOG=3Dping_sweep_${TECH}_${DATESTR}.txt >>=20 >>=20 >> # by default non-root ping will only end one packet per second, so = work around that by calling ping independently for each package >> # empirically figure out the shortest period still giving the = standard ping time (to avoid being slow-pathed by our target) >> PINGPERIOD=3D0.01><------># in seconds >> PINGSPERSIZE=3D10000 >>=20 >> # Start, needed to find the per packet overhead dependent on the ATM = encapsulation >> # to reiably show ATM quantization one would like to see at least two = steps, so cover a range > 2 ATM cells (so > 96 bytes) >> SWEEPMINSIZE=3D16><------># 64bit systems seem to require 16 bytes of = payload to include a timestamp... >> SWEEPMAXSIZE=3D116 >>=20 >> n_SWEEPS=3D`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}` >>=20 >> i_sweep=3D0 >> i_size=3D0 >>=20 >> echo "Running ICMP RTT measurement against: ${TARGET}" >> while [ ${i_sweep} -lt ${PINGSPERSIZE} ] >> do >> (( i_sweep++ )) >> echo "Current iteration: ${i_sweep}" >> # now loop from sweepmin to sweepmax >> i_size=3D${SWEEPMINSIZE} >> while [ ${i_size} -le ${SWEEPMAXSIZE} ] >> do >> echo "${i_sweep}. repetition of ping size ${i_size}" >> ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\ >> (( i_size++ )) >> # we need a sleep binary that allows non integer times (GNU = sleep is fine as is sleep of macosx 10.8.4) >> sleep ${PINGPERIOD} >> done >> done >> echo "Done... ($0)" >>=20 >>=20 >> This will try to run 10000 repetitions for ICMP packet sizes from 16 = to 116 bytes running (10000 * 101 * 0.01 / 60 =3D) 168 minutes, but you = should be able to stop it with ctrl c if you are not patience enough, = with your link I would estimate that 3000 should be plenty, but if you = could run it over night that would be great and then ~3 hours should not = matter much. >> And then run the following attached code in octave or matlab=20 >>=20 >>=20 >>=20 >> . Invoce with = "tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_sai= d_file')". The parser will run on the first invocation and is reallr = really slow, but further invocations should be faster. If issues arise, = let me know, I am happy to help. >>=20 >>=20 >>> Were I to use a single directly connected gateway, I would input a = suitable value for PPPoA in that openWRT firmware. >>>=20 >> I think you should do that right now. >=20 > The firmware has not yet been released. >>=20 >>> In theory, I might need to use a negative value, bmt the current = kernel does not support that. >>>=20 >> If you use tc_stab, negative overheads are fully supported, only = htb_private has overhead defined as unsigned integer and hence does not = allow negative values. >=20 > Jesper Brouer posted about this. I thought he was referring to = tc_stab. I recall having a discussion with Jesper about this topic, where = he agreed that tc_stab was not affected, only htb_private. Best Regards Sebastian >>=20 >>> I have used many different arbitrary values for overhead. All appear = to have little effect. >>>=20 >> So the issue here is that only at small packet sizes does the = overhead and last cell padding eat a disproportionate amount of your = bandwidth (64 byte packet plus 44 byte overhead plus 47 byte worst case = cell padding: 100* (44+47+64)/64 =3D 242% effective packet size to what = the shaper estimated ), at typical packet sizes the max error (44 bytes = missing overhead and potentially misjudged cell padding of 47 bytes adds = up to a theoretical 100*(44+47+1500)/1500 =3D 106% effective packet = size to what the shaper estimated). It is obvious that at 1500 byte = packets the whole ATM issue can be easily dismissed with just reducing = the link rate by ~10% for the 48 in 53 framing and an additional ~6% for = overhead and cell padding. But once you mix smaller packets in your = traffic for say VoIP, the effective wire size misjudgment will kill your = ability to control the queueing. Note that the common wisdom of shape = down to 85% might be fem the ~15% ATM "tax" on 1500 byte traffic size... >>=20 >>=20 >>> As I understand it, the current recommendation is to use tc_stab in = preference to htb_private. I do not know the basis for this value = judgement. >>>=20 >> In short: tc_stab allows negative overheads, tc_stab works with = HTB, TBF, HFSC while htb_private only works with HTB. Currently = htb_private has two advantages: it will estimate the per packet overhead = correctly of GSO (generic segmentation offload) is enabled and it will = produce exact ATM link layer estimates for all possible packet sizes. In = practice almost everyone uses an MTU of 1500 or less for their internet = access making both htb_private advantages effectively moot. (Plus if no = one beats me to it I intend to address both theoretical short coming of = tc_stab next year). >>=20 >> Best Regards >> Sebastian >>=20 >>=20 >>>=20 >>>=20 >>>=20 >>>=20 >>> On 28/12/13 10:01, Sebastian Moeller wrote: >>>=20 >>>> Hi Rich, >>>>=20 >>>> great! A few comments: >>>>=20 >>>> Basic Settings: >>>> [Is 95% the right fudge factor?] I think that ideally, if we get = can precisely measure the useable link rate even 99% of that should work = out well, to keep the queue in our device. I assume that due to the = difficulties in measuring and accounting for the link properties as link = layer and overhead people typically rely on setting the shaped rate a = bit lower than required to stochastically/empirically account for the = link properties. I predict that if we get a correct description of the = link properties to the shaper we should be fine with 95% shaping. Note = though, it is not trivial on an adel link to get the actually useable = bit rate from the modem so 95% of what can be deduced from the modem or = the ISP's invoice might be a decent proxy=85 >>>>=20 >>>> [Do we have a recommendation for an easy way to tell if it's = working? Perhaps a link to a new Quick Test for Bufferbloat page. ] The = linked page looks like a decent probe for buffer bloat. >>>>=20 >>>>=20 >>>>=20 >>>>> Basic Settings - the details... >>>>>=20 >>>>> CeroWrt is designed to manage the queues of packets waiting to be = sent across the slowest (bottleneck) link, which is usually your = connection to the Internet. >>>>>=20 >>>>>=20 >>>> I think we can only actually control the first link to the ISP, = which often happens to be the bottleneck. At a typical DSLAM (xDSL head = end station) the cumulative sold bandwidth to the customers is larger = than the back bone connection (which is called over-subscription and is = almost guaranteed to be the case in every DSLAM) which typically is not = a problem, as typically people do not use their internet that much. My = point being we can not really control congestion in the DSLAM's uplink = (as we have no idea what the reserved rate per customer is in the worst = case, if there is any). >>>>=20 >>>>=20 >>>>=20 >>>>> CeroWrt can automatically adapt to network conditions to improve = the delay/latency of data without any settings. >>>>>=20 >>>>>=20 >>>> Does this describe the default fq_codels on each interface = (except fib?)? >>>>=20 >>>>=20 >>>>=20 >>>>> However, it can do a better job if it knows more about the actual = link speeds available. You can adjust this setting by entering link = speeds that are a few percent below the actual speeds.=20 >>>>>=20 >>>>> Note: it can be difficult to get an accurate measurement of the = link speeds. The speed advertised by your provider is a starting point, = but your experience often won't meet their published specs. You can also = use a speed test program or web site like=20 >>>>>=20 >>>>> http://speedtest.net >>>>>=20 >>>>> to estimate actual operating speeds. >>>>>=20 >>>>>=20 >>>> While this approach is commonly recommended on the internet, I = do not believe that it is that useful. Between a user and the speediest = site there are a number of potential congestion points that can affect = (reduce) the throughput, like bad peering. Now that said the sppedtets = will report something <=3D the actual link speed and hence be = conservative (interactivity stays great at 90% of link rate as well as = 80% so underestimating the bandwidth within reason does not affect the = latency gains from traffic shaping it just sacrifices a bit more = bandwidth; and given the difficulty to actually measure the actually = attainable bandwidth might have been effectively a decent recommendation = even though the theory of it seems flawed) >>>>=20 >>>>=20 >>>>=20 >>>>> Be sure to make your measurement when network is quiet, and others = in your home aren=92t generating traffic. >>>>>=20 >>>>>=20 >>>> This is great advise. >>>>=20 >>>> I would love to comment further, but after reloading=20 >>>>=20 >>>> = http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWr= t_310 >>>>=20 >>>> just returns a blank page and I can not get back to the page as of = yesterday evening=85 I will have a look later to see whether the page = resurfaces=85 >>>>=20 >>>> Best >>>> Sebastian >>>>=20 >>>>=20 >>>> On Dec 27, 2013, at 23:09 , Rich Brown=20 >>>>=20 >>>> >>>>=20 >>>> wrote: >>>>=20 >>>>=20 >>>>=20 >>>>>> You are a very good writer and I am on a tablet. >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>> Thanks! >>>>>=20 >>>>>=20 >>>>>> Ill take a pass at the wiki tomorrow. >>>>>>=20 >>>>>> The shaper does up and down was my first thought... >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>> Everyone else=85 Don=92t let Dave hog all the fun! Read the tech = note and give feedback! >>>>>=20 >>>>> Rich >>>>>=20 >>>>>=20 >>>>>=20 >>>>>> On Dec 27, 2013 10:48 AM, "Rich Brown" >>>>>>=20 >>>>>> wrote: >>>>>> I updated the page to reflect the 3.10.24-8 build, and its new = GUI pages. >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> = http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWr= t_310 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> There are still lots of open questions. Comments, please. >>>>>>=20 >>>>>> Rich >>>>>> _______________________________________________ >>>>>> Cerowrt-devel mailing list >>>>>>=20 >>>>>>=20 >>>>>> Cerowrt-devel@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>>>> _______________________________________________ >>>>> Cerowrt-devel mailing list >>>>>=20 >>>>>=20 >>>>> Cerowrt-devel@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>>> _______________________________________________ >>>> Cerowrt-devel mailing list >>>>=20 >>>>=20 >>>> Cerowrt-devel@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >=20