From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 4E30221F1ED for ; Fri, 23 Aug 2013 06:02:45 -0700 (PDT) Received: from compute1.internal (compute1.nyi.mail.srv.osa [10.202.2.41]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id A015F2167C; Fri, 23 Aug 2013 09:02:43 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Fri, 23 Aug 2013 09:02:43 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=imap.cc; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s=mesmtp; bh=FtbiMjlg6E5uMb84vYXQi1VLxao=; b=XjBpt5LZNm+mrjvxCmOSFnSWPRNh BXais/G0SaB/hsD2e4fYkO2tmueGJXHCZ9d3ht7VD2Xo6XlkMpOiwYULImAjxvjY 7tpdsAzolVf29NAi45+EABSFY7YRfad2gS+4+LZVgZC8cic1AHh9pptH68ie7XrT TYoo/+hMh3kiY1o= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id :references:to; s=smtpout; bh=FtbiMjlg6E5uMb84vYXQi1VLxao=; b=Sd slgBktnhnhhl7beNxy2o3c8az1vpIs2FjxL6MuzVO1XeKR9ja9/oIirUrGll2Obz jUEoCafWsQA11D+TvC5FHYO632BVqHSS3B1ll7VHyJYkDjIebheq25kOaZCraZQM sqawiks5IdbWompmQu5kaxAmfY79TUnhQkFKUZLvE= X-Sasl-enc: UVlAUwAX2IUSc8085SNC9PzMrMChfThC9r0ntiQz3CrS 1377262962 Received: from [172.30.42.15] (unknown [188.221.232.223]) by mail.messagingengine.com (Postfix) with ESMTPA id D61EFC00E83 for ; Fri, 23 Aug 2013 09:02:42 -0400 (EDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) From: Fred Stratton In-Reply-To: <2D3FE3C2-AD17-4CDF-8DC3-D807B361EE66@gmx.de> Date: Fri, 23 Aug 2013 14:02:41 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <6665470B-0F21-4383-93F6-3D5E87DDE5F3@imap.cc> References: <56B261F1-2277-457C-9A38-FAB89818288F@gmx.de> <2148E2EF-A119-4499-BAC1-7E647C53F077@gmx.de> <03951E31-8F11-4FB8-9558-29EAAE3DAE4D@gmx.de> <20130823092702.3171b5fd@redhat.com> <20130823131653.6aa3498f@redhat.com> <2D3FE3C2-AD17-4CDF-8DC3-D807B361EE66@gmx.de> To: "cerowrt-devel@lists.bufferbloat.net" X-Mailer: Apple Mail (2.1508) Subject: Re: [Cerowrt-devel] some kernel updates X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Aug 2013 13:02:45 -0000 On 23 Aug 2013, at 13:37, Sebastian Moeller wrote: > Hi Jesper, >=20 > thanks for your time=85 >=20 > On Aug 23, 2013, at 13:16 , Jesper Dangaard Brouer = wrote: >=20 >> On Fri, 23 Aug 2013 12:15:12 +0200 >> Sebastian Moeller wrote: >>=20 >>> Hi Jesper, >>>=20 >>>=20 >>> On Aug 23, 2013, at 09:27 , Jesper Dangaard Brouer = wrote: >>>=20 >>>> On Thu, 22 Aug 2013 22:13:52 -0700 >>>> Dave Taht wrote: >>>>=20 >>>>> On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller = wrote: >>>>>=20 >>>>>> Hi List, hi Jesper, >>>>>>=20 >>>>>> So I tested 3.10.9-1 to assess the status of the HTB atm link = layer >>>>>> adjustments to see whether the recent changes resurrected this = feature. >>>>>> Unfortunately the htb_private link layer adjustments still = is >>>>>> broken (RRUL ping RTT against Toke's netperf host in Germany of = ~80ms, same >>>>>> as without link layer adjustments). On the bright side the = tc_stab method >>>>>> still works as well as before (ping RTT around 40ms). >>>>>> I would like to humbly propose to use the tc stab method in >>>>>> cerowrt to perform ATM link layer adjustments as default. To = repeat myself, >>>>>> simply telling the kernel a lie about the packet size seems more = robust >>>>>> than fudging HTB's rate tables. >>>>=20 >>>> After the (regression) commit 56b765b79 ("htb: improved accuracy at >>>> high rates"), the kernel no-longer uses the rate tables. =20 >>>=20 >>> See, I am quite a layman here, spelunking through the tc and = kernel source code made me believe that the rate tables are still used = (I might have looked at too old versions of both repositories though). >>>=20 >>>>=20 >>>> My commit 8a8e3d84b1719 (net_sched: restore "linklayer atm" = handling), >>>> does the ATM cell overhead calculation directly on the packet = length, >>>> see psched_l2t_ns() doing (DIV_ROUND_UP(len,48)*53). >>>> Thus, the cell calc should actually be more precise now.... but see = below >>>=20 >>> Is there any way to make HTB report which link layer it assumes? >>=20 >> I added some print debug statements in my patch, so you can see this = in >> the kernel log / dmesg. Run activate the debugging print statments: >>=20 >> mount -t debugfs none /sys/kernel/debug/ >> echo "func __detect_linklayer +p" >>> /sys/kernel/debug/dynamic_debug/control >>=20 >> Run your tc script, and run dmesg, or look in kernel-syslog. >=20 > Ah, unfortunately I am not setup to build new kernels for the = router I am testing on, so I would hereby like to beg Dave to include = that patch in one of the next releases. Would it not a good idea to = teach tc to report the link layer for HTB as it does for stab? Having to = empirically figure out whether it is applied or not is somewhat = cumbersome... >=20 >=20 >>=20 >>=20 >>>>=20 >>>>>> Especially since the kernel already fudges >>>>>> the packet size to account for the ethernet header and then some, = so this >>>>>> path should receive more scrutiny by virtue of having more users? >>>>=20 >>>> As you mention, the default kernel path (not tc stab) fudges the = packet >>>> size for Ethernet headers, AND I made a mistake (back in approx = 2006, >>>> sorry) that the "overhead" cannot be a negative number. =20 >>>=20 >>> Mmh, does this also apply to stab? >>=20 >> This seems to be two question... >>=20 >> Yes, the Ethernet header size gets adjusted/added before the "stab" >> call. >> For reference >> See: net/core/dev.c function __dev_xmit_skb() >> Call qdisc_pkt_len_init(skb); // adjust Ethernet and account for GSO >> Call qdisc_calculate_pkt_len(skb, q); // is the stab call >> (ps calls __qdisc_calculate_pkt_len() in net/sched/sch_api.c) >>=20 >> The qdisc_pkt_len_init() call were introduced by Eric in >> v3.9-rc1~139^2~411. >=20 > So I look at 3.10 here: >=20 > net/core/dev.c, qdisc_pkt_len_init > line 2628: qdisc_skb_cb(skb)->pkt_len =3D skb->len; > and in=20 > line 2650: qdisc_skb_cb(skb)->pkt_len +=3D (gso_segs - 1) * hdr_len; > so the adjusted size does not seem to end in skb->len >=20 >=20 > and then in=20 > net/sched/sch_api.c, __qdisc_calculate_pkt_len > line440: pkt_len =3D skb->len + stab->szopts.overhead; >=20 > So to my eyes this looks like stab is not honoring the changes made in = qdisc_pkt_len_init, no? At least I fail to see where=20 > skb->len is assigned qdisc_skb_cb(skb)->pkt_len > But I happily admit that I am truly a novice in these matters and = easily intimidated by C code. >=20 >=20 >>=20 >> Thus, in kernels >=3D 3.9, you would need to change/reduce your tc >> "overhead" parameter with -14 bytes (iif you accounted encapsulated >> Ethernet header before) >=20 > That is what I thought before, but my kernel spelunking made me = reconsider and switch to not subtract the 14 bytes since as I understand = it the kernel actively does not do it if stab is used. >=20 >>=20 >> The "overhead" of stab can be negative, so no problem here, in an = "int" >> for stab. >>=20 >>=20 >>>> Meaning that >>>> some ATM encap overheads simply cannot be configured correctly (as = you >>>> need to subtract the ethernet header). >>>=20 >>> Yes, I see, luckily PPPoA and IPoA seem quite rare, and setting = the overhead to be larger than it actually is is relatively benign, as = it will overestimate packe size. As a point of information, the entire UK uses PPPoA rather than PPPoE, = and some hundreds of thousands of users IPoA. >>>=20 >>>> (And its quite problematic to >>>> change the kABI to allow for a negative overhead) >>>=20 >>> Again I have no clue but overhead seems to be integer, not = unsigned, so why can it not be negative? >>=20 >> Nope, for reference in include/uapi/linux/pkt_sched.h >>=20 >> This struct tc_ratespec is used by the normal "HTB/TBF" rate system, >> notice "unsigned short overhead". >>=20 >> struct tc_ratespec { >> unsigned char cell_log; >> __u8 linklayer; /* lower 4 bits */ >> unsigned short overhead; >> short cell_align; >> unsigned short mpu; >> __u32 rate; >> }; >>=20 >>=20 >> This struct tc_sizespec is used by stab system, where the overhead is >> an int. >>=20 >> struct tc_sizespec { >> unsigned char cell_log; >> unsigned char size_log; >> short cell_align; >> int overhead; >> unsigned int linklayer; >> unsigned int mpu; >> unsigned int mtu; >> unsigned int tsize; >> }; >=20 > Ah, good to know. >=20 >>=20 >>=20 >>=20 >>=20 >>>>=20 >>>> Perhaps we should change to use "tc stab" for this reason. But I'm = not >>>> sure "stab" does the right thing either, and its accuracy is also >>>> limited as its actually also table based. >>>=20 >>> But why should a table be problematic here? As long as we can = assure the table is equal or larger to the largest packet we are golden. = So either we do the manly and stupid thing and go for 9000 byte jumbo = packets for the table size. Or we assume that for the most part ATM = users will art best use baby jumbo frames (I think BT does this to allow = payload MTU 1500 in spite of PPPoE encapsulation overhead) but than we = are quite fine with the default size table maxMTU of 2048 bytes, no? >>=20 >>=20 >> It is the GSO problem that I'm worried about. The kernel will bunch = up >> packets, and that caused the length calculation issue... just disable >> GSO. >>=20 >> ethtool -K eth63 gso off gro off tso off >=20 > Oh, as always Dave has this tackled in cerowrt already, all = offloads are off (at 16000Kbit/s 2500Kbit/s there should be no need for = offloads nowadays I guess). But I see the issue now.=20 >=20 >=20 >>=20 >>=20 >>>> We could easily change the >>>> kernel to perform the ATM cell overhead calc inside "stab", and we >>>> should also fix the GSO packet overhead problem. >>>> (for now remember to disable GSO packets when shaping) >>>=20 >>> Yeah I stumbled over the fact that the stab mechanism does not = honor the kernels earlier adjustments of packet length (but I seem to be = unable to find the actual file and line where this initially is = handeled). It would seem relatively easy to make stab take the earlier = adjustment into account. Regarding GSO, I assumed that GSO will not play = nicely with a AQM anyway as a single large packet will hog too much = transfer time... >>>=20 >>=20 >> Yes, just disable GSO ;-) >=20 > Done. >=20 >>=20 >>=20 >>>>=20 >>>>> It's my hope that the atm code works but is misconfigured. You can = output >>>>> the tc commands by overriding the TC variable with TC=3D"echo tc" = and paste >>>>> here. >>>>=20 >>>> I also hope is a misconfig. Please show us the config/script. >>>=20 >>> Will do this later. I would be delighted if it is just me being = stupid. >>>=20 >>>>=20 >>>> I would appreciate a link to the scripts you are using... perhaps a = git tree? >>>=20 >>> Unfortunately I have no git tree and no experience with git. I = do not think I will be able to set something up quickly. But I use a = modified version of cerowrt's AQM scripts which I will post later. >>>=20 >>=20 >> Someone just point me to the cerowrt git repo... please, and point me >> at simple.qos script. >>=20 > >>=20 >>=20 >> Did you add the "linklayer atm" yourself to Dave's script? >=20 > Well, partly the option for HTB was already in his script but = under tested, I changed the script to add stab and to allow easier = configuration of overhead, mow, mtu and tsize (just for stab) from the = guy, but the code is Dave's. I attached the scripts. functions.sh gets = the values from the configuration GUI. I extended the way the linklayer = option strings are created, but basically it is the same method that = dave used. And I do see the right overhead values appear in "tc -d = qdisc", so at least something is reaching HTB. Sorry, that I have no = repository for easier access. >=20 >=20 >=20 >=20 >>=20 >>=20 >>=20 >>=20 >>>>=20 >>>>>> Now, I have been testing this using Dave's most recent = cerowrt >>>>>> alpha version with a 3.10.9 kernel on mips hardware, I think this = kernel >>>>>> should contain all htb fixes including commit 8a8e3d84b17 = (net_sched: >>>>>> restore "linklayer atm" handling) but am not fully sure. >>>>>>=20 >>>>>=20 >>>>> It does. >>>>=20 >>>> It have not hit the stable tree yet, but DaveM promised he would = pass it along. >>>>=20 >>>> It does seem Dave Taht have my patch applied: >>>> = http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/685-net_sched-re= store-linklayer-atm-handling.patch >>>=20 >>> Ah, good so it should have worked. >>=20 >> It should... >>=20 >>=20 >>>>=20 >>>>>> While I am not able to build kernels, it seems that I am able to = quickly >>>>>> test whether link layer adjustments work or not. SO aim happy to = help where >>>>>> I can :) >>>>=20 >>>> So, what is you setup lab, that allow you to test this quickly? >>>=20 >>> Oh, Dave and Toke are the giants on whose shoulders I stand here = (thanks guys), all I bring to the table basically is the fact that I = have an ATM carried ADSL2+ connection at home. >>=20 >> I will soon have a ADSL lab again, so I try and reproduce your = results. >> Actually, almost while typing this email, the postman arrived at my >> house and delivered a new ADSL modem... as a local Danish ISP >> www.fullrate.dk have been so kind to give me a testline for free >> (thanks fullrate!). >=20 > This is great! Even though I am quite sure that no real DSL link = is actually required to test the effect of the link layer adjustments. >=20 >>=20 >>=20 >>> Anyway, my theory is that proper link layer adjustments should = only show up if not performing these would make my traffic exceed my = link-speed and hence accumulate in the DSL modems bloated buffers = leading to measurable increases in latency. So I try to saturate the = both up- and down-link while measuring latency und different conditions. = SInce the worst case overhead of the ATM encapsulation approaches 50% = (with best case being around 10%) I try to test the system while shaping = to 95% percent of link rates where do expect to see an effect of the = link layer adjustments and while shaping to 50% where do not expect to = see an effect. And basically that seems to work. >>=20 >>> Practically, I use Toke's netsurf-wrapper project with the RRUL = test from my cerowrt router behind an ADSL2+ modem to a close netperf = server in Germany. The link layer adjustments are configured in my = cerowrt router, using Dave's simple.qos script (3 band HTB shaper with = fq_codel on each leaf, taking my overhead of 40 bytes into account and = optionally the link layer). >>=20 >>> It turns out that this test nicely saturates my link with 4 up >>> and 4 down TCP flows ad uses a train ping probes at 0.2 second = period >>> to assess the latency induced by saturating the links. Now I shape = down >>> to 95% and 50% of line rates and simply look at the ping RTT plot = for >>> different conditions. In my rig I see around 30ms ping RTT without >>> load, 80ms with full saturation and no linklayer adjustments, and = 40ms >>> with working link layer adjustments (hand in hand with slightly = reduced >>> TCP good put just as one would expect). In my testing so far = activating >>> the HTB link layer adjustments yielded the same 80ms delay I get >>> without link layer adjustments. If I shape down to 50% of link rates >>> HTB, stab and no link layer adjustments yield a ping RTT of ~40ms. >>> Still with proper link layer adjustments the TCP good-put is reduced >>> even at 50% shaping. As Dave explained with an unloaded swallow = ermm, >>> ping RTT and fq_codel's target set to 5ms the best case would be = 30ms + >>> 2*5ms or 40ms, so I am pretty close to ideal with proper link layer >>> adjustments. >>>=20 >>=20 >>> I guess it should be possible to simply use the reduction in = good-put as an easy indicator whether the link layer adjustments work or = not. But to do this properly I would need to be able to control the size = of the sent packets which I am not, at least not with RRUL. But I am = quite sure real computer scientists could easily set something up to = test the good-put through a shaping device at differently sized packet = streams of the same bandwidth, but I digress.=20 >>>=20 >>> On the other hand I do not claim to be an expert in this field = in any way and my measurement method might be flawed, if you think so = please do not hesitate to let me know how I could improve it. >>=20 >> I have a hard time following your description... sorry. >=20 > Okay, I see, let me try to present the data in a more ordered fashion: > BW: bandwidth [Kbit/s] > LLAM =3D link-layer adjustment method > LL: link layer > GP: goodput [Kbit/s] >=20 > # shaped downBW (%) upBW (%) LLAM = LL ping RTT downGP upGP > 1 no 16309 100 2544 100 = none none 300ms 10000 1600 > 2 yes 14698 90 2430 95 = none none 80ms 13600 1800 > 3 yes 14698 90 2430 95 = stab adsl 40ms 11600 1600 > 4 yes 15494 95 2430 95 = stab adsl 42ms 12400 1600 > 5 yes 14698 90 2430 95 = htb adsl 75ms 13200 1600 >=20 > 2 yes 7349 45 1215 48 = none none 45ms 6800 1000 > 4 yes 7349 45 1215 48 = stab adsl 42ms 5800 800 > 5 yes 7349 45 1215 48 = htb adsl 45ms 6600 1000 >=20 > Notes: upGP is way noisier than downGP and therefore harder to = estimate >=20 > So condition# 3 and 4 show the best latency at high link saturation = where link layer adjustments actually make a difference, by controlling = whether the DSL modem will buffer or not > At ~50% link saturation there is not much, if any effect of the link = layer adjustments on latency, but it still leaves its hallmark on good = put reduction. (The partial reduction for htb might be caused by the = specification of 40 bytes of overhead which seem to have been honored). > I take the disappearance of the latency effect at 50% as a control = data point that shows my measurement approach seems sane enough. > I hope this clears up the information I wanted to give you the first = time around. >=20 >=20 >=20 >>=20 >> So, did you get a working-low-latency setup by using 95% shaping and >> "stab" linklayer adjustment? >=20 > Yes. (With a 3 leaf HTB as shaper and fq_codel as disc) >=20 > Best Regards > Sebastian >=20 >>=20 >> --=20 >> Best regards, >> Jesper Dangaard Brouer >> MSc.CS, Sr. Network Kernel Developer at Red Hat >> Author of http://www.iptv-analyzer.org >> LinkedIn: http://www.linkedin.com/in/brouer >=20 > _______________________________________________ > Cerowrt-devel mailing list > Cerowrt-devel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cerowrt-devel