From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 59E1521F1ED for ; Fri, 23 Aug 2013 05:37:52 -0700 (PDT) Received: from u-089-cab204a2.am1.uni-tuebingen.de ([134.2.89.3]) by mail.gmx.com (mrgmx102) with ESMTPSA (Nemesis) id 0MYtId-1VXy0m3T8d-00VjAG for ; Fri, 23 Aug 2013 14:37:47 +0200 Content-Type: multipart/mixed; boundary="Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3" Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) From: Sebastian Moeller In-Reply-To: <20130823131653.6aa3498f@redhat.com> Date: Fri, 23 Aug 2013 14:37:47 +0200 Message-Id: <2D3FE3C2-AD17-4CDF-8DC3-D807B361EE66@gmx.de> References: <56B261F1-2277-457C-9A38-FAB89818288F@gmx.de> <2148E2EF-A119-4499-BAC1-7E647C53F077@gmx.de> <03951E31-8F11-4FB8-9558-29EAAE3DAE4D@gmx.de> <20130823092702.3171b5fd@redhat.com> <20130823131653.6aa3498f@redhat.com> To: Jesper Dangaard Brouer X-Mailer: Apple Mail (2.1508) X-Provags-ID: V03:K0:sPOFOCSDJL02wdm4RNJsBmbnDeuzE+8S0c9kDz/27zc525kvpfl P/8lA8LN3hU94YWy1MEWMONkYub5KsROmUkfGTIqWS5V1eD8uBE0uuwx+HIozzqH7u+ROYu mbP0XqdNOxD5gwIVsGWm32ZzqpayRRzj/muuPm8v61VUtIif6Ww2Uwn/wNRq8YiAnwAW7ZL 20IFO+Ro4xIrE/p1aaRuQ== Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] some kernel updates X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Aug 2013 12:37:54 -0000 --Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Hi Jesper, thanks for your time=85 On Aug 23, 2013, at 13:16 , Jesper Dangaard Brouer = wrote: > On Fri, 23 Aug 2013 12:15:12 +0200 > Sebastian Moeller wrote: >=20 >> Hi Jesper, >>=20 >>=20 >> On Aug 23, 2013, at 09:27 , Jesper Dangaard Brouer = wrote: >>=20 >>> On Thu, 22 Aug 2013 22:13:52 -0700 >>> Dave Taht wrote: >>>=20 >>>> On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller = wrote: >>>>=20 >>>>> Hi List, hi Jesper, >>>>>=20 >>>>> So I tested 3.10.9-1 to assess the status of the HTB atm link = layer >>>>> adjustments to see whether the recent changes resurrected this = feature. >>>>> Unfortunately the htb_private link layer adjustments still = is >>>>> broken (RRUL ping RTT against Toke's netperf host in Germany of = ~80ms, same >>>>> as without link layer adjustments). On the bright side the tc_stab = method >>>>> still works as well as before (ping RTT around 40ms). >>>>> I would like to humbly propose to use the tc stab method in >>>>> cerowrt to perform ATM link layer adjustments as default. To = repeat myself, >>>>> simply telling the kernel a lie about the packet size seems more = robust >>>>> than fudging HTB's rate tables. >>>=20 >>> After the (regression) commit 56b765b79 ("htb: improved accuracy at >>> high rates"), the kernel no-longer uses the rate tables. =20 >>=20 >> See, I am quite a layman here, spelunking through the tc and = kernel source code made me believe that the rate tables are still used = (I might have looked at too old versions of both repositories though). >>=20 >>>=20 >>> My commit 8a8e3d84b1719 (net_sched: restore "linklayer atm" = handling), >>> does the ATM cell overhead calculation directly on the packet = length, >>> see psched_l2t_ns() doing (DIV_ROUND_UP(len,48)*53). >>> Thus, the cell calc should actually be more precise now.... but see = below >>=20 >> Is there any way to make HTB report which link layer it assumes? >=20 > I added some print debug statements in my patch, so you can see this = in > the kernel log / dmesg. Run activate the debugging print statments: >=20 > mount -t debugfs none /sys/kernel/debug/ > echo "func __detect_linklayer +p" >> /sys/kernel/debug/dynamic_debug/control >=20 > Run your tc script, and run dmesg, or look in kernel-syslog. Ah, unfortunately I am not setup to build new kernels for the = router I am testing on, so I would hereby like to beg Dave to include = that patch in one of the next releases. Would it not a good idea to = teach tc to report the link layer for HTB as it does for stab? Having to = empirically figure out whether it is applied or not is somewhat = cumbersome... >=20 >=20 >>>=20 >>>>> Especially since the kernel already fudges >>>>> the packet size to account for the ethernet header and then some, = so this >>>>> path should receive more scrutiny by virtue of having more users? >>>=20 >>> As you mention, the default kernel path (not tc stab) fudges the = packet >>> size for Ethernet headers, AND I made a mistake (back in approx = 2006, >>> sorry) that the "overhead" cannot be a negative number. =20 >>=20 >> Mmh, does this also apply to stab? >=20 > This seems to be two question... >=20 > Yes, the Ethernet header size gets adjusted/added before the "stab" > call. > For reference > See: net/core/dev.c function __dev_xmit_skb() > Call qdisc_pkt_len_init(skb); // adjust Ethernet and account for GSO > Call qdisc_calculate_pkt_len(skb, q); // is the stab call > (ps calls __qdisc_calculate_pkt_len() in net/sched/sch_api.c) >=20 > The qdisc_pkt_len_init() call were introduced by Eric in > v3.9-rc1~139^2~411. So I look at 3.10 here: net/core/dev.c, qdisc_pkt_len_init line 2628: qdisc_skb_cb(skb)->pkt_len =3D skb->len; and in=20 line 2650: qdisc_skb_cb(skb)->pkt_len +=3D (gso_segs - 1) * hdr_len; so the adjusted size does not seem to end in skb->len and then in=20 net/sched/sch_api.c, __qdisc_calculate_pkt_len line440: pkt_len =3D skb->len + stab->szopts.overhead; So to my eyes this looks like stab is not honoring the changes made in = qdisc_pkt_len_init, no? At least I fail to see where=20 skb->len is assigned qdisc_skb_cb(skb)->pkt_len But I happily admit that I am truly a novice in these matters and easily = intimidated by C code. >=20 > Thus, in kernels >=3D 3.9, you would need to change/reduce your tc > "overhead" parameter with -14 bytes (iif you accounted encapsulated > Ethernet header before) That is what I thought before, but my kernel spelunking made me = reconsider and switch to not subtract the 14 bytes since as I understand = it the kernel actively does not do it if stab is used. >=20 > The "overhead" of stab can be negative, so no problem here, in an = "int" > for stab. >=20 >=20 >>> Meaning that >>> some ATM encap overheads simply cannot be configured correctly (as = you >>> need to subtract the ethernet header). >>=20 >> Yes, I see, luckily PPPoA and IPoA seem quite rare, and setting = the overhead to be larger than it actually is is relatively benign, as = it will overestimate packe size. >>=20 >>> (And its quite problematic to >>> change the kABI to allow for a negative overhead) >>=20 >> Again I have no clue but overhead seems to be integer, not = unsigned, so why can it not be negative? >=20 > Nope, for reference in include/uapi/linux/pkt_sched.h >=20 > This struct tc_ratespec is used by the normal "HTB/TBF" rate system, > notice "unsigned short overhead". >=20 > struct tc_ratespec { > unsigned char cell_log; > __u8 linklayer; /* lower 4 bits */ > unsigned short overhead; > short cell_align; > unsigned short mpu; > __u32 rate; > }; >=20 >=20 > This struct tc_sizespec is used by stab system, where the overhead is > an int. >=20 > struct tc_sizespec { > unsigned char cell_log; > unsigned char size_log; > short cell_align; > int overhead; > unsigned int linklayer; > unsigned int mpu; > unsigned int mtu; > unsigned int tsize; > }; Ah, good to know. >=20 >=20 >=20 >=20 >>>=20 >>> Perhaps we should change to use "tc stab" for this reason. But I'm = not >>> sure "stab" does the right thing either, and its accuracy is also >>> limited as its actually also table based. >>=20 >> But why should a table be problematic here? As long as we can = assure the table is equal or larger to the largest packet we are golden. = So either we do the manly and stupid thing and go for 9000 byte jumbo = packets for the table size. Or we assume that for the most part ATM = users will art best use baby jumbo frames (I think BT does this to allow = payload MTU 1500 in spite of PPPoE encapsulation overhead) but than we = are quite fine with the default size table maxMTU of 2048 bytes, no? >=20 >=20 > It is the GSO problem that I'm worried about. The kernel will bunch = up > packets, and that caused the length calculation issue... just disable > GSO. >=20 > ethtool -K eth63 gso off gro off tso off Oh, as always Dave has this tackled in cerowrt already, all = offloads are off (at 16000Kbit/s 2500Kbit/s there should be no need for = offloads nowadays I guess). But I see the issue now.=20 >=20 >=20 >>> We could easily change the >>> kernel to perform the ATM cell overhead calc inside "stab", and we >>> should also fix the GSO packet overhead problem. >>> (for now remember to disable GSO packets when shaping) >>=20 >> Yeah I stumbled over the fact that the stab mechanism does not = honor the kernels earlier adjustments of packet length (but I seem to be = unable to find the actual file and line where this initially is = handeled). It would seem relatively easy to make stab take the earlier = adjustment into account. Regarding GSO, I assumed that GSO will not play = nicely with a AQM anyway as a single large packet will hog too much = transfer time... >>=20 >=20 > Yes, just disable GSO ;-) Done. >=20 >=20 >>>=20 >>>> It's my hope that the atm code works but is misconfigured. You can = output >>>> the tc commands by overriding the TC variable with TC=3D"echo tc" = and paste >>>> here. >>>=20 >>> I also hope is a misconfig. Please show us the config/script. >>=20 >> Will do this later. I would be delighted if it is just me being = stupid. >>=20 >>>=20 >>> I would appreciate a link to the scripts you are using... perhaps a = git tree? >>=20 >> Unfortunately I have no git tree and no experience with git. I = do not think I will be able to set something up quickly. But I use a = modified version of cerowrt's AQM scripts which I will post later. >>=20 >=20 > Someone just point me to the cerowrt git repo... please, and point me > at simple.qos script. >=20 --Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3 Content-Disposition: attachment; filename=simple.qos Content-Type: application/octet-stream; name="simple.qos" Content-Transfer-Encoding: 7bit #!/bin/sh # Cero3 Shaper # A 3 bin tc_codel and ipv6 enabled shaping script for # ethernet gateways # Copyright (C) 2012 Michael D Taht # GPLv2 # Compared to the complexity that debloat had become # this cleanly shows a means of going from diffserv marking # to prioritization using the current tools (ip(6)tables # and tc. I note that the complexity of debloat exists for # a reason, and it is expected that script is run first # to setup various other parameters such as BQL and ethtool. # (And that the debloat script has setup the other interfaces) # You need to jiggle these parameters. Note limits are tuned towards a <10Mbit uplink <60Mbup down . /usr/lib/aqm/functions.sh ipt_setup() { ipt -t mangle -N QOS_MARK_${IFACE} ipt -t mangle -A QOS_MARK_${IFACE} -j MARK --set-mark 0x2 # You can go further with classification but... ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class CS1 -j MARK --set-mark 0x3 ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class CS6 -j MARK --set-mark 0x1 ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class EF -j MARK --set-mark 0x1 ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class AF42 -j MARK --set-mark 0x1 ipt -t mangle -A QOS_MARK_${IFACE} -m tos --tos Minimize-Delay -j MARK --set-mark 0x1 # and it might be a good idea to do it for udp tunnels too # Turn it on. Preserve classification if already performed ipt -t mangle -A POSTROUTING -o $DEV -m mark --mark 0x00 -g QOS_MARK_${IFACE} ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00 -g QOS_MARK_${IFACE} # The Syn optimization was nice but fq_codel does it for us # ipt -t mangle -A PREROUTING -i s+ -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x01 # Not sure if this will work. Encapsulation is a problem period ipt -t mangle -A PREROUTING -i vtun+ -p tcp -j MARK --set-mark 0x2 # tcp tunnels need ordering # Emanating from router, do a little more optimization # but don't bother with it too much. ipt -t mangle -A OUTPUT -p udp -m multiport --ports 123,53 -j DSCP --set-dscp-class AF42 #Not clear if the second line is needed #ipt -t mangle -A OUTPUT -o $IFACE -g QOS_MARK_${IFACE} } # TC rules egress() { CEIL=${UPLINK} PRIO_RATE=`expr $CEIL / 3` # Ceiling for prioirty BE_RATE=`expr $CEIL / 6` # Min for best effort BK_RATE=`expr $CEIL / 6` # Min for background BE_CEIL=`expr $CEIL - 64` # A little slop at the top LQ="quantum `get_mtu $IFACE`" $TC qdisc del dev $IFACE root 2> /dev/null $TC qdisc add dev $IFACE root handle 1: ${STABSTRING} htb default 12 $TC class add dev $IFACE parent 1: classid 1:1 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit $ADSLL $TC class add dev $IFACE parent 1:1 classid 1:10 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit prio 0 $ADSLL $TC class add dev $IFACE parent 1:1 classid 1:11 htb $LQ rate 128kbit ceil ${PRIO_RATE}kbit prio 1 $ADSLL $TC class add dev $IFACE parent 1:1 classid 1:12 htb $LQ rate ${BE_RATE}kbit ceil ${BE_CEIL}kbit prio 2 $ADSLL $TC class add dev $IFACE parent 1:1 classid 1:13 htb $LQ rate ${BK_RATE}kbit ceil ${BE_CEIL}kbit prio 3 $ADSLL $TC qdisc add dev $IFACE parent 1:11 handle 110: $QDISC limit 600 $NOECN `get_quantum 300` `get_flows ${PRIO_RATE}` $TC qdisc add dev $IFACE parent 1:12 handle 120: $QDISC limit 600 $NOECN `get_quantum 300` `get_flows ${BE_RATE}` $TC qdisc add dev $IFACE parent 1:13 handle 130: $QDISC limit 600 $NOECN `get_quantum 300` `get_flows ${BK_RATE}` # Need a catchall rule $TC filter add dev $IFACE parent 1:0 protocol all prio 999 u32 \ match ip protocol 0 0x00 flowid 1:12 # FIXME should probably change the filter here to do pre-nat $TC filter add dev $IFACE parent 1:0 protocol ip prio 1 handle 1 fw classid 1:11 $TC filter add dev $IFACE parent 1:0 protocol ip prio 2 handle 2 fw classid 1:12 $TC filter add dev $IFACE parent 1:0 protocol ip prio 3 handle 3 fw classid 1:13 # ipv6 support. Note that the handle indicates the fw mark bucket that is looked for $TC filter add dev $IFACE parent 1:0 protocol ipv6 prio 4 handle 1 fw classid 1:11 $TC filter add dev $IFACE parent 1:0 protocol ipv6 prio 5 handle 2 fw classid 1:12 $TC filter add dev $IFACE parent 1:0 protocol ipv6 prio 6 handle 3 fw classid 1:13 # Arp traffic $TC filter add dev $IFACE parent 1:0 protocol arp prio 7 handle 1 fw classid 1:11 } ingress() { CEIL=$DOWNLINK PRIO_RATE=`expr $CEIL / 3` # Ceiling for prioirty BE_RATE=`expr $CEIL / 6` # Min for best effort BK_RATE=`expr $CEIL / 6` # Min for background BE_CEIL=`expr $CEIL - 64` # A little slop at the top LQ="quantum `get_mtu $IFACE`" $TC qdisc del dev $IFACE handle ffff: ingress 2> /dev/null $TC qdisc add dev $IFACE handle ffff: ingress $TC qdisc del dev $DEV root 2> /dev/null $TC qdisc add dev $DEV root handle 1: ${STABSTRING} htb default 12 $TC class add dev $DEV parent 1: classid 1:1 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit $TC class add dev $DEV parent 1:1 classid 1:10 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit prio 0 $TC class add dev $DEV parent 1:1 classid 1:11 htb $LQ rate 32kbit ceil ${PRIO_RATE}kbit prio 1 $TC class add dev $DEV parent 1:1 classid 1:12 htb $LQ rate ${BE_RATE}kbit ceil ${BE_CEIL}kbit prio 2 $TC class add dev $DEV parent 1:1 classid 1:13 htb $LQ rate ${BK_RATE}kbit ceil ${BE_CEIL}kbit prio 3 # I'd prefer to use a pre-nat filter but that causes permutation... $TC qdisc add dev $DEV parent 1:11 handle 110: $QDISC limit 1000 $ECN `get_quantum 500` `get_flows ${PRIO_RATE}` $TC qdisc add dev $DEV parent 1:12 handle 120: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BE_RATE}` $TC qdisc add dev $DEV parent 1:13 handle 130: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BK_RATE}` diffserv $DEV ifconfig $DEV up # redirect all IP packets arriving in $IFACE to ifb0 $TC filter add dev $IFACE parent ffff: protocol all prio 10 u32 \ match u32 0 0 flowid 1:1 action mirred egress redirect dev $DEV } do_modules ipt_setup egress ingress # References: # This alternate shaper attempts to go for 1/u performance in a clever way # http://git.coverfire.com/?p=linux-qos-scripts.git;a=blob;f=src-3tos.sh;hb=HEAD # Comments # This does the right thing with ipv6 traffic. # It also tries to leverage diffserv to some sane extent. In particular, # the 'priority' queue is limited to 33% of the total, so EF, and IMM traffic # cannot starve other types. The rfc suggested 30%. 30% is probably # a lot in today's world. # Flaws # Many! --Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3 Content-Disposition: attachment; filename=functions.sh Content-Type: application/octet-stream; name="functions.sh" Content-Transfer-Encoding: 7bit insmod() { lsmod | grep -q ^$1 || $INSMOD $1 } ipt() { d=`echo $* | sed s/-A/-D/g` [ "$d" != "$*" ] && { iptables $d > /dev/null 2>&1 ip6tables $d > /dev/null 2>&1 } iptables $* > /dev/null 2>&1 ip6tables $* > /dev/null 2>&1 } do_modules() { insmod sch_$QDISC insmod sch_ingress insmod act_mirred insmod cls_fw insmod sch_htb } # You need to jiggle these parameters. Note limits are tuned towards a <10Mbit uplink <60Mbup down [ -z "$UPLINK" ] && UPLINK=2302 [ -z "$DOWNLINK" ] && DOWNLINK=14698 [ -z "$DEV" ] && DEV=ifb0 [ -z "$QDISC" ] && QDISC=fq_codel [ -z "$IFACE" ] && IFACE=ge00 [ -z "$LLAM" ] && LLAM="none" [ -z "$LINKLAYER" ] && LINKLAYER=ethernet [ -z "$OVERHEAD" ] && OVERHEAD=0 [ -z "$STAB_MTU" ] && STAB_MTU=2047 [ -z "$STAB_MPU" ] && STAB_MPU=0 [ -z "$STAB_TSIZE" ] && STAB_TSIZE=512 [ -z "$AUTOFLOW" ] && AUTOFLOW=0 [ -z "$AUTOECN" ] && AUTOECN=1 [ -z "$TC" ] && TC=`which tc` [ -z "$INSMOD" ] && INSMOD=`which insmod` #logger "LLAM: ${LLAM}" #logger "LINKLAYER: ${LINKLAYER}" CEIL=$UPLINK ADSLL="" if [ "$LLAM" = "htb_private" ]; then # HTB defaults to MTU 1600 and an implicit fixed TSIZE of 256 ADSLL="mpu ${STAB_MPU} linklayer ${LINKLAYER} overhead ${OVERHEAD} mtu ${STAB_MTU}" logger "ADSLL: ${ADSLL}" fi if [ "${LLAM}" = "tc_stab" ]; then STABSTRING="stab mtu ${STAB_MTU} tsize ${STAB_TSIZE} mpu ${STAB_MPU} overhead ${OVERHEAD} linklayer ${LINKLAYER}" logger "STAB: ${STABSTRING}" fi aqm_stop() { $TC qdisc del dev $IFACE ingress $TC qdisc del dev $IFACE root $TC qdisc del dev $DEV root } # Note this has side effects on the prio variable # and depends on the interface global too fc() { $TC filter add dev $interface protocol ip parent $1 prio $prio u32 match ip tos $2 0xfc classid $3 prio=$(($prio + 1)) $TC filter add dev $interface protocol ipv6 parent $1 prio $prio u32 match ip6 priority $2 0xfc classid $3 prio=$(($prio + 1)) } # FIXME: actually you need to get the underlying MTU on PPOE thing get_mtu() { F=`cat /sys/class/net/$1/mtu` if [ -z "$F" ] then echo 1500 else echo $F fi } # FIXME should also calculate the limit # Frankly I think Xfq_codel can pretty much always run with high numbers of flows # now that it does fate sharing # But right now I'm trying to match the ns2 model behavior better # So SET the autoflow variable to 1 if you want the cablelabs behavior get_flows() { if [ "$AUTOFLOW" -eq "1" ] then FLOWS=8 [ $1 -gt 999 ] && FLOWS=16 [ $1 -gt 2999 ] && FLOWS=32 [ $1 -gt 7999 ] && FLOWS=48 [ $1 -gt 9999 ] && FLOWS=64 [ $1 -gt 19999 ] && FLOWS=128 [ $1 -gt 39999 ] && FLOWS=256 [ $1 -gt 69999 ] && FLOWS=512 [ $1 -gt 99999 ] && FLOWS=1024 case $QDISC in codel|ns2_codel|pie) ;; fq_codel|*fq_codel|sfq) echo flows $FLOWS ;; esac fi } # set quantum parameter if available for this qdisc get_quantum() { case $QDISC in *fq_codel|fq_pie|drr) echo quantum $1 ;; *) ;; esac } # Set some variables to handle different qdiscs ECN="" NOECN="" # ECN is somewhat useful but it helps to have a way # to turn it on or off. Note we never do ECN on egress currently. qdisc_variants() { if [ "$AUTOECN" -eq "1" ] then case $QDISC in *codel|*pie) ECN=ecn; NOECN=noecn ;; *) ;; esac fi } qdisc_variants # This could be a complete diffserv implementation diffserv() { interface=$1 prio=1 # Catchall $TC filter add dev $interface parent 1:0 protocol all prio 999 u32 \ match ip protocol 0 0x00 flowid 1:12 # Find the most common matches fast fc 1:0 0x00 1:12 # BE fc 1:0 0x20 1:13 # CS1 fc 1:0 0x10 1:11 # IMM fc 1:0 0xb8 1:11 # EF fc 1:0 0xc0 1:11 # CS3 fc 1:0 0xe0 1:11 # CS6 fc 1:0 0x90 1:11 # AF42 (mosh) # Arp traffic $TC filter add dev $interface parent 1:0 protocol arp prio $prio handle 1 fw classid 1:11 prio=$(($prio + 1)) } --Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii >=20 > Did you add the "linklayer atm" yourself to Dave's script? Well, partly the option for HTB was already in his script but = under tested, I changed the script to add stab and to allow easier = configuration of overhead, mow, mtu and tsize (just for stab) from the = guy, but the code is Dave's. I attached the scripts. functions.sh gets = the values from the configuration GUI. I extended the way the linklayer = option strings are created, but basically it is the same method that = dave used. And I do see the right overhead values appear in "tc -d = qdisc", so at least something is reaching HTB. Sorry, that I have no = repository for easier access. =20 >=20 >=20 >=20 >=20 >>>=20 >>>>> Now, I have been testing this using Dave's most recent = cerowrt >>>>> alpha version with a 3.10.9 kernel on mips hardware, I think this = kernel >>>>> should contain all htb fixes including commit 8a8e3d84b17 = (net_sched: >>>>> restore "linklayer atm" handling) but am not fully sure. >>>>>=20 >>>>=20 >>>> It does. >>>=20 >>> It have not hit the stable tree yet, but DaveM promised he would = pass it along. >>>=20 >>> It does seem Dave Taht have my patch applied: >>> = http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/685-net_sched-re= store-linklayer-atm-handling.patch >>=20 >> Ah, good so it should have worked. >=20 > It should... >=20 >=20 >>>=20 >>>>> While I am not able to build kernels, it seems that I am able to = quickly >>>>> test whether link layer adjustments work or not. SO aim happy to = help where >>>>> I can :) >>>=20 >>> So, what is you setup lab, that allow you to test this quickly? >>=20 >> Oh, Dave and Toke are the giants on whose shoulders I stand here = (thanks guys), all I bring to the table basically is the fact that I = have an ATM carried ADSL2+ connection at home. >=20 > I will soon have a ADSL lab again, so I try and reproduce your = results. > Actually, almost while typing this email, the postman arrived at my > house and delivered a new ADSL modem... as a local Danish ISP > www.fullrate.dk have been so kind to give me a testline for free > (thanks fullrate!). This is great! Even though I am quite sure that no real DSL link = is actually required to test the effect of the link layer adjustments. >=20 >=20 >> Anyway, my theory is that proper link layer adjustments should = only show up if not performing these would make my traffic exceed my = link-speed and hence accumulate in the DSL modems bloated buffers = leading to measurable increases in latency. So I try to saturate the = both up- and down-link while measuring latency und different conditions. = SInce the worst case overhead of the ATM encapsulation approaches 50% = (with best case being around 10%) I try to test the system while shaping = to 95% percent of link rates where do expect to see an effect of the = link layer adjustments and while shaping to 50% where do not expect to = see an effect. And basically that seems to work. >=20 >> Practically, I use Toke's netsurf-wrapper project with the RRUL = test from my cerowrt router behind an ADSL2+ modem to a close netperf = server in Germany. The link layer adjustments are configured in my = cerowrt router, using Dave's simple.qos script (3 band HTB shaper with = fq_codel on each leaf, taking my overhead of 40 bytes into account and = optionally the link layer). >=20 >> It turns out that this test nicely saturates my link with 4 up >> and 4 down TCP flows ad uses a train ping probes at 0.2 second period >> to assess the latency induced by saturating the links. Now I shape = down >> to 95% and 50% of line rates and simply look at the ping RTT plot for >> different conditions. In my rig I see around 30ms ping RTT without >> load, 80ms with full saturation and no linklayer adjustments, and = 40ms >> with working link layer adjustments (hand in hand with slightly = reduced >> TCP good put just as one would expect). In my testing so far = activating >> the HTB link layer adjustments yielded the same 80ms delay I get >> without link layer adjustments. If I shape down to 50% of link rates >> HTB, stab and no link layer adjustments yield a ping RTT of ~40ms. >> Still with proper link layer adjustments the TCP good-put is reduced >> even at 50% shaping. As Dave explained with an unloaded swallow ermm, >> ping RTT and fq_codel's target set to 5ms the best case would be 30ms = + >> 2*5ms or 40ms, so I am pretty close to ideal with proper link layer >> adjustments. >>=20 >=20 >> I guess it should be possible to simply use the reduction in = good-put as an easy indicator whether the link layer adjustments work or = not. But to do this properly I would need to be able to control the size = of the sent packets which I am not, at least not with RRUL. But I am = quite sure real computer scientists could easily set something up to = test the good-put through a shaping device at differently sized packet = streams of the same bandwidth, but I digress.=20 >>=20 >> On the other hand I do not claim to be an expert in this field = in any way and my measurement method might be flawed, if you think so = please do not hesitate to let me know how I could improve it. >=20 > I have a hard time following your description... sorry. Okay, I see, let me try to present the data in a more ordered fashion: BW: bandwidth [Kbit/s] LLAM =3D link-layer adjustment method LL: link layer GP: goodput [Kbit/s] # shaped downBW (%) upBW (%) LLAM = LL ping RTT downGP upGP 1 no 16309 100 2544 100 = none none 300ms 10000 1600 2 yes 14698 90 2430 95 = none none 80ms 13600 1800 3 yes 14698 90 2430 95 = stab adsl 40ms 11600 1600 4 yes 15494 95 2430 95 = stab adsl 42ms 12400 1600 5 yes 14698 90 2430 95 = htb adsl 75ms 13200 1600 2 yes 7349 45 1215 48 = none none 45ms 6800 1000 4 yes 7349 45 1215 48 = stab adsl 42ms 5800 800 5 yes 7349 45 1215 48 = htb adsl 45ms 6600 1000 Notes: upGP is way noisier than downGP and therefore harder to estimate So condition# 3 and 4 show the best latency at high link saturation = where link layer adjustments actually make a difference, by controlling = whether the DSL modem will buffer or not At ~50% link saturation there is not much, if any effect of the link = layer adjustments on latency, but it still leaves its hallmark on good = put reduction. (The partial reduction for htb might be caused by the = specification of 40 bytes of overhead which seem to have been honored). I take the disappearance of the latency effect at 50% as a control data = point that shows my measurement approach seems sane enough. I hope this clears up the information I wanted to give you the first = time around. >=20 > So, did you get a working-low-latency setup by using 95% shaping and > "stab" linklayer adjustment? Yes. (With a 3 leaf HTB as shaper and fq_codel as disc) Best Regards Sebastian >=20 > --=20 > Best regards, > Jesper Dangaard Brouer > MSc.CS, Sr. Network Kernel Developer at Red Hat > Author of http://www.iptv-analyzer.org > LinkedIn: http://www.linkedin.com/in/brouer --Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3--