From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.15.15])
	(using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits))
	(Client CN "mout.gmx.net",
	Issuer "TeleSec ServerPass DE-1" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 59E1521F1ED
	for <cerowrt-devel@lists.bufferbloat.net>;
	Fri, 23 Aug 2013 05:37:52 -0700 (PDT)
Received: from u-089-cab204a2.am1.uni-tuebingen.de ([134.2.89.3]) by
	mail.gmx.com (mrgmx102) with ESMTPSA (Nemesis) id
	0MYtId-1VXy0m3T8d-00VjAG for <cerowrt-devel@lists.bufferbloat.net>;
	Fri, 23 Aug 2013 14:37:47 +0200
Content-Type: multipart/mixed;
	boundary="Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3"
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <20130823131653.6aa3498f@redhat.com>
Date: Fri, 23 Aug 2013 14:37:47 +0200
Message-Id: <2D3FE3C2-AD17-4CDF-8DC3-D807B361EE66@gmx.de>
References: <CAA93jw7rankvo0B7+HzbASYY2gBcQfa2B-_dC2z6CHhqfWSkJQ@mail.gmail.com>
	<56B261F1-2277-457C-9A38-FAB89818288F@gmx.de>
	<CAA93jw6ku0OOXzNcAUtK4UL5uc7R2zVAOKo1+Fwzmr7gCH1pzA@mail.gmail.com>
	<2148E2EF-A119-4499-BAC1-7E647C53F077@gmx.de>
	<03951E31-8F11-4FB8-9558-29EAAE3DAE4D@gmx.de>
	<CAA93jw6PS1rQqxPTfroW9J1tOrp8gWE1G98mYkKCNTAvPoYrAg@mail.gmail.com>
	<20130823092702.3171b5fd@redhat.com>
	<A1A49202-B75C-4571-9F5B-568BB3C59176@gmx.de>
	<20130823131653.6aa3498f@redhat.com>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
X-Mailer: Apple Mail (2.1508)
X-Provags-ID: V03:K0:sPOFOCSDJL02wdm4RNJsBmbnDeuzE+8S0c9kDz/27zc525kvpfl
	P/8lA8LN3hU94YWy1MEWMONkYub5KsROmUkfGTIqWS5V1eD8uBE0uuwx+HIozzqH7u+ROYu
	mbP0XqdNOxD5gwIVsGWm32ZzqpayRRzj/muuPm8v61VUtIif6Ww2Uwn/wNRq8YiAnwAW7ZL
	20IFO+Ro4xIrE/p1aaRuQ==
Cc: "cerowrt-devel@lists.bufferbloat.net" <cerowrt-devel@lists.bufferbloat.net>
Subject: Re: [Cerowrt-devel] some kernel updates
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Fri, 23 Aug 2013 12:37:54 -0000


--Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

Hi Jesper,

thanks for your time=85

On Aug 23, 2013, at 13:16 , Jesper Dangaard Brouer <jbrouer@redhat.com> =
wrote:

> On Fri, 23 Aug 2013 12:15:12 +0200
> Sebastian Moeller <moeller0@gmx.de> wrote:
>=20
>> Hi Jesper,
>>=20
>>=20
>> On Aug 23, 2013, at 09:27 , Jesper Dangaard Brouer =
<jbrouer@redhat.com> wrote:
>>=20
>>> On Thu, 22 Aug 2013 22:13:52 -0700
>>> Dave Taht <dave.taht@gmail.com> wrote:
>>>=20
>>>> On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller =
<moeller0@gmx.de> wrote:
>>>>=20
>>>>> Hi List, hi Jesper,
>>>>>=20
>>>>> So I tested 3.10.9-1 to assess the status of the HTB atm link =
layer
>>>>> adjustments to see whether the recent changes resurrected this =
feature.
>>>>>       Unfortunately the htb_private link layer adjustments still =
is
>>>>> broken (RRUL ping RTT against Toke's netperf host in Germany of =
~80ms, same
>>>>> as without link layer adjustments). On the bright side the tc_stab =
method
>>>>> still works as well as before (ping RTT around 40ms).
>>>>>       I would like to humbly propose to use the tc stab method in
>>>>> cerowrt to perform ATM link layer adjustments as default. To =
repeat myself,
>>>>> simply telling the kernel a lie about the packet size seems more =
robust
>>>>> than fudging HTB's rate tables.
>>>=20
>>> After the (regression) commit 56b765b79 ("htb: improved accuracy at
>>> high rates"), the kernel no-longer uses the rate tables. =20
>>=20
>> 	See, I am quite a layman here, spelunking through the tc and =
kernel source code made me believe that the rate tables are still used =
(I might have looked at too old versions of both repositories though).
>>=20
>>>=20
>>> My commit 8a8e3d84b1719 (net_sched: restore "linklayer atm" =
handling),
>>> does the ATM cell overhead calculation directly on the packet =
length,
>>> see psched_l2t_ns() doing (DIV_ROUND_UP(len,48)*53).
>>> Thus, the cell calc should actually be more precise now.... but see =
below
>>=20
>> 	Is there any way to make HTB report which link layer it assumes?
>=20
> I added some print debug statements in my patch, so you can see this =
in
> the kernel log / dmesg.  Run activate the debugging print statments:
>=20
>  mount -t debugfs none /sys/kernel/debug/
>  echo "func __detect_linklayer +p"
>> /sys/kernel/debug/dynamic_debug/control
>=20
> Run your tc script, and run dmesg, or look in kernel-syslog.

	Ah, unfortunately I am not setup to build new kernels for the =
router I am testing on, so I would hereby like to beg Dave to include =
that patch in one of the next releases. Would it not a good idea to =
teach tc to report the link layer for HTB as it does for stab? Having to =
empirically figure out whether it is applied or not is somewhat =
cumbersome...


>=20
>=20
>>>=20
>>>>> Especially since the kernel already fudges
>>>>> the packet size to account for the ethernet header and then some, =
so this
>>>>> path should receive more scrutiny by virtue of having more users?
>>>=20
>>> As you mention, the default kernel path (not tc stab) fudges the =
packet
>>> size for Ethernet headers, AND I made a mistake (back in approx =
2006,
>>> sorry) that the "overhead" cannot be a negative number. =20
>>=20
>> 	Mmh, does this also apply to stab?
>=20
> This seems to be two question...
>=20
> Yes, the Ethernet header size gets adjusted/added before the "stab"
> call.
> For reference
> See: net/core/dev.c function __dev_xmit_skb()
> Call qdisc_pkt_len_init(skb); // adjust Ethernet and account for GSO
> Call qdisc_calculate_pkt_len(skb, q); // is the stab call
>  (ps calls __qdisc_calculate_pkt_len() in net/sched/sch_api.c)
>=20
> The qdisc_pkt_len_init() call were introduced by Eric in
> v3.9-rc1~139^2~411.

	So I look at 3.10 here:

net/core/dev.c, qdisc_pkt_len_init
line 2628: 	qdisc_skb_cb(skb)->pkt_len =3D skb->len;
and in=20
line 2650: qdisc_skb_cb(skb)->pkt_len +=3D (gso_segs - 1) * hdr_len;
so the adjusted size does not seem to end in skb->len


and then in=20
net/sched/sch_api.c, __qdisc_calculate_pkt_len
line440: pkt_len =3D skb->len + stab->szopts.overhead;

So to my eyes this looks like stab is not honoring the changes made in =
qdisc_pkt_len_init, no? At least I fail to see where=20
skb->len is assigned qdisc_skb_cb(skb)->pkt_len
But I happily admit that I am truly a novice in these matters and easily =
intimidated by C code.


>=20
> Thus, in kernels >=3D 3.9, you would need to change/reduce your tc
> "overhead" parameter with -14 bytes (iif you accounted encapsulated
> Ethernet header before)

	That is what I thought before, but my kernel spelunking made me =
reconsider and switch to not subtract the 14 bytes since as I understand =
it the kernel actively does not do it if stab is used.

>=20
> The "overhead" of stab can be negative, so no problem here, in an =
"int"
> for stab.
>=20
>=20
>>> Meaning that
>>> some ATM encap overheads simply cannot be configured correctly (as =
you
>>> need to subtract the ethernet header).
>>=20
>> 	Yes, I see, luckily PPPoA and IPoA seem quite rare, and setting =
the overhead to be larger than it actually is is relatively benign, as =
it will overestimate packe size.
>>=20
>>> (And its quite problematic to
>>> change the kABI to allow for a negative overhead)
>>=20
>> 	Again I have no clue but overhead seems to be integer, not =
unsigned, so why can it not be negative?
>=20
> Nope, for reference in include/uapi/linux/pkt_sched.h
>=20
> This struct tc_ratespec is used by the normal "HTB/TBF" rate system,
> notice "unsigned short	overhead".
>=20
> struct tc_ratespec {
> 	unsigned char	cell_log;
> 	__u8		linklayer; /* lower 4 bits */
> 	unsigned short	overhead;
> 	short		cell_align;
> 	unsigned short	mpu;
> 	__u32		rate;
> };
>=20
>=20
> This struct tc_sizespec is used by stab system, where the overhead is
> an int.
>=20
> struct tc_sizespec {
> 	unsigned char	cell_log;
> 	unsigned char	size_log;
> 	short		cell_align;
> 	int		overhead;
> 	unsigned int	linklayer;
> 	unsigned int	mpu;
> 	unsigned int	mtu;
> 	unsigned int	tsize;
> };

	Ah, good to know.

>=20
>=20
>=20
>=20
>>>=20
>>> Perhaps we should change to use "tc stab" for this reason.  But I'm =
not
>>> sure "stab" does the right thing either, and its accuracy is also
>>> limited as its actually also table based.
>>=20
>> 	But why should a table be problematic here? As long as we can =
assure the table is equal or larger to the largest packet we are golden. =
So either we do the manly and  stupid thing and go for 9000 byte jumbo =
packets for the table size. Or we assume that for the most part ATM =
users will art best use baby jumbo frames (I think BT does this to allow =
payload MTU 1500 in spite of PPPoE encapsulation overhead) but than we =
are quite fine with the default size table maxMTU of 2048 bytes, no?
>=20
>=20
> It is the GSO problem that I'm worried about.  The kernel will bunch =
up
> packets, and that caused the length calculation issue... just disable
> GSO.
>=20
> ethtool -K eth63 gso off gro off tso off

	Oh, as always Dave has this tackled in cerowrt already, all =
offloads are off (at 16000Kbit/s 2500Kbit/s there should be no need for =
offloads nowadays I guess). But I see the issue now.=20


>=20
>=20
>>> We could easily change the
>>> kernel to perform the ATM cell overhead calc inside "stab", and we
>>> should also fix the GSO packet overhead problem.
>>> (for now remember to disable GSO packets when shaping)
>>=20
>> 	Yeah I stumbled over the fact that the stab mechanism does not =
honor the kernels earlier adjustments of packet length (but I seem to be =
unable to find the actual file and line where this initially is =
handeled). It would seem relatively easy to make stab take the earlier =
adjustment into account. Regarding GSO, I assumed that GSO will not play =
nicely with a AQM anyway as a single large packet will hog too much =
transfer time...
>>=20
>=20
> Yes, just disable GSO ;-)

	Done.

>=20
>=20
>>>=20
>>>> It's my hope that the atm code works but is misconfigured. You can =
output
>>>> the tc commands by overriding the TC variable with TC=3D"echo tc" =
and paste
>>>> here.
>>>=20
>>> I also hope is a misconfig.  Please show us the config/script.
>>=20
>> 	Will do this later. I would be delighted if it is just me being =
stupid.
>>=20
>>>=20
>>> I would appreciate a link to the scripts you are using... perhaps a =
git tree?
>>=20
>> 	Unfortunately I have no git tree and no experience with git. I =
do not think I will be able to set something up quickly. But I use a =
modified version of cerowrt's AQM scripts which I will post later.
>>=20
>=20
> Someone just point me to the cerowrt git repo... please, and point me
> at simple.qos script.
>=20

--Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3
Content-Disposition: attachment;
	filename=simple.qos
Content-Type: application/octet-stream;
	name="simple.qos"
Content-Transfer-Encoding: 7bit

#!/bin/sh
# Cero3 Shaper
# A 3 bin tc_codel and ipv6 enabled shaping script for
# ethernet gateways

# Copyright (C) 2012 Michael D Taht
# GPLv2

# Compared to the complexity that debloat had become
# this cleanly shows a means of going from diffserv marking
# to prioritization using the current tools (ip(6)tables
# and tc. I note that the complexity of debloat exists for
# a reason, and it is expected that script is run first
# to setup various other parameters such as BQL and ethtool.
# (And that the debloat script has setup the other interfaces)

# You need to jiggle these parameters. Note limits are tuned towards a <10Mbit uplink <60Mbup down


. /usr/lib/aqm/functions.sh


ipt_setup() {

ipt -t mangle -N QOS_MARK_${IFACE}

ipt -t mangle -A QOS_MARK_${IFACE} -j MARK --set-mark 0x2
# You can go further with classification but...
ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class CS1 -j MARK --set-mark 0x3
ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class CS6 -j MARK --set-mark 0x1
ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class EF -j MARK --set-mark 0x1
ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class AF42 -j MARK --set-mark 0x1
ipt -t mangle -A QOS_MARK_${IFACE} -m tos  --tos Minimize-Delay -j MARK --set-mark 0x1

# and it might be a good idea to do it for udp tunnels too

# Turn it on. Preserve classification if already performed

ipt -t mangle -A POSTROUTING -o $DEV -m mark --mark 0x00 -g QOS_MARK_${IFACE} 
ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00 -g QOS_MARK_${IFACE} 

# The Syn optimization was nice but fq_codel does it for us
# ipt -t mangle -A PREROUTING -i s+ -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x01
# Not sure if this will work. Encapsulation is a problem period

ipt -t mangle -A PREROUTING -i vtun+ -p tcp -j MARK --set-mark 0x2 # tcp tunnels need ordering

# Emanating from router, do a little more optimization
# but don't bother with it too much. 

ipt -t mangle -A OUTPUT -p udp -m multiport --ports 123,53 -j DSCP --set-dscp-class AF42

#Not clear if the second line is needed
#ipt -t mangle -A OUTPUT -o $IFACE -g QOS_MARK_${IFACE}

}


# TC rules

egress() {

CEIL=${UPLINK}
PRIO_RATE=`expr $CEIL / 3` # Ceiling for prioirty
BE_RATE=`expr $CEIL / 6`   # Min for best effort
BK_RATE=`expr $CEIL / 6`   # Min for background
BE_CEIL=`expr $CEIL - 64`  # A little slop at the top

LQ="quantum `get_mtu $IFACE`"

$TC qdisc del dev $IFACE root 2> /dev/null
$TC qdisc add dev $IFACE root handle 1: ${STABSTRING} htb default 12
$TC class add dev $IFACE parent 1: classid 1:1 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit $ADSLL
$TC class add dev $IFACE parent 1:1 classid 1:10 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit prio 0 $ADSLL
$TC class add dev $IFACE parent 1:1 classid 1:11 htb $LQ rate 128kbit ceil ${PRIO_RATE}kbit prio 1 $ADSLL
$TC class add dev $IFACE parent 1:1 classid 1:12 htb $LQ rate ${BE_RATE}kbit ceil ${BE_CEIL}kbit prio 2 $ADSLL
$TC class add dev $IFACE parent 1:1 classid 1:13 htb $LQ rate ${BK_RATE}kbit ceil ${BE_CEIL}kbit prio 3 $ADSLL

$TC qdisc add dev $IFACE parent 1:11 handle 110: $QDISC limit 600 $NOECN `get_quantum 300` `get_flows ${PRIO_RATE}`
$TC qdisc add dev $IFACE parent 1:12 handle 120: $QDISC limit 600 $NOECN `get_quantum 300` `get_flows ${BE_RATE}`
$TC qdisc add dev $IFACE parent 1:13 handle 130: $QDISC limit 600 $NOECN `get_quantum 300` `get_flows ${BK_RATE}`

# Need a catchall rule

$TC filter add dev $IFACE parent 1:0 protocol all prio 999 u32 \
        match ip protocol 0 0x00 flowid 1:12  

# FIXME should probably change the filter here to do pre-nat
        
$TC filter add dev $IFACE parent 1:0 protocol ip prio 1 handle 1 fw classid 1:11
$TC filter add dev $IFACE parent 1:0 protocol ip prio 2 handle 2 fw classid 1:12
$TC filter add dev $IFACE parent 1:0 protocol ip prio 3 handle 3 fw classid 1:13

# ipv6 support. Note that the handle indicates the fw mark bucket that is looked for

$TC filter add dev $IFACE parent 1:0 protocol ipv6 prio 4 handle 1 fw classid 1:11
$TC filter add dev $IFACE parent 1:0 protocol ipv6 prio 5 handle 2 fw classid 1:12
$TC filter add dev $IFACE parent 1:0 protocol ipv6 prio 6 handle 3 fw classid 1:13

# Arp traffic

$TC filter add dev $IFACE parent 1:0 protocol arp prio 7 handle 1 fw classid 1:11

}

ingress() {

CEIL=$DOWNLINK
PRIO_RATE=`expr $CEIL / 3` # Ceiling for prioirty
BE_RATE=`expr $CEIL / 6`   # Min for best effort
BK_RATE=`expr $CEIL / 6`   # Min for background
BE_CEIL=`expr $CEIL - 64`  # A little slop at the top

LQ="quantum `get_mtu $IFACE`"

$TC qdisc del dev $IFACE handle ffff: ingress 2> /dev/null
$TC qdisc add dev $IFACE handle ffff: ingress
 
$TC qdisc del dev $DEV root  2> /dev/null
$TC qdisc add dev $DEV root handle 1: ${STABSTRING} htb default 12
$TC class add dev $DEV parent 1: classid 1:1 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit
$TC class add dev $DEV parent 1:1 classid 1:10 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit prio 0
$TC class add dev $DEV parent 1:1 classid 1:11 htb $LQ rate 32kbit ceil ${PRIO_RATE}kbit prio 1
$TC class add dev $DEV parent 1:1 classid 1:12 htb $LQ rate ${BE_RATE}kbit ceil ${BE_CEIL}kbit prio 2
$TC class add dev $DEV parent 1:1 classid 1:13 htb $LQ rate ${BK_RATE}kbit ceil ${BE_CEIL}kbit prio 3

# I'd prefer to use a pre-nat filter but that causes permutation...

$TC qdisc add dev $DEV parent 1:11 handle 110: $QDISC limit 1000 $ECN `get_quantum 500` `get_flows ${PRIO_RATE}`
$TC qdisc add dev $DEV parent 1:12 handle 120: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BE_RATE}`
$TC qdisc add dev $DEV parent 1:13 handle 130: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BK_RATE}`

diffserv $DEV

ifconfig $DEV up

# redirect all IP packets arriving in $IFACE to ifb0 

$TC filter add dev $IFACE parent ffff: protocol all prio 10 u32 \
  match u32 0 0 flowid 1:1 action mirred egress redirect dev $DEV

}

do_modules
ipt_setup
egress 
ingress

# References:
# This alternate shaper attempts to go for 1/u performance in a clever way
# http://git.coverfire.com/?p=linux-qos-scripts.git;a=blob;f=src-3tos.sh;hb=HEAD

# Comments
# This does the right thing with ipv6 traffic.
# It also tries to leverage diffserv to some sane extent. In particular,
# the 'priority' queue is limited to 33% of the total, so EF, and IMM traffic
# cannot starve other types. The rfc suggested 30%. 30% is probably
# a lot in today's world.

# Flaws
# Many!

--Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3
Content-Disposition: attachment;
	filename=functions.sh
Content-Type: application/octet-stream;
	name="functions.sh"
Content-Transfer-Encoding: 7bit

insmod() {
  lsmod | grep -q ^$1 || $INSMOD $1
}

ipt() {
  d=`echo $* | sed s/-A/-D/g`
  [ "$d" != "$*" ] && {
	iptables $d > /dev/null 2>&1
	ip6tables $d > /dev/null 2>&1
  }
  iptables $* > /dev/null 2>&1
  ip6tables $* > /dev/null 2>&1
}

do_modules() {
	insmod sch_$QDISC                                          
	insmod sch_ingress                                      
	insmod act_mirred                                         
	insmod cls_fw                                          
	insmod sch_htb                                              
}
                                                                                                                  
# You need to jiggle these parameters. Note limits are tuned towards a <10Mbit uplink <60Mbup down

[ -z "$UPLINK" ] && UPLINK=2302
[ -z "$DOWNLINK" ] && DOWNLINK=14698
[ -z "$DEV" ] && DEV=ifb0
[ -z "$QDISC" ] && QDISC=fq_codel
[ -z "$IFACE" ] && IFACE=ge00
[ -z "$LLAM" ] && LLAM="none"
[ -z "$LINKLAYER" ] && LINKLAYER=ethernet
[ -z "$OVERHEAD" ] && OVERHEAD=0
[ -z "$STAB_MTU" ] && STAB_MTU=2047
[ -z "$STAB_MPU" ] && STAB_MPU=0
[ -z "$STAB_TSIZE" ] && STAB_TSIZE=512
[ -z "$AUTOFLOW" ] && AUTOFLOW=0
[ -z "$AUTOECN" ] && AUTOECN=1
[ -z "$TC" ] && TC=`which tc`
[ -z "$INSMOD" ] && INSMOD=`which insmod`


#logger "LLAM: ${LLAM}"
#logger "LINKLAYER: ${LINKLAYER}"

CEIL=$UPLINK

ADSLL=""
if [ "$LLAM" = "htb_private" ]; 
then
	# HTB defaults to MTU 1600 and an implicit fixed TSIZE of 256
	ADSLL="mpu ${STAB_MPU} linklayer ${LINKLAYER} overhead ${OVERHEAD} mtu ${STAB_MTU}"
	logger "ADSLL: ${ADSLL}"
fi

if [ "${LLAM}" = "tc_stab" ]; 
then
	STABSTRING="stab mtu ${STAB_MTU} tsize ${STAB_TSIZE} mpu ${STAB_MPU} overhead ${OVERHEAD} linklayer ${LINKLAYER}"
	logger "STAB: ${STABSTRING}"
fi


aqm_stop() {
	$TC qdisc del dev $IFACE ingress
	$TC qdisc del dev $IFACE root
	$TC qdisc del dev $DEV root
}

# Note this has side effects on the prio variable
# and depends on the interface global too

fc() {
$TC filter add dev $interface protocol ip parent $1 prio $prio u32 match ip tos $2 0xfc classid $3
prio=$(($prio + 1))
$TC filter add dev $interface protocol ipv6 parent $1 prio $prio u32 match ip6 priority $2 0xfc classid $3
prio=$(($prio + 1))
}

# FIXME: actually you need to get the underlying MTU on PPOE thing

get_mtu() {
	F=`cat /sys/class/net/$1/mtu`
	if [ -z "$F" ]
	then
	echo 1500
	else
	echo $F
	fi
}

# FIXME should also calculate the limit
# Frankly I think Xfq_codel can pretty much always run with high numbers of flows
# now that it does fate sharing
# But right now I'm trying to match the ns2 model behavior better
# So SET the autoflow variable to 1 if you want the cablelabs behavior

get_flows() {
	if [ "$AUTOFLOW" -eq "1" ] 
	then
	FLOWS=8
	[ $1 -gt 999 ] && FLOWS=16
	[ $1 -gt 2999 ] && FLOWS=32
	[ $1 -gt 7999 ] && FLOWS=48
	[ $1 -gt 9999 ] && FLOWS=64
	[ $1 -gt 19999 ] && FLOWS=128
	[ $1 -gt 39999 ] && FLOWS=256
	[ $1 -gt 69999 ] && FLOWS=512
	[ $1 -gt 99999 ] && FLOWS=1024
	case $QDISC in
		codel|ns2_codel|pie) ;;
		fq_codel|*fq_codel|sfq) echo flows $FLOWS ;;
	esac
	fi
}	

# set quantum parameter if available for this qdisc

get_quantum() {
    case $QDISC in
	*fq_codel|fq_pie|drr) echo quantum $1 ;;
	*) ;;
    esac

}

# Set some variables to handle different qdiscs

ECN=""
NOECN=""

# ECN is somewhat useful but it helps to have a way
# to turn it on or off. Note we never do ECN on egress currently.

qdisc_variants() {
    if [ "$AUTOECN" -eq "1" ]
    then
    case $QDISC in
	*codel|*pie) ECN=ecn; NOECN=noecn ;;
	*) ;;
    esac
    fi
}

qdisc_variants

# This could be a complete diffserv implementation

diffserv() {

interface=$1
prio=1

# Catchall

$TC filter add dev $interface parent 1:0 protocol all prio 999 u32 \
        match ip protocol 0 0x00 flowid 1:12

# Find the most common matches fast

fc 1:0 0x00 1:12 # BE
fc 1:0 0x20 1:13 # CS1
fc 1:0 0x10 1:11 # IMM
fc 1:0 0xb8 1:11 # EF
fc 1:0 0xc0 1:11 # CS3
fc 1:0 0xe0 1:11 # CS6
fc 1:0 0x90 1:11 # AF42 (mosh)

# Arp traffic
$TC filter add dev $interface parent 1:0 protocol arp prio $prio handle 1 fw classid 1:11
prio=$(($prio + 1))
}

--Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

>=20
> Did you add the "linklayer atm" yourself to Dave's script?

	Well, partly the option for HTB was already in his script but =
under tested, I changed the script to add stab and to allow easier =
configuration of overhead, mow, mtu and tsize (just for stab) from the =
guy, but the code is Dave's. I attached the scripts. functions.sh gets =
the values from the configuration GUI. I extended the way the linklayer =
option strings are created, but basically it is the same method that =
dave used. And I do see the right overhead values appear in "tc -d =
qdisc", so at least something is reaching HTB. Sorry, that I have no =
repository for easier access.

=20


>=20
>=20
>=20
>=20
>>>=20
>>>>>       Now, I have been testing this using Dave's most recent =
cerowrt
>>>>> alpha version with a 3.10.9 kernel on mips hardware, I think this =
kernel
>>>>> should contain all htb fixes including commit 8a8e3d84b17 =
(net_sched:
>>>>> restore "linklayer atm" handling) but am not fully sure.
>>>>>=20
>>>>=20
>>>> It does.
>>>=20
>>> It have not hit the stable tree yet, but DaveM promised he would =
pass it along.
>>>=20
>>> It does seem Dave Taht have my patch applied:
>>> =
http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/685-net_sched-re=
store-linklayer-atm-handling.patch
>>=20
>> 	Ah, good so it should have worked.
>=20
> It should...
>=20
>=20
>>>=20
>>>>> While I am not able to build kernels, it seems that I am able to =
quickly
>>>>> test whether link layer adjustments work or not. SO aim happy to =
help where
>>>>> I can :)
>>>=20
>>> So, what is you setup lab, that allow you to test this quickly?
>>=20
>> 	Oh, Dave and Toke are the giants on whose shoulders I stand here =
(thanks guys), all I bring to the table basically is the fact that I =
have an ATM carried ADSL2+ connection at home.
>=20
> I will soon have a ADSL lab again, so I try and reproduce your =
results.
> Actually, almost while typing this email, the postman arrived at my
> house and delivered a new ADSL modem... as a local Danish ISP
> www.fullrate.dk have been so kind to give me a testline for free
> (thanks fullrate!).

	This is great! Even though I am quite sure that no real DSL link =
is actually required to test the effect of the link layer adjustments.

>=20
>=20
>> 	Anyway, my theory is that proper link layer adjustments should =
only show up if not performing these would make my traffic exceed my =
link-speed and hence accumulate in the DSL modems bloated buffers =
leading to measurable increases in latency. So I try to saturate the =
both up- and down-link while measuring latency und different conditions. =
SInce the worst case overhead of the ATM encapsulation approaches 50% =
(with best case being around 10%) I try to test the system while shaping =
to 95% percent of link rates where do expect to see an effect of the =
link layer adjustments and while shaping to 50% where do not expect to =
see an effect. And basically that seems to work.
>=20
>> 	Practically, I use Toke's netsurf-wrapper project with the RRUL =
test from my cerowrt router behind an ADSL2+ modem to a close netperf =
server in Germany. The link layer adjustments are configured in my =
cerowrt router, using Dave's simple.qos script (3 band HTB shaper with =
fq_codel on each leaf, taking my overhead of 40 bytes into account and =
optionally the link layer).
>=20
>> 	It turns out that this test nicely saturates my link with 4 up
>> and 4 down TCP flows ad uses a train ping probes at 0.2 second period
>> to assess the latency induced by saturating the links. Now I shape =
down
>> to 95% and 50% of line rates and simply look at the ping RTT plot for
>> different conditions. In my rig I see around 30ms ping RTT without
>> load, 80ms with full saturation and no linklayer adjustments, and =
40ms
>> with working link layer adjustments (hand in hand with slightly =
reduced
>> TCP good put just as one would expect). In my testing so far =
activating
>> the HTB link layer adjustments yielded the same 80ms delay I get
>> without link layer adjustments. If I shape down to 50% of link rates
>> HTB, stab and no link layer adjustments yield a ping RTT of ~40ms.
>> Still with proper link layer adjustments the TCP good-put is reduced
>> even at 50% shaping. As Dave explained with an unloaded swallow ermm,
>> ping RTT and fq_codel's target set to 5ms the best case would be 30ms =
+
>> 2*5ms or 40ms, so I am pretty close to ideal with proper link layer
>> adjustments.
>>=20
>=20
>> 	I guess it should be possible to simply use the reduction in =
good-put as an easy indicator whether the link layer adjustments work or =
not. But to do this properly I would need to be able to control the size =
of the sent packets which I am not, at least not with RRUL. But I am =
quite sure real computer scientists could easily set something up to =
test the good-put through a shaping device at differently sized packet =
streams of the same bandwidth, but I digress.=20
>>=20
>> 	On the other hand I do not claim to be an expert in this field =
in any way and my measurement method might be flawed, if you think so =
please do not hesitate to let me know how I could improve it.
>=20
> I have a hard time following your description... sorry.

Okay, I see, let me try to present the data in a more ordered fashion:
BW: bandwidth [Kbit/s]
LLAM =3D link-layer adjustment method
LL: link layer
GP: goodput [Kbit/s]

#	shaped	downBW	(%)		upBW	(%)		LLAM	=
LL		ping RTT	downGP		upGP
1	no		16309		100		2544	100		=
none	none	300ms		10000 		1600
2	yes		14698		90		2430	95		=
none	none	80ms		13600		1800
3	yes		14698		90		2430	95		=
stab		adsl		40ms		11600		1600
4	yes		15494		95		2430	95		=
stab		adsl		42ms		12400		1600
5	yes		14698		90		2430	95		=
htb		adsl		75ms		13200		1600

2	yes		7349		45		1215	48		=
none	none	45ms		6800		1000
4	yes		7349		45		1215	48		=
stab		adsl		42ms		5800		800
5	yes		7349		45		1215	48		=
htb		adsl		45ms		6600		1000

Notes: upGP is way noisier than downGP and therefore harder to estimate

So condition# 3 and 4 show the best latency at high link saturation =
where link layer adjustments actually make a difference, by controlling =
whether the DSL modem will buffer or not
At ~50% link saturation there is not much, if any effect of the link =
layer adjustments on latency, but it still leaves its hallmark on good =
put reduction. (The partial reduction for htb might be caused by the =
specification of 40 bytes of overhead which seem to have been honored).
I take the disappearance of the latency effect at 50% as a control data =
point that shows my measurement approach seems sane enough.
I hope this clears up the information I wanted to give you the first =
time around.


>=20
> So, did you get a working-low-latency setup by using 95% shaping and
> "stab" linklayer adjustment?

	Yes. (With a 3 leaf HTB as shaper and fq_codel as disc)

Best Regards
	Sebastian

>=20
> --=20
> Best regards,
>  Jesper Dangaard Brouer
>  MSc.CS, Sr. Network Kernel Developer at Red Hat
>  Author of http://www.iptv-analyzer.org
>  LinkedIn: http://www.linkedin.com/in/brouer


--Apple-Mail=_A33CCBB7-E3F6-4E82-9ED5-823F7D6C8AC3--