Re: [Cerowrt-devel] some kernel updates

Development issues regarding the cerowrt test router project
 help / color / mirror / Atom feed

From: Sebastian Moeller <moeller0@gmx.de>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: "cerowrt-devel@lists.bufferbloat.net"
	<cerowrt-devel@lists.bufferbloat.net>
Subject: Re: [Cerowrt-devel] some kernel updates
Date: Fri, 23 Aug 2013 14:37:47 +0200	[thread overview]
Message-ID: <2D3FE3C2-AD17-4CDF-8DC3-D807B361EE66@gmx.de> (raw)
In-Reply-To: <20130823131653.6aa3498f@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 8604 bytes --]

Hi Jesper,

thanks for your time…

On Aug 23, 2013, at 13:16 , Jesper Dangaard Brouer <jbrouer@redhat.com> wrote:

> On Fri, 23 Aug 2013 12:15:12 +0200
> Sebastian Moeller <moeller0@gmx.de> wrote:
> 
>> Hi Jesper,
>> 
>> 
>> On Aug 23, 2013, at 09:27 , Jesper Dangaard Brouer <jbrouer@redhat.com> wrote:
>> 
>>> On Thu, 22 Aug 2013 22:13:52 -0700
>>> Dave Taht <dave.taht@gmail.com> wrote:
>>> 
>>>> On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller <moeller0@gmx.de> wrote:
>>>> 
>>>>> Hi List, hi Jesper,
>>>>> 
>>>>> So I tested 3.10.9-1 to assess the status of the HTB atm link layer
>>>>> adjustments to see whether the recent changes resurrected this feature.
>>>>>       Unfortunately the htb_private link layer adjustments still is
>>>>> broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, same
>>>>> as without link layer adjustments). On the bright side the tc_stab method
>>>>> still works as well as before (ping RTT around 40ms).
>>>>>       I would like to humbly propose to use the tc stab method in
>>>>> cerowrt to perform ATM link layer adjustments as default. To repeat myself,
>>>>> simply telling the kernel a lie about the packet size seems more robust
>>>>> than fudging HTB's rate tables.
>>> 
>>> After the (regression) commit 56b765b79 ("htb: improved accuracy at
>>> high rates"), the kernel no-longer uses the rate tables.  
>> 
>> 	See, I am quite a layman here, spelunking through the tc and kernel source code made me believe that the rate tables are still used (I might have looked at too old versions of both repositories though).
>> 
>>> 
>>> My commit 8a8e3d84b1719 (net_sched: restore "linklayer atm" handling),
>>> does the ATM cell overhead calculation directly on the packet length,
>>> see psched_l2t_ns() doing (DIV_ROUND_UP(len,48)*53).
>>> Thus, the cell calc should actually be more precise now.... but see below
>> 
>> 	Is there any way to make HTB report which link layer it assumes?
> 
> I added some print debug statements in my patch, so you can see this in
> the kernel log / dmesg.  Run activate the debugging print statments:
> 
>  mount -t debugfs none /sys/kernel/debug/
>  echo "func __detect_linklayer +p"
>> /sys/kernel/debug/dynamic_debug/control
> 
> Run your tc script, and run dmesg, or look in kernel-syslog.

	Ah, unfortunately I am not setup to build new kernels for the router I am testing on, so I would hereby like to beg Dave to include that patch in one of the next releases. Would it not a good idea to teach tc to report the link layer for HTB as it does for stab? Having to empirically figure out whether it is applied or not is somewhat cumbersome...


> 
> 
>>> 
>>>>> Especially since the kernel already fudges
>>>>> the packet size to account for the ethernet header and then some, so this
>>>>> path should receive more scrutiny by virtue of having more users?
>>> 
>>> As you mention, the default kernel path (not tc stab) fudges the packet
>>> size for Ethernet headers, AND I made a mistake (back in approx 2006,
>>> sorry) that the "overhead" cannot be a negative number.  
>> 
>> 	Mmh, does this also apply to stab?
> 
> This seems to be two question...
> 
> Yes, the Ethernet header size gets adjusted/added before the "stab"
> call.
> For reference
> See: net/core/dev.c function __dev_xmit_skb()
> Call qdisc_pkt_len_init(skb); // adjust Ethernet and account for GSO
> Call qdisc_calculate_pkt_len(skb, q); // is the stab call
>  (ps calls __qdisc_calculate_pkt_len() in net/sched/sch_api.c)
> 
> The qdisc_pkt_len_init() call were introduced by Eric in
> v3.9-rc1~139^2~411.

	So I look at 3.10 here:

net/core/dev.c, qdisc_pkt_len_init
line 2628: 	qdisc_skb_cb(skb)->pkt_len = skb->len;
and in 
line 2650: qdisc_skb_cb(skb)->pkt_len += (gso_segs - 1) * hdr_len;
so the adjusted size does not seem to end in skb->len


and then in 
net/sched/sch_api.c, __qdisc_calculate_pkt_len
line440: pkt_len = skb->len + stab->szopts.overhead;

So to my eyes this looks like stab is not honoring the changes made in qdisc_pkt_len_init, no? At least I fail to see where 
skb->len is assigned qdisc_skb_cb(skb)->pkt_len
But I happily admit that I am truly a novice in these matters and easily intimidated by C code.


> 
> Thus, in kernels >= 3.9, you would need to change/reduce your tc
> "overhead" parameter with -14 bytes (iif you accounted encapsulated
> Ethernet header before)

	That is what I thought before, but my kernel spelunking made me reconsider and switch to not subtract the 14 bytes since as I understand it the kernel actively does not do it if stab is used.

> 
> The "overhead" of stab can be negative, so no problem here, in an "int"
> for stab.
> 
> 
>>> Meaning that
>>> some ATM encap overheads simply cannot be configured correctly (as you
>>> need to subtract the ethernet header).
>> 
>> 	Yes, I see, luckily PPPoA and IPoA seem quite rare, and setting the overhead to be larger than it actually is is relatively benign, as it will overestimate packe size.
>> 
>>> (And its quite problematic to
>>> change the kABI to allow for a negative overhead)
>> 
>> 	Again I have no clue but overhead seems to be integer, not unsigned, so why can it not be negative?
> 
> Nope, for reference in include/uapi/linux/pkt_sched.h
> 
> This struct tc_ratespec is used by the normal "HTB/TBF" rate system,
> notice "unsigned short	overhead".
> 
> struct tc_ratespec {
> 	unsigned char	cell_log;
> 	__u8		linklayer; /* lower 4 bits */
> 	unsigned short	overhead;
> 	short		cell_align;
> 	unsigned short	mpu;
> 	__u32		rate;
> };
> 
> 
> This struct tc_sizespec is used by stab system, where the overhead is
> an int.
> 
> struct tc_sizespec {
> 	unsigned char	cell_log;
> 	unsigned char	size_log;
> 	short		cell_align;
> 	int		overhead;
> 	unsigned int	linklayer;
> 	unsigned int	mpu;
> 	unsigned int	mtu;
> 	unsigned int	tsize;
> };

	Ah, good to know.

> 
> 
> 
> 
>>> 
>>> Perhaps we should change to use "tc stab" for this reason.  But I'm not
>>> sure "stab" does the right thing either, and its accuracy is also
>>> limited as its actually also table based.
>> 
>> 	But why should a table be problematic here? As long as we can assure the table is equal or larger to the largest packet we are golden. So either we do the manly and  stupid thing and go for 9000 byte jumbo packets for the table size. Or we assume that for the most part ATM users will art best use baby jumbo frames (I think BT does this to allow payload MTU 1500 in spite of PPPoE encapsulation overhead) but than we are quite fine with the default size table maxMTU of 2048 bytes, no?
> 
> 
> It is the GSO problem that I'm worried about.  The kernel will bunch up
> packets, and that caused the length calculation issue... just disable
> GSO.
> 
> ethtool -K eth63 gso off gro off tso off

	Oh, as always Dave has this tackled in cerowrt already, all offloads are off (at 16000Kbit/s 2500Kbit/s there should be no need for offloads nowadays I guess). But I see the issue now. 


> 
> 
>>> We could easily change the
>>> kernel to perform the ATM cell overhead calc inside "stab", and we
>>> should also fix the GSO packet overhead problem.
>>> (for now remember to disable GSO packets when shaping)
>> 
>> 	Yeah I stumbled over the fact that the stab mechanism does not honor the kernels earlier adjustments of packet length (but I seem to be unable to find the actual file and line where this initially is handeled). It would seem relatively easy to make stab take the earlier adjustment into account. Regarding GSO, I assumed that GSO will not play nicely with a AQM anyway as a single large packet will hog too much transfer time...
>> 
> 
> Yes, just disable GSO ;-)

	Done.

> 
> 
>>> 
>>>> It's my hope that the atm code works but is misconfigured. You can output
>>>> the tc commands by overriding the TC variable with TC="echo tc" and paste
>>>> here.
>>> 
>>> I also hope is a misconfig.  Please show us the config/script.
>> 
>> 	Will do this later. I would be delighted if it is just me being stupid.
>> 
>>> 
>>> I would appreciate a link to the scripts you are using... perhaps a git tree?
>> 
>> 	Unfortunately I have no git tree and no experience with git. I do not think I will be able to set something up quickly. But I use a modified version of cerowrt's AQM scripts which I will post later.
>> 
> 
> Someone just point me to the cerowrt git repo... please, and point me
> at simple.qos script.
> 

[-- Attachment #2: simple.qos --]
[-- Type: application/octet-stream, Size: 6424 bytes --]

#!/bin/sh
# Cero3 Shaper
# A 3 bin tc_codel and ipv6 enabled shaping script for
# ethernet gateways

# Copyright (C) 2012 Michael D Taht
# GPLv2

# Compared to the complexity that debloat had become
# this cleanly shows a means of going from diffserv marking
# to prioritization using the current tools (ip(6)tables
# and tc. I note that the complexity of debloat exists for
# a reason, and it is expected that script is run first
# to setup various other parameters such as BQL and ethtool.
# (And that the debloat script has setup the other interfaces)

# You need to jiggle these parameters. Note limits are tuned towards a <10Mbit uplink <60Mbup down




. /usr/lib/aqm/functions.sh



ipt_setup() {

ipt -t mangle -N QOS_MARK_${IFACE}

ipt -t mangle -A QOS_MARK_${IFACE} -j MARK --set-mark 0x2
# You can go further with classification but...
ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class CS1 -j MARK --set-mark 0x3
ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class CS6 -j MARK --set-mark 0x1
ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class EF -j MARK --set-mark 0x1
ipt -t mangle -A QOS_MARK_${IFACE} -m dscp --dscp-class AF42 -j MARK --set-mark 0x1
ipt -t mangle -A QOS_MARK_${IFACE} -m tos  --tos Minimize-Delay -j MARK --set-mark 0x1

# and it might be a good idea to do it for udp tunnels too

# Turn it on. Preserve classification if already performed

ipt -t mangle -A POSTROUTING -o $DEV -m mark --mark 0x00 -g QOS_MARK_${IFACE} 
ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00 -g QOS_MARK_${IFACE} 

# The Syn optimization was nice but fq_codel does it for us
# ipt -t mangle -A PREROUTING -i s+ -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x01
# Not sure if this will work. Encapsulation is a problem period

ipt -t mangle -A PREROUTING -i vtun+ -p tcp -j MARK --set-mark 0x2 # tcp tunnels need ordering

# Emanating from router, do a little more optimization
# but don't bother with it too much. 

ipt -t mangle -A OUTPUT -p udp -m multiport --ports 123,53 -j DSCP --set-dscp-class AF42

#Not clear if the second line is needed
#ipt -t mangle -A OUTPUT -o $IFACE -g QOS_MARK_${IFACE}

}


# TC rules

egress() {

CEIL=${UPLINK}
PRIO_RATE=`expr $CEIL / 3` # Ceiling for prioirty
BE_RATE=`expr $CEIL / 6`   # Min for best effort
BK_RATE=`expr $CEIL / 6`   # Min for background
BE_CEIL=`expr $CEIL - 64`  # A little slop at the top

LQ="quantum `get_mtu $IFACE`"

$TC qdisc del dev $IFACE root 2> /dev/null
$TC qdisc add dev $IFACE root handle 1: ${STABSTRING} htb default 12
$TC class add dev $IFACE parent 1: classid 1:1 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit $ADSLL
$TC class add dev $IFACE parent 1:1 classid 1:10 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit prio 0 $ADSLL
$TC class add dev $IFACE parent 1:1 classid 1:11 htb $LQ rate 128kbit ceil ${PRIO_RATE}kbit prio 1 $ADSLL
$TC class add dev $IFACE parent 1:1 classid 1:12 htb $LQ rate ${BE_RATE}kbit ceil ${BE_CEIL}kbit prio 2 $ADSLL
$TC class add dev $IFACE parent 1:1 classid 1:13 htb $LQ rate ${BK_RATE}kbit ceil ${BE_CEIL}kbit prio 3 $ADSLL

$TC qdisc add dev $IFACE parent 1:11 handle 110: $QDISC limit 600 $NOECN `get_quantum 300` `get_flows ${PRIO_RATE}`
$TC qdisc add dev $IFACE parent 1:12 handle 120: $QDISC limit 600 $NOECN `get_quantum 300` `get_flows ${BE_RATE}`
$TC qdisc add dev $IFACE parent 1:13 handle 130: $QDISC limit 600 $NOECN `get_quantum 300` `get_flows ${BK_RATE}`

# Need a catchall rule

$TC filter add dev $IFACE parent 1:0 protocol all prio 999 u32 \
        match ip protocol 0 0x00 flowid 1:12  

# FIXME should probably change the filter here to do pre-nat
        
$TC filter add dev $IFACE parent 1:0 protocol ip prio 1 handle 1 fw classid 1:11
$TC filter add dev $IFACE parent 1:0 protocol ip prio 2 handle 2 fw classid 1:12
$TC filter add dev $IFACE parent 1:0 protocol ip prio 3 handle 3 fw classid 1:13

# ipv6 support. Note that the handle indicates the fw mark bucket that is looked for

$TC filter add dev $IFACE parent 1:0 protocol ipv6 prio 4 handle 1 fw classid 1:11
$TC filter add dev $IFACE parent 1:0 protocol ipv6 prio 5 handle 2 fw classid 1:12
$TC filter add dev $IFACE parent 1:0 protocol ipv6 prio 6 handle 3 fw classid 1:13

# Arp traffic

$TC filter add dev $IFACE parent 1:0 protocol arp prio 7 handle 1 fw classid 1:11

}

ingress() {

CEIL=$DOWNLINK
PRIO_RATE=`expr $CEIL / 3` # Ceiling for prioirty
BE_RATE=`expr $CEIL / 6`   # Min for best effort
BK_RATE=`expr $CEIL / 6`   # Min for background
BE_CEIL=`expr $CEIL - 64`  # A little slop at the top

LQ="quantum `get_mtu $IFACE`"

$TC qdisc del dev $IFACE handle ffff: ingress 2> /dev/null
$TC qdisc add dev $IFACE handle ffff: ingress
 
$TC qdisc del dev $DEV root  2> /dev/null
$TC qdisc add dev $DEV root handle 1: ${STABSTRING} htb default 12
$TC class add dev $DEV parent 1: classid 1:1 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit
$TC class add dev $DEV parent 1:1 classid 1:10 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit prio 0
$TC class add dev $DEV parent 1:1 classid 1:11 htb $LQ rate 32kbit ceil ${PRIO_RATE}kbit prio 1
$TC class add dev $DEV parent 1:1 classid 1:12 htb $LQ rate ${BE_RATE}kbit ceil ${BE_CEIL}kbit prio 2
$TC class add dev $DEV parent 1:1 classid 1:13 htb $LQ rate ${BK_RATE}kbit ceil ${BE_CEIL}kbit prio 3

# I'd prefer to use a pre-nat filter but that causes permutation...

$TC qdisc add dev $DEV parent 1:11 handle 110: $QDISC limit 1000 $ECN `get_quantum 500` `get_flows ${PRIO_RATE}`
$TC qdisc add dev $DEV parent 1:12 handle 120: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BE_RATE}`
$TC qdisc add dev $DEV parent 1:13 handle 130: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BK_RATE}`

diffserv $DEV

ifconfig $DEV up

# redirect all IP packets arriving in $IFACE to ifb0 

$TC filter add dev $IFACE parent ffff: protocol all prio 10 u32 \
  match u32 0 0 flowid 1:1 action mirred egress redirect dev $DEV

}

do_modules
ipt_setup
egress 
ingress

# References:
# This alternate shaper attempts to go for 1/u performance in a clever way
# http://git.coverfire.com/?p=linux-qos-scripts.git;a=blob;f=src-3tos.sh;hb=HEAD

# Comments
# This does the right thing with ipv6 traffic.
# It also tries to leverage diffserv to some sane extent. In particular,
# the 'priority' queue is limited to 33% of the total, so EF, and IMM traffic
# cannot starve other types. The rfc suggested 30%. 30% is probably
# a lot in today's world.

# Flaws
# Many!

[-- Attachment #3: functions.sh --]
[-- Type: application/octet-stream, Size: 4122 bytes --]

insmod() {
  lsmod | grep -q ^$1 || $INSMOD $1
}

ipt() {
  d=`echo $* | sed s/-A/-D/g`
  [ "$d" != "$*" ] && {
	iptables $d > /dev/null 2>&1
	ip6tables $d > /dev/null 2>&1
  }
  iptables $* > /dev/null 2>&1
  ip6tables $* > /dev/null 2>&1
}

do_modules() {
	insmod sch_$QDISC                                          
	insmod sch_ingress                                      
	insmod act_mirred                                         
	insmod cls_fw                                          
	insmod sch_htb                                              
}
                                                                                                                  
# You need to jiggle these parameters. Note limits are tuned towards a <10Mbit uplink <60Mbup down

[ -z "$UPLINK" ] && UPLINK=2302
[ -z "$DOWNLINK" ] && DOWNLINK=14698
[ -z "$DEV" ] && DEV=ifb0
[ -z "$QDISC" ] && QDISC=fq_codel
[ -z "$IFACE" ] && IFACE=ge00
[ -z "$LLAM" ] && LLAM="none"
[ -z "$LINKLAYER" ] && LINKLAYER=ethernet
[ -z "$OVERHEAD" ] && OVERHEAD=0
[ -z "$STAB_MTU" ] && STAB_MTU=2047
[ -z "$STAB_MPU" ] && STAB_MPU=0
[ -z "$STAB_TSIZE" ] && STAB_TSIZE=512
[ -z "$AUTOFLOW" ] && AUTOFLOW=0
[ -z "$AUTOECN" ] && AUTOECN=1
[ -z "$TC" ] && TC=`which tc`
[ -z "$INSMOD" ] && INSMOD=`which insmod`


#logger "LLAM: ${LLAM}"
#logger "LINKLAYER: ${LINKLAYER}"

CEIL=$UPLINK

ADSLL=""
if [ "$LLAM" = "htb_private" ]; 
then
	# HTB defaults to MTU 1600 and an implicit fixed TSIZE of 256
	ADSLL="mpu ${STAB_MPU} linklayer ${LINKLAYER} overhead ${OVERHEAD} mtu ${STAB_MTU}"
	logger "ADSLL: ${ADSLL}"
fi

if [ "${LLAM}" = "tc_stab" ]; 
then
	STABSTRING="stab mtu ${STAB_MTU} tsize ${STAB_TSIZE} mpu ${STAB_MPU} overhead ${OVERHEAD} linklayer ${LINKLAYER}"
	logger "STAB: ${STABSTRING}"
fi


aqm_stop() {
	$TC qdisc del dev $IFACE ingress
	$TC qdisc del dev $IFACE root
	$TC qdisc del dev $DEV root
}

# Note this has side effects on the prio variable
# and depends on the interface global too

fc() {
$TC filter add dev $interface protocol ip parent $1 prio $prio u32 match ip tos $2 0xfc classid $3
prio=$(($prio + 1))
$TC filter add dev $interface protocol ipv6 parent $1 prio $prio u32 match ip6 priority $2 0xfc classid $3
prio=$(($prio + 1))
}

# FIXME: actually you need to get the underlying MTU on PPOE thing

get_mtu() {
	F=`cat /sys/class/net/$1/mtu`
	if [ -z "$F" ]
	then
	echo 1500
	else
	echo $F
	fi
}

# FIXME should also calculate the limit
# Frankly I think Xfq_codel can pretty much always run with high numbers of flows
# now that it does fate sharing
# But right now I'm trying to match the ns2 model behavior better
# So SET the autoflow variable to 1 if you want the cablelabs behavior

get_flows() {
	if [ "$AUTOFLOW" -eq "1" ] 
	then
	FLOWS=8
	[ $1 -gt 999 ] && FLOWS=16
	[ $1 -gt 2999 ] && FLOWS=32
	[ $1 -gt 7999 ] && FLOWS=48
	[ $1 -gt 9999 ] && FLOWS=64
	[ $1 -gt 19999 ] && FLOWS=128
	[ $1 -gt 39999 ] && FLOWS=256
	[ $1 -gt 69999 ] && FLOWS=512
	[ $1 -gt 99999 ] && FLOWS=1024
	case $QDISC in
		codel|ns2_codel|pie) ;;
		fq_codel|*fq_codel|sfq) echo flows $FLOWS ;;
	esac
	fi
}	

# set quantum parameter if available for this qdisc

get_quantum() {
    case $QDISC in
	*fq_codel|fq_pie|drr) echo quantum $1 ;;
	*) ;;
    esac

}

# Set some variables to handle different qdiscs

ECN=""
NOECN=""

# ECN is somewhat useful but it helps to have a way
# to turn it on or off. Note we never do ECN on egress currently.

qdisc_variants() {
    if [ "$AUTOECN" -eq "1" ]
    then
    case $QDISC in
	*codel|*pie) ECN=ecn; NOECN=noecn ;;
	*) ;;
    esac
    fi
}

qdisc_variants

# This could be a complete diffserv implementation

diffserv() {

interface=$1
prio=1

# Catchall

$TC filter add dev $interface parent 1:0 protocol all prio 999 u32 \
        match ip protocol 0 0x00 flowid 1:12

# Find the most common matches fast

fc 1:0 0x00 1:12 # BE
fc 1:0 0x20 1:13 # CS1
fc 1:0 0x10 1:11 # IMM
fc 1:0 0xb8 1:11 # EF
fc 1:0 0xc0 1:11 # CS3
fc 1:0 0xe0 1:11 # CS6
fc 1:0 0x90 1:11 # AF42 (mosh)

# Arp traffic
$TC filter add dev $interface parent 1:0 protocol arp prio $prio handle 1 fw classid 1:11
prio=$(($prio + 1))
}

[-- Attachment #4: Type: text/plain, Size: 7022 bytes --]

> 
> Did you add the "linklayer atm" yourself to Dave's script?

	Well, partly the option for HTB was already in his script but under tested, I changed the script to add stab and to allow easier configuration of overhead, mow, mtu and tsize (just for stab) from the guy, but the code is Dave's. I attached the scripts. functions.sh gets the values from the configuration GUI. I extended the way the linklayer option strings are created, but basically it is the same method that dave used. And I do see the right overhead values appear in "tc -d qdisc", so at least something is reaching HTB. Sorry, that I have no repository for easier access.

 


> 
> 
> 
> 
>>> 
>>>>>       Now, I have been testing this using Dave's most recent cerowrt
>>>>> alpha version with a 3.10.9 kernel on mips hardware, I think this kernel
>>>>> should contain all htb fixes including commit 8a8e3d84b17 (net_sched:
>>>>> restore "linklayer atm" handling) but am not fully sure.
>>>>> 
>>>> 
>>>> It does.
>>> 
>>> It have not hit the stable tree yet, but DaveM promised he would pass it along.
>>> 
>>> It does seem Dave Taht have my patch applied:
>>> http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/685-net_sched-restore-linklayer-atm-handling.patch
>> 
>> 	Ah, good so it should have worked.
> 
> It should...
> 
> 
>>> 
>>>>> While I am not able to build kernels, it seems that I am able to quickly
>>>>> test whether link layer adjustments work or not. SO aim happy to help where
>>>>> I can :)
>>> 
>>> So, what is you setup lab, that allow you to test this quickly?
>> 
>> 	Oh, Dave and Toke are the giants on whose shoulders I stand here (thanks guys), all I bring to the table basically is the fact that I have an ATM carried ADSL2+ connection at home.
> 
> I will soon have a ADSL lab again, so I try and reproduce your results.
> Actually, almost while typing this email, the postman arrived at my
> house and delivered a new ADSL modem... as a local Danish ISP
> www.fullrate.dk have been so kind to give me a testline for free
> (thanks fullrate!).

	This is great! Even though I am quite sure that no real DSL link is actually required to test the effect of the link layer adjustments.

> 
> 
>> 	Anyway, my theory is that proper link layer adjustments should only show up if not performing these would make my traffic exceed my link-speed and hence accumulate in the DSL modems bloated buffers leading to measurable increases in latency. So I try to saturate the both up- and down-link while measuring latency und different conditions. SInce the worst case overhead of the ATM encapsulation approaches 50% (with best case being around 10%) I try to test the system while shaping to 95% percent of link rates where do expect to see an effect of the link layer adjustments and while shaping to 50% where do not expect to see an effect. And basically that seems to work.
> 
>> 	Practically, I use Toke's netsurf-wrapper project with the RRUL test from my cerowrt router behind an ADSL2+ modem to a close netperf server in Germany. The link layer adjustments are configured in my cerowrt router, using Dave's simple.qos script (3 band HTB shaper with fq_codel on each leaf, taking my overhead of 40 bytes into account and optionally the link layer).
> 
>> 	It turns out that this test nicely saturates my link with 4 up
>> and 4 down TCP flows ad uses a train ping probes at 0.2 second period
>> to assess the latency induced by saturating the links. Now I shape down
>> to 95% and 50% of line rates and simply look at the ping RTT plot for
>> different conditions. In my rig I see around 30ms ping RTT without
>> load, 80ms with full saturation and no linklayer adjustments, and 40ms
>> with working link layer adjustments (hand in hand with slightly reduced
>> TCP good put just as one would expect). In my testing so far activating
>> the HTB link layer adjustments yielded the same 80ms delay I get
>> without link layer adjustments. If I shape down to 50% of link rates
>> HTB, stab and no link layer adjustments yield a ping RTT of ~40ms.
>> Still with proper link layer adjustments the TCP good-put is reduced
>> even at 50% shaping. As Dave explained with an unloaded swallow ermm,
>> ping RTT and fq_codel's target set to 5ms the best case would be 30ms +
>> 2*5ms or 40ms, so I am pretty close to ideal with proper link layer
>> adjustments.
>> 
> 
>> 	I guess it should be possible to simply use the reduction in good-put as an easy indicator whether the link layer adjustments work or not. But to do this properly I would need to be able to control the size of the sent packets which I am not, at least not with RRUL. But I am quite sure real computer scientists could easily set something up to test the good-put through a shaping device at differently sized packet streams of the same bandwidth, but I digress. 
>> 
>> 	On the other hand I do not claim to be an expert in this field in any way and my measurement method might be flawed, if you think so please do not hesitate to let me know how I could improve it.
> 
> I have a hard time following your description... sorry.

Okay, I see, let me try to present the data in a more ordered fashion:
BW: bandwidth [Kbit/s]
LLAM = link-layer adjustment method
LL: link layer
GP: goodput [Kbit/s]

#	shaped	downBW	(%)		upBW	(%)		LLAM	LL		ping RTT	downGP		upGP
1	no		16309		100		2544	100		none	none	300ms		10000 		1600
2	yes		14698		90		2430	95		none	none	80ms		13600		1800
3	yes		14698		90		2430	95		stab		adsl		40ms		11600		1600
4	yes		15494		95		2430	95		stab		adsl		42ms		12400		1600
5	yes		14698		90		2430	95		htb		adsl		75ms		13200		1600

2	yes		7349		45		1215	48		none	none	45ms		6800		1000
4	yes		7349		45		1215	48		stab		adsl		42ms		5800		800
5	yes		7349		45		1215	48		htb		adsl		45ms		6600		1000

Notes: upGP is way noisier than downGP and therefore harder to estimate

So condition# 3 and 4 show the best latency at high link saturation where link layer adjustments actually make a difference, by controlling whether the DSL modem will buffer or not
At ~50% link saturation there is not much, if any effect of the link layer adjustments on latency, but it still leaves its hallmark on good put reduction. (The partial reduction for htb might be caused by the specification of 40 bytes of overhead which seem to have been honored).
I take the disappearance of the latency effect at 50% as a control data point that shows my measurement approach seems sane enough.
I hope this clears up the information I wanted to give you the first time around.



> 
> So, did you get a working-low-latency setup by using 95% shaping and
> "stab" linklayer adjustment?

	Yes. (With a 3 leaf HTB as shaper and fq_codel as disc)

Best Regards
	Sebastian

> 
> -- 
> Best regards,
>  Jesper Dangaard Brouer
>  MSc.CS, Sr. Network Kernel Developer at Red Hat
>  Author of http://www.iptv-analyzer.org
>  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2013-08-23 12:37 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-21 18:42 Dave Taht
     [not found] ` <56B261F1-2277-457C-9A38-FAB89818288F@gmx.de>
     [not found]   ` <CAA93jw6ku0OOXzNcAUtK4UL5uc7R2zVAOKo1+Fwzmr7gCH1pzA@mail.gmail.com>
     [not found]     ` <2148E2EF-A119-4499-BAC1-7E647C53F077@gmx.de>
2013-08-23  0:52       ` Sebastian Moeller
2013-08-23  5:13         ` Dave Taht
2013-08-23  7:27           ` Jesper Dangaard Brouer
2013-08-23 10:15             ` Sebastian Moeller
2013-08-23 11:16               ` Jesper Dangaard Brouer
2013-08-23 12:37                 ` Sebastian Moeller [this message]
2013-08-23 13:02                   ` Fred Stratton
2013-08-23 19:49                     ` Sebastian Moeller
2013-08-23 15:05                   ` Jesper Dangaard Brouer
2013-08-23 17:23                   ` Toke Høiland-Jørgensen
2013-08-23 20:09                     ` Sebastian Moeller
2013-08-23 20:46                       ` Toke Høiland-Jørgensen
2013-08-24 20:51                         ` Sebastian Moeller
2013-08-23 19:51             ` Sebastian Moeller
2013-08-23  9:16           ` Sebastian Moeller
2013-08-23 19:38           ` Sebastian Moeller
2013-08-23 19:47             ` Dave Taht
2013-08-23 19:56               ` Sebastian Moeller
2013-08-23 20:29                 ` Dave Taht
2013-08-24 20:51                   ` Sebastian Moeller
2013-08-24 20:51                   ` Sebastian Moeller
2013-08-25  9:21                     ` Fred Stratton
2013-08-25 10:17                       ` Fred Stratton
2013-08-25 13:59                         ` Sebastian Moeller
2013-08-25 14:26                           ` Fred Stratton
2013-08-25 14:31                             ` Fred Stratton
2013-08-25 17:53                             ` Sebastian Moeller
2013-08-25 17:55                               ` Dave Taht
2013-08-25 18:00                                 ` Fred Stratton
2013-08-25 18:30                               ` Fred Stratton
2013-08-25 18:41                                 ` Dave Taht
2013-08-25 19:08                                   ` Fred Stratton
2013-08-25 19:31                                     ` Fred Stratton
2013-08-25 21:54                                       ` Sebastian Moeller
2013-08-25 20:28                                     ` Dave Taht
2013-08-25 21:40                                       ` Sebastian Moeller
2013-08-25 21:50                                 ` Sebastian Moeller
2013-08-27 11:10                   ` Jesper Dangaard Brouer
2013-08-27 10:45                 ` Jesper Dangaard Brouer
2013-08-30 15:46                 ` [Cerowrt-devel] some kernel updates + new userspace patch Jesper Dangaard Brouer
2013-08-27 10:42             ` [Cerowrt-devel] some kernel updates Jesper Dangaard Brouer
2013-08-24 23:08           ` Sebastian Moeller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/cerowrt-devel.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2D3FE3C2-AD17-4CDF-8DC3-D807B361EE66@gmx.de \
    --to=moeller0@gmx.de \
    --cc=cerowrt-devel@lists.bufferbloat.net \
    --cc=jbrouer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox