From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <fredstratton@imap.cc>
Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com
	[66.111.4.29])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 4E30221F1ED
	for <cerowrt-devel@lists.bufferbloat.net>;
	Fri, 23 Aug 2013 06:02:45 -0700 (PDT)
Received: from compute1.internal (compute1.nyi.mail.srv.osa [10.202.2.41])
	by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id A015F2167C;
	Fri, 23 Aug 2013 09:02:43 -0400 (EDT)
Received: from frontend1 ([10.202.2.160])
	by compute1.internal (MEProxy); Fri, 23 Aug 2013 09:02:43 -0400
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=imap.cc; h=
	content-type:mime-version:subject:from:in-reply-to:date
	:content-transfer-encoding:message-id:references:to; s=mesmtp;
	bh=FtbiMjlg6E5uMb84vYXQi1VLxao=; b=XjBpt5LZNm+mrjvxCmOSFnSWPRNh
	BXais/G0SaB/hsD2e4fYkO2tmueGJXHCZ9d3ht7VD2Xo6XlkMpOiwYULImAjxvjY
	7tpdsAzolVf29NAi45+EABSFY7YRfad2gS+4+LZVgZC8cic1AHh9pptH68ie7XrT
	TYoo/+hMh3kiY1o=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=
	messagingengine.com; h=content-type:mime-version:subject:from
	:in-reply-to:date:content-transfer-encoding:message-id
	:references:to; s=smtpout; bh=FtbiMjlg6E5uMb84vYXQi1VLxao=; b=Sd
	slgBktnhnhhl7beNxy2o3c8az1vpIs2FjxL6MuzVO1XeKR9ja9/oIirUrGll2Obz
	jUEoCafWsQA11D+TvC5FHYO632BVqHSS3B1ll7VHyJYkDjIebheq25kOaZCraZQM
	sqawiks5IdbWompmQu5kaxAmfY79TUnhQkFKUZLvE=
X-Sasl-enc: UVlAUwAX2IUSc8085SNC9PzMrMChfThC9r0ntiQz3CrS 1377262962
Received: from [172.30.42.15] (unknown [188.221.232.223])
	by mail.messagingengine.com (Postfix) with ESMTPA id D61EFC00E83
	for <cerowrt-devel@lists.bufferbloat.net>;
	Fri, 23 Aug 2013 09:02:42 -0400 (EDT)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
From: Fred Stratton <fredstratton@imap.cc>
In-Reply-To: <2D3FE3C2-AD17-4CDF-8DC3-D807B361EE66@gmx.de>
Date: Fri, 23 Aug 2013 14:02:41 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <6665470B-0F21-4383-93F6-3D5E87DDE5F3@imap.cc>
References: <CAA93jw7rankvo0B7+HzbASYY2gBcQfa2B-_dC2z6CHhqfWSkJQ@mail.gmail.com>
	<56B261F1-2277-457C-9A38-FAB89818288F@gmx.de>
	<CAA93jw6ku0OOXzNcAUtK4UL5uc7R2zVAOKo1+Fwzmr7gCH1pzA@mail.gmail.com>
	<2148E2EF-A119-4499-BAC1-7E647C53F077@gmx.de>
	<03951E31-8F11-4FB8-9558-29EAAE3DAE4D@gmx.de>
	<CAA93jw6PS1rQqxPTfroW9J1tOrp8gWE1G98mYkKCNTAvPoYrAg@mail.gmail.com>
	<20130823092702.3171b5fd@redhat.com>
	<A1A49202-B75C-4571-9F5B-568BB3C59176@gmx.de>
	<20130823131653.6aa3498f@redhat.com>
	<2D3FE3C2-AD17-4CDF-8DC3-D807B361EE66@gmx.de>
To: "cerowrt-devel@lists.bufferbloat.net" <cerowrt-devel@lists.bufferbloat.net>
X-Mailer: Apple Mail (2.1508)
Subject: Re: [Cerowrt-devel] some kernel updates
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Fri, 23 Aug 2013 13:02:45 -0000


On 23 Aug 2013, at 13:37, Sebastian Moeller <moeller0@gmx.de> wrote:

> Hi Jesper,
>=20
> thanks for your time=85
>=20
> On Aug 23, 2013, at 13:16 , Jesper Dangaard Brouer =
<jbrouer@redhat.com> wrote:
>=20
>> On Fri, 23 Aug 2013 12:15:12 +0200
>> Sebastian Moeller <moeller0@gmx.de> wrote:
>>=20
>>> Hi Jesper,
>>>=20
>>>=20
>>> On Aug 23, 2013, at 09:27 , Jesper Dangaard Brouer =
<jbrouer@redhat.com> wrote:
>>>=20
>>>> On Thu, 22 Aug 2013 22:13:52 -0700
>>>> Dave Taht <dave.taht@gmail.com> wrote:
>>>>=20
>>>>> On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller =
<moeller0@gmx.de> wrote:
>>>>>=20
>>>>>> Hi List, hi Jesper,
>>>>>>=20
>>>>>> So I tested 3.10.9-1 to assess the status of the HTB atm link =
layer
>>>>>> adjustments to see whether the recent changes resurrected this =
feature.
>>>>>>      Unfortunately the htb_private link layer adjustments still =
is
>>>>>> broken (RRUL ping RTT against Toke's netperf host in Germany of =
~80ms, same
>>>>>> as without link layer adjustments). On the bright side the =
tc_stab method
>>>>>> still works as well as before (ping RTT around 40ms).
>>>>>>      I would like to humbly propose to use the tc stab method in
>>>>>> cerowrt to perform ATM link layer adjustments as default. To =
repeat myself,
>>>>>> simply telling the kernel a lie about the packet size seems more =
robust
>>>>>> than fudging HTB's rate tables.
>>>>=20
>>>> After the (regression) commit 56b765b79 ("htb: improved accuracy at
>>>> high rates"), the kernel no-longer uses the rate tables. =20
>>>=20
>>> 	See, I am quite a layman here, spelunking through the tc and =
kernel source code made me believe that the rate tables are still used =
(I might have looked at too old versions of both repositories though).
>>>=20
>>>>=20
>>>> My commit 8a8e3d84b1719 (net_sched: restore "linklayer atm" =
handling),
>>>> does the ATM cell overhead calculation directly on the packet =
length,
>>>> see psched_l2t_ns() doing (DIV_ROUND_UP(len,48)*53).
>>>> Thus, the cell calc should actually be more precise now.... but see =
below
>>>=20
>>> 	Is there any way to make HTB report which link layer it assumes?
>>=20
>> I added some print debug statements in my patch, so you can see this =
in
>> the kernel log / dmesg.  Run activate the debugging print statments:
>>=20
>> mount -t debugfs none /sys/kernel/debug/
>> echo "func __detect_linklayer +p"
>>> /sys/kernel/debug/dynamic_debug/control
>>=20
>> Run your tc script, and run dmesg, or look in kernel-syslog.
>=20
> 	Ah, unfortunately I am not setup to build new kernels for the =
router I am testing on, so I would hereby like to beg Dave to include =
that patch in one of the next releases. Would it not a good idea to =
teach tc to report the link layer for HTB as it does for stab? Having to =
empirically figure out whether it is applied or not is somewhat =
cumbersome...
>=20
>=20
>>=20
>>=20
>>>>=20
>>>>>> Especially since the kernel already fudges
>>>>>> the packet size to account for the ethernet header and then some, =
so this
>>>>>> path should receive more scrutiny by virtue of having more users?
>>>>=20
>>>> As you mention, the default kernel path (not tc stab) fudges the =
packet
>>>> size for Ethernet headers, AND I made a mistake (back in approx =
2006,
>>>> sorry) that the "overhead" cannot be a negative number. =20
>>>=20
>>> 	Mmh, does this also apply to stab?
>>=20
>> This seems to be two question...
>>=20
>> Yes, the Ethernet header size gets adjusted/added before the "stab"
>> call.
>> For reference
>> See: net/core/dev.c function __dev_xmit_skb()
>> Call qdisc_pkt_len_init(skb); // adjust Ethernet and account for GSO
>> Call qdisc_calculate_pkt_len(skb, q); // is the stab call
>> (ps calls __qdisc_calculate_pkt_len() in net/sched/sch_api.c)
>>=20
>> The qdisc_pkt_len_init() call were introduced by Eric in
>> v3.9-rc1~139^2~411.
>=20
> 	So I look at 3.10 here:
>=20
> net/core/dev.c, qdisc_pkt_len_init
> line 2628: 	qdisc_skb_cb(skb)->pkt_len =3D skb->len;
> and in=20
> line 2650: qdisc_skb_cb(skb)->pkt_len +=3D (gso_segs - 1) * hdr_len;
> so the adjusted size does not seem to end in skb->len
>=20
>=20
> and then in=20
> net/sched/sch_api.c, __qdisc_calculate_pkt_len
> line440: pkt_len =3D skb->len + stab->szopts.overhead;
>=20
> So to my eyes this looks like stab is not honoring the changes made in =
qdisc_pkt_len_init, no? At least I fail to see where=20
> skb->len is assigned qdisc_skb_cb(skb)->pkt_len
> But I happily admit that I am truly a novice in these matters and =
easily intimidated by C code.
>=20
>=20
>>=20
>> Thus, in kernels >=3D 3.9, you would need to change/reduce your tc
>> "overhead" parameter with -14 bytes (iif you accounted encapsulated
>> Ethernet header before)
>=20
> 	That is what I thought before, but my kernel spelunking made me =
reconsider and switch to not subtract the 14 bytes since as I understand =
it the kernel actively does not do it if stab is used.
>=20
>>=20
>> The "overhead" of stab can be negative, so no problem here, in an =
"int"
>> for stab.
>>=20
>>=20
>>>> Meaning that
>>>> some ATM encap overheads simply cannot be configured correctly (as =
you
>>>> need to subtract the ethernet header).
>>>=20
>>> 	Yes, I see, luckily PPPoA and IPoA seem quite rare, and setting =
the overhead to be larger than it actually is is relatively benign, as =
it will overestimate packe size.


As a point of information, the entire UK uses PPPoA rather than PPPoE, =
and some hundreds of thousands of users IPoA.


>>>=20
>>>> (And its quite problematic to
>>>> change the kABI to allow for a negative overhead)
>>>=20
>>> 	Again I have no clue but overhead seems to be integer, not =
unsigned, so why can it not be negative?
>>=20
>> Nope, for reference in include/uapi/linux/pkt_sched.h
>>=20
>> This struct tc_ratespec is used by the normal "HTB/TBF" rate system,
>> notice "unsigned short	overhead".
>>=20
>> struct tc_ratespec {
>> 	unsigned char	cell_log;
>> 	__u8		linklayer; /* lower 4 bits */
>> 	unsigned short	overhead;
>> 	short		cell_align;
>> 	unsigned short	mpu;
>> 	__u32		rate;
>> };
>>=20
>>=20
>> This struct tc_sizespec is used by stab system, where the overhead is
>> an int.
>>=20
>> struct tc_sizespec {
>> 	unsigned char	cell_log;
>> 	unsigned char	size_log;
>> 	short		cell_align;
>> 	int		overhead;
>> 	unsigned int	linklayer;
>> 	unsigned int	mpu;
>> 	unsigned int	mtu;
>> 	unsigned int	tsize;
>> };
>=20
> 	Ah, good to know.
>=20
>>=20
>>=20
>>=20
>>=20
>>>>=20
>>>> Perhaps we should change to use "tc stab" for this reason.  But I'm =
not
>>>> sure "stab" does the right thing either, and its accuracy is also
>>>> limited as its actually also table based.
>>>=20
>>> 	But why should a table be problematic here? As long as we can =
assure the table is equal or larger to the largest packet we are golden. =
So either we do the manly and  stupid thing and go for 9000 byte jumbo =
packets for the table size. Or we assume that for the most part ATM =
users will art best use baby jumbo frames (I think BT does this to allow =
payload MTU 1500 in spite of PPPoE encapsulation overhead) but than we =
are quite fine with the default size table maxMTU of 2048 bytes, no?
>>=20
>>=20
>> It is the GSO problem that I'm worried about.  The kernel will bunch =
up
>> packets, and that caused the length calculation issue... just disable
>> GSO.
>>=20
>> ethtool -K eth63 gso off gro off tso off
>=20
> 	Oh, as always Dave has this tackled in cerowrt already, all =
offloads are off (at 16000Kbit/s 2500Kbit/s there should be no need for =
offloads nowadays I guess). But I see the issue now.=20
>=20
>=20
>>=20
>>=20
>>>> We could easily change the
>>>> kernel to perform the ATM cell overhead calc inside "stab", and we
>>>> should also fix the GSO packet overhead problem.
>>>> (for now remember to disable GSO packets when shaping)
>>>=20
>>> 	Yeah I stumbled over the fact that the stab mechanism does not =
honor the kernels earlier adjustments of packet length (but I seem to be =
unable to find the actual file and line where this initially is =
handeled). It would seem relatively easy to make stab take the earlier =
adjustment into account. Regarding GSO, I assumed that GSO will not play =
nicely with a AQM anyway as a single large packet will hog too much =
transfer time...
>>>=20
>>=20
>> Yes, just disable GSO ;-)
>=20
> 	Done.
>=20
>>=20
>>=20
>>>>=20
>>>>> It's my hope that the atm code works but is misconfigured. You can =
output
>>>>> the tc commands by overriding the TC variable with TC=3D"echo tc" =
and paste
>>>>> here.
>>>>=20
>>>> I also hope is a misconfig.  Please show us the config/script.
>>>=20
>>> 	Will do this later. I would be delighted if it is just me being =
stupid.
>>>=20
>>>>=20
>>>> I would appreciate a link to the scripts you are using... perhaps a =
git tree?
>>>=20
>>> 	Unfortunately I have no git tree and no experience with git. I =
do not think I will be able to set something up quickly. But I use a =
modified version of cerowrt's AQM scripts which I will post later.
>>>=20
>>=20
>> Someone just point me to the cerowrt git repo... please, and point me
>> at simple.qos script.
>>=20
> <simple.qos><functions.sh>
>>=20
>>=20
>> Did you add the "linklayer atm" yourself to Dave's script?
>=20
> 	Well, partly the option for HTB was already in his script but =
under tested, I changed the script to add stab and to allow easier =
configuration of overhead, mow, mtu and tsize (just for stab) from the =
guy, but the code is Dave's. I attached the scripts. functions.sh gets =
the values from the configuration GUI. I extended the way the linklayer =
option strings are created, but basically it is the same method that =
dave used. And I do see the right overhead values appear in "tc -d =
qdisc", so at least something is reaching HTB. Sorry, that I have no =
repository for easier access.
>=20
>=20
>=20
>=20
>>=20
>>=20
>>=20
>>=20
>>>>=20
>>>>>>      Now, I have been testing this using Dave's most recent =
cerowrt
>>>>>> alpha version with a 3.10.9 kernel on mips hardware, I think this =
kernel
>>>>>> should contain all htb fixes including commit 8a8e3d84b17 =
(net_sched:
>>>>>> restore "linklayer atm" handling) but am not fully sure.
>>>>>>=20
>>>>>=20
>>>>> It does.
>>>>=20
>>>> It have not hit the stable tree yet, but DaveM promised he would =
pass it along.
>>>>=20
>>>> It does seem Dave Taht have my patch applied:
>>>> =
http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/685-net_sched-re=
store-linklayer-atm-handling.patch
>>>=20
>>> 	Ah, good so it should have worked.
>>=20
>> It should...
>>=20
>>=20
>>>>=20
>>>>>> While I am not able to build kernels, it seems that I am able to =
quickly
>>>>>> test whether link layer adjustments work or not. SO aim happy to =
help where
>>>>>> I can :)
>>>>=20
>>>> So, what is you setup lab, that allow you to test this quickly?
>>>=20
>>> 	Oh, Dave and Toke are the giants on whose shoulders I stand here =
(thanks guys), all I bring to the table basically is the fact that I =
have an ATM carried ADSL2+ connection at home.
>>=20
>> I will soon have a ADSL lab again, so I try and reproduce your =
results.
>> Actually, almost while typing this email, the postman arrived at my
>> house and delivered a new ADSL modem... as a local Danish ISP
>> www.fullrate.dk have been so kind to give me a testline for free
>> (thanks fullrate!).
>=20
> 	This is great! Even though I am quite sure that no real DSL link =
is actually required to test the effect of the link layer adjustments.
>=20
>>=20
>>=20
>>> 	Anyway, my theory is that proper link layer adjustments should =
only show up if not performing these would make my traffic exceed my =
link-speed and hence accumulate in the DSL modems bloated buffers =
leading to measurable increases in latency. So I try to saturate the =
both up- and down-link while measuring latency und different conditions. =
SInce the worst case overhead of the ATM encapsulation approaches 50% =
(with best case being around 10%) I try to test the system while shaping =
to 95% percent of link rates where do expect to see an effect of the =
link layer adjustments and while shaping to 50% where do not expect to =
see an effect. And basically that seems to work.
>>=20
>>> 	Practically, I use Toke's netsurf-wrapper project with the RRUL =
test from my cerowrt router behind an ADSL2+ modem to a close netperf =
server in Germany. The link layer adjustments are configured in my =
cerowrt router, using Dave's simple.qos script (3 band HTB shaper with =
fq_codel on each leaf, taking my overhead of 40 bytes into account and =
optionally the link layer).
>>=20
>>> 	It turns out that this test nicely saturates my link with 4 up
>>> and 4 down TCP flows ad uses a train ping probes at 0.2 second =
period
>>> to assess the latency induced by saturating the links. Now I shape =
down
>>> to 95% and 50% of line rates and simply look at the ping RTT plot =
for
>>> different conditions. In my rig I see around 30ms ping RTT without
>>> load, 80ms with full saturation and no linklayer adjustments, and =
40ms
>>> with working link layer adjustments (hand in hand with slightly =
reduced
>>> TCP good put just as one would expect). In my testing so far =
activating
>>> the HTB link layer adjustments yielded the same 80ms delay I get
>>> without link layer adjustments. If I shape down to 50% of link rates
>>> HTB, stab and no link layer adjustments yield a ping RTT of ~40ms.
>>> Still with proper link layer adjustments the TCP good-put is reduced
>>> even at 50% shaping. As Dave explained with an unloaded swallow =
ermm,
>>> ping RTT and fq_codel's target set to 5ms the best case would be =
30ms +
>>> 2*5ms or 40ms, so I am pretty close to ideal with proper link layer
>>> adjustments.
>>>=20
>>=20
>>> 	I guess it should be possible to simply use the reduction in =
good-put as an easy indicator whether the link layer adjustments work or =
not. But to do this properly I would need to be able to control the size =
of the sent packets which I am not, at least not with RRUL. But I am =
quite sure real computer scientists could easily set something up to =
test the good-put through a shaping device at differently sized packet =
streams of the same bandwidth, but I digress.=20
>>>=20
>>> 	On the other hand I do not claim to be an expert in this field =
in any way and my measurement method might be flawed, if you think so =
please do not hesitate to let me know how I could improve it.
>>=20
>> I have a hard time following your description... sorry.
>=20
> Okay, I see, let me try to present the data in a more ordered fashion:
> BW: bandwidth [Kbit/s]
> LLAM =3D link-layer adjustment method
> LL: link layer
> GP: goodput [Kbit/s]
>=20
> #	shaped	downBW	(%)		upBW	(%)		LLAM	=
LL		ping RTT	downGP		upGP
> 1	no		16309		100		2544	100		=
none	none	300ms		10000 		1600
> 2	yes		14698		90		2430	95		=
none	none	80ms		13600		1800
> 3	yes		14698		90		2430	95		=
stab		adsl		40ms		11600		1600
> 4	yes		15494		95		2430	95		=
stab		adsl		42ms		12400		1600
> 5	yes		14698		90		2430	95		=
htb		adsl		75ms		13200		1600
>=20
> 2	yes		7349		45		1215	48		=
none	none	45ms		6800		1000
> 4	yes		7349		45		1215	48		=
stab		adsl		42ms		5800		800
> 5	yes		7349		45		1215	48		=
htb		adsl		45ms		6600		1000
>=20
> Notes: upGP is way noisier than downGP and therefore harder to =
estimate
>=20
> So condition# 3 and 4 show the best latency at high link saturation =
where link layer adjustments actually make a difference, by controlling =
whether the DSL modem will buffer or not
> At ~50% link saturation there is not much, if any effect of the link =
layer adjustments on latency, but it still leaves its hallmark on good =
put reduction. (The partial reduction for htb might be caused by the =
specification of 40 bytes of overhead which seem to have been honored).
> I take the disappearance of the latency effect at 50% as a control =
data point that shows my measurement approach seems sane enough.
> I hope this clears up the information I wanted to give you the first =
time around.
>=20
>=20
>=20
>>=20
>> So, did you get a working-low-latency setup by using 95% shaping and
>> "stab" linklayer adjustment?
>=20
> 	Yes. (With a 3 leaf HTB as shaper and fq_codel as disc)
>=20
> Best Regards
> 	Sebastian
>=20
>>=20
>> --=20
>> Best regards,
>> Jesper Dangaard Brouer
>> MSc.CS, Sr. Network Kernel Developer at Red Hat
>> Author of http://www.iptv-analyzer.org
>> LinkedIn: http://www.linkedin.com/in/brouer
>=20
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel