From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mout.gmx.net",
	Issuer "TeleSec ServerPass DE-1" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id B3A3B21F643
	for <cerowrt-devel@lists.bufferbloat.net>;
	Sat, 26 Jul 2014 15:39:34 -0700 (PDT)
Received: from hms-beagle.home.lan ([217.231.210.84]) by mail.gmx.com
	(mrgmx101) with ESMTPSA (Nemesis) id 0LtmK9-1WSWAD3poL-011DcO;
	Sun, 27 Jul 2014 00:39:25 +0200
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <alpine.DEB.2.02.1407261434570.19912@nftneq.ynat.uz>
Date: Sun, 27 Jul 2014 00:39:23 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <93489218-DB72-4A74-96A4-F95AF4800BBE@gmx.de>
References: <CACj-SW2xRzNJa_c7CyOGzY-Yvun7UjNyp0W0aeF5DjO_Guu=ag@mail.gmail.com>
	<13144.1406313454@turing-police.cc.vt.edu>
	<36889fad276c5cdd1cd083d1c83f2265@lang.hm>
	<2483CF77-EE7D-4D76-ACC8-5CBC75D093A7@gmx.de>
	<alpine.DEB.2.02.1407261322260.19912@nftneq.ynat.uz>
	<E0E72F83-F4E5-4912-A9D3-F421640B6C7F@gmx.de>
	<alpine.DEB.2.02.1407261434570.19912@nftneq.ynat.uz>
To: David Lang <david@lang.hm>
X-Mailer: Apple Mail (2.1878.6)
X-Provags-ID: V03:K0:e8s2WAPOa5hbf5E44H6R6jeVOZnry+51z0X/YgaDnxY/PzEvaEN
	TyQ5IwutsqLc0rr5qS6p7u13SIWj6oPx5MJvG1Ftmtl7EaaoU7OX70jaN2jtCKDHU0nUqSD
	jeW3zZxujc0Mg15baAE/HGcjXLArfDgulTdj1dBZ04MIuLHOkDIlZ19/mpdOV+Shq5h9SrL
	R74cvgt07KRzZhYwVtywA==
Cc: cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] Ideas on how to simplify and popularize
	bufferbloat control for consideration.
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sat, 26 Jul 2014 22:39:35 -0000

Hi David,

On Jul 26, 2014, at 23:45 , David Lang <david@lang.hm> wrote:

> On Sat, 26 Jul 2014, Sebastian Moeller wrote:
>=20
>> On Jul 26, 2014, at 22:39 , David Lang <david@lang.hm> wrote:
>>=20
>>> by how much tuning is required, I wasn't meaning how frequently to =
tune, but how close default settings can come to the performance of a =
expertly tuned setup.
>>=20
>> 	Good question.
>>=20
>>>=20
>>> Ideally the tuning takes into account the characteristics of the =
hardware of the link layer. If it's IP encapsulated in something else =
(ATM, PPPoE, VPN, VLAN tagging, ethernet with jumbo packet support for =
example), then you have overhead from the encapsulation that you would =
ideally take into account when tuning things.
>>>=20
>>> the question I'm talking about below is how much do you loose =
compared to the idea if you ignore this sort of thing and just assume =
that the wire is dumb and puts the bits on them as you send them? By =
dumb I mean don't even allow for inter-packet gaps, don't measure the =
bandwidth, don't try to pace inbound connections by the timing of your =
acks, etc. Just run BQL and fq_codel and start the BQL sizes based on =
the wire speed of your link (Gig-E on the 3800) and shrink them based on =
long-term passive observation of the sender.
>>=20
>> 	As data talks I just did a quick experiment with my ADSL2+ koine =
at home. The solid lines in the attached plot show the results for =
proper shaping with SQM (shaping to 95% of del link rates of downstream =
and upstream while taking the link layer properties, that is ATM =
encapsulation and per packet overhead into account) the broken lines =
show the same system with just the link layer adjustments and per packet =
overhead adjustments disabled, but still shaping to 95% of link rate =
(this is roughly equivalent to 15% underestimation of the packet size). =
The actual theist is netperf-wrappers RRUL (4 tcp streams up, 4 tcp =
steams down while measuring latency with ping and UDP probes). As you =
can see from the plot just getting the link layer encapsulation wrong =
destroys latency under load badly. The host is ~52ms RTT away, and with =
fq_codel the ping time per leg is just increased one codel target of 5ms =
each resulting in an modest latency increase of ~10ms with proper =
shaping for a total of ~65ms, with improper shaping RTTs increase to =
~95ms (they almost double), so RTT increases by ~43ms. Also note how the =
extremes for the broken lines are much worse than for the solid lines. =
In short I would estimate that a slight misjudgment (15%) results in =
almost 80% increase of latency under load. In other words getting the =
rates right matters a lot. (I should also note that in my setup there is =
a secondary router that limits RTT to max 300ms, otherwise the broken =
lines might look even worse...)
>=20
> what is the latency like without BQL and codel? the pre-bufferbloat =
version? (without any traffic shaping)

	So I just disabled SQM and the plot looks almost exactly like =
the broken line plot I sent before (~95ms RTT up from 55ms unloaded, =
with single pings delayed for > 1000ms, just as with the broken line, =
with proper shaping even extreme pings stay < 100ms). But as I said =
before I need to run through my ISP supplied primary router (not just a =
dumb modem) that also tries to bound the latencies under load to some =
degree. Actually I just repeated the test connected directly to the =
primary router and get the same ~95ms average ping time with frequent =
extremes > 1000ms, so it looks like just getting the shaping wrong by =
15% eradicates the buffer de-bloating efforts completely...

>=20
> I agree that going from 65ms to 95ms seems significant, but if the =
stock version goes into up above 1000ms, then I think we are talking =
about things that are =91close'

	Well if we include outliers (and we should as enough outliers =
will degrade the FPS and voip suitability of an otherwise responsive =
system quickly) stock and improper shaping  are in the >1000ms worst =
case range, while proper SQM bounds this to 100ms.=20

>=20
> assuming that latency under load without the improvents got >1000ms
>=20
> fast-slow (in ms)
> ideal=3D10
> untuned=3D43
> bloated > 1000

	The sign seems off as fast < slow? I like this best ;)

>=20
> fast/slow
> ideal =3D 1.25
> untuned =3D 1.83
> bloated > 19

	But Fast < Slow and hence this ration should be <0?

> slow/fast
> ideal =3D 0.8
> untuned =3D 0.55
> bloated =3D 0.05
>=20

	and this >0?

> rather than looking at how much worse it is than the ideal, look at =
how much closer it is to the ideal than to the bloated version.
>=20
> David Lang
>=20