From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <versteb@cisco.com>
Received: from alln-iport-2.cisco.com (alln-iport-2.cisco.com [173.37.142.89])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "alln-iport.cisco.com",
	Issuer "HydrantID SSL ICA G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 9044E21F1A9
	for <bloat@lists.bufferbloat.net>; Fri, 27 Mar 2015 16:18:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
	d=cisco.com; i=@cisco.com; l=16024; q=dns/txt;
	s=iport; t=1427498319; x=1428707919;
	h=from:to:cc:subject:date:message-id:references:
	in-reply-to:mime-version;
	bh=DxuGkekQK2ZXj3qkHjSvMNxE1iSGKHmqxp10b+SbTGg=;
	b=VR0dC5B6lY4P8MsTv/x6AqMzDl/X6UFivOZ3E1OJJTb4HLcesQDk9kVh
	2PSG5zgPCeKBriTINMZl7w1TP3+zg5KF99E3YkYQmLyjOS4BpMqnsnC1W
	z7BoTf7zst50ROxdTCC7WP/GiNdgYX7hmnjPnvTw8o7lF7ImxEOrEmHlN 8=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0AQBQDH5BVV/5BdJa1cgkNDUloExUcBCYV1AoE2TAEBAQEBAX2EFAEBAQIBAQECKkwFBwQCAQgRBAEBCx0HKAoUCQgCBAoEBQiIHwgNzGoBAQEBAQEBAQEBAQEBAQEBAQEBAQETBIopf4RHLQQHBoMRgRYFhQ2LSoNvg1uDQYodhTGDSCKDbm8BgUN/AQEB
X-IronPort-AV: E=Sophos;i="5.11,481,1422921600"; 
	d="scan'208,217";a="136123151"
Received: from rcdn-core-8.cisco.com ([173.37.93.144])
	by alln-iport-2.cisco.com with ESMTP; 27 Mar 2015 23:18:09 +0000
Received: from xhc-rcd-x11.cisco.com (xhc-rcd-x11.cisco.com [173.37.183.85])
	by rcdn-core-8.cisco.com (8.14.5/8.14.5) with ESMTP id t2RNI8fq016028
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL);
	Fri, 27 Mar 2015 23:18:08 GMT
Received: from xmb-aln-x05.cisco.com ([169.254.11.172]) by
	xhc-rcd-x11.cisco.com ([173.37.183.85]) with mapi id 14.03.0195.001;
	Fri, 27 Mar 2015 18:18:08 -0500
From: "Bill Ver Steeg (versteb)" <versteb@cisco.com>
To: David Lang <david@lang.hm>
Thread-Topic: [Bloat] setting queue depth on tail drop configurations of
	pfifo_fast
Thread-Index: AQHQaNmvE3sdnewkUkaL8R1sg/Nw6Z0w4rEQgABfCQD//689oA==
Date: Fri, 27 Mar 2015 23:18:07 +0000
Message-ID: <AE7F97DB5FEE054088D82E836BD15BE9319903D9@xmb-aln-x05.cisco.com>
References: <AE7F97DB5FEE054088D82E836BD15BE9319902F7@xmb-aln-x05.cisco.com>
	<alpine.DEB.2.02.1503271452120.19390@nftneq.ynat.uz>
	<AE7F97DB5FEE054088D82E836BD15BE93199035E@xmb-aln-x05.cisco.com>
	<alpine.DEB.2.02.1503271517110.19390@nftneq.ynat.uz>
In-Reply-To: <alpine.DEB.2.02.1503271517110.19390@nftneq.ynat.uz>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.117.75.44]
Content-Type: multipart/alternative;
	boundary="_000_AE7F97DB5FEE054088D82E836BD15BE9319903D9xmbalnx05ciscoc_"
MIME-Version: 1.0
Cc: "bloat@lists.bufferbloat.net" <bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] setting queue depth on tail drop configurations of
 pfifo_fast
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Fri, 27 Mar 2015 23:18:38 -0000

--_000_AE7F97DB5FEE054088D82E836BD15BE9319903D9xmbalnx05ciscoc_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Dave Lang-


Yup, you got the intent.


The ABR video delivery stack is actually one level more complex. The applic=
ation uses plain old HTTP to receive N=3D=3D2 second chunks of video, which=
 in turn uses TCP to get the data, which in turn interacts with the various=
 queuing mechanisms, yada, yada, yada. So, the application rate adaptation =
logic is using the HTTP transfer rate to decide whether to upshift to a hig=
her video rate, downshift to a lower video rate, or stay at the current vid=
eo rate at each chunk boundary.


There are several application layer algorithms in use (Netflix, MPEG DASH, =
Apple, Microsoft, etc), and many of them use more than one TCP/HTTP session=
 to get chunks. Lots of moving parts, and IMHO most of these developers are=
 more concerned with getting the best possible throughput than being bloat-=
friendly. Driving the network at the perceived available line rate for hour=
s at a time is simply not network friendly.....


Clearly, the newer AQM algorithms will handle these types of aggressive ABR=
 algorithms better. There also may be a way to tweak the ABR algorithm to "=
do the right thing" and make the system work better - both from a "make my =
video better" standpoint and a "don't impact cross traffic" standpoint. As =
a start, I am thinking of ways to keep the sending rate between the max vid=
eo rate and the (perceived) network rate. This does impact how such a flow =
competes with other flows, and


Regarding peeking into the kernel ----- The overall design of the existing =
systems assumes that they need to run on several OSes/platforms, and theref=
ore they (generally) do not peak into the kernel. I have done some work tha=
t does look into the kernel to examine TCP receive queue sizes ---  https:/=
/smartech.gatech.edu/bitstream/handle/1853/45059/GT-CS-12-07.pdf -- and it =
worked pretty well. That scheme would be difficult to productize, and I am =
thinking about server-based methods in addition to client based methods to =
keep out congestion jail. Perhaps using HTTP pragmas to have the client sig=
nal the desired send rate to the HTTP server.

Bill Ver Steeg


-----Original Message-----
From: David Lang [mailto:david@lang.hm]
Sent: Friday, March 27, 2015 6:46 PM
To: Bill Ver Steeg (versteb)
Cc: bloat@lists.bufferbloat.net
Subject: RE: [Bloat] setting queue depth on tail drop configurations of pfi=
fo_fast


On Fri, 27 Mar 2015, Bill Ver Steeg (versteb) wrote:


> For this very specific test, I am doing one-way netperf-wrapper packet

> tests that will (almost) always be sending 1500 byte packets. I am

> then running some ABR traffic cross traffic to see how it responds to

> FQ_AQM and AQM (where AQM =3D=3D Codel and PIE). I am using the pfifo_fas=
t

> as a baseline. The Codel, FQ_codel, PIE and FQ_PIE stuff is working

> fine. I need to tweak the pfifo_fast queue length to do some comparisons.

>

> One of the test scenarios is a 3 Mbps ABR video flow on a 4 Mbps link,

> with and without cross traffic. I have already done what you

> suggested, and the ABR traffic drives the pfifo_fast code into severe

> congestion (even with no cross traffic), with a 3 second bloat. This

> is a bit surprising until you think about how the ABR code fills its

> video buffer at startup and then during steady state playout. I will

> send a detailed note once I get a chance to write it up properly.

>

> I would like to reduce the tail drop queue size to 100 packets (down

> from the default of 1000) and see how that impacts the test. 3 seconds

> of bloat is pretty bad, and I would like to compare how ABR works at

> at 1 second and at

> 200-300 ms.


I think the real question is what are you trying to find out?


No matter how you fiddle with the queue size, we know it's not going to wor=
k well. Without using BQL, if you have a queue short enough to not cause ho=
rrific bloat when under load with large packets, it's not going to be long =
enough to keep the link busy with small packets.


If you are trying to do A/B comparisons to show that this doesn't work, tha=
t's one thing (and it sounds like you have already done so). But if you are=
 trying to make fixed size buffers work well, we don't think that it can be=
 done (not just that we have better ideas now, but the 'been there, tried t=
hat, nothing worked' side of things)


Even with 100 packet queue lengths you can easily get bad latencies under l=
oad.


re-reading your post for the umpteenth time, here's what I think I may be s=
eeing.


you are working on developing video streaming software that can adapt the b=
it rate of the streaming video to have it fit within the available bandwidt=
h. You are trying to see how this interacts with the different queuing opti=
ons.


Is this a good summary?


If so, then you are basically wanting to do the same thing that the TCP sta=
ck is doing and when you see a dropped packet or ECN tagged packet, slow do=
wn the bit rate of the media that you are streaming so that it will use les=
s bandwidth.


This sounds like an extremely interesting thing to do, it will be interesti=
ng to see the response from folks who know the deeper levels of the OS as t=
o what options you have to learn that such events have taken place.


David Lang

--_000_AE7F97DB5FEE054088D82E836BD15BE9319903D9xmbalnx05ciscoc_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
	{mso-style-priority:99;
	mso-style-link:"Plain Text Char";
	margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
span.PlainTextChar
	{mso-style-name:"Plain Text Char";
	mso-style-priority:99;
	mso-style-link:"Plain Text";
	font-family:"Calibri","sans-serif";}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri","sans-serif";}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div class=3D"WordSection1">
<p class=3D"MsoPlainText">Dave Lang-<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Yup, you got the intent. <o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">The ABR video delivery stack is actually one leve=
l more complex. The application uses plain old HTTP to receive N=3D=3D2 sec=
ond chunks of video, which in turn uses TCP to get the data, which in turn =
interacts with the various queuing mechanisms,
 yada, yada, yada. So, the application rate adaptation logic is using the H=
TTP transfer rate to decide whether to upshift to a higher video rate, down=
shift to a lower video rate, or stay at the current video rate at each chun=
k boundary.
<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">There are several application layer algorithms in=
 use (Netflix, MPEG DASH, Apple, Microsoft, etc), and many of them use more=
 than one TCP/HTTP session to get chunks. Lots of moving parts, and IMHO mo=
st of these developers are more concerned
 with getting the best possible throughput than being bloat-friendly. Drivi=
ng the network at the perceived available line rate for hours at a time is =
simply not network friendly.....
<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Clearly, the newer AQM algorithms will handle the=
se types of aggressive ABR algorithms better. There also may be a way to tw=
eak the ABR algorithm to &quot;do the right thing&quot; and make the system=
 work better - both from a &quot;make my video better&quot;
 standpoint and a &quot;don&#8217;t impact cross traffic&quot; standpoint. =
As a start, I am thinking of ways to keep the sending rate between the max =
video rate and the (perceived) network rate. This does impact how such a fl=
ow competes with other flows, and &nbsp;&nbsp;<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Regarding peeking into the kernel ----- The overa=
ll design of the existing systems assumes that they need to run on several =
OSes/platforms, and therefore they (generally) do not peak into the kernel.=
 I have done some work that does look
 into the kernel to examine TCP receive queue sizes ---&nbsp; <a href=3D"ht=
tps://smartech.gatech.edu/bitstream/handle/1853/45059/GT-CS-12-07.pdf">
https://smartech.gatech.edu/bitstream/handle/1853/45059/GT-CS-12-07.pdf</a>=
 -- and it worked pretty well. That scheme would be difficult to productize=
, and I am thinking about server-based methods in addition to client based =
methods to keep out congestion jail.
 Perhaps using HTTP pragmas to have the client signal the desired send rate=
 to the HTTP server.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p></o:p></p>
<p class=3D"MsoPlainText">Bill Ver Steeg<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">-----Original Message-----<br>
From: David Lang [mailto:david@lang.hm] <br>
Sent: Friday, March 27, 2015 6:46 PM<br>
To: Bill Ver Steeg (versteb)<br>
Cc: bloat@lists.bufferbloat.net<br>
Subject: RE: [Bloat] setting queue depth on tail drop configurations of pfi=
fo_fast</p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">On Fri, 27 Mar 2015, Bill Ver Steeg (versteb) wro=
te:<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">&gt; For this very specific test, I am doing one-=
way netperf-wrapper packet
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; tests that will (almost) always be sending 1=
500 byte packets. I am
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; then running some ABR traffic cross traffic =
to see how it responds to
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; FQ_AQM and AQM (where AQM =3D=3D Codel and P=
IE). I am using the pfifo_fast
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; as a baseline. The Codel, FQ_codel, PIE and =
FQ_PIE stuff is working
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; fine. I need to tweak the pfifo_fast queue l=
ength to do some comparisons.<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt;<o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">&gt; One of the test scenarios is a 3 Mbps ABR vi=
deo flow on a 4 Mbps link,
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; with and without cross traffic. I have alrea=
dy done what you
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; suggested, and the ABR traffic drives the pf=
ifo_fast code into severe
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; congestion (even with no cross traffic), wit=
h a 3 second bloat. This
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; is a bit surprising until you think about ho=
w the ABR code fills its
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; video buffer at startup and then during stea=
dy state playout. I will
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; send a detailed note once I get a chance to =
write it up properly.<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt;<o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">&gt; I would like to reduce the tail drop queue s=
ize to 100 packets (down
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; from the default of 1000) and see how that i=
mpacts the test. 3 seconds
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; of bloat is pretty bad, and I would like to =
compare how ABR works at
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; at 1 second and at<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; 200-300 ms.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">I think the real question is what are you trying =
to find out?<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">No matter how you fiddle with the queue size, we =
know it's not going to work well. Without using BQL, if you have a queue sh=
ort enough to not cause horrific bloat when under load with large packets, =
it's not going to be long enough to
 keep the link busy with small packets.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">If you are trying to do A/B comparisons to show t=
hat this doesn't work, that's one thing (and it sounds like you have alread=
y done so). But if you are trying to make fixed size buffers work well, we =
don't think that it can be done (not
 just that we have better ideas now, but the 'been there, tried that, nothi=
ng worked' side of things)<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Even with 100 packet queue lengths you can easily=
 get bad latencies under load.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">re-reading your post for the umpteenth time, here=
's what I think I may be seeing.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">you are working on developing video streaming sof=
tware that can adapt the bit rate of the streaming video to have it fit wit=
hin the available bandwidth. You are trying to see how this interacts with =
the different queuing options.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Is this a good summary?<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">If so, then you are basically wanting to do the s=
ame thing that the TCP stack is doing and when you see a dropped packet or =
ECN tagged packet, slow down the bit rate of the media that you are streami=
ng so that it will use less bandwidth.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">This sounds like an extremely interesting thing t=
o do, it will be interesting to see the response from folks who know the de=
eper levels of the OS as to what options you have to learn that such events=
 have taken place.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">David Lang<o:p></o:p></p>
</div>
</body>
</html>

--_000_AE7F97DB5FEE054088D82E836BD15BE9319903D9xmbalnx05ciscoc_--