From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from alln-iport-2.cisco.com (alln-iport-2.cisco.com [173.37.142.89]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "alln-iport.cisco.com", Issuer "HydrantID SSL ICA G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 9044E21F1A9 for ; Fri, 27 Mar 2015 16:18:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=16024; q=dns/txt; s=iport; t=1427498319; x=1428707919; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=DxuGkekQK2ZXj3qkHjSvMNxE1iSGKHmqxp10b+SbTGg=; b=VR0dC5B6lY4P8MsTv/x6AqMzDl/X6UFivOZ3E1OJJTb4HLcesQDk9kVh 2PSG5zgPCeKBriTINMZl7w1TP3+zg5KF99E3YkYQmLyjOS4BpMqnsnC1W z7BoTf7zst50ROxdTCC7WP/GiNdgYX7hmnjPnvTw8o7lF7ImxEOrEmHlN 8=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0AQBQDH5BVV/5BdJa1cgkNDUloExUcBCYV1AoE2TAEBAQEBAX2EFAEBAQIBAQECKkwFBwQCAQgRBAEBCx0HKAoUCQgCBAoEBQiIHwgNzGoBAQEBAQEBAQEBAQEBAQEBAQEBAQETBIopf4RHLQQHBoMRgRYFhQ2LSoNvg1uDQYodhTGDSCKDbm8BgUN/AQEB X-IronPort-AV: E=Sophos;i="5.11,481,1422921600"; d="scan'208,217";a="136123151" Received: from rcdn-core-8.cisco.com ([173.37.93.144]) by alln-iport-2.cisco.com with ESMTP; 27 Mar 2015 23:18:09 +0000 Received: from xhc-rcd-x11.cisco.com (xhc-rcd-x11.cisco.com [173.37.183.85]) by rcdn-core-8.cisco.com (8.14.5/8.14.5) with ESMTP id t2RNI8fq016028 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Fri, 27 Mar 2015 23:18:08 GMT Received: from xmb-aln-x05.cisco.com ([169.254.11.172]) by xhc-rcd-x11.cisco.com ([173.37.183.85]) with mapi id 14.03.0195.001; Fri, 27 Mar 2015 18:18:08 -0500 From: "Bill Ver Steeg (versteb)" To: David Lang Thread-Topic: [Bloat] setting queue depth on tail drop configurations of pfifo_fast Thread-Index: AQHQaNmvE3sdnewkUkaL8R1sg/Nw6Z0w4rEQgABfCQD//689oA== Date: Fri, 27 Mar 2015 23:18:07 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.117.75.44] Content-Type: multipart/alternative; boundary="_000_AE7F97DB5FEE054088D82E836BD15BE9319903D9xmbalnx05ciscoc_" MIME-Version: 1.0 Cc: "bloat@lists.bufferbloat.net" Subject: Re: [Bloat] setting queue depth on tail drop configurations of pfifo_fast X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2015 23:18:38 -0000 --_000_AE7F97DB5FEE054088D82E836BD15BE9319903D9xmbalnx05ciscoc_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Dave Lang- Yup, you got the intent. The ABR video delivery stack is actually one level more complex. The applic= ation uses plain old HTTP to receive N=3D=3D2 second chunks of video, which= in turn uses TCP to get the data, which in turn interacts with the various= queuing mechanisms, yada, yada, yada. So, the application rate adaptation = logic is using the HTTP transfer rate to decide whether to upshift to a hig= her video rate, downshift to a lower video rate, or stay at the current vid= eo rate at each chunk boundary. There are several application layer algorithms in use (Netflix, MPEG DASH, = Apple, Microsoft, etc), and many of them use more than one TCP/HTTP session= to get chunks. Lots of moving parts, and IMHO most of these developers are= more concerned with getting the best possible throughput than being bloat-= friendly. Driving the network at the perceived available line rate for hour= s at a time is simply not network friendly..... Clearly, the newer AQM algorithms will handle these types of aggressive ABR= algorithms better. There also may be a way to tweak the ABR algorithm to "= do the right thing" and make the system work better - both from a "make my = video better" standpoint and a "don't impact cross traffic" standpoint. As = a start, I am thinking of ways to keep the sending rate between the max vid= eo rate and the (perceived) network rate. This does impact how such a flow = competes with other flows, and Regarding peeking into the kernel ----- The overall design of the existing = systems assumes that they need to run on several OSes/platforms, and theref= ore they (generally) do not peak into the kernel. I have done some work tha= t does look into the kernel to examine TCP receive queue sizes --- https:/= /smartech.gatech.edu/bitstream/handle/1853/45059/GT-CS-12-07.pdf -- and it = worked pretty well. That scheme would be difficult to productize, and I am = thinking about server-based methods in addition to client based methods to = keep out congestion jail. Perhaps using HTTP pragmas to have the client sig= nal the desired send rate to the HTTP server. Bill Ver Steeg -----Original Message----- From: David Lang [mailto:david@lang.hm] Sent: Friday, March 27, 2015 6:46 PM To: Bill Ver Steeg (versteb) Cc: bloat@lists.bufferbloat.net Subject: RE: [Bloat] setting queue depth on tail drop configurations of pfi= fo_fast On Fri, 27 Mar 2015, Bill Ver Steeg (versteb) wrote: > For this very specific test, I am doing one-way netperf-wrapper packet > tests that will (almost) always be sending 1500 byte packets. I am > then running some ABR traffic cross traffic to see how it responds to > FQ_AQM and AQM (where AQM =3D=3D Codel and PIE). I am using the pfifo_fas= t > as a baseline. The Codel, FQ_codel, PIE and FQ_PIE stuff is working > fine. I need to tweak the pfifo_fast queue length to do some comparisons. > > One of the test scenarios is a 3 Mbps ABR video flow on a 4 Mbps link, > with and without cross traffic. I have already done what you > suggested, and the ABR traffic drives the pfifo_fast code into severe > congestion (even with no cross traffic), with a 3 second bloat. This > is a bit surprising until you think about how the ABR code fills its > video buffer at startup and then during steady state playout. I will > send a detailed note once I get a chance to write it up properly. > > I would like to reduce the tail drop queue size to 100 packets (down > from the default of 1000) and see how that impacts the test. 3 seconds > of bloat is pretty bad, and I would like to compare how ABR works at > at 1 second and at > 200-300 ms. I think the real question is what are you trying to find out? No matter how you fiddle with the queue size, we know it's not going to wor= k well. Without using BQL, if you have a queue short enough to not cause ho= rrific bloat when under load with large packets, it's not going to be long = enough to keep the link busy with small packets. If you are trying to do A/B comparisons to show that this doesn't work, tha= t's one thing (and it sounds like you have already done so). But if you are= trying to make fixed size buffers work well, we don't think that it can be= done (not just that we have better ideas now, but the 'been there, tried t= hat, nothing worked' side of things) Even with 100 packet queue lengths you can easily get bad latencies under l= oad. re-reading your post for the umpteenth time, here's what I think I may be s= eeing. you are working on developing video streaming software that can adapt the b= it rate of the streaming video to have it fit within the available bandwidt= h. You are trying to see how this interacts with the different queuing opti= ons. Is this a good summary? If so, then you are basically wanting to do the same thing that the TCP sta= ck is doing and when you see a dropped packet or ECN tagged packet, slow do= wn the bit rate of the media that you are streaming so that it will use les= s bandwidth. This sounds like an extremely interesting thing to do, it will be interesti= ng to see the response from folks who know the deeper levels of the OS as t= o what options you have to learn that such events have taken place. David Lang --_000_AE7F97DB5FEE054088D82E836BD15BE9319903D9xmbalnx05ciscoc_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Dave Lang-

 

Yup, you got the intent.

 

The ABR video delivery stack is actually one leve= l more complex. The application uses plain old HTTP to receive N=3D=3D2 sec= ond chunks of video, which in turn uses TCP to get the data, which in turn = interacts with the various queuing mechanisms, yada, yada, yada. So, the application rate adaptation logic is using the H= TTP transfer rate to decide whether to upshift to a higher video rate, down= shift to a lower video rate, or stay at the current video rate at each chun= k boundary.

 

There are several application layer algorithms in= use (Netflix, MPEG DASH, Apple, Microsoft, etc), and many of them use more= than one TCP/HTTP session to get chunks. Lots of moving parts, and IMHO mo= st of these developers are more concerned with getting the best possible throughput than being bloat-friendly. Drivi= ng the network at the perceived available line rate for hours at a time is = simply not network friendly.....

 

Clearly, the newer AQM algorithms will handle the= se types of aggressive ABR algorithms better. There also may be a way to tw= eak the ABR algorithm to "do the right thing" and make the system= work better - both from a "make my video better" standpoint and a "don’t impact cross traffic" standpoint. = As a start, I am thinking of ways to keep the sending rate between the max = video rate and the (perceived) network rate. This does impact how such a fl= ow competes with other flows, and   

 

Regarding peeking into the kernel ----- The overa= ll design of the existing systems assumes that they need to run on several = OSes/platforms, and therefore they (generally) do not peak into the kernel.= I have done some work that does look into the kernel to examine TCP receive queue sizes ---  https://smartech.gatech.edu/bitstream/handle/1853/45059/GT-CS-12-07.pdf= -- and it worked pretty well. That scheme would be difficult to productize= , and I am thinking about server-based methods in addition to client based = methods to keep out congestion jail. Perhaps using HTTP pragmas to have the client signal the desired send rate= to the HTTP server.

Bill Ver Steeg

 

 

 

-----Original Message-----
From: David Lang [mailto:david@lang.hm]
Sent: Friday, March 27, 2015 6:46 PM
To: Bill Ver Steeg (versteb)
Cc: bloat@lists.bufferbloat.net
Subject: RE: [Bloat] setting queue depth on tail drop configurations of pfi= fo_fast

 

On Fri, 27 Mar 2015, Bill Ver Steeg (versteb) wro= te:

 

> For this very specific test, I am doing one-= way netperf-wrapper packet

> tests that will (almost) always be sending 1= 500 byte packets. I am

> then running some ABR traffic cross traffic = to see how it responds to

> FQ_AQM and AQM (where AQM =3D=3D Codel and P= IE). I am using the pfifo_fast

> as a baseline. The Codel, FQ_codel, PIE and = FQ_PIE stuff is working

> fine. I need to tweak the pfifo_fast queue l= ength to do some comparisons.

> 

> One of the test scenarios is a 3 Mbps ABR vi= deo flow on a 4 Mbps link,

> with and without cross traffic. I have alrea= dy done what you

> suggested, and the ABR traffic drives the pf= ifo_fast code into severe

> congestion (even with no cross traffic), wit= h a 3 second bloat. This

> is a bit surprising until you think about ho= w the ABR code fills its

> video buffer at startup and then during stea= dy state playout. I will

> send a detailed note once I get a chance to = write it up properly.

> 

> I would like to reduce the tail drop queue s= ize to 100 packets (down

> from the default of 1000) and see how that i= mpacts the test. 3 seconds

> of bloat is pretty bad, and I would like to = compare how ABR works at

> at 1 second and at

> 200-300 ms.

 

I think the real question is what are you trying = to find out?

 

No matter how you fiddle with the queue size, we = know it's not going to work well. Without using BQL, if you have a queue sh= ort enough to not cause horrific bloat when under load with large packets, = it's not going to be long enough to keep the link busy with small packets.

 

If you are trying to do A/B comparisons to show t= hat this doesn't work, that's one thing (and it sounds like you have alread= y done so). But if you are trying to make fixed size buffers work well, we = don't think that it can be done (not just that we have better ideas now, but the 'been there, tried that, nothi= ng worked' side of things)

 

Even with 100 packet queue lengths you can easily= get bad latencies under load.

 

 

re-reading your post for the umpteenth time, here= 's what I think I may be seeing.

 

you are working on developing video streaming sof= tware that can adapt the bit rate of the streaming video to have it fit wit= hin the available bandwidth. You are trying to see how this interacts with = the different queuing options.

 

Is this a good summary?

 

 

If so, then you are basically wanting to do the s= ame thing that the TCP stack is doing and when you see a dropped packet or = ECN tagged packet, slow down the bit rate of the media that you are streami= ng so that it will use less bandwidth.

 

This sounds like an extremely interesting thing t= o do, it will be interesting to see the response from folks who know the de= eper levels of the OS as to what options you have to learn that such events= have taken place.

 

David Lang

--_000_AE7F97DB5FEE054088D82E836BD15BE9319903D9xmbalnx05ciscoc_--