[Bloat] setting queue depth on tail drop configurations of pfifo_fast

Fri Mar 27 19:18:07 EDT 2015

Dave Lang-

Yup, you got the intent.

The ABR video delivery stack is actually one level more complex. The application uses plain old HTTP to receive N==2 second chunks of video, which in turn uses TCP to get the data, which in turn interacts with the various queuing mechanisms, yada, yada, yada. So, the application rate adaptation logic is using the HTTP transfer rate to decide whether to upshift to a higher video rate, downshift to a lower video rate, or stay at the current video rate at each chunk boundary.

There are several application layer algorithms in use (Netflix, MPEG DASH, Apple, Microsoft, etc), and many of them use more than one TCP/HTTP session to get chunks. Lots of moving parts, and IMHO most of these developers are more concerned with getting the best possible throughput than being bloat-friendly. Driving the network at the perceived available line rate for hours at a time is simply not network friendly.....

Clearly, the newer AQM algorithms will handle these types of aggressive ABR algorithms better. There also may be a way to tweak the ABR algorithm to "do the right thing" and make the system work better - both from a "make my video better" standpoint and a "don't impact cross traffic" standpoint. As a start, I am thinking of ways to keep the sending rate between the max video rate and the (perceived) network rate. This does impact how such a flow competes with other flows, and

Regarding peeking into the kernel ----- The overall design of the existing systems assumes that they need to run on several OSes/platforms, and therefore they (generally) do not peak into the kernel. I have done some work that does look into the kernel to examine TCP receive queue sizes ---  https://smartech.gatech.edu/bitstream/handle/1853/45059/GT-CS-12-07.pdf -- and it worked pretty well. That scheme would be difficult to productize, and I am thinking about server-based methods in addition to client based methods to keep out congestion jail. Perhaps using HTTP pragmas to have the client signal the desired send rate to the HTTP server.

Bill Ver Steeg

-----Original Message-----
From: David Lang [mailto:david at lang.hm]
Sent: Friday, March 27, 2015 6:46 PM
To: Bill Ver Steeg (versteb)
Cc: bloat at lists.bufferbloat.net
Subject: RE: [Bloat] setting queue depth on tail drop configurations of pfifo_fast

On Fri, 27 Mar 2015, Bill Ver Steeg (versteb) wrote:

> For this very specific test, I am doing one-way netperf-wrapper packet

> tests that will (almost) always be sending 1500 byte packets. I am

> then running some ABR traffic cross traffic to see how it responds to

> FQ_AQM and AQM (where AQM == Codel and PIE). I am using the pfifo_fast

> as a baseline. The Codel, FQ_codel, PIE and FQ_PIE stuff is working

> fine. I need to tweak the pfifo_fast queue length to do some comparisons.

>

> One of the test scenarios is a 3 Mbps ABR video flow on a 4 Mbps link,

> with and without cross traffic. I have already done what you

> suggested, and the ABR traffic drives the pfifo_fast code into severe

> congestion (even with no cross traffic), with a 3 second bloat. This

> is a bit surprising until you think about how the ABR code fills its

> video buffer at startup and then during steady state playout. I will

> send a detailed note once I get a chance to write it up properly.

>

> I would like to reduce the tail drop queue size to 100 packets (down

> from the default of 1000) and see how that impacts the test. 3 seconds

> of bloat is pretty bad, and I would like to compare how ABR works at

> at 1 second and at

> 200-300 ms.

I think the real question is what are you trying to find out?

No matter how you fiddle with the queue size, we know it's not going to work well. Without using BQL, if you have a queue short enough to not cause horrific bloat when under load with large packets, it's not going to be long enough to keep the link busy with small packets.

If you are trying to do A/B comparisons to show that this doesn't work, that's one thing (and it sounds like you have already done so). But if you are trying to make fixed size buffers work well, we don't think that it can be done (not just that we have better ideas now, but the 'been there, tried that, nothing worked' side of things)

Even with 100 packet queue lengths you can easily get bad latencies under load.

re-reading your post for the umpteenth time, here's what I think I may be seeing.

you are working on developing video streaming software that can adapt the bit rate of the streaming video to have it fit within the available bandwidth. You are trying to see how this interacts with the different queuing options.

Is this a good summary?

If so, then you are basically wanting to do the same thing that the TCP stack is doing and when you see a dropped packet or ECN tagged packet, slow down the bit rate of the media that you are streaming so that it will use less bandwidth.

This sounds like an extremely interesting thing to do, it will be interesting to see the response from folks who know the deeper levels of the OS as to what options you have to learn that such events have taken place.

David Lang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20150327/6f5b8729/attachment-0002.html>