[Bloat] Best practices for paced TCP on Linux?

Sat Apr 7 15:38:04 EDT 2012

On Sat, Apr 7, 2012 at 12:01 PM, Steinar H. Gunderson
<sgunderson at bigfoot.com> wrote:
> On Sat, Apr 07, 2012 at 08:54:56PM +0200, Steinar H. Gunderson wrote:
>> I did these on one of the irrors (well, I did 500000 instead of 256000).
>> You can try
>>
>>   http://pannekake.samfundet.no:3013/ (SD)
>>   http://pannekake.samfundet.no:3015/ (HD)
>>
>> I didn't restart VLC; I hope I don't have to. =)
>
> I got reports from people in Norway that this instantly stopped the problems
> on the HD stream, so incredibly enough, it may have worked.
>
> I don't understand these mechanisms. Why would a smaller send window help?
> Less burstiness?

Awesome. I still think it's way too big, but there's some divisor in
here (1/4?) that I don't remember.

As for an explanation...

Welcome to bufferbloat, the global plague that is sweeping the world!

Up until you hit the available bandwidth on a path, life is golden,
and response time to lost packets approximately equals the overall
latency in the path, say, 10ms for around town. Your video player has
at least a few 100ms worth of it's own buffering, so it doesn't even
notice anything but a truly massive outage. TCP just recovers,
transparently, underneat.

But:

You pass that operating point, all the buffers in the path fill, your
delays go way up, then you finally lose a packet (see rfc970) and tcp
cannot recover with all the data in flight rapidly enough
(particularly in the case of streaming video), all the lost data needs
to be resent, and you get a collapse and a tcp reset. And then the
process starts again.

The more bandwidth you are using up, the easier (and faster ) it is to trigger.

tcp's response time vs buffering induced latency is quadratic. if you
have 10x more buffering in the path than needed, it takes 100x longer
to recover (so your recovery time went from ~10ms to 1000)

We're seeing buffering all over the internet in multiple types of
devices well in excess of that.

Your buffers don't always fill, as some people have sufficient
bandwidth and correctly operating gear.

Most importantly very few people try to send sustained bursts of tcp
data longer than a few seconds, which is why this is normally so hard
to see. You were streaming 5Mbit which is far more than just about
anybody....

Second most importantly it's a problem that is hard to trigger at sub
ms rtts (like what you would get on local ethernet during test) , but
gets rapidly easier as your rtts > 1 and the randomness of the
internet kicks in.

For WAY more detail on bufferbloat the cacm articles are canonical.

http://cacm.acm.org/magazines/2012/1/144810-bufferbloat/fulltext

See chart 4B in particular for a graphical explanation of how window
size and rtts are interrelated.

Your current setting now is overbuffered, but far less massively so.
With the short rtts the quadratic-isms kick in, but limiting the send
window size is still sub-optimal.

I'm very sorry your show hit this problem, so hard,  and that  it took
so long to figure out.

It will make a great case study, and I would love it of a few of the
amazing graphics and sound talents you had at  "the gathering" - could
lend their vision to explaining what bufferbloat is!

jg's videos are helpful but they don't rock out to techno! nor do they
feature led-lit juggling balls (wow!), nor spectral snakes that
dissolve into visual madness.

I am glad I had a chance to watch the udp stream and sad to know that
so few others were able to enjoy the show.

Perhaps using an rtp based streaming method, particularly over ipv6,
will give you a better result next year.

-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net