[Bloat] What is fairness, anyway? was: Re: finally... winning on wired!

Dave Taht dave.taht at gmail.com
Sat Jan 7 19:40:10 EST 2012


On Thu, Jan 5, 2012 at 6:52 PM, Bob Briscoe <bob.briscoe at bt.com> wrote:
> Jim, Justin,
>
> Jumping back one posting in this thread...
>
>
> At 17:36 04/01/2012, Justin McCann wrote:
>>
>> On Wed, Jan 4, 2012 at 11:16 AM, Dave Taht <dave.taht at gmail.com> wrote:
>> >
>> > On Wed, Jan 4, 2012 at 4:25 PM, Jim Gettys <jg at freedesktop.org> wrote:
>> >
>> > 1: the 'slower flows gain priority' question is my gravest concern
>> > (eg, ledbat, bittorrent). It's fixable with per-host FQ.
>>
>> Meaning that you don't want to hand priority to stuff that is intended
>> to stay in the background?
>
>
> The LEDBAT/BitTorrent issue wouldn't be fixed by per-host FQ.
> LEDBAT/uTP tries to yield to other hosts, not just its own host.
>
> In fact, in the early part of the last decade, the whole issue of
> long-running vs interactive flows showed how broken any form of FQ was. This
> was why ISPs moved away from rate equality (whether per-flow, per-host or
> per-customer site) to various per-customer-volume-based approaches (or a mix
> of both).
>
> There seems to be an unspoken assumption among many on this list that rate
> equality must be integrated into each AQM implementation. That's so 2004. It
> seems all the developments in fairness over the last several years have
> passed completely over the heads of many on this list. This page might fill
> in the gaps for those who missed the last few years:
> <http://trac.tools.ietf.org/group/irtf/trac/wiki/CapacitySharingArch>


> To address buffer bloat, I advise we "do one thing and do it well": bulk
> AQM.

If you have an algorithm to suggest, I'd gladly look at it.

>
> In a nutshell, bit-rate equality, where each of N active users gets 1/N of
> the bit-rate, was found to be extremely _unfair_ when the activity of
> different users is widely different. For example:
> * 5 light users all active 1% of the time get close to 100% of a shared link
> whenever they need it.
> * However, if instead 2 of these users are active 100% of the time, FQ gives
> the other three light users only 33% of the link whenever they are active.
> * That's pretty rubbish for a solution that claims to isolate each user from
> the excesses of others.

Without AQM or FQ, we have a situation where one stream from one user
at a site, can eat more than 100% of the bandwidth.

1/u would be a substantial improvement!

Secondly, as most devices lack both AQM and FQ these days despite as -
one the papers referenced said - "considering that to be a bug"
people are doing DoS attacks on themselves whenever they attempt
to do something that requires high bandwidth AND do something interactively.

Thirdly I kind of need to clarify the usage of three terms in this discussion.

To me, a *user* is - mom, dad, son, daughter, and to some extent their ipods,
ipads, and tivos, The interesting thing about this scenario is that it
is the device
you are in front of that you want the best interactive performance from. A
second interesting thing is that the highest interactive, yet, bulk flow
- tv streams - have an upper limit on their rate, and also tend to adjust
fairly well to lower ones.

A *site* is a home, small business, cybercafe, office, etc. As the size of a
'site' scales up to well beyond what a polyamorous catholic family could
produce in terms of users, additional techniques do apply.

(while what we're working on should work well on handhelds, desktops,
many kinds of servers, home routers, etc - I am far from convinced it's
applicable in the isp->customer side, aside from the FQ portion breaking
up streams enough for various algorithms to work better.)

Anyway:

I think sometimes 'users' and 'sites' get conflated.

As you touch upon, the policies that a 'site' implements in order to manage
their internet access do not enter the political arena in any way. Those
policies are in the interest of family harmony only, and giving dad or
daughter knobs to turn that actually work is in everyones best interest.

Thirdly, the definition of a "flow", within a site, is more flexible than
what a provider can see beyond nat. The device on the customers
site can regulate flows at levels ranging from mere IP address
(even the mac address) - to ports, to tuples consisting of the time
of day and dscp information...

As a demo to myself I got 1/u to work nearly perfectly a while
back.

And I left in the ability to manage multicast (to limit it
*enough* so it could be *used* in wireless, without damaging the
rest of the network at the *site*. I LIKE multicast, but the
development of switches and wireless need it to be managed
properly.

>
> Since 2004, we now understand that fairness has to involve accounting over
> time. That requires per-user state, which is not something you can do, or
> that you need to do, within every queue. We should leave fairness to
> separate code, probably on machines specialising in this at the edge of a
> provider's network domain - where it knows about users/customers - separate
> from the AQM code of each specific queue.

You are taking the provider in approach, we're on the device out approach.
The two approaches meet on the edge gw in the site.

Firstly, FQ helps on the user's machine, as does AQM. Not so much
at gigE speeds, as the instantaneous queue length is rarely long enough,
but a lot, on wireless, where you always have ~4ms worth of queue.

FQ hurts the user herself on bittorrent like applications, and
there are multiple ways to manage that. For example, a new way showed
up in the current tree for putting applications in cgroup containers that could
set their network 'priority'. The std way is for bittorrent to have rate limit.
Another way IS for an AQM or app to be congestion aware much like
with your proposed Conex stuff. The way I was considering was playing
with the TCP-ledbat code to be more datacenter-tcp like...

Secondly, a combination FQ/AQM is the approach we are taking, more or less.

I started with HTB + QFQ + RED/SFB etc... back in august...

At the moment eric is scaling up SFQ and adding
a couple variants of RED. The net behavior should be similar to what
you describe. If you care to try it, it's mostly in linus's kernel at this
point.

It was possibly unclear to others that we have never thought that
FQ alone was the answer. In fact we started off saying that
better AQM was the answer, and then realized that the
problem had multiple aspects that could be addressed
independently, and thus started saying FQ + AQM was
a way forward.

I would LOVE a 'bulk AQM' solution, and I'll be more
than glad to start working on solving 2008's problems...
after we get done with solving 1985's problems...
by applying techniques that were well understood by the late 90s...
and disremembered by an entire generation of device makers,
driver writers, OS, network designers, buyers, and benchmarkers.

But first...

1) We needed *any* form of packet scheduler to
actually work, which we couldn't do until a few months back
due to overly huge tx rings in all the current ethernet devices.

fixed by BQL

2) We then needed a FQ mechanism to actually behave as designed
- which both SFQ and and QFQ do now - they weren't, for years.

3) We also needed AQMs that worked - Still do. I'm looking forward
to a successor to RED, and there were some implementation bugs
in RED that made turning it's knobs do nothing that are fixed now
that need testing, and a couple new twists on it in SFQRED...

4) We needed a way to handle TSO properly, as byte oriented AQMs
    handled that badly. Sort of fixed now.

Please feel free to evaluate and critique SFQRED.


In my case I plan to continue working with HTB, QFQ
(which, btw, has multiple interesting ways to engage
weighting mechanisms) and various sub qdiscs... *after SFQ stablizes*.

I kind of view QFQ as a qdisc construction set.

We're well aware that these may not be the ultimate answer to
all the known networking problems in the universe, but what
we have working in the lab is making a big dent in them.

And now that bugfix #1 above exists...

Let a thousand new AQM's bloom!

If there is anything in what we're up to that will damage the
internet worse that it is damaged already, let us know soonest.



More information about the Bloat mailing list