General list for discussing Bufferbloat
 help / color / mirror / Atom feed
* [Bloat] Notes about hacking on AQMs
@ 2011-06-08 12:12 Dave Taht
  2011-06-08 12:56 ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Taht @ 2011-06-08 12:12 UTC (permalink / raw)
  To: bloat

[-- Attachment #1: Type: text/plain, Size: 3997 bytes --]

So in addition to hacking on the switch, I've been poking into the behavior
of multiple AQM systems in the kernel, ranging from the wondershaper,
to the adsl-shaper, to the qos-scripts in openwrt.

I started off with something ambitious, which was to try and implement
a complete implementation of diffserv, using guidelines laid out by

not only across the outgoing to-the-internet interface, but across the
internal wired and wireless networks, something that would work with all
protocols.

I rapidly got bogged down. (or rather, I've been poking at it for months,
nay, years,
in part trying to find feedback loops that handled 'tiny monster' packets
like multicast on wireless)

Some notes:

There are as many philosophies to AQM as there are shapers and classifiers.

None of the Linux shaper scripts in the field handle ipv6 traffic.

HTB is the most commonly used qdisc, handles it's bandwidth limits by packet
drop
and doesn't do ECN. It's usually used in conjunction with other qdiscs, too.

An explanation of how diffserv (dsmark) and GRED are supposed to play ball
together
(starting here: http://www.opalsoft.net/qos/DS-27.htm) is so amazingly not
opaque.

SFB remains promising, but until I get a ported tc for it,
I can't play with it much.

SFQ is the second most commonly used qdisc, but doesn't balance in ways ESFQ
could.

ESFQ really looked like a winner and I'm sorry it never made the mainline
kernel.

HFSC is mind-bending as to what it tries to do.

Any form of fair queuing is useful for ethernet, but actually knowing the
link rate and port on the switch per dest macaddr would help in load
balancing streams.

Fair queueing is very bad on wireless when packet aggregation is used.

PFIFO_FAST is tied to TOS bits, not diffserv bits.

RED is, well, RED.

GRED is far less opaque than RED, as noted earlier.

MQ and MQPrio are horribly underdocumented. I still don't 'get' how to use
them
properly (I'm more focused on writing a good classifier at the moment)

802.11e does its prioritization at the vlan layer, not at the TOS or
diffserv bits. Getting from tos or diffserv to mq* seems painful but I
haven't looked into it too hard.

iptables seems to think ecn can only be looked at in TCP streams, where (for
example),
ecn bits can be copied to the outer header of a udp vpn stream, and marked
when needed.

ip6tables has no support for looking at ecn except through a u32 match.

You are in a maze of twisty little passages, all not quite going where you
want
to go. The intersection of all these 'solutions' is part of why wireless is
so messed up, as are home routers...
and I haven't even got to trying to figure out the multicast monster problem
yet!

Adding ECN capability to the other qdiscs looks like low hanging fruit...

Anyway, aside from the whining^H^H^H^H^^H descriptions above, here's a quick
and dirty bit of iptables useful for detecting ecn capability:

iptables -t mangle -X Wireless
iptables -t mangle -N Wireless
iptables -t mangle -F Wireless
iptables -t mangle -A Wireless -p tcp -m tcp --tcp-flags ALL SYN,ACK -m ecn
--ecn-tcp-ece -m recent --name ecn_enabled --set -m comment --comment 'ECN
enabled streams'
iptables -t mangle -A Wireless -p tcp -m tcp --tcp-flags ALL SYN,ACK -m ecn
! --ecn-tcp-ece -m recent --name ecn_disabled --set -m comment --comment
'ECN disab
led streams'

iptables -t mangle -F POSTROUTING
iptables -t mangle -A POSTROUTING -j Wireless

You can see what ips managed to do ECN or not via

cat /proc/net/xt_recent/ecn_*

But that's just a distraction from trying to converge on a
decent set of solutions for AQM. I AM happy to report that after getting
buffer sizes down (via ethtool, a switch patch, txqueuelen) I am finally
able to reliably see sub 10ms latencies on the wndr3700... but I wake up
these days, feeling doomed.

-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

[-- Attachment #2: Type: text/html, Size: 4549 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 12:12 [Bloat] Notes about hacking on AQMs Dave Taht
@ 2011-06-08 12:56 ` Eric Dumazet
  2011-06-08 13:32   ` Dave Taht
  2011-06-09 16:04   ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Dumazet @ 2011-06-08 12:56 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

Le mercredi 08 juin 2011 à 06:12 -0600, Dave Taht a écrit :

> SFQ is the second most commonly used qdisc, but doesn't balance in
> ways ESFQ could.
> 
> ESFQ really looked like a winner and I'm sorry it never made the
> mainline kernel.

Hmm, since 2007 SFQ has all ESFQ provided, if you use a flow classifier,
you can exactly match your needs.

[ SFQ uses an internal flow classifer on
src,dst,proto,proto-src,proto-dst ]

Say you want to make something only about dst addresses :

tc filter add ... flow hash \
  	keys dst divisor 1024

With recent SFQ, you can play with a divisor in [256 .. 65536]


Refs : 

http://lwn.net/Articles/236200/

http://www.nuclearcat.com/mediawiki/index.php/Linux_iproute2




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 12:56 ` Eric Dumazet
@ 2011-06-08 13:32   ` Dave Taht
  2011-06-08 14:04     ` Dave Taht
  2011-06-09 16:04   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Taht @ 2011-06-08 13:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]

On Wed, Jun 8, 2011 at 6:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mercredi 08 juin 2011 à 06:12 -0600, Dave Taht a écrit :
>
> > SFQ is the second most commonly used qdisc, but doesn't balance in
> > ways ESFQ could.
> >
> > ESFQ really looked like a winner and I'm sorry it never made the
> > mainline kernel.
>
> Hmm, since 2007 SFQ has all ESFQ provided, if you use a flow classifier,
> you can exactly match your needs.
>
> [ SFQ uses an internal flow classifer on
> src,dst,proto,proto-src,proto-dst ]
>
> Say you want to make something only about dst addresses :
>
> tc filter add ... flow hash \
>        keys dst divisor 1024
>
> With recent SFQ, you can play with a divisor in [256 .. 65536]
>
>
>
Didn't know that!! VERY COOL. How history changes.


> Refs :
>
> http://lwn.net/Articles/236200/
>
> http://www.nuclearcat.com/mediawiki/index.php/Linux_iproute2
>
>
>
>


-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

[-- Attachment #2: Type: text/html, Size: 1804 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 13:32   ` Dave Taht
@ 2011-06-08 14:04     ` Dave Taht
  2011-06-08 14:57       ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Taht @ 2011-06-08 14:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 3406 bytes --]

It looks like adding ECN to the other qdiscs would be good, and transparent
to the upper layers, but a 10 minute glance at HTB seems to make it a
non-trivial exercise. But that's me. I would certainly like to see ECN
asserted more often than it is. Thoughts?

On the diffserv front, I'd meant to link to this RFC:

http://tools.ietf.org/html/rfc4594

in the first message on this thread.

... While laboring to classify hundreds of packet types into various buckets
using conventional iptables rules (code in progress in my Cruft repo on
github)

I came up with an interesting (and possibly bogus) idea for combining QoS
and firewalling that seems both simple and low overhead (thus suspect)

There are only 64k ports in the world. To do diffserv, you need 6 bits. With
a lookup table of 48k, instead of laborously matching packets with dozens of
rules like this:

$iptables -t mangle -A Wireless -p tcp -m tcp -m multiport --ports $P2PPORTS
-j DSCP --set-dscp-class CS4 -m comment --comment 'P2P'

Instead, you could have a table that had a 1to1 correspondence table between
ports and DSCP values,
and load that into the kernel at run time (generated perhaps from iana's
list + some other collection, modified for decent diffserv classes by
various providers (similar to how adblock plus provides varying lists)

That would result in a 48k table lookup, and would mostly stay 'hot' in that
you would typically match against the lower of the port numbers.

The *crazy* part of the idea was that you basically need 2 bits to determine
if you want to allow a packet of a given type or not.

00 = allow
01 = block incoming
10 = block outgoing
11 = block both

So a massive set of iptables and classification rules could be replaced by a
table lookup, and the tables developed and distributed via means similar to
how we do dnsrbls today, or adblock plus.

There are pesky problems like coping with ephemeral ports, etc, and cache
misses, but I would hope that two table lookups would outperform two dozen
iptables rules.

And provide a means for comprehensive classification that has not been done
to date.

On Wed, Jun 8, 2011 at 7:32 AM, Dave Taht <dave.taht@gmail.com> wrote:

>
>
> On Wed, Jun 8, 2011 at 6:56 AM, Eric Dumazet <eric.dumazet@gmail.com>wrote:
>
>> Le mercredi 08 juin 2011 à 06:12 -0600, Dave Taht a écrit :
>>
>> > SFQ is the second most commonly used qdisc, but doesn't balance in
>> > ways ESFQ could.
>> >
>> > ESFQ really looked like a winner and I'm sorry it never made the
>> > mainline kernel.
>>
>> Hmm, since 2007 SFQ has all ESFQ provided, if you use a flow classifier,
>> you can exactly match your needs.
>>
>> [ SFQ uses an internal flow classifer on
>> src,dst,proto,proto-src,proto-dst ]
>>
>> Say you want to make something only about dst addresses :
>>
>> tc filter add ... flow hash \
>>        keys dst divisor 1024
>>
>> With recent SFQ, you can play with a divisor in [256 .. 65536]
>>
>>
>>
> Didn't know that!! VERY COOL. How history changes.
>
>
>> Refs :
>>
>> http://lwn.net/Articles/236200/
>>
>> http://www.nuclearcat.com/mediawiki/index.php/Linux_iproute2
>>
>>
>>
>>
>
>
> --
> Dave Täht
> SKYPE: davetaht
> US Tel: 1-239-829-5608
> http://the-edge.blogspot.com
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

[-- Attachment #2: Type: text/html, Size: 4776 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 14:04     ` Dave Taht
@ 2011-06-08 14:57       ` Eric Dumazet
  2011-06-08 15:20         ` Dave Taht
  2011-06-08 15:27         ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Dumazet @ 2011-06-08 14:57 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

Le mercredi 08 juin 2011 à 08:04 -0600, Dave Taht a écrit :
> It looks like adding ECN to the other qdiscs would be good, and
> transparent to the upper layers, but a 10 minute glance at HTB seems
> to make it a non-trivial exercise. But that's me. I would certainly
> like to see ECN asserted more often than it is. Thoughts?

Just add to your HTB some RED qdisc ? You have a framework to build
whatever is needed. Dont try to use a "single magic thing that will
solve all my problems". This reminds me the ESFQ attempt : Patrick
prefered to plug an external classifier in SFQ, instead of adding
specialized code in each possible Qdisc.


I had the idea to add ECN to SFQ (my favorite qdisc for proxies dealing
only with tcp flows) in the past, with a global config (shared for all
flows : remember SFQ means Fair Queuing ;) )

At queueing time :

- we compute the flow (internal default SFQ classifier, or external user
provided one)
- We queue the packet into its slot X (kind of pfifo)
- If queue limit is reached, take a packet from the biggest slot Y, do a
head drop. Return Congestion Notification to caller if the chosen slot
is the slot X (X == Y)

Adding ECN/RED here could be done with very litle added cost :

Adding kind of RED on each slot, instead of a regular pfifo, and
probabilist mark/drop packet at enqueue time if :
- Current slot length is above the RED lower threshold
- Or average residency time in slot above a threshold

And doing full drop if :
- Current slot length is above RED upper limit
- Current elapsed time of head packet above upper time limit.

I like the time being the feedback instead of queue length (hard to
tune, especially if bandwidth is unknown)

You would say for example : 
	min_time = 3 ms 
	max_time = 30 ms
	probability = 0.05
	limit_time = 100 ms




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 14:57       ` Eric Dumazet
@ 2011-06-08 15:20         ` Dave Taht
  2011-06-08 15:21           ` Dave Taht
  2011-06-23 22:38           ` Juliusz Chroboczek
  2011-06-08 15:27         ` Jesper Dangaard Brouer
  1 sibling, 2 replies; 20+ messages in thread
From: Dave Taht @ 2011-06-08 15:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 2977 bytes --]

On Wed, Jun 8, 2011 at 8:57 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mercredi 08 juin 2011 à 08:04 -0600, Dave Taht a écrit :
> > It looks like adding ECN to the other qdiscs would be good, and
> > transparent to the upper layers, but a 10 minute glance at HTB seems
> > to make it a non-trivial exercise. But that's me. I would certainly
> > like to see ECN asserted more often than it is. Thoughts?
>
> Just add to your HTB some RED qdisc ? You have a framework to build
> whatever is needed. Dont try to use a "single magic thing that will
> solve all my problems". This reminds me the ESFQ attempt : Patrick
> prefered to plug an external classifier in SFQ, instead of adding
> specialized code in each possible Qdisc.
>

I agree that the new (1997) solution for SFQ, embedding the ESFQ principle
is better. It's NOT embedded in the shaper scripts I've been playing with,
it certainly seems saner for flows into the home to be doing it against dest
ips rather than ips and port numbers, in the bittorrent age.

I will also attempt to argue persuasively that having ECN packet marking in
HTB and elsewhere - when possible - in addition to packet drop would
probably result in better behavior overall, but to do that well would
require coding it up.

The core argument would be:

By the time a packet gets to a RED sub-qdisc, it's already been through HTB,
and dropped if it is overlimit. RED has it's own idea as to the 'bandwidth'
available, and does not understand what it's getting has already been shaped
by HTB.



>
>
> I had the idea to add ECN to SFQ (my favorite qdisc for proxies dealing
> only with tcp flows) in the past, with a global config (shared for all
> flows : remember SFQ means Fair Queuing ;) )
>
> At queueing time :
>
> - we compute the flow (internal default SFQ classifier, or external user
> provided one)
> - We queue the packet into its slot X (kind of pfifo)
> - If queue limit is reached, take a packet from the biggest slot Y, do a
> head drop. Return Congestion Notification to caller if the chosen slot
> is the slot X (X == Y)
>
> Adding ECN/RED here could be done with very litle added cost :
>
> Adding kind of RED on each slot, instead of a regular pfifo, and
> probabilist mark/drop packet at enqueue time if :
> - Current slot length is above the RED lower threshold
> - Or average residency time in slot above a threshold
>
> And doing full drop if :
> - Current slot length is above RED upper limit
> - Current elapsed time of head packet above upper time limit.
>
> I like the time being the feedback instead of queue length (hard to
> tune, especially if bandwidth is unknown)
>
> You would say for example :
>        min_time = 3 ms
>        max_time = 30 ms
>        probability = 0.05
>        limit_time = 100 ms
>
>
This sounds promising also.




-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

[-- Attachment #2: Type: text/html, Size: 3690 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 15:20         ` Dave Taht
@ 2011-06-08 15:21           ` Dave Taht
  2011-06-23 22:38           ` Juliusz Chroboczek
  1 sibling, 0 replies; 20+ messages in thread
From: Dave Taht @ 2011-06-08 15:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 175 bytes --]

I agree that the new (1997) solution for SFQ, embedding the
>

Sorry, meant 2007.

-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

[-- Attachment #2: Type: text/html, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 14:57       ` Eric Dumazet
  2011-06-08 15:20         ` Dave Taht
@ 2011-06-08 15:27         ` Jesper Dangaard Brouer
  2011-06-08 15:45           ` Eric Dumazet
  2011-06-08 23:06           ` Thomas Graf
  1 sibling, 2 replies; 20+ messages in thread
From: Jesper Dangaard Brouer @ 2011-06-08 15:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Thomas Graf, bloat

On Wed, 2011-06-08 at 16:57 +0200, Eric Dumazet wrote:

> Just add to your HTB some RED qdisc ? You have a framework to build
> whatever is needed. Dont try to use a "single magic thing that will
> solve all my problems". This reminds me the ESFQ attempt : Patrick
> prefered to plug an external classifier in SFQ, instead of adding
> specialized code in each possible Qdisc.

While this is a good coding approach, the end result is that nobody is
using this stuff, because "tc" is so difficult to use, and its error
feedback is so lousy that you will never figure out your small syntax
errors.

I wonder if Thomas Graf ever finished/release his alternative to tc?

[...]
> I like the time being the feedback instead of queue length (hard to
> tune, especially if bandwidth is unknown)

I love the idea of using the delay time as parameter/feedback :-)


-- 
Best regards,
  Jesper Dangaard Brouer
  ComX Networks A/S
  Linux Network Kernel Developer
  Cand. Scient Datalog / MSc.CS
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 15:27         ` Jesper Dangaard Brouer
@ 2011-06-08 15:45           ` Eric Dumazet
  2011-06-08 15:51             ` Dave Taht
  2011-06-08 23:06           ` Thomas Graf
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2011-06-08 15:45 UTC (permalink / raw)
  To: jdb; +Cc: Thomas Graf, bloat

Le mercredi 08 juin 2011 à 11:27 -0400, Jesper Dangaard Brouer a écrit :

> While this is a good coding approach, the end result is that nobody is
> using this stuff, because "tc" is so difficult to use, and its error
> feedback is so lousy that you will never figure out your small syntax
> errors.
> 

Well, I agree its really hard to even use 10% of tc features, but isnt
human brain has the same problem ? ;)

Most people playing with AQM setups are using scripts, or even script
generators for complex/dynamic cases.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 15:45           ` Eric Dumazet
@ 2011-06-08 15:51             ` Dave Taht
  2011-06-08 16:41               ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Taht @ 2011-06-08 15:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Thomas Graf, jdb, bloat

On Wed, Jun 8, 2011 at 9:45 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 08 juin 2011 à 11:27 -0400, Jesper Dangaard Brouer a écrit :
>
>> While this is a good coding approach, the end result is that nobody is
>> using this stuff, because "tc" is so difficult to use, and its error
>> feedback is so lousy that you will never figure out your small syntax
>> errors.
>>
>
> Well, I agree its really hard to even use 10% of tc features, but isnt
> human brain has the same problem ? ;)
>
> Most people playing with AQM setups are using scripts, or even script
> generators for complex/dynamic cases.

And they are *all* wrong to varying extents, which is why I like the
'mondo classifier' idea for DSCP+firewalling mentioned earlier on this
thread. Converging on several standards for packet marking vs the
adhoc-ness of thousands of different partial solutions that now exist
really makes sense to me.

>
>
>
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 15:51             ` Dave Taht
@ 2011-06-08 16:41               ` Eric Dumazet
  2011-06-08 16:52                 ` Dave Taht
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2011-06-08 16:41 UTC (permalink / raw)
  To: Dave Taht; +Cc: Thomas Graf, jdb, bloat

Le mercredi 08 juin 2011 à 09:51 -0600, Dave Taht a écrit :

> 
> And they are *all* wrong to varying extents, which is why I like the
> 'mondo classifier' idea for DSCP+firewalling mentioned earlier on this
> thread. Converging on several standards for packet marking vs the
> adhoc-ness of thousands of different partial solutions that now exist
> really makes sense to me.

I can tell you there are hundred of different *valid* setups, especially
in server farms, when you want some control of network trafic, now
machines have Gb or 10Gb links...

Really, there is no "one big thing that solves all problems,
automatically"

You are doing a great job, but now we need to split all your findings
into small units and eventually fix problems.

Do not expect everything to work, since few people are interested to
make the necessary kernel changes in their free time.

BTW, latest stuff uses DRR & HFSC ;)

Patrick sample script is here :
http://people.netfilter.org/kaber/shaping



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 16:41               ` Eric Dumazet
@ 2011-06-08 16:52                 ` Dave Taht
  2011-06-08 17:50                   ` Stephen Hemminger
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Taht @ 2011-06-08 16:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Thomas Graf, jdb, bloat

On Wed, Jun 8, 2011 at 10:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 08 juin 2011 à 09:51 -0600, Dave Taht a écrit :
>
>>
>> And they are *all* wrong to varying extents, which is why I like the
>> 'mondo classifier' idea for DSCP+firewalling mentioned earlier on this
>> thread. Converging on several standards for packet marking vs the
>> adhoc-ness of thousands of different partial solutions that now exist
>> really makes sense to me.
>
> I can tell you there are hundred of different *valid* setups, especially
> in server farms, when you want some control of network trafic, now
> machines have Gb or 10Gb links...

Well, there are hundreds of thousands of completely ad-hoc solutions
of varying degrees of effacy.

Getting it down to mere hundreds would be be a good start.

> Really, there is no "one big thing that solves all problems,
> automatically".

Oh, I agree.

>
> You are doing a great job, but now we need to split all your findings
> into small units and eventually fix problems.

I appreciate the rapidity as problems have been found, as fixed.
I mean, I only saw ECN actually start to work as advertised a month or
two ago, and the results were very promising, as shared on the bloat
list....


> Do not expect everything to work, since few people are interested to
> make the necessary kernel changes in their free time.

Well, funding this work would be a great thing for everybody, and
we're working on it, and boy do we appreciate the volunteerism as it
stands.

Everybody, including me, is basically working for free here, (I'm
leveraging the lab at gatech and am (now) making a little money
supporting their bismark effort, but just a little) - but fixing the
internet as a whole is a worthy goal, don't you think? It beats
building web sites....

:)

>
> BTW, latest stuff uses DRR & HFSC ;)

>
> Patrick sample script is here :
> http://people.netfilter.org/kaber/shaping

AWESOME! I was going to get DRR working in my next build and series of
test runs. And look harder at HSFC than I have as yet.

I'm told the latest iproute2 code will be out soon, once all those
pieces are in place, I can beat them all up, pretty good, in the lab.

>
>
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 16:52                 ` Dave Taht
@ 2011-06-08 17:50                   ` Stephen Hemminger
  0 siblings, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2011-06-08 17:50 UTC (permalink / raw)
  To: Dave Taht; +Cc: Thomas Graf, jdb, bloat

On Wed, 8 Jun 2011 10:52:07 -0600
Dave Taht <dave.taht@gmail.com> wrote:

> On Wed, Jun 8, 2011 at 10:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mercredi 08 juin 2011 à 09:51 -0600, Dave Taht a écrit :
> >
> >>
> >> And they are *all* wrong to varying extents, which is why I like the
> >> 'mondo classifier' idea for DSCP+firewalling mentioned earlier on this
> >> thread. Converging on several standards for packet marking vs the
> >> adhoc-ness of thousands of different partial solutions that now exist
> >> really makes sense to me.
> >
> > I can tell you there are hundred of different *valid* setups, especially
> > in server farms, when you want some control of network trafic, now
> > machines have Gb or 10Gb links...
> 
> Well, there are hundreds of thousands of completely ad-hoc solutions
> of varying degrees of effacy.
> 
> Getting it down to mere hundreds would be be a good start.
> 
> > Really, there is no "one big thing that solves all problems,
> > automatically".
> 
> Oh, I agree.

It isn't just a Linux problem. Cisco and Juniper have been doing
QoS solutions for years. Like Linux there is the "billions of knobs
version" and the KISS version. The KISS versions are fair queueing
based. The problem is that the more complex QoS variants can't be
done in ASIC's and go down the software path.  Linux has the same
problem, the more complex QoS ends up requiring locks that embed
performance.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 15:27         ` Jesper Dangaard Brouer
  2011-06-08 15:45           ` Eric Dumazet
@ 2011-06-08 23:06           ` Thomas Graf
  2011-06-09 17:18             ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 20+ messages in thread
From: Thomas Graf @ 2011-06-08 23:06 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Thomas Graf, bloat

On Wed, Jun 08, 2011 at 11:27:51AM -0400, Jesper Dangaard Brouer wrote:
> On Wed, 2011-06-08 at 16:57 +0200, Eric Dumazet wrote:
> 
> > Just add to your HTB some RED qdisc ? You have a framework to build
> > whatever is needed. Dont try to use a "single magic thing that will
> > solve all my problems". This reminds me the ESFQ attempt : Patrick
> > prefered to plug an external classifier in SFQ, instead of adding
> > specialized code in each possible Qdisc.
> 
> While this is a good coding approach, the end result is that nobody is
> using this stuff, because "tc" is so difficult to use, and its error
> feedback is so lousy that you will never figure out your small syntax
> errors.
> 
> I wonder if Thomas Graf ever finished/release his alternative to tc?

Wish I could spend more time on it but I'm slowly getting there.
I'll present some of it at netconf.

Takes quite some effort to really handle all misconfiguration
cases and print verbose error messages to help and assist the
user.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 12:56 ` Eric Dumazet
  2011-06-08 13:32   ` Dave Taht
@ 2011-06-09 16:04   ` Jesper Dangaard Brouer
  2011-06-09 16:14     ` Eric Dumazet
  1 sibling, 1 reply; 20+ messages in thread
From: Jesper Dangaard Brouer @ 2011-06-09 16:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: bloat


On Wed, 2011-06-08 at 18:41 +0200, Eric Dumazet wrote:

> BTW, latest stuff uses DRR & HFSC ;)
> 
> Patrick sample script is here :
> http://people.netfilter.org/kaber/shaping

I'll add a sample script to the collection:
 http://people.netfilter.org/hawk/shaper-example/qos-DRR-example

I did that script as a consultant task.  Its based on HTB + DRR + SFQ.
The customer was a large apartment building complex, which wanted to
provide fair queue scheduling.  The residents could choose between two
Internet subscriptions a "small" upto 100Mbit/s shared, and a "big" upto
390 Mbit/s shared.  Within each group they achieve fair sharing via DRR.
And each DRR subqueue is a SFQ queue to give the person fair sharing
between his "own" traffic (or if the hash clash and several users get in
the same queue).


On Wed, 2011-06-08 at 14:56 +0200, Eric Dumazet wrote:

> Hmm, since 2007 SFQ has all ESFQ provided, if you use a flow classifier,
> you can exactly match your needs.
> 
> [ SFQ uses an internal flow classifer on
> src,dst,proto,proto-src,proto-dst ]
> 
> Say you want to make something only about dst addresses :
> 
> tc filter add ... flow hash \
>   	keys dst divisor 1024
> 
> With recent SFQ, you can play with a divisor in [256 .. 65536]
> 
> 
> Refs : 
> 
> http://lwn.net/Articles/236200/
> 
> http://www.nuclearcat.com/mediawiki/index.php/Linux_iproute2





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-09 16:04   ` Jesper Dangaard Brouer
@ 2011-06-09 16:14     ` Eric Dumazet
  2011-06-09 17:20       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2011-06-09 16:14 UTC (permalink / raw)
  To: jdb; +Cc: bloat

Le jeudi 09 juin 2011 à 12:04 -0400, Jesper Dangaard Brouer a écrit :
> On Wed, 2011-06-08 at 18:41 +0200, Eric Dumazet wrote:
> 
> > BTW, latest stuff uses DRR & HFSC ;)
> > 
> > Patrick sample script is here :
> > http://people.netfilter.org/kaber/shaping
> 
> I'll add a sample script to the collection:
>  http://people.netfilter.org/hawk/shaper-example/qos-DRR-example
> 
> I did that script as a consultant task.  Its based on HTB + DRR + SFQ.
> The customer was a large apartment building complex, which wanted to
> provide fair queue scheduling.  The residents could choose between two
> Internet subscriptions a "small" upto 100Mbit/s shared, and a "big" upto
> 390 Mbit/s shared.  Within each group they achieve fair sharing via DRR.
> And each DRR subqueue is a SFQ queue to give the person fair sharing
> between his "own" traffic (or if the hash clash and several users get in
> the same queue).

Hi !

I can see some strange limits in your sfq :

fun_tc qdisc add dev ${DEV} parent 1:50 handle 4250: \
    sfq perturb 10 limit 256

AFAIK SFQ max limit is 128. Are you using a custom SFQ ?




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 23:06           ` Thomas Graf
@ 2011-06-09 17:18             ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 20+ messages in thread
From: Jesper Dangaard Brouer @ 2011-06-09 17:18 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Thomas Graf, bloat

On Wed, 2011-06-08 at 19:06 -0400, Thomas Graf wrote:
> On Wed, Jun 08, 2011 at 11:27:51AM -0400, Jesper Dangaard Brouer wrote:
> > On Wed, 2011-06-08 at 16:57 +0200, Eric Dumazet wrote:
> > 
> > > Just add to your HTB some RED qdisc ? You have a framework to build
> > > whatever is needed. Dont try to use a "single magic thing that will
> > > solve all my problems". This reminds me the ESFQ attempt : Patrick
> > > prefered to plug an external classifier in SFQ, instead of adding
> > > specialized code in each possible Qdisc.
> > 
> > While this is a good coding approach, the end result is that nobody is
> > using this stuff, because "tc" is so difficult to use, and its error
> > feedback is so lousy that you will never figure out your small syntax
> > errors.
> > 
> > I wonder if Thomas Graf ever finished/release his alternative to tc?
> 
> Wish I could spend more time on it but I'm slowly getting there.
> I'll present some of it at netconf.
> 
> Takes quite some effort to really handle all misconfiguration
> cases and print verbose error messages to help and assist the
> user.

Good to hear that you still work on the project :-)

Thomas Graf's old slides are available here:
  http://vger.kernel.org/netconf2010_slides/tgraf_netconf10.odp




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-09 16:14     ` Eric Dumazet
@ 2011-06-09 17:20       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 20+ messages in thread
From: Jesper Dangaard Brouer @ 2011-06-09 17:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: bloat

On Thu, 2011-06-09 at 18:14 +0200, Eric Dumazet wrote:
> Le jeudi 09 juin 2011 à 12:04 -0400, Jesper Dangaard Brouer a écrit :
> > On Wed, 2011-06-08 at 18:41 +0200, Eric Dumazet wrote:
> > 
> > > BTW, latest stuff uses DRR & HFSC ;)
> > > 
> > > Patrick sample script is here :
> > > http://people.netfilter.org/kaber/shaping
> > 
> > I'll add a sample script to the collection:
> >  http://people.netfilter.org/hawk/shaper-example/qos-DRR-example
> > 
> > I did that script as a consultant task.  Its based on HTB + DRR + SFQ.
> > The customer was a large apartment building complex, which wanted to
> > provide fair queue scheduling.  The residents could choose between two
> > Internet subscriptions a "small" upto 100Mbit/s shared, and a "big" upto
> > 390 Mbit/s shared.  Within each group they achieve fair sharing via DRR.
> > And each DRR subqueue is a SFQ queue to give the person fair sharing
> > between his "own" traffic (or if the hash clash and several users get in
> > the same queue).
> 
> Hi !
> 
> I can see some strange limits in your sfq :
> 
> fun_tc qdisc add dev ${DEV} parent 1:50 handle 4250: \
>     sfq perturb 10 limit 256
> 
> AFAIK SFQ max limit is 128. Are you using a custom SFQ ?

Nope, no custom SFQ, I guess I just got the parameters wrong... I though
could increase the queue size this way, but I guess I'm wrong.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-08 15:20         ` Dave Taht
  2011-06-08 15:21           ` Dave Taht
@ 2011-06-23 22:38           ` Juliusz Chroboczek
  2011-06-23 22:47             ` Dave Taht
  1 sibling, 1 reply; 20+ messages in thread
From: Juliusz Chroboczek @ 2011-06-23 22:38 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

> I will also attempt to argue persuasively that having ECN packet marking in HTB

I'm not following you.  You can only perform ECN marking when you detect
congestion; and you can only detect congestion if you're queueing.
I may be wrong, but I believe that HTB doesn't do any queueing itself,
it delegates queuing to its child qdiscs; hence, I don't see how you can
perform ECN marking in HTB itself; you should be doing it in HTB's child
qdisc.

> RED has it's own idea as to the 'bandwidth' available, and does not
> understand what it's getting has already been shaped by HTB.

I'm not sure I understand that.

-- Juliusz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] Notes about hacking on AQMs
  2011-06-23 22:38           ` Juliusz Chroboczek
@ 2011-06-23 22:47             ` Dave Taht
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Taht @ 2011-06-23 22:47 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: bloat

On Thu, Jun 23, 2011 at 4:38 PM, Juliusz Chroboczek <jch@pps.jussieu.fr> wrote:
>> I will also attempt to argue persuasively that having ECN packet marking in HTB
>
> I'm not following you.  You can only perform ECN marking when you detect
> congestion; and you can only detect congestion if you're queueing.
> I may be wrong, but I believe that HTB doesn't do any queueing itself,
> it delegates queuing to its child qdiscs; hence, I don't see how you can
> perform ECN marking in HTB itself; you should be doing it in HTB's child
> qdisc.

You are right. I had spent a lot of time into HTB puzzling over how it
actually worked, observing lots of packet drops at it's level of stats
and nothing at the red level in the existing qos-scripts, and no
seeming correlation to the SFQ or hsfc either. It just didn't add up.

I'm still puzzling over it. But I grew to understand after posting
that mail that my issues were happening at the child qdiscs and I will
continue rewriting the rfc until it is both right AND clear.

Thx for catching up on the backlog.


>> RED has it's own idea as to the 'bandwidth' available, and does not
>> understand what it's getting has already been shaped by HTB.


>
> I'm not sure I understand that.

Neither do I and I'm going to go hide under a rock until I do.

> -- Juliusz
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-06-23 22:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-08 12:12 [Bloat] Notes about hacking on AQMs Dave Taht
2011-06-08 12:56 ` Eric Dumazet
2011-06-08 13:32   ` Dave Taht
2011-06-08 14:04     ` Dave Taht
2011-06-08 14:57       ` Eric Dumazet
2011-06-08 15:20         ` Dave Taht
2011-06-08 15:21           ` Dave Taht
2011-06-23 22:38           ` Juliusz Chroboczek
2011-06-23 22:47             ` Dave Taht
2011-06-08 15:27         ` Jesper Dangaard Brouer
2011-06-08 15:45           ` Eric Dumazet
2011-06-08 15:51             ` Dave Taht
2011-06-08 16:41               ` Eric Dumazet
2011-06-08 16:52                 ` Dave Taht
2011-06-08 17:50                   ` Stephen Hemminger
2011-06-08 23:06           ` Thomas Graf
2011-06-09 17:18             ` Jesper Dangaard Brouer
2011-06-09 16:04   ` Jesper Dangaard Brouer
2011-06-09 16:14     ` Eric Dumazet
2011-06-09 17:20       ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox