* Re: [Codel] FQ_Codel lwn draft article review
[not found] <CAA93jw5yFvrOyXu2s2DY3oK_0v3OaNfnL+1zTteJodfxtAAzcQ@mail.gmail.com>
@ 2012-11-23 8:57 ` Dave Taht
2012-11-23 22:18 ` Paul E. McKenney
0 siblings, 1 reply; 56+ messages in thread
From: Dave Taht @ 2012-11-23 8:57 UTC (permalink / raw)
To: paulmck, bloat, cerowrt-devel, codel
Cc: Eric Raymond, Paolo Valente, Toke Høiland-Jørgensen,
John Crispin
David Woodhouse and I fiddled a lot with adsl and openwrt and a
variety of drivers and network layers in a typical bonded adsl stack
yesterday. The complexity of it all makes my head hurt. I'm happy that
a newly BQL'd ethernet driver (for the geos and qemu) emerged from it,
which he submitted to netdev...
I made a recording of us last night discussing the layers, which I
will produce and distribute later...
Anyway, along the way, we fiddled a lot with trying to analyze where
the 350ms or so of added latency was coming from in the traverse geo's
adsl implementation and overlying stack....
Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
Note: 1:
The netperf sample rate on the rrul test needs to be higher than
100ms in order to get a decent result at sub 10Mbit speeds.
Note 2:
The two nicest graphs here are nofq.svg vs fq.svg, which were taken on
a gigE link from a Mac running Linux to another gigE link. (in other
words, NOT on the friggin adsl link) (firefox can display svg, I don't
know what else) I find the T+10 delay before stream start in the
fq.svg graph suspicious and think the "throw out the outlier" code in
the netperf-wrapper code is at fault. Prior to that, codel is merely
buffering up things madly, which can also be seen in the pfifo_fast
behavior, with 1000pkts it's default.
(Arguably, the default queue length in codel can be reduced from 10k
packets to something more reasonable at GigE speeds)
(the indicator that it's the graph, not the reality, is that the
fq.svg pings and udp start at T+5 and grow minimally, as is usual with
fq_codel.)
As for the *.ps graphs, well, they would take david's network topology
to explain, and were conducted over a variety of circumstances,
including wifi, with more variables in play than I care to think
about.
We didn't really get anywhere on digging deeper. As we got to purer
tests - with a minimal number of boxes, running pure ethernet,
switched over a couple of switches, even in the simplest two box case,
my HTB based "ceroshaper" implementation had multiple problems in
cutting median latencies below 100ms, on this very slow ADSL link.
David suspects problems on the path along the carrier backbone as a
potential issue, and the only way to measure that is with two one way
trip time measurements (rather than rtt), time synced via ntp... I
keep hoping to find a rtp test, but I'm open to just about any option
at this point. anyone?
We also found a probable bug in mtr in that multiple mtrs on the same
box don't co-exist.
Moving back to more scientific clarity and simpler tests...
The two graphs, taken a few weeks back, on pages 5 and 6 of this:
http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Bufferbloat_on_wifi.pdf
appear to show the advantage of fq_codel fq + codel + head drop over
tail drop during the slow start period on a 10Mbit link - (see how
squiggly slow start is on pfifo fast?) as well as the marvelous
interstream latency that can be achieved with BQL=3000 (on a 10 mbit
link.) Even that latency can be halved by reducing BQL to 1500, which
is just fine on a 10mbit. Below those rates I'd like to be rid of BQL
entirely, and just have a single packet outstanding... in everything
from adsl to cable...
That said, I'd welcome other explanations of the squiggly slowstart
pfifo_fast behavior before I put that explanation on the slide.... ECN
was in play here, too. I can redo this test easily, it's basically
running a netperf TCP_RR for 70 seconds, and starting up a TCP_MAERTS
and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
limit and the link speeds on two sides of a directly connected laptop
connection.
ethtool -s eth0 advertise 0x002 # 10 Mbit
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-23 8:57 ` [Codel] FQ_Codel lwn draft article review Dave Taht
@ 2012-11-23 22:18 ` Paul E. McKenney
2012-11-24 0:07 ` Toke Høiland-Jørgensen
2012-11-27 22:03 ` [Codel] [Cerowrt-devel] " Jim Gettys
0 siblings, 2 replies; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-23 22:18 UTC (permalink / raw)
To: Dave Taht
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
[-- Attachment #1: Type: text/plain, Size: 4875 bytes --]
On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> David Woodhouse and I fiddled a lot with adsl and openwrt and a
> variety of drivers and network layers in a typical bonded adsl stack
> yesterday. The complexity of it all makes my head hurt. I'm happy that
> a newly BQL'd ethernet driver (for the geos and qemu) emerged from it,
> which he submitted to netdev...
Cool!!! ;-)
> I made a recording of us last night discussing the layers, which I
> will produce and distribute later...
>
> Anyway, along the way, we fiddled a lot with trying to analyze where
> the 350ms or so of added latency was coming from in the traverse geo's
> adsl implementation and overlying stack....
>
> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
>
> Note: 1:
>
> The netperf sample rate on the rrul test needs to be higher than
> 100ms in order to get a decent result at sub 10Mbit speeds.
>
> Note 2:
>
> The two nicest graphs here are nofq.svg vs fq.svg, which were taken on
> a gigE link from a Mac running Linux to another gigE link. (in other
> words, NOT on the friggin adsl link) (firefox can display svg, I don't
> know what else) I find the T+10 delay before stream start in the
> fq.svg graph suspicious and think the "throw out the outlier" code in
> the netperf-wrapper code is at fault. Prior to that, codel is merely
> buffering up things madly, which can also be seen in the pfifo_fast
> behavior, with 1000pkts it's default.
I am using these two in a new "Effectiveness of FQ-CoDel" section.
Chrome can display .svg, and if it becomes a problem, I am sure that
they can be converted. Please let me know if some other data would
make the point better.
I am assuming that the colored throughput spikes are due to occasional
packet losses. Please let me know if this interpretation is overly naive.
Also, I know what ICMP is, but the UDP variants are new to me. Could
you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> (Arguably, the default queue length in codel can be reduced from 10k
> packets to something more reasonable at GigE speeds)
>
> (the indicator that it's the graph, not the reality, is that the
> fq.svg pings and udp start at T+5 and grow minimally, as is usual with
> fq_codel.)
All sessions were started at T+5, then?
> As for the *.ps graphs, well, they would take david's network topology
> to explain, and were conducted over a variety of circumstances,
> including wifi, with more variables in play than I care to think
> about.
>
> We didn't really get anywhere on digging deeper. As we got to purer
> tests - with a minimal number of boxes, running pure ethernet,
> switched over a couple of switches, even in the simplest two box case,
> my HTB based "ceroshaper" implementation had multiple problems in
> cutting median latencies below 100ms, on this very slow ADSL link.
> David suspects problems on the path along the carrier backbone as a
> potential issue, and the only way to measure that is with two one way
> trip time measurements (rather than rtt), time synced via ntp... I
> keep hoping to find a rtp test, but I'm open to just about any option
> at this point. anyone?
>
> We also found a probable bug in mtr in that multiple mtrs on the same
> box don't co-exist.
I must confess that I am not seeing all that clear a difference between
the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better latencies
for FQ-CoDel, but not unambiguously so.
> Moving back to more scientific clarity and simpler tests...
>
> The two graphs, taken a few weeks back, on pages 5 and 6 of this:
>
> http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Bufferbloat_on_wifi.pdf
>
> appear to show the advantage of fq_codel fq + codel + head drop over
> tail drop during the slow start period on a 10Mbit link - (see how
> squiggly slow start is on pfifo fast?) as well as the marvelous
> interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> link.) Even that latency can be halved by reducing BQL to 1500, which
> is just fine on a 10mbit. Below those rates I'd like to be rid of BQL
> entirely, and just have a single packet outstanding... in everything
> from adsl to cable...
>
> That said, I'd welcome other explanations of the squiggly slowstart
> pfifo_fast behavior before I put that explanation on the slide.... ECN
> was in play here, too. I can redo this test easily, it's basically
> running a netperf TCP_RR for 70 seconds, and starting up a TCP_MAERTS
> and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> limit and the link speeds on two sides of a directly connected laptop
> connection.
I must defer to others on this one. I do note the much lower latencies
on slide 6 compared to slide 5, though.
Please see attached for update including .git directory.
Thanx, Paul
> ethtool -s eth0 advertise 0x002 # 10 Mbit
>
[-- Attachment #2: SFQ2012.11.23a.tgz --]
[-- Type: application/x-gtar-compressed, Size: 893092 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-23 22:18 ` Paul E. McKenney
@ 2012-11-24 0:07 ` Toke Høiland-Jørgensen
2012-11-24 16:19 ` Dave Taht
` (2 more replies)
2012-11-27 22:03 ` [Codel] [Cerowrt-devel] " Jim Gettys
1 sibling, 3 replies; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-11-24 0:07 UTC (permalink / raw)
To: paulmck
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat, John Crispin
[-- Attachment #1.1: Type: text/plain, Size: 2413 bytes --]
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> I am using these two in a new "Effectiveness of FQ-CoDel" section.
> Chrome can display .svg, and if it becomes a problem, I am sure that
> they can be converted. Please let me know if some other data would
> make the point better.
If you are just trying to show the "ideal" effectiveness of fq_codel,
two attached graphs are from some old tests we did at the UDS showing a
simple ethernet link between two laptops with a single stream going in
each direction. This is of course by no means a real-world test, but on
the other hand they show a very visible factor ~4 improvement in
latency.
These are the same graphs Dave used in his slides, but also in a 100mbit
version.
> Also, I know what ICMP is, but the UDP variants are new to me. Could
> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
The UDP ping times are simply roundtrips/second (as measured by netperf)
converted to ping times. The acronyms are diffserv markings, i.e.
EF=expedited forwarding, BK=bulk (CS1 marking), BE=best effort (no
marking). The UDP ping tests tend to not work so well on a loaded link,
however, since netperf stops sending packets after detecting
(excessive(?)) loss. Which is why you see only see the UDP ping times on
the first part of the graph.
The markings are also used on the TCP flows, as seen in the legend for
the up/downloads.
> All sessions were started at T+5, then?
The pings start right away, the transfers start at T+5 seconds. Looks
like the first ~five seconds of transfer is being cut off on those
graphs. I think what happens is that one of the streams (the turquoise
one) starts up faster than the other ones, consuming all the bandwidth
for the first couple of seconds until they adjust to the same level.
These initial values are then scaled off the graph as outlier values. If
you zoom in on the beginning of the graph you can see the turquoise line
coming down from far off the scale in one direction, while the rest come
From off the bottom.
> Please see attached for update including .git directory.
I got a little lost in all the lists of SFQ, but other than that I found
it quite readable. The diagrams of the queuing algorithms are a tad big,
though, I think. :)
When is the article going to be published?
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #1.2: 10mbit fq_codel --]
[-- Type: image/png, Size: 79395 bytes --]
[-- Attachment #1.3: 10mbit pfifo_fast --]
[-- Type: image/png, Size: 82895 bytes --]
[-- Attachment #1.4: 100mbit fq_codel --]
[-- Type: image/png, Size: 63717 bytes --]
[-- Attachment #1.5: 100mbit pfifo_fast --]
[-- Type: image/png, Size: 68232 bytes --]
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-24 0:07 ` Toke Høiland-Jørgensen
@ 2012-11-24 16:19 ` Dave Taht
2012-11-24 16:36 ` [Codel] [Cerowrt-devel] " dpreed
` (3 more replies)
2012-11-26 17:20 ` [Codel] " Paul E. McKenney
2012-11-26 21:05 ` Rick Jones
2 siblings, 4 replies; 56+ messages in thread
From: Dave Taht @ 2012-11-24 16:19 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat,
paulmck, John Crispin
On Sat, Nov 24, 2012 at 1:07 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>
>> I am using these two in a new "Effectiveness of FQ-CoDel" section.
>> Chrome can display .svg, and if it becomes a problem, I am sure that
>> they can be converted. Please let me know if some other data would
>> make the point better.
My primary focus has been on making the kind of internet over a
billion people have, function better, that with <10Mbit uplinks. While
it's nice to show an improvement on 100Mbit, gigE and higher, I'd
rather talk to the 10Mbit and below cases whenever possible.
>
> If you are just trying to show the "ideal" effectiveness of fq_codel,
> two attached graphs are from some old tests we did at the UDS showing a
> simple ethernet link between two laptops with a single stream going in
> each direction. This is of course by no means a real-world test, but on
> the other hand they show a very visible factor ~4 improvement in
> latency.
>
> These are the same graphs Dave used in his slides, but also in a 100mbit
> version.
As noted above, 10Mbit is better to show. Secondly, in looking over
the 10Mbit graph, I realized that we could also keep injecting new
tcps at intervals of every 5 seconds, for shorter periods, to observe
what happens.
And more importantly, I'd like to avoid falling into the trap that so
much network research falls into, which is blithely benchmarking lots
of long duration TCP traffic,
rather than the kinds of network traffic we actually see in the real
world. A real world web page might have a hundred or more dns lookups
and a hundred tcp streams, the vast majority of which are so short as
to not get out of slow start.
Now - seeing/measuring/graphing that - is *hard* - which is why it is
so rarely done. Because it's hard, but accurately measures the real
world, says it should be done.
However, I can see leveraging the clean 10Mbit trace or a (better)
asymmetric 24/5.5 case, and while pounding it with the existing,
simple code for 1 full rate up, 1 full rate down, and a CIR stream for
voice - impacting that plot with chrome web page benchmark or
something similar.
Indirectly observing the web load effects on that graph, while timing
web page completion, would be good, when comparing pfifo_fast and
various aqm variants.
>> Also, I know what ICMP is, but the UDP variants are new to me. Could
>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>
> The UDP ping times are simply roundtrips/second (as measured by netperf)
> converted to ping times. The acronyms are diffserv markings, i.e.
> EF=expedited forwarding, BK=bulk (CS1 marking), BE=best effort (no
> marking).
The classification tests are in there for a number of reasons.
0) I needed multiple streams in the test anyway.
1) Many people keep insisting that classification can work. It
doesn't. It never has. Not over the wild and wooly internet. It only
rarely does any good at all even on internal networks. It sometimes
works on some kinds of udp streams, but that's it. The bulk of the
problem is the massive packet streams modern offloads generate, and
breaking those up, everywhere possible, any time possible.
I had put up a graph last week, that showed each classification bucket
for a tcp stream being totally ignored...
2) Theoretically wireless 802.11e SHOULD respect classification. In
fact, it does, on the ath9k, to a large extent. However, on the iwl I
have, BE, BK traffic get completely starved by VO, and VI traffic,
which is something of a bug. I'm certain that due to inadaquate
testing, 802.11e classification is largely broken in the field, and
I'd hoped this test would bring that out to more people.
3) I don't mind at an effort to make classification work, particularly
for traffic clearly marked background, such as bittorrent often is.
Perhaps this is an opportunity to get IPv6 done up right, as it seems
the diffserv bits are much more rarely fiddled with in transit.
> The UDP ping tests tend to not work so well on a loaded link,
> however, since netperf stops sending packets after detecting
> (excessive(?)) loss. Which is why you see only see the UDP ping times on
> the first part of the graph.
Netperf stops UDP_STREAM exchanges after the first lost udp packet.
This is not helpful.
I keep noting that the next phase of the rrul development is to find a
good pair of CIR one way measurements that look a bit like voip.
Either that test can get added to netperf or we use another tool, or
we create one, and I keep hoping for recommendations from various
people on this list. Come on, something like this
exists? Anybody?
Another reason for a UDP based voip-like ping test is that icmp is
frequently handled differently than other sorts of streams.
A TCP based ping test used to be in there (and should go back) as it
shows the impact of packet loss on TCP behavior. (that said, the
TCP_RR test is roughly equivalent)
After staring at the tons of data collected over the past year, on
wifi, I'm willing to strongly suggest we just drop TCP packets after
500ms in the wifi stack, period, as that exceeds the round trip
timeout...
> The markings are also used on the TCP flows, as seen in the legend for
> the up/downloads.
>
>> All sessions were started at T+5, then?
>
> The pings start right away, the transfers start at T+5 seconds. Looks
> like the first ~five seconds of transfer is being cut off on those
> graphs.
Ramping up to 10K packets is silly at gigE, and looks like an outlier.
> I think what happens is that one of the streams (the turquoise
> one) starts up faster than the other ones, consuming all the bandwidth
> for the first couple of seconds until they adjust to the same level.
I'm not willing to draw this conclusion from this graph, and need
to/would like someone else to/ setup a test in a controlled
environment. the wrapper scripts
can dump the raw data and I can manually plot using gnuplot or a
spreadsheet, but it's tedious...
> These initial values are then scaled off the graph as outlier values.
Huge need for cdf plots and to present the outliers. In fact I'd like
graphs that just presented the outliers. Another way to approach it
would be, instead of creating static graphs, to use something like the
ds3.js and incorporate the ability to zoom
in, around, and so on, on multiple data sets. Or leverage mlab's tools.
I am no better at javascript than python.
> If
> you zoom in on the beginning of the graph you can see the turquoise line
> coming down from far off the scale in one direction, while the rest come
> From off the bottom.
Not willing to draw any conclusions. I am.
>> Please see attached for update including .git directory.
>
> I got a little lost in all the lists of SFQ, but other than that I found
> it quite readable. The diagrams of the queuing algorithms are a tad big,
> though, I think. :)
I would like to take some serious time to make them better. I'm
graphically hopeless, however I know what I like, and a picture does
tell a thousand words.
>
> When is the article going to be published?
Well, jon strongly indicated he'd take an article, and I told him that
once I found a theme, co-authors, and time, I'd talk to him again. We
seem to be making rapid progress due to paul stepping up and your
graphing tools.
So as for publication: when it's done, would be my guess! I would like
this to be the best presentation, possible, and also address some FUD
spread by the recent Cisco PIE presentation.
That said, I do feel the need for formal publication in a dead-tree
journal somewhere, which could talk to some of the interesting stuff
like beating tcp global synchronization (finally), and the RTT info,
and maybe also explore the few known flaws of fq_codel...
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-24 16:19 ` Dave Taht
@ 2012-11-24 16:36 ` dpreed
2012-11-24 19:57 ` [Codel] " Andrew McGregor
` (2 subsequent siblings)
3 siblings, 0 replies; 56+ messages in thread
From: dpreed @ 2012-11-24 16:36 UTC (permalink / raw)
To: Dave Taht
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 10628 bytes --]
All the points below make sense. Ideally you want to measure the TCP FQ Codel interaction in the "real world". Throughput benchmarks are irrelevant, the equivalent of Hot Rod amateur dragstrip competitions among cars that cannot even turn corners.
Beyond being hard, there is no "agreed upon" standard for testing "real world" performance - which is why academics who care little about anything other than publishing go for the "Hot Rod" stuff.
In your lwn posting, I think it is worth pointing out that "wrongheaded benchmarks" were exactly what drove the folks who created the bufferbloat problem in the first place. And those people are still alive and kicking (in the wrong direction). But that's how you get tenure.
The other issue is "KISS". I would *seriously* suggest that the idea of "classification" not get too entangled with the problem at this point.
Classification has many downsides, most of which will just confuse the inventors, adding what is probably an unnecessarily complex space of design alternatives. If you must discuss classification (which is another academic wet dream), discuss it as "future research".
Two classes (latency critical, and latency as short as possible) should be enough in a network that for "control loop" reasons wants to have minimal control latencies *all of the time*. I'm not sure that two is the desired state - I tend to think 1 class is better on an end-to-end basis.
If you want to stabilize things with faster control loops, just order all queues by "packet entry" timestamps, and move ECN-style marking towards "head-marking" - that is signaling congestion in packets that are being transmitted if any packets are queued behind them.
That creates the most responsive control loops possible on an end-to-end basis for TCP and other congestion-managing protocols.
-----Original Message-----
From: "Dave Taht" <dave.taht@gmail.com>
Sent: Saturday, November 24, 2012 11:19am
To: "Toke Høiland-Jørgensen" <toke@toke.dk>
Cc: "Paolo Valente" <paolo.valente@unimore.it>, "Eric Raymond" <esr@thyrsus.com>, codel@lists.bufferbloat.net, cerowrt-devel@lists.bufferbloat.net, "bloat" <bloat@lists.bufferbloat.net>, paulmck@linux.vnet.ibm.com, "David Woodhouse" <dwmw2@infradead.org>, "John Crispin" <blogic@openwrt.org>
Subject: Re: [Cerowrt-devel] FQ_Codel lwn draft article review
On Sat, Nov 24, 2012 at 1:07 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>
>> I am using these two in a new "Effectiveness of FQ-CoDel" section.
>> Chrome can display .svg, and if it becomes a problem, I am sure that
>> they can be converted. Please let me know if some other data would
>> make the point better.
My primary focus has been on making the kind of internet over a
billion people have, function better, that with <10Mbit uplinks. While
it's nice to show an improvement on 100Mbit, gigE and higher, I'd
rather talk to the 10Mbit and below cases whenever possible.
>
> If you are just trying to show the "ideal" effectiveness of fq_codel,
> two attached graphs are from some old tests we did at the UDS showing a
> simple ethernet link between two laptops with a single stream going in
> each direction. This is of course by no means a real-world test, but on
> the other hand they show a very visible factor ~4 improvement in
> latency.
>
> These are the same graphs Dave used in his slides, but also in a 100mbit
> version.
As noted above, 10Mbit is better to show. Secondly, in looking over
the 10Mbit graph, I realized that we could also keep injecting new
tcps at intervals of every 5 seconds, for shorter periods, to observe
what happens.
And more importantly, I'd like to avoid falling into the trap that so
much network research falls into, which is blithely benchmarking lots
of long duration TCP traffic,
rather than the kinds of network traffic we actually see in the real
world. A real world web page might have a hundred or more dns lookups
and a hundred tcp streams, the vast majority of which are so short as
to not get out of slow start.
Now - seeing/measuring/graphing that - is *hard* - which is why it is
so rarely done. Because it's hard, but accurately measures the real
world, says it should be done.
However, I can see leveraging the clean 10Mbit trace or a (better)
asymmetric 24/5.5 case, and while pounding it with the existing,
simple code for 1 full rate up, 1 full rate down, and a CIR stream for
voice - impacting that plot with chrome web page benchmark or
something similar.
Indirectly observing the web load effects on that graph, while timing
web page completion, would be good, when comparing pfifo_fast and
various aqm variants.
>> Also, I know what ICMP is, but the UDP variants are new to me. Could
>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>
> The UDP ping times are simply roundtrips/second (as measured by netperf)
> converted to ping times. The acronyms are diffserv markings, i.e.
> EF=expedited forwarding, BK=bulk (CS1 marking), BE=best effort (no
> marking).
The classification tests are in there for a number of reasons.
0) I needed multiple streams in the test anyway.
1) Many people keep insisting that classification can work. It
doesn't. It never has. Not over the wild and wooly internet. It only
rarely does any good at all even on internal networks. It sometimes
works on some kinds of udp streams, but that's it. The bulk of the
problem is the massive packet streams modern offloads generate, and
breaking those up, everywhere possible, any time possible.
I had put up a graph last week, that showed each classification bucket
for a tcp stream being totally ignored...
2) Theoretically wireless 802.11e SHOULD respect classification. In
fact, it does, on the ath9k, to a large extent. However, on the iwl I
have, BE, BK traffic get completely starved by VO, and VI traffic,
which is something of a bug. I'm certain that due to inadaquate
testing, 802.11e classification is largely broken in the field, and
I'd hoped this test would bring that out to more people.
3) I don't mind at an effort to make classification work, particularly
for traffic clearly marked background, such as bittorrent often is.
Perhaps this is an opportunity to get IPv6 done up right, as it seems
the diffserv bits are much more rarely fiddled with in transit.
> The UDP ping tests tend to not work so well on a loaded link,
> however, since netperf stops sending packets after detecting
> (excessive(?)) loss. Which is why you see only see the UDP ping times on
> the first part of the graph.
Netperf stops UDP_STREAM exchanges after the first lost udp packet.
This is not helpful.
I keep noting that the next phase of the rrul development is to find a
good pair of CIR one way measurements that look a bit like voip.
Either that test can get added to netperf or we use another tool, or
we create one, and I keep hoping for recommendations from various
people on this list. Come on, something like this
exists? Anybody?
Another reason for a UDP based voip-like ping test is that icmp is
frequently handled differently than other sorts of streams.
A TCP based ping test used to be in there (and should go back) as it
shows the impact of packet loss on TCP behavior. (that said, the
TCP_RR test is roughly equivalent)
After staring at the tons of data collected over the past year, on
wifi, I'm willing to strongly suggest we just drop TCP packets after
500ms in the wifi stack, period, as that exceeds the round trip
timeout...
> The markings are also used on the TCP flows, as seen in the legend for
> the up/downloads.
>
>> All sessions were started at T+5, then?
>
> The pings start right away, the transfers start at T+5 seconds. Looks
> like the first ~five seconds of transfer is being cut off on those
> graphs.
Ramping up to 10K packets is silly at gigE, and looks like an outlier.
> I think what happens is that one of the streams (the turquoise
> one) starts up faster than the other ones, consuming all the bandwidth
> for the first couple of seconds until they adjust to the same level.
I'm not willing to draw this conclusion from this graph, and need
to/would like someone else to/ setup a test in a controlled
environment. the wrapper scripts
can dump the raw data and I can manually plot using gnuplot or a
spreadsheet, but it's tedious...
> These initial values are then scaled off the graph as outlier values.
Huge need for cdf plots and to present the outliers. In fact I'd like
graphs that just presented the outliers. Another way to approach it
would be, instead of creating static graphs, to use something like the
ds3.js and incorporate the ability to zoom
in, around, and so on, on multiple data sets. Or leverage mlab's tools.
I am no better at javascript than python.
> If
> you zoom in on the beginning of the graph you can see the turquoise line
> coming down from far off the scale in one direction, while the rest come
> From off the bottom.
Not willing to draw any conclusions. I am.
>> Please see attached for update including .git directory.
>
> I got a little lost in all the lists of SFQ, but other than that I found
> it quite readable. The diagrams of the queuing algorithms are a tad big,
> though, I think. :)
I would like to take some serious time to make them better. I'm
graphically hopeless, however I know what I like, and a picture does
tell a thousand words.
>
> When is the article going to be published?
Well, jon strongly indicated he'd take an article, and I told him that
once I found a theme, co-authors, and time, I'd talk to him again. We
seem to be making rapid progress due to paul stepping up and your
graphing tools.
So as for publication: when it's done, would be my guess! I would like
this to be the best presentation, possible, and also address some FUD
spread by the recent Cisco PIE presentation.
That said, I do feel the need for formal publication in a dead-tree
journal somewhere, which could talk to some of the interesting stuff
like beating tcp global synchronization (finally), and the RTT info,
and maybe also explore the few known flaws of fq_codel...
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel
[-- Attachment #2: Type: text/html, Size: 12450 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-24 16:19 ` Dave Taht
2012-11-24 16:36 ` [Codel] [Cerowrt-devel] " dpreed
@ 2012-11-24 19:57 ` Andrew McGregor
2012-11-26 21:13 ` Rick Jones
2012-11-26 22:16 ` Toke Høiland-Jørgensen
3 siblings, 0 replies; 56+ messages in thread
From: Andrew McGregor @ 2012-11-24 19:57 UTC (permalink / raw)
To: Dave Taht
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, paulmck, John Crispin
On 25/11/2012, at 5:19 AM, Dave Taht <dave.taht@gmail.com> wrote:
> On Sat, Nov 24, 2012 at 1:07 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>>
>
> Indirectly observing the web load effects on that graph, while timing
> web page completion, would be good, when comparing pfifo_fast and
> various aqm variants.
Indeed
>>> Also, I know what ICMP is, but the UDP variants are new to me. Could
>>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>>
>> The UDP ping times are simply roundtrips/second (as measured by netperf)
>> converted to ping times. The acronyms are diffserv markings, i.e.
>> EF=expedited forwarding, BK=bulk (CS1 marking), BE=best effort (no
>> marking).
>
> The classification tests are in there for a number of reasons.
>
> 0) I needed multiple streams in the test anyway.
>
> 1) Many people keep insisting that classification can work. It
> doesn't. It never has. Not over the wild and wooly internet. It only
> rarely does any good at all even on internal networks. It sometimes
> works on some kinds of udp streams, but that's it. The bulk of the
> problem is the massive packet streams modern offloads generate, and
> breaking those up, everywhere possible, any time possible.
>
> I had put up a graph last week, that showed each classification bucket
> for a tcp stream being totally ignored...
>
> 2) Theoretically wireless 802.11e SHOULD respect classification. In
> fact, it does, on the ath9k, to a large extent. However, on the iwl I
> have, BE, BK traffic get completely starved by VO, and VI traffic,
> which is something of a bug. I'm certain that due to inadaquate
> testing, 802.11e classification is largely broken in the field, and
> I'd hoped this test would bring that out to more people.
802.11e doesn't prevent a station from starving itself, nor does it help the AP at all when there is contending traffic to deliver to the same station... all it does for you is prevent one station with high priority traffic to send/receive from getting completely starved by another station with low priority. It's not at all a complete solution, and we need something like the mythical mfq_codel to sort out the rest.
> 3) I don't mind at an effort to make classification work, particularly
> for traffic clearly marked background, such as bittorrent often is.
> Perhaps this is an opportunity to get IPv6 done up right, as it seems
> the diffserv bits are much more rarely fiddled with in transit
Doesn't look that hard, to be honest.
>> The UDP ping tests tend to not work so well on a loaded link,
>> however, since netperf stops sending packets after detecting
>> (excessive(?)) loss. Which is why you see only see the UDP ping times on
>> the first part of the graph.
>
> Netperf stops UDP_STREAM exchanges after the first lost udp packet.
> This is not helpful.
>
> I keep noting that the next phase of the rrul development is to find a
> good pair of CIR one way measurements that look a bit like voip.
> Either that test can get added to netperf or we use another tool, or
> we create one, and I keep hoping for recommendations from various
> people on this list. Come on, something like this
> exists? Anybody?
nmap -PU?
>> I think what happens is that one of the streams (the turquoise
>> one) starts up faster than the other ones, consuming all the bandwidth
>> for the first couple of seconds until they adjust to the same level.
>
> I'm not willing to draw this conclusion from this graph, and need
> to/would like someone else to/ setup a test in a controlled
> environment. the wrapper scripts
> can dump the raw data and I can manually plot using gnuplot or a
> spreadsheet, but it's tedious...
I may have some code that will help here, including CDFs and a rarely seen in the wild exponential weighted moving variance.
>> These initial values are then scaled off the graph as outlier values.
>
> Huge need for cdf plots and to present the outliers. In fact I'd like
> graphs that just presented the outliers. Another way to approach it
> would be, instead of creating static graphs, to use something like the
> ds3.js and incorporate the ability to zoom
> in, around, and so on, on multiple data sets. Or leverage mlab's tools.
>
> I am no better at javascript than python.
Run interactively, python matplotlib stuff lets you zoom. I don't know if that can be made into a zoomable web page though.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-24 0:07 ` Toke Høiland-Jørgensen
2012-11-24 16:19 ` Dave Taht
@ 2012-11-26 17:20 ` Paul E. McKenney
2012-11-26 21:05 ` Rick Jones
2 siblings, 0 replies; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-26 17:20 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat, John Crispin
On Sat, Nov 24, 2012 at 01:07:04AM +0100, Toke Høiland-Jørgensen wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>
> > I am using these two in a new "Effectiveness of FQ-CoDel" section.
> > Chrome can display .svg, and if it becomes a problem, I am sure that
> > they can be converted. Please let me know if some other data would
> > make the point better.
>
> If you are just trying to show the "ideal" effectiveness of fq_codel,
> two attached graphs are from some old tests we did at the UDS showing a
> simple ethernet link between two laptops with a single stream going in
> each direction. This is of course by no means a real-world test, but on
> the other hand they show a very visible factor ~4 improvement in
> latency.
>
> These are the same graphs Dave used in his slides, but also in a 100mbit
> version.
>
> > Also, I know what ICMP is, but the UDP variants are new to me. Could
> > you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>
> The UDP ping times are simply roundtrips/second (as measured by netperf)
> converted to ping times. The acronyms are diffserv markings, i.e.
> EF=expedited forwarding, BK=bulk (CS1 marking), BE=best effort (no
> marking). The UDP ping tests tend to not work so well on a loaded link,
> however, since netperf stops sending packets after detecting
> (excessive(?)) loss. Which is why you see only see the UDP ping times on
> the first part of the graph.
>
> The markings are also used on the TCP flows, as seen in the legend for
> the up/downloads.
>
> > All sessions were started at T+5, then?
>
> The pings start right away, the transfers start at T+5 seconds. Looks
> like the first ~five seconds of transfer is being cut off on those
> graphs. I think what happens is that one of the streams (the turquoise
> one) starts up faster than the other ones, consuming all the bandwidth
> for the first couple of seconds until they adjust to the same level.
> These initial values are then scaled off the graph as outlier values. If
> you zoom in on the beginning of the graph you can see the turquoise line
> coming down from far off the scale in one direction, while the rest come
> From off the bottom.
>
> > Please see attached for update including .git directory.
>
> I got a little lost in all the lists of SFQ, but other than that I found
> it quite readable. The diagrams of the queuing algorithms are a tad big,
> though, I think. :)
Thank you, I have shrunk the figures and added the acronym expansions.
Thanx, Paul
> When is the article going to be published?
>
> -Toke
>
> --
> Toke Høiland-Jørgensen
> toke@toke.dk
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-24 0:07 ` Toke Høiland-Jørgensen
2012-11-24 16:19 ` Dave Taht
2012-11-26 17:20 ` [Codel] " Paul E. McKenney
@ 2012-11-26 21:05 ` Rick Jones
2012-11-26 23:18 ` [Codel] [Bloat] " Rick Jones
2 siblings, 1 reply; 56+ messages in thread
From: Rick Jones @ 2012-11-26 21:05 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat,
paulmck, John Crispin
On 11/23/2012 04:07 PM, Toke Høiland-Jørgensen wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>> Also, I know what ICMP is, but the UDP variants are new to me. Could
>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>
> The UDP ping times are simply roundtrips/second (as measured by netperf)
> converted to ping times. The acronyms are diffserv markings, i.e.
> EF=expedited forwarding, BK=bulk (CS1 marking), BE=best effort (no
> marking). The UDP ping tests tend to not work so well on a loaded link,
> however, since netperf stops sending packets after detecting
> (excessive(?)) loss. Which is why you see only see the UDP ping times on
> the first part of the graph.
In a "classic" netperf UDP_RR test, where there is only one
request/response (transaction) in flight at one time, the test will come
to a halt on the first packet loss - of either a request or a response.
Netperf has no retransmission mechanism for UDP.
If one is using "burst mode" then the test will continue so long as
there is at least one "transaction" outstanding. However, one cannot
then simply invert transactions per second to get seconds per transaction.
That is why I tend to use TCP_RR - the retransmission mechanism of TCP
will kick-in to keep the test going.
In theory, netperf could be tweaked to set SO_RCVTIMEO at some
high-but-not-too-high level (from the command line?). It could then
keep the test limping along I suppose (with gaps), but I don't want
anything terribly complicated going-on in netperf - otherwise one might
as well use TCP_RR anyway.
rick jones
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-24 16:19 ` Dave Taht
2012-11-24 16:36 ` [Codel] [Cerowrt-devel] " dpreed
2012-11-24 19:57 ` [Codel] " Andrew McGregor
@ 2012-11-26 21:13 ` Rick Jones
2012-11-26 21:19 ` Dave Taht
2012-11-26 22:16 ` Toke Høiland-Jørgensen
3 siblings, 1 reply; 56+ messages in thread
From: Rick Jones @ 2012-11-26 21:13 UTC (permalink / raw)
To: Dave Taht
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, paulmck, John Crispin
On 11/24/2012 08:19 AM, Dave Taht wrote:
> On Sat, Nov 24, 2012 at 1:07 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>> The UDP ping tests tend to not work so well on a loaded link,
>> however, since netperf stops sending packets after detecting
>> (excessive(?)) loss. Which is why you see only see the UDP ping times on
>> the first part of the graph.
>
> Netperf stops UDP_STREAM exchanges after the first lost udp packet.
The UDP_STREAM test will keep blasting along until the end-of-test timer
fires. It is the non-burst-mode UDP_RR test which comes to a halt on
the first lost datagram.
> After staring at the tons of data collected over the past year, on
> wifi, I'm willing to strongly suggest we just drop TCP packets after
> 500ms in the wifi stack, period, as that exceeds the round trip
> timeout...
How does WiFi "know" what the TCP RTO for a given flow happens to be?
There is no 500 millisecond ceiling on the TCP RTO.
rick jones
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-26 21:13 ` Rick Jones
@ 2012-11-26 21:19 ` Dave Taht
0 siblings, 0 replies; 56+ messages in thread
From: Dave Taht @ 2012-11-26 21:19 UTC (permalink / raw)
To: Rick Jones
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, paulmck, John Crispin
On Mon, Nov 26, 2012 at 10:13 PM, Rick Jones <rick.jones2@hp.com> wrote:
> On 11/24/2012 08:19 AM, Dave Taht wrote:
>>
>> On Sat, Nov 24, 2012 at 1:07 AM, Toke Høiland-Jørgensen <toke@toke.dk>
>> wrote:
>>>
>>> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>>> The UDP ping tests tend to not work so well on a loaded link,
>>> however, since netperf stops sending packets after detecting
>>> (excessive(?)) loss. Which is why you see only see the UDP ping times on
>>> the first part of the graph.
>>
>>
>> Netperf stops UDP_STREAM exchanges after the first lost udp packet.
>
>
> The UDP_STREAM test will keep blasting along until the end-of-test timer
> fires. It is the non-burst-mode UDP_RR test which comes to a halt on the
> first lost datagram.
>
>
>> After staring at the tons of data collected over the past year, on
>> wifi, I'm willing to strongly suggest we just drop TCP packets after
>> 500ms in the wifi stack, period, as that exceeds the round trip
>> timeout...
>
>
> How does WiFi "know" what the TCP RTO for a given flow happens to be? There
> is no 500 millisecond ceiling on the TCP RTO.
the lightspeed equivalent of 1 and half times around the planet is
enough time to spend inside of one computer.
As for the RTO, you're right... sorta.
http://tools.ietf.org/html/rfc6298
But I cannot see any harm in wifi, in simply dropping > 500ms old
packets, in the general case, and a lot of potential good.
>
> rick jones
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-24 16:19 ` Dave Taht
` (2 preceding siblings ...)
2012-11-26 21:13 ` Rick Jones
@ 2012-11-26 22:16 ` Toke Høiland-Jørgensen
2012-11-26 23:21 ` Toke Høiland-Jørgensen
3 siblings, 1 reply; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-11-26 22:16 UTC (permalink / raw)
To: Dave Taht
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat,
paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 1600 bytes --]
Dave Taht <dave.taht@gmail.com> writes:
> I keep noting that the next phase of the rrul development is to find a
> good pair of CIR one way measurements that look a bit like voip.
> Either that test can get added to netperf or we use another tool, or
> we create one, and I keep hoping for recommendations from various
> people on this list. Come on, something like this exists? Anybody?
I came across this in the iperf documentation:
"Jitter calculations are continuously computed by the server, as
specified by RTP in RFC 1889. The client records a 64 bit
second/microsecond timestamp in the packet. The server computes the
relative transit time as (server's receive time - client's send time).
The client's and server's clocks do not need to be synchronized; any
difference is subtracted out in the jitter calculation. Jitter is the
smoothed mean of differences between consecutive transit times."
http://iperf.fr/#tuningudp
Iperf seems to output jitter measurements on the *server* side when
doing UDP transfers. So incorporating this into netperf-wrapper would
require either some way to notify a server to start up an iperf instance
sending to the client (and finding some way to persuade firewalls/NATs
on the way to let the packets through), or create a server-side wrapper
that monitors the server-side output and sends it to the client on
request via some sort of rpc.
The latter should be pretty straight forward, I suppose. And if I recall
correctly, you did want to measure the upstream jitter?
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] FQ_Codel lwn draft article review
2012-11-26 21:05 ` Rick Jones
@ 2012-11-26 23:18 ` Rick Jones
0 siblings, 0 replies; 56+ messages in thread
From: Rick Jones @ 2012-11-26 23:18 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, codel, cerowrt-devel, bloat, paulmck, John Crispin
On 11/26/2012 01:05 PM, Rick Jones wrote:
> In theory, netperf could be tweaked to set SO_RCVTIMEO at some
> high-but-not-too-high level (from the command line?). It could then
> keep the test limping along I suppose (with gaps), but I don't want
> anything terribly complicated going-on in netperf - otherwise one might
> as well use TCP_RR anyway.
Without committing to keeping it in there, I have made a first pass at a
quick and dirty SO_RCVTIMEO-based mechanism to keep a UDP_RR test from
stopping entirely in the face of UDP datagram loss. The result is
checked-in to the top-of-trunk of the netperf subversion repository at
http://www.netperf.org/svn/netperf2/trunk .
I'm not at all sure at present the "right" things happen for interim
results or the RTT statistics.
To enable the functionality, one adds a test-specific -e option with a
timeout specified in seconds. I would suggest it be quite large so that
one is very much statistically certain that the request/response was
indeed lost and not simply delayed or it will definitely throw the
timings off...
happy benchmarking,
rick jones
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] FQ_Codel lwn draft article review
2012-11-26 22:16 ` Toke Høiland-Jørgensen
@ 2012-11-26 23:21 ` Toke Høiland-Jørgensen
2012-11-26 23:39 ` [Codel] [Cerowrt-devel] " dpreed
0 siblings, 1 reply; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-11-26 23:21 UTC (permalink / raw)
To: Dave Taht
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat,
paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 813 bytes --]
Toke Høiland-Jørgensen <toke@toke.dk> writes:
> The latter should be pretty straight forward, I suppose. And if I recall
> correctly, you did want to measure the upstream jitter?
Following up on this, I've created a proof of concept python script that
starts an iperf server in the background, parses the output, and
presents a command line interface that dumps the parsed data in json
format when asked for a transfer ID (source port number).
The script is available here:
https://github.com/tohojo/netperf-wrapper/blob/master/misc/iperf-server.py
It should be pretty easy to make it listen to a socket instead and allow
clients to request 'their' data. If anyone thinks this will be useful,
I'll be happy to poke some more at it. :)
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-26 23:21 ` Toke Høiland-Jørgensen
@ 2012-11-26 23:39 ` dpreed
2012-11-26 23:58 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 56+ messages in thread
From: dpreed @ 2012-11-26 23:39 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat,
paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]
I'm not sure why people are focused on iperf as a test of fqcodel.
iperf is a "hot rod" test. The UDP versions ignores congestion signals entirely, and thus is completely irrelevant to bufferbloat.
The TCP tests are focused on throughput only, in an extreme case.
While it might be a nice footnote in a discussion of bufferbloat mitigation to say that "iperf is not too badly affected", the purpose of iperf as a measurement tool has literally NOTHING to do with bufferbloat management.
In fact, the focus on optimizing iperf by a half a percent or so in laboratory conditions is *literally* how we ended up with bufferbloat in the first place.
You don't design a highly maneuverable jet fighter by designing a rocket that goes from point A to point B the fastest.
The Internet was NEVER supposed to support circuit switchable traffic models.
Someone needs to make a tool that measures the right thing - and using iperf is the opposite of the right thing.
-----Original Message-----
From: "Toke Høiland-Jørgensen" <toke@toke.dk>
Sent: Monday, November 26, 2012 6:21pm
To: "Dave Taht" <dave.taht@gmail.com>
Cc: "Paolo Valente" <paolo.valente@unimore.it>, "Eric Raymond" <esr@thyrsus.com>, codel@lists.bufferbloat.net, cerowrt-devel@lists.bufferbloat.net, "bloat" <bloat@lists.bufferbloat.net>, paulmck@linux.vnet.ibm.com, "David Woodhouse" <dwmw2@infradead.org>, "John Crispin" <blogic@openwrt.org>
Subject: Re: [Cerowrt-devel] FQ_Codel lwn draft article review
_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel
Toke Høiland-Jørgensen <toke@toke.dk> writes:
> The latter should be pretty straight forward, I suppose. And if I recall
> correctly, you did want to measure the upstream jitter?
Following up on this, I've created a proof of concept python script that
starts an iperf server in the background, parses the output, and
presents a command line interface that dumps the parsed data in json
format when asked for a transfer ID (source port number).
The script is available here:
https://github.com/tohojo/netperf-wrapper/blob/master/misc/iperf-server.py
It should be pretty easy to make it listen to a socket instead and allow
clients to request 'their' data. If anyone thinks this will be useful,
I'll be happy to poke some more at it. :)
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: text/html, Size: 3477 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-26 23:39 ` [Codel] [Cerowrt-devel] " dpreed
@ 2012-11-26 23:58 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-11-26 23:58 UTC (permalink / raw)
To: dpreed
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat,
paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 589 bytes --]
dpreed@reed.com writes:
> iperf is a "hot rod" test. The UDP versions ignores congestion signals
> entirely, and thus is completely irrelevant to bufferbloat.
Well I wasn't going to run it at full speed (whatever that might mean),
but limit it to a relatively low speed, to get the jitter measurements
for UDP in the hope they'd be a decent indication for how a voip call
might fare on the same connection. If you have ideas for a better tool,
I'm all ears. :)
The other tests are done with netperf and good old ping.
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-23 22:18 ` Paul E. McKenney
2012-11-24 0:07 ` Toke Høiland-Jørgensen
@ 2012-11-27 22:03 ` Jim Gettys
2012-11-27 22:31 ` [Codel] [Bloat] " David Lang
` (3 more replies)
1 sibling, 4 replies; 56+ messages in thread
From: Jim Gettys @ 2012-11-27 22:03 UTC (permalink / raw)
To: Paul McKenney
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
[-- Attachment #1: Type: text/plain, Size: 8138 bytes --]
Some points worth making:
1) It is important to point out that (and how) fq_codel avoids starvation:
unpleasant as elephant flows are, it would be very unfriendly to never
service them at all until they time out.
2) "fairness" is not necessarily what we ultimately want at all; you'd
really like to penalize those who induce congestion the most. But we don't
currently have a solution (though Bob Briscoe at BT thinks he does, and is
seeing if he can get it out from under a BT patent), so the current
fq_codel round robins ultimately until/unless we can do something like
Bob's idea. This is a local information only subset of the ideas he's been
working on in the congestion exposure (conex) group at the IETF.
3) "fairness" is always in the eyes of the beholder (and should be left to
the beholder to determine). "fairness" depends on where in the network you
are. While being "fair" among TCP flows is sensible default policy for a
host, else where in the network it may not be/usually isn't.
Two examples:
o at a home router, you probably want to be "fair" according to transmit
opportunities. We really don't want a single system remote from the router
to be able to starve the network so that devices near the router get much
less bandwidth than you might hope/expect.
What is more, you probably want to account for a single host using many
flows, and regulate that they not be able to "hog" bandwidth in the home
environment, but only use their "fair" share.
o at an ISP, you must to be "fair" between customers; it is best to leave
the judgement of "fairness" at finer granularity (e.g. host and TCP flows)
to the points closer to the customer's systems, so that they can enforce
whatever definition of "fair" they need to themselves.
Algorithms like fq_codel can be/should be adjusted to the circumstances.
And therefore exactly what you choose to hash against to form the buckets
will vary depending on where you are. That at least one step (at the
user's device) of this be TCP flow "fair" does have the great advantage of
helping the RTT unfairness problem that violates the principle of "least
surprise", such as that routinely seen in places like New Zealand.
This is why I have so many problems using the word "fair" near this
algorithm. "fair" is impossible to define, overloaded in people's mind
with TCP fair queuing, not even desirable much of the time, and by
definition and design, even today's fq_codel isn't fair to lots of things,
and the same basic algorithm can/should be tweaked in lots of directions
depending on what we need to do. Calling this "smart" queuing or some such
would be better.
When you've done another round on the document, I'll do a more detailed
read.
- Jim
On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
paulmck@linux.vnet.ibm.com> wrote:
> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > variety of drivers and network layers in a typical bonded adsl stack
> > yesterday. The complexity of it all makes my head hurt. I'm happy that
> > a newly BQL'd ethernet driver (for the geos and qemu) emerged from it,
> > which he submitted to netdev...
>
> Cool!!! ;-)
>
> > I made a recording of us last night discussing the layers, which I
> > will produce and distribute later...
> >
> > Anyway, along the way, we fiddled a lot with trying to analyze where
> > the 350ms or so of added latency was coming from in the traverse geo's
> > adsl implementation and overlying stack....
> >
> > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> >
> > Note: 1:
> >
> > The netperf sample rate on the rrul test needs to be higher than
> > 100ms in order to get a decent result at sub 10Mbit speeds.
> >
> > Note 2:
> >
> > The two nicest graphs here are nofq.svg vs fq.svg, which were taken on
> > a gigE link from a Mac running Linux to another gigE link. (in other
> > words, NOT on the friggin adsl link) (firefox can display svg, I don't
> > know what else) I find the T+10 delay before stream start in the
> > fq.svg graph suspicious and think the "throw out the outlier" code in
> > the netperf-wrapper code is at fault. Prior to that, codel is merely
> > buffering up things madly, which can also be seen in the pfifo_fast
> > behavior, with 1000pkts it's default.
>
> I am using these two in a new "Effectiveness of FQ-CoDel" section.
> Chrome can display .svg, and if it becomes a problem, I am sure that
> they can be converted. Please let me know if some other data would
> make the point better.
>
> I am assuming that the colored throughput spikes are due to occasional
> packet losses. Please let me know if this interpretation is overly naive.
>
> Also, I know what ICMP is, but the UDP variants are new to me. Could
> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>
> > (Arguably, the default queue length in codel can be reduced from 10k
> > packets to something more reasonable at GigE speeds)
> >
> > (the indicator that it's the graph, not the reality, is that the
> > fq.svg pings and udp start at T+5 and grow minimally, as is usual with
> > fq_codel.)
>
> All sessions were started at T+5, then?
>
> > As for the *.ps graphs, well, they would take david's network topology
> > to explain, and were conducted over a variety of circumstances,
> > including wifi, with more variables in play than I care to think
> > about.
> >
> > We didn't really get anywhere on digging deeper. As we got to purer
> > tests - with a minimal number of boxes, running pure ethernet,
> > switched over a couple of switches, even in the simplest two box case,
> > my HTB based "ceroshaper" implementation had multiple problems in
> > cutting median latencies below 100ms, on this very slow ADSL link.
> > David suspects problems on the path along the carrier backbone as a
> > potential issue, and the only way to measure that is with two one way
> > trip time measurements (rather than rtt), time synced via ntp... I
> > keep hoping to find a rtp test, but I'm open to just about any option
> > at this point. anyone?
> >
> > We also found a probable bug in mtr in that multiple mtrs on the same
> > box don't co-exist.
>
> I must confess that I am not seeing all that clear a difference between
> the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better latencies
> for FQ-CoDel, but not unambiguously so.
>
> > Moving back to more scientific clarity and simpler tests...
> >
> > The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> >
> >
> http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Bufferbloat_on_wifi.pdf
> >
> > appear to show the advantage of fq_codel fq + codel + head drop over
> > tail drop during the slow start period on a 10Mbit link - (see how
> > squiggly slow start is on pfifo fast?) as well as the marvelous
> > interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> > link.) Even that latency can be halved by reducing BQL to 1500, which
> > is just fine on a 10mbit. Below those rates I'd like to be rid of BQL
> > entirely, and just have a single packet outstanding... in everything
> > from adsl to cable...
> >
> > That said, I'd welcome other explanations of the squiggly slowstart
> > pfifo_fast behavior before I put that explanation on the slide.... ECN
> > was in play here, too. I can redo this test easily, it's basically
> > running a netperf TCP_RR for 70 seconds, and starting up a TCP_MAERTS
> > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> > limit and the link speeds on two sides of a directly connected laptop
> > connection.
>
> I must defer to others on this one. I do note the much lower latencies
> on slide 6 compared to slide 5, though.
>
> Please see attached for update including .git directory.
>
> Thanx, Paul
>
> > ethtool -s eth0 advertise 0x002 # 10 Mbit
> >
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
[-- Attachment #2: Type: text/html, Size: 10160 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 22:03 ` [Codel] [Cerowrt-devel] " Jim Gettys
@ 2012-11-27 22:31 ` David Lang
2012-11-27 22:54 ` Paul E. McKenney
2012-11-28 14:06 ` [Codel] [Cerowrt-devel] [Bloat] " Michael Richardson
2012-11-27 22:49 ` [Codel] [Cerowrt-devel] " Paul E. McKenney
` (2 subsequent siblings)
3 siblings, 2 replies; 56+ messages in thread
From: David Lang @ 2012-11-27 22:31 UTC (permalink / raw)
To: Jim Gettys
Cc: Paolo Valente, Toke Høiland-Jørgensen, codel,
cerowrt-devel, bloat, Paul McKenney, John Crispin
[-- Attachment #1: Type: TEXT/Plain, Size: 1630 bytes --]
On Tue, 27 Nov 2012, Jim Gettys wrote:
> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> really like to penalize those who induce congestion the most. But we don't
> currently have a solution (though Bob Briscoe at BT thinks he does, and is
> seeing if he can get it out from under a BT patent), so the current
> fq_codel round robins ultimately until/unless we can do something like
> Bob's idea. This is a local information only subset of the ideas he's been
> working on in the congestion exposure (conex) group at the IETF.
Even more than this, we _know_ that we don't want to be fair in terms of the raw
packet priority.
For example, we know that we want to prioritize DNS traffic over TCP streams
(due to the fact that the TCP traffic usually can't even start until DNS
resolution finishes)
We strongly suspect that we want to prioritize short-lived connections over long
lived connections. We don't know a good way to do this, but one good starting
point would be to prioritize syn packets so that the initialization of the
connection happens as fast as possible.
Ideally we'd probably like to prioritize the first couple of packets of a
connection so that very short lived connections finish quickly
it may make sense to prioritize fin packets so that connection teardown (and the
resulting release of resources and connection tracking) happens as fast as
possible
all of these are horribly unfair when you are looking at the raw packet flow,
but they significantly help the user's percieved response time without making
much difference on the large download cases.
David Lang
[-- Attachment #2: Type: TEXT/PLAIN, Size: 140 bytes --]
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 22:03 ` [Codel] [Cerowrt-devel] " Jim Gettys
2012-11-27 22:31 ` [Codel] [Bloat] " David Lang
@ 2012-11-27 22:49 ` Paul E. McKenney
2012-11-27 23:53 ` Greg White
2012-11-28 17:20 ` [Codel] " Paul E. McKenney
2012-11-30 1:09 ` Dan Siemon
3 siblings, 1 reply; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-27 22:49 UTC (permalink / raw)
To: Jim Gettys
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
Thank you for the review and comments, Jim! I will apply them when
I get the pen back from Dave. And yes, that is the thing about
"fairness" -- there are a great many definitions, many of the most
useful of which appear to many to be patently unfair. ;-)
As you suggest, it might well be best to drop discussion of fairness,
or to at the least supply the corresponding definition.
Thanx, Paul
On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> Some points worth making:
>
> 1) It is important to point out that (and how) fq_codel avoids starvation:
> unpleasant as elephant flows are, it would be very unfriendly to never
> service them at all until they time out.
>
> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> really like to penalize those who induce congestion the most. But we don't
> currently have a solution (though Bob Briscoe at BT thinks he does, and is
> seeing if he can get it out from under a BT patent), so the current
> fq_codel round robins ultimately until/unless we can do something like
> Bob's idea. This is a local information only subset of the ideas he's been
> working on in the congestion exposure (conex) group at the IETF.
>
> 3) "fairness" is always in the eyes of the beholder (and should be left to
> the beholder to determine). "fairness" depends on where in the network you
> are. While being "fair" among TCP flows is sensible default policy for a
> host, else where in the network it may not be/usually isn't.
>
> Two examples:
> o at a home router, you probably want to be "fair" according to transmit
> opportunities. We really don't want a single system remote from the router
> to be able to starve the network so that devices near the router get much
> less bandwidth than you might hope/expect.
>
> What is more, you probably want to account for a single host using many
> flows, and regulate that they not be able to "hog" bandwidth in the home
> environment, but only use their "fair" share.
>
> o at an ISP, you must to be "fair" between customers; it is best to leave
> the judgement of "fairness" at finer granularity (e.g. host and TCP flows)
> to the points closer to the customer's systems, so that they can enforce
> whatever definition of "fair" they need to themselves.
>
>
> Algorithms like fq_codel can be/should be adjusted to the circumstances.
>
> And therefore exactly what you choose to hash against to form the buckets
> will vary depending on where you are. That at least one step (at the
> user's device) of this be TCP flow "fair" does have the great advantage of
> helping the RTT unfairness problem that violates the principle of "least
> surprise", such as that routinely seen in places like New Zealand.
>
> This is why I have so many problems using the word "fair" near this
> algorithm. "fair" is impossible to define, overloaded in people's mind
> with TCP fair queuing, not even desirable much of the time, and by
> definition and design, even today's fq_codel isn't fair to lots of things,
> and the same basic algorithm can/should be tweaked in lots of directions
> depending on what we need to do. Calling this "smart" queuing or some such
> would be better.
>
> When you've done another round on the document, I'll do a more detailed
> read.
> - Jim
>
>
>
>
> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> paulmck@linux.vnet.ibm.com> wrote:
>
> > On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > > David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > > variety of drivers and network layers in a typical bonded adsl stack
> > > yesterday. The complexity of it all makes my head hurt. I'm happy that
> > > a newly BQL'd ethernet driver (for the geos and qemu) emerged from it,
> > > which he submitted to netdev...
> >
> > Cool!!! ;-)
> >
> > > I made a recording of us last night discussing the layers, which I
> > > will produce and distribute later...
> > >
> > > Anyway, along the way, we fiddled a lot with trying to analyze where
> > > the 350ms or so of added latency was coming from in the traverse geo's
> > > adsl implementation and overlying stack....
> > >
> > > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> > >
> > > Note: 1:
> > >
> > > The netperf sample rate on the rrul test needs to be higher than
> > > 100ms in order to get a decent result at sub 10Mbit speeds.
> > >
> > > Note 2:
> > >
> > > The two nicest graphs here are nofq.svg vs fq.svg, which were taken on
> > > a gigE link from a Mac running Linux to another gigE link. (in other
> > > words, NOT on the friggin adsl link) (firefox can display svg, I don't
> > > know what else) I find the T+10 delay before stream start in the
> > > fq.svg graph suspicious and think the "throw out the outlier" code in
> > > the netperf-wrapper code is at fault. Prior to that, codel is merely
> > > buffering up things madly, which can also be seen in the pfifo_fast
> > > behavior, with 1000pkts it's default.
> >
> > I am using these two in a new "Effectiveness of FQ-CoDel" section.
> > Chrome can display .svg, and if it becomes a problem, I am sure that
> > they can be converted. Please let me know if some other data would
> > make the point better.
> >
> > I am assuming that the colored throughput spikes are due to occasional
> > packet losses. Please let me know if this interpretation is overly naive.
> >
> > Also, I know what ICMP is, but the UDP variants are new to me. Could
> > you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> >
> > > (Arguably, the default queue length in codel can be reduced from 10k
> > > packets to something more reasonable at GigE speeds)
> > >
> > > (the indicator that it's the graph, not the reality, is that the
> > > fq.svg pings and udp start at T+5 and grow minimally, as is usual with
> > > fq_codel.)
> >
> > All sessions were started at T+5, then?
> >
> > > As for the *.ps graphs, well, they would take david's network topology
> > > to explain, and were conducted over a variety of circumstances,
> > > including wifi, with more variables in play than I care to think
> > > about.
> > >
> > > We didn't really get anywhere on digging deeper. As we got to purer
> > > tests - with a minimal number of boxes, running pure ethernet,
> > > switched over a couple of switches, even in the simplest two box case,
> > > my HTB based "ceroshaper" implementation had multiple problems in
> > > cutting median latencies below 100ms, on this very slow ADSL link.
> > > David suspects problems on the path along the carrier backbone as a
> > > potential issue, and the only way to measure that is with two one way
> > > trip time measurements (rather than rtt), time synced via ntp... I
> > > keep hoping to find a rtp test, but I'm open to just about any option
> > > at this point. anyone?
> > >
> > > We also found a probable bug in mtr in that multiple mtrs on the same
> > > box don't co-exist.
> >
> > I must confess that I am not seeing all that clear a difference between
> > the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better latencies
> > for FQ-CoDel, but not unambiguously so.
> >
> > > Moving back to more scientific clarity and simpler tests...
> > >
> > > The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> > >
> > >
> > http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Bufferbloat_on_wifi.pdf
> > >
> > > appear to show the advantage of fq_codel fq + codel + head drop over
> > > tail drop during the slow start period on a 10Mbit link - (see how
> > > squiggly slow start is on pfifo fast?) as well as the marvelous
> > > interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> > > link.) Even that latency can be halved by reducing BQL to 1500, which
> > > is just fine on a 10mbit. Below those rates I'd like to be rid of BQL
> > > entirely, and just have a single packet outstanding... in everything
> > > from adsl to cable...
> > >
> > > That said, I'd welcome other explanations of the squiggly slowstart
> > > pfifo_fast behavior before I put that explanation on the slide.... ECN
> > > was in play here, too. I can redo this test easily, it's basically
> > > running a netperf TCP_RR for 70 seconds, and starting up a TCP_MAERTS
> > > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> > > limit and the link speeds on two sides of a directly connected laptop
> > > connection.
> >
> > I must defer to others on this one. I do note the much lower latencies
> > on slide 6 compared to slide 5, though.
> >
> > Please see attached for update including .git directory.
> >
> > Thanx, Paul
> >
> > > ethtool -s eth0 advertise 0x002 # 10 Mbit
> > >
> >
> > _______________________________________________
> > Cerowrt-devel mailing list
> > Cerowrt-devel@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >
> >
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 22:31 ` [Codel] [Bloat] " David Lang
@ 2012-11-27 22:54 ` Paul E. McKenney
2012-11-27 23:15 ` Andrew McGregor
2012-11-28 14:06 ` [Codel] [Cerowrt-devel] [Bloat] " Michael Richardson
1 sibling, 1 reply; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-27 22:54 UTC (permalink / raw)
To: David Lang
Cc: Paolo Valente, Toke Høiland-Jørgensen, codel,
cerowrt-devel, bloat, John Crispin
On Tue, Nov 27, 2012 at 02:31:53PM -0800, David Lang wrote:
> On Tue, 27 Nov 2012, Jim Gettys wrote:
>
> >2) "fairness" is not necessarily what we ultimately want at all; you'd
> >really like to penalize those who induce congestion the most. But we don't
> >currently have a solution (though Bob Briscoe at BT thinks he does, and is
> >seeing if he can get it out from under a BT patent), so the current
> >fq_codel round robins ultimately until/unless we can do something like
> >Bob's idea. This is a local information only subset of the ideas he's been
> >working on in the congestion exposure (conex) group at the IETF.
>
> Even more than this, we _know_ that we don't want to be fair in
> terms of the raw packet priority.
>
> For example, we know that we want to prioritize DNS traffic over TCP
> streams (due to the fact that the TCP traffic usually can't even
> start until DNS resolution finishes)
>
> We strongly suspect that we want to prioritize short-lived
> connections over long lived connections. We don't know a good way to
> do this, but one good starting point would be to prioritize syn
> packets so that the initialization of the connection happens as fast
> as possible.
>
> Ideally we'd probably like to prioritize the first couple of packets
> of a connection so that very short lived connections finish quickly
>
> it may make sense to prioritize fin packets so that connection
> teardown (and the resulting release of resources and connection
> tracking) happens as fast as possible
>
> all of these are horribly unfair when you are looking at the raw
> packet flow, but they significantly help the user's percieved
> response time without making much difference on the large download
> cases.
In all cases, to Jim's point, as long as we avoid starvation. And there
will likely be more corner cases that show up under extreme overload.
Thanx, Paul
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 22:54 ` Paul E. McKenney
@ 2012-11-27 23:15 ` Andrew McGregor
2012-11-28 0:51 ` Paul E. McKenney
2012-11-28 17:36 ` Paul E. McKenney
0 siblings, 2 replies; 56+ messages in thread
From: Andrew McGregor @ 2012-11-27 23:15 UTC (permalink / raw)
To: paulmck
Cc: David Lang, Paolo Valente, Toke Høiland-Jørgensen,
codel, cerowrt-devel, bloat, John Crispin
On 28/11/2012, at 11:54 AM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Nov 27, 2012 at 02:31:53PM -0800, David Lang wrote:
>> On Tue, 27 Nov 2012, Jim Gettys wrote:
>>
>>> 2) "fairness" is not necessarily what we ultimately want at all; you'd
>>> really like to penalize those who induce congestion the most. But we don't
>>> currently have a solution (though Bob Briscoe at BT thinks he does, and is
>>> seeing if he can get it out from under a BT patent), so the current
>>> fq_codel round robins ultimately until/unless we can do something like
>>> Bob's idea. This is a local information only subset of the ideas he's been
>>> working on in the congestion exposure (conex) group at the IETF.
>>
>> Even more than this, we _know_ that we don't want to be fair in
>> terms of the raw packet priority.
>>
>> For example, we know that we want to prioritize DNS traffic over TCP
>> streams (due to the fact that the TCP traffic usually can't even
>> start until DNS resolution finishes)
>>
>> We strongly suspect that we want to prioritize short-lived
>> connections over long lived connections. We don't know a good way to
>> do this, but one good starting point would be to prioritize syn
>> packets so that the initialization of the connection happens as fast
>> as possible.
>>
>> Ideally we'd probably like to prioritize the first couple of packets
>> of a connection so that very short lived connections finish quickly
fq_codel does all of this, although it isn't explicit about it so it is hard to see how it happens.
>> it may make sense to prioritize fin packets so that connection
>> teardown (and the resulting release of resources and connection
>> tracking) happens as fast as possible
>>
>> all of these are horribly unfair when you are looking at the raw
>> packet flow, but they significantly help the user's percieved
>> response time without making much difference on the large download
>> cases.
>
> In all cases, to Jim's point, as long as we avoid starvation. And there
> will likely be more corner cases that show up under extreme overload.
>
> Thanx, Paul
>
So, fq_codel exhibits a new kind of fairness: it is jitter fair, or in other words, each flow gets the same bound on how much jitter it can induce in the whole ensemble of flows. Exceed that bound, and flows get deprioritised. This achieves thin-flow and DNS prioritisation, while allowing TCP flows to build more buffer if required. The sub-flow CoDel queues then allow short flows to use a reasonably large buffer, while draining standing buffers for long TCP flows.
The really interesting part of the jitter-fair behaviour is that jitter-sensitive traffic is protected as much as it can be, provided its own sending rate control does something sensible. Good news for interactive video, in other words.
The actual jitter bound is the transmission time of max(mtu, quantum) * n_thin_flows bytes, where a thin flow is one that has not exceeded its own jitter allowance since the last time its queue drained. While it is possible that there might instantaneously be a fairly large number of thin flows, in practice on a home network link there are normally only a very few of these at any one moment, and so the jitter experienced is pretty good.
Andrew
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 22:49 ` [Codel] [Cerowrt-devel] " Paul E. McKenney
@ 2012-11-27 23:53 ` Greg White
2012-11-28 0:27 ` Paul E. McKenney
0 siblings, 1 reply; 56+ messages in thread
From: Greg White @ 2012-11-27 23:53 UTC (permalink / raw)
To: paulmck, Jim Gettys
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
BTW, I've heard some use the term "stochastic flow queueing" as a
replacement to avoid the term "fair". Seems like a more apt term anyway.
-Greg
On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>Thank you for the review and comments, Jim! I will apply them when
>I get the pen back from Dave. And yes, that is the thing about
>"fairness" -- there are a great many definitions, many of the most
>useful of which appear to many to be patently unfair. ;-)
>
>As you suggest, it might well be best to drop discussion of fairness,
>or to at the least supply the corresponding definition.
>
> Thanx, Paul
>
>On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
>> Some points worth making:
>>
>> 1) It is important to point out that (and how) fq_codel avoids
>>starvation:
>> unpleasant as elephant flows are, it would be very unfriendly to never
>> service them at all until they time out.
>>
>> 2) "fairness" is not necessarily what we ultimately want at all; you'd
>> really like to penalize those who induce congestion the most. But we
>>don't
>> currently have a solution (though Bob Briscoe at BT thinks he does, and
>>is
>> seeing if he can get it out from under a BT patent), so the current
>> fq_codel round robins ultimately until/unless we can do something like
>> Bob's idea. This is a local information only subset of the ideas he's
>>been
>> working on in the congestion exposure (conex) group at the IETF.
>>
>> 3) "fairness" is always in the eyes of the beholder (and should be left
>>to
>> the beholder to determine). "fairness" depends on where in the network
>>you
>> are. While being "fair" among TCP flows is sensible default policy for
>>a
>> host, else where in the network it may not be/usually isn't.
>>
>> Two examples:
>> o at a home router, you probably want to be "fair" according to transmit
>> opportunities. We really don't want a single system remote from the
>>router
>> to be able to starve the network so that devices near the router get
>>much
>> less bandwidth than you might hope/expect.
>>
>> What is more, you probably want to account for a single host using many
>> flows, and regulate that they not be able to "hog" bandwidth in the home
>> environment, but only use their "fair" share.
>>
>> o at an ISP, you must to be "fair" between customers; it is best to
>>leave
>> the judgement of "fairness" at finer granularity (e.g. host and TCP
>>flows)
>> to the points closer to the customer's systems, so that they can enforce
>> whatever definition of "fair" they need to themselves.
>>
>>
>> Algorithms like fq_codel can be/should be adjusted to the circumstances.
>>
>> And therefore exactly what you choose to hash against to form the
>>buckets
>> will vary depending on where you are. That at least one step (at the
>> user's device) of this be TCP flow "fair" does have the great advantage
>>of
>> helping the RTT unfairness problem that violates the principle of "least
>> surprise", such as that routinely seen in places like New Zealand.
>>
>> This is why I have so many problems using the word "fair" near this
>> algorithm. "fair" is impossible to define, overloaded in people's mind
>> with TCP fair queuing, not even desirable much of the time, and by
>> definition and design, even today's fq_codel isn't fair to lots of
>>things,
>> and the same basic algorithm can/should be tweaked in lots of directions
>> depending on what we need to do. Calling this "smart" queuing or some
>>such
>> would be better.
>>
>> When you've done another round on the document, I'll do a more detailed
>> read.
>> - Jim
>>
>>
>>
>>
>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
>> paulmck@linux.vnet.ibm.com> wrote:
>>
>> > On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
>> > > David Woodhouse and I fiddled a lot with adsl and openwrt and a
>> > > variety of drivers and network layers in a typical bonded adsl stack
>> > > yesterday. The complexity of it all makes my head hurt. I'm happy
>>that
>> > > a newly BQL'd ethernet driver (for the geos and qemu) emerged from
>>it,
>> > > which he submitted to netdev...
>> >
>> > Cool!!! ;-)
>> >
>> > > I made a recording of us last night discussing the layers, which I
>> > > will produce and distribute later...
>> > >
>> > > Anyway, along the way, we fiddled a lot with trying to analyze where
>> > > the 350ms or so of added latency was coming from in the traverse
>>geo's
>> > > adsl implementation and overlying stack....
>> > >
>> > > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
>> > >
>> > > Note: 1:
>> > >
>> > > The netperf sample rate on the rrul test needs to be higher than
>> > > 100ms in order to get a decent result at sub 10Mbit speeds.
>> > >
>> > > Note 2:
>> > >
>> > > The two nicest graphs here are nofq.svg vs fq.svg, which were taken
>>on
>> > > a gigE link from a Mac running Linux to another gigE link. (in other
>> > > words, NOT on the friggin adsl link) (firefox can display svg, I
>>don't
>> > > know what else) I find the T+10 delay before stream start in the
>> > > fq.svg graph suspicious and think the "throw out the outlier" code
>>in
>> > > the netperf-wrapper code is at fault. Prior to that, codel is merely
>> > > buffering up things madly, which can also be seen in the pfifo_fast
>> > > behavior, with 1000pkts it's default.
>> >
>> > I am using these two in a new "Effectiveness of FQ-CoDel" section.
>> > Chrome can display .svg, and if it becomes a problem, I am sure that
>> > they can be converted. Please let me know if some other data would
>> > make the point better.
>> >
>> > I am assuming that the colored throughput spikes are due to occasional
>> > packet losses. Please let me know if this interpretation is overly
>>naive.
>> >
>> > Also, I know what ICMP is, but the UDP variants are new to me. Could
>> > you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>> >
>> > > (Arguably, the default queue length in codel can be reduced from 10k
>> > > packets to something more reasonable at GigE speeds)
>> > >
>> > > (the indicator that it's the graph, not the reality, is that the
>> > > fq.svg pings and udp start at T+5 and grow minimally, as is usual
>>with
>> > > fq_codel.)
>> >
>> > All sessions were started at T+5, then?
>> >
>> > > As for the *.ps graphs, well, they would take david's network
>>topology
>> > > to explain, and were conducted over a variety of circumstances,
>> > > including wifi, with more variables in play than I care to think
>> > > about.
>> > >
>> > > We didn't really get anywhere on digging deeper. As we got to purer
>> > > tests - with a minimal number of boxes, running pure ethernet,
>> > > switched over a couple of switches, even in the simplest two box
>>case,
>> > > my HTB based "ceroshaper" implementation had multiple problems in
>> > > cutting median latencies below 100ms, on this very slow ADSL link.
>> > > David suspects problems on the path along the carrier backbone as a
>> > > potential issue, and the only way to measure that is with two one
>>way
>> > > trip time measurements (rather than rtt), time synced via ntp... I
>> > > keep hoping to find a rtp test, but I'm open to just about any
>>option
>> > > at this point. anyone?
>> > >
>> > > We also found a probable bug in mtr in that multiple mtrs on the
>>same
>> > > box don't co-exist.
>> >
>> > I must confess that I am not seeing all that clear a difference
>>between
>> > the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better
>>latencies
>> > for FQ-CoDel, but not unambiguously so.
>> >
>> > > Moving back to more scientific clarity and simpler tests...
>> > >
>> > > The two graphs, taken a few weeks back, on pages 5 and 6 of this:
>> > >
>> > >
>> >
>>http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
>>erbloat_on_wifi.pdf
>> > >
>> > > appear to show the advantage of fq_codel fq + codel + head drop over
>> > > tail drop during the slow start period on a 10Mbit link - (see how
>> > > squiggly slow start is on pfifo fast?) as well as the marvelous
>> > > interstream latency that can be achieved with BQL=3000 (on a 10 mbit
>> > > link.) Even that latency can be halved by reducing BQL to 1500,
>>which
>> > > is just fine on a 10mbit. Below those rates I'd like to be rid of
>>BQL
>> > > entirely, and just have a single packet outstanding... in everything
>> > > from adsl to cable...
>> > >
>> > > That said, I'd welcome other explanations of the squiggly slowstart
>> > > pfifo_fast behavior before I put that explanation on the slide....
>>ECN
>> > > was in play here, too. I can redo this test easily, it's basically
>> > > running a netperf TCP_RR for 70 seconds, and starting up a
>>TCP_MAERTS
>> > > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
>> > > limit and the link speeds on two sides of a directly connected
>>laptop
>> > > connection.
>> >
>> > I must defer to others on this one. I do note the much lower
>>latencies
>> > on slide 6 compared to slide 5, though.
>> >
>> > Please see attached for update including .git directory.
>> >
>> > Thanx, Paul
>> >
>> > > ethtool -s eth0 advertise 0x002 # 10 Mbit
>> > >
>> >
>> > _______________________________________________
>> > Cerowrt-devel mailing list
>> > Cerowrt-devel@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >
>> >
>
>_______________________________________________
>Codel mailing list
>Codel@lists.bufferbloat.net
>https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 23:53 ` Greg White
@ 2012-11-28 0:27 ` Paul E. McKenney
2012-11-28 3:43 ` Kathleen Nichols
0 siblings, 1 reply; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-28 0:27 UTC (permalink / raw)
To: Greg White
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote:
> BTW, I've heard some use the term "stochastic flow queueing" as a
> replacement to avoid the term "fair". Seems like a more apt term anyway.
Would that mean that FQ-CoDel is Flow Queue Controlled Delay? ;-)
Thanx, Paul
> -Greg
>
>
> On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
> >Thank you for the review and comments, Jim! I will apply them when
> >I get the pen back from Dave. And yes, that is the thing about
> >"fairness" -- there are a great many definitions, many of the most
> >useful of which appear to many to be patently unfair. ;-)
> >
> >As you suggest, it might well be best to drop discussion of fairness,
> >or to at the least supply the corresponding definition.
> >
> > Thanx, Paul
> >
> >On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> >> Some points worth making:
> >>
> >> 1) It is important to point out that (and how) fq_codel avoids
> >>starvation:
> >> unpleasant as elephant flows are, it would be very unfriendly to never
> >> service them at all until they time out.
> >>
> >> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> >> really like to penalize those who induce congestion the most. But we
> >>don't
> >> currently have a solution (though Bob Briscoe at BT thinks he does, and
> >>is
> >> seeing if he can get it out from under a BT patent), so the current
> >> fq_codel round robins ultimately until/unless we can do something like
> >> Bob's idea. This is a local information only subset of the ideas he's
> >>been
> >> working on in the congestion exposure (conex) group at the IETF.
> >>
> >> 3) "fairness" is always in the eyes of the beholder (and should be left
> >>to
> >> the beholder to determine). "fairness" depends on where in the network
> >>you
> >> are. While being "fair" among TCP flows is sensible default policy for
> >>a
> >> host, else where in the network it may not be/usually isn't.
> >>
> >> Two examples:
> >> o at a home router, you probably want to be "fair" according to transmit
> >> opportunities. We really don't want a single system remote from the
> >>router
> >> to be able to starve the network so that devices near the router get
> >>much
> >> less bandwidth than you might hope/expect.
> >>
> >> What is more, you probably want to account for a single host using many
> >> flows, and regulate that they not be able to "hog" bandwidth in the home
> >> environment, but only use their "fair" share.
> >>
> >> o at an ISP, you must to be "fair" between customers; it is best to
> >>leave
> >> the judgement of "fairness" at finer granularity (e.g. host and TCP
> >>flows)
> >> to the points closer to the customer's systems, so that they can enforce
> >> whatever definition of "fair" they need to themselves.
> >>
> >>
> >> Algorithms like fq_codel can be/should be adjusted to the circumstances.
> >>
> >> And therefore exactly what you choose to hash against to form the
> >>buckets
> >> will vary depending on where you are. That at least one step (at the
> >> user's device) of this be TCP flow "fair" does have the great advantage
> >>of
> >> helping the RTT unfairness problem that violates the principle of "least
> >> surprise", such as that routinely seen in places like New Zealand.
> >>
> >> This is why I have so many problems using the word "fair" near this
> >> algorithm. "fair" is impossible to define, overloaded in people's mind
> >> with TCP fair queuing, not even desirable much of the time, and by
> >> definition and design, even today's fq_codel isn't fair to lots of
> >>things,
> >> and the same basic algorithm can/should be tweaked in lots of directions
> >> depending on what we need to do. Calling this "smart" queuing or some
> >>such
> >> would be better.
> >>
> >> When you've done another round on the document, I'll do a more detailed
> >> read.
> >> - Jim
> >>
> >>
> >>
> >>
> >> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> >> paulmck@linux.vnet.ibm.com> wrote:
> >>
> >> > On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> >> > > David Woodhouse and I fiddled a lot with adsl and openwrt and a
> >> > > variety of drivers and network layers in a typical bonded adsl stack
> >> > > yesterday. The complexity of it all makes my head hurt. I'm happy
> >>that
> >> > > a newly BQL'd ethernet driver (for the geos and qemu) emerged from
> >>it,
> >> > > which he submitted to netdev...
> >> >
> >> > Cool!!! ;-)
> >> >
> >> > > I made a recording of us last night discussing the layers, which I
> >> > > will produce and distribute later...
> >> > >
> >> > > Anyway, along the way, we fiddled a lot with trying to analyze where
> >> > > the 350ms or so of added latency was coming from in the traverse
> >>geo's
> >> > > adsl implementation and overlying stack....
> >> > >
> >> > > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> >> > >
> >> > > Note: 1:
> >> > >
> >> > > The netperf sample rate on the rrul test needs to be higher than
> >> > > 100ms in order to get a decent result at sub 10Mbit speeds.
> >> > >
> >> > > Note 2:
> >> > >
> >> > > The two nicest graphs here are nofq.svg vs fq.svg, which were taken
> >>on
> >> > > a gigE link from a Mac running Linux to another gigE link. (in other
> >> > > words, NOT on the friggin adsl link) (firefox can display svg, I
> >>don't
> >> > > know what else) I find the T+10 delay before stream start in the
> >> > > fq.svg graph suspicious and think the "throw out the outlier" code
> >>in
> >> > > the netperf-wrapper code is at fault. Prior to that, codel is merely
> >> > > buffering up things madly, which can also be seen in the pfifo_fast
> >> > > behavior, with 1000pkts it's default.
> >> >
> >> > I am using these two in a new "Effectiveness of FQ-CoDel" section.
> >> > Chrome can display .svg, and if it becomes a problem, I am sure that
> >> > they can be converted. Please let me know if some other data would
> >> > make the point better.
> >> >
> >> > I am assuming that the colored throughput spikes are due to occasional
> >> > packet losses. Please let me know if this interpretation is overly
> >>naive.
> >> >
> >> > Also, I know what ICMP is, but the UDP variants are new to me. Could
> >> > you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> >> >
> >> > > (Arguably, the default queue length in codel can be reduced from 10k
> >> > > packets to something more reasonable at GigE speeds)
> >> > >
> >> > > (the indicator that it's the graph, not the reality, is that the
> >> > > fq.svg pings and udp start at T+5 and grow minimally, as is usual
> >>with
> >> > > fq_codel.)
> >> >
> >> > All sessions were started at T+5, then?
> >> >
> >> > > As for the *.ps graphs, well, they would take david's network
> >>topology
> >> > > to explain, and were conducted over a variety of circumstances,
> >> > > including wifi, with more variables in play than I care to think
> >> > > about.
> >> > >
> >> > > We didn't really get anywhere on digging deeper. As we got to purer
> >> > > tests - with a minimal number of boxes, running pure ethernet,
> >> > > switched over a couple of switches, even in the simplest two box
> >>case,
> >> > > my HTB based "ceroshaper" implementation had multiple problems in
> >> > > cutting median latencies below 100ms, on this very slow ADSL link.
> >> > > David suspects problems on the path along the carrier backbone as a
> >> > > potential issue, and the only way to measure that is with two one
> >>way
> >> > > trip time measurements (rather than rtt), time synced via ntp... I
> >> > > keep hoping to find a rtp test, but I'm open to just about any
> >>option
> >> > > at this point. anyone?
> >> > >
> >> > > We also found a probable bug in mtr in that multiple mtrs on the
> >>same
> >> > > box don't co-exist.
> >> >
> >> > I must confess that I am not seeing all that clear a difference
> >>between
> >> > the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better
> >>latencies
> >> > for FQ-CoDel, but not unambiguously so.
> >> >
> >> > > Moving back to more scientific clarity and simpler tests...
> >> > >
> >> > > The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> >> > >
> >> > >
> >> >
> >>http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
> >>erbloat_on_wifi.pdf
> >> > >
> >> > > appear to show the advantage of fq_codel fq + codel + head drop over
> >> > > tail drop during the slow start period on a 10Mbit link - (see how
> >> > > squiggly slow start is on pfifo fast?) as well as the marvelous
> >> > > interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> >> > > link.) Even that latency can be halved by reducing BQL to 1500,
> >>which
> >> > > is just fine on a 10mbit. Below those rates I'd like to be rid of
> >>BQL
> >> > > entirely, and just have a single packet outstanding... in everything
> >> > > from adsl to cable...
> >> > >
> >> > > That said, I'd welcome other explanations of the squiggly slowstart
> >> > > pfifo_fast behavior before I put that explanation on the slide....
> >>ECN
> >> > > was in play here, too. I can redo this test easily, it's basically
> >> > > running a netperf TCP_RR for 70 seconds, and starting up a
> >>TCP_MAERTS
> >> > > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> >> > > limit and the link speeds on two sides of a directly connected
> >>laptop
> >> > > connection.
> >> >
> >> > I must defer to others on this one. I do note the much lower
> >>latencies
> >> > on slide 6 compared to slide 5, though.
> >> >
> >> > Please see attached for update including .git directory.
> >> >
> >> > Thanx, Paul
> >> >
> >> > > ethtool -s eth0 advertise 0x002 # 10 Mbit
> >> > >
> >> >
> >> > _______________________________________________
> >> > Cerowrt-devel mailing list
> >> > Cerowrt-devel@lists.bufferbloat.net
> >> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >> >
> >> >
> >
> >_______________________________________________
> >Codel mailing list
> >Codel@lists.bufferbloat.net
> >https://lists.bufferbloat.net/listinfo/codel
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 23:15 ` Andrew McGregor
@ 2012-11-28 0:51 ` Paul E. McKenney
2012-11-28 17:36 ` Paul E. McKenney
1 sibling, 0 replies; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-28 0:51 UTC (permalink / raw)
To: Andrew McGregor
Cc: David Lang, Paolo Valente, Toke Høiland-Jørgensen,
codel, cerowrt-devel, bloat, John Crispin
On Wed, Nov 28, 2012 at 12:15:35PM +1300, Andrew McGregor wrote:
>
> On 28/11/2012, at 11:54 AM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
> > On Tue, Nov 27, 2012 at 02:31:53PM -0800, David Lang wrote:
> >> On Tue, 27 Nov 2012, Jim Gettys wrote:
> >>
> >>> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> >>> really like to penalize those who induce congestion the most. But we don't
> >>> currently have a solution (though Bob Briscoe at BT thinks he does, and is
> >>> seeing if he can get it out from under a BT patent), so the current
> >>> fq_codel round robins ultimately until/unless we can do something like
> >>> Bob's idea. This is a local information only subset of the ideas he's been
> >>> working on in the congestion exposure (conex) group at the IETF.
> >>
> >> Even more than this, we _know_ that we don't want to be fair in
> >> terms of the raw packet priority.
> >>
> >> For example, we know that we want to prioritize DNS traffic over TCP
> >> streams (due to the fact that the TCP traffic usually can't even
> >> start until DNS resolution finishes)
> >>
> >> We strongly suspect that we want to prioritize short-lived
> >> connections over long lived connections. We don't know a good way to
> >> do this, but one good starting point would be to prioritize syn
> >> packets so that the initialization of the connection happens as fast
> >> as possible.
> >>
> >> Ideally we'd probably like to prioritize the first couple of packets
> >> of a connection so that very short lived connections finish quickly
>
> fq_codel does all of this, although it isn't explicit about it so it is hard to see how it happens.
>
> >> it may make sense to prioritize fin packets so that connection
> >> teardown (and the resulting release of resources and connection
> >> tracking) happens as fast as possible
> >>
> >> all of these are horribly unfair when you are looking at the raw
> >> packet flow, but they significantly help the user's percieved
> >> response time without making much difference on the large download
> >> cases.
> >
> > In all cases, to Jim's point, as long as we avoid starvation. And there
> > will likely be more corner cases that show up under extreme overload.
> >
> > Thanx, Paul
> >
>
> So, fq_codel exhibits a new kind of fairness: it is jitter fair, or in other words, each flow gets the same bound on how much jitter it can induce in the whole ensemble of flows. Exceed that bound, and flows get deprioritised. This achieves thin-flow and DNS prioritisation, while allowing TCP flows to build more buffer if required. The sub-flow CoDel queues then allow short flows to use a reasonably large buffer, while draining standing buffers for long TCP flows.
>
> The really interesting part of the jitter-fair behaviour is that jitter-sensitive traffic is protected as much as it can be, provided its own sending rate control does something sensible. Good news for interactive video, in other words.
>
> The actual jitter bound is the transmission time of max(mtu, quantum) * n_thin_flows bytes, where a thin flow is one that has not exceeded its own jitter allowance since the last time its queue drained. While it is possible that there might instantaneously be a fairly large number of thin flows, in practice on a home network link there are normally only a very few of these at any one moment, and so the jitter experienced is pretty good.
I will have to think about this, but at first glance I kinda like the
idea of describing FQ-CoDel as jitter fair. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 0:27 ` Paul E. McKenney
@ 2012-11-28 3:43 ` Kathleen Nichols
2012-11-28 4:38 ` Paul E. McKenney
0 siblings, 1 reply; 56+ messages in thread
From: Kathleen Nichols @ 2012-11-28 3:43 UTC (permalink / raw)
To: paulmck
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
It would be me that tries to say "stochastic flow queuing with CoDel"
as I like to be accurate. But I think FQ-Codel is Flow queuing with CoDel.
JimG suggests "smart flow queuing" because he is ever mindful of the
big audience.
On 11/27/12 4:27 PM, Paul E. McKenney wrote:
> On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote:
>> BTW, I've heard some use the term "stochastic flow queueing" as a
>> replacement to avoid the term "fair". Seems like a more apt term anyway.
>
> Would that mean that FQ-CoDel is Flow Queue Controlled Delay? ;-)
>
> Thanx, Paul
>
>> -Greg
>>
>>
>> On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>>
>>> Thank you for the review and comments, Jim! I will apply them when
>>> I get the pen back from Dave. And yes, that is the thing about
>>> "fairness" -- there are a great many definitions, many of the most
>>> useful of which appear to many to be patently unfair. ;-)
>>>
>>> As you suggest, it might well be best to drop discussion of fairness,
>>> or to at the least supply the corresponding definition.
>>>
>>> Thanx, Paul
>>>
>>> On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
>>>> Some points worth making:
>>>>
>>>> 1) It is important to point out that (and how) fq_codel avoids
>>>> starvation:
>>>> unpleasant as elephant flows are, it would be very unfriendly to never
>>>> service them at all until they time out.
>>>>
>>>> 2) "fairness" is not necessarily what we ultimately want at all; you'd
>>>> really like to penalize those who induce congestion the most. But we
>>>> don't
>>>> currently have a solution (though Bob Briscoe at BT thinks he does, and
>>>> is
>>>> seeing if he can get it out from under a BT patent), so the current
>>>> fq_codel round robins ultimately until/unless we can do something like
>>>> Bob's idea. This is a local information only subset of the ideas he's
>>>> been
>>>> working on in the congestion exposure (conex) group at the IETF.
>>>>
>>>> 3) "fairness" is always in the eyes of the beholder (and should be left
>>>> to
>>>> the beholder to determine). "fairness" depends on where in the network
>>>> you
>>>> are. While being "fair" among TCP flows is sensible default policy for
>>>> a
>>>> host, else where in the network it may not be/usually isn't.
>>>>
>>>> Two examples:
>>>> o at a home router, you probably want to be "fair" according to transmit
>>>> opportunities. We really don't want a single system remote from the
>>>> router
>>>> to be able to starve the network so that devices near the router get
>>>> much
>>>> less bandwidth than you might hope/expect.
>>>>
>>>> What is more, you probably want to account for a single host using many
>>>> flows, and regulate that they not be able to "hog" bandwidth in the home
>>>> environment, but only use their "fair" share.
>>>>
>>>> o at an ISP, you must to be "fair" between customers; it is best to
>>>> leave
>>>> the judgement of "fairness" at finer granularity (e.g. host and TCP
>>>> flows)
>>>> to the points closer to the customer's systems, so that they can enforce
>>>> whatever definition of "fair" they need to themselves.
>>>>
>>>>
>>>> Algorithms like fq_codel can be/should be adjusted to the circumstances.
>>>>
>>>> And therefore exactly what you choose to hash against to form the
>>>> buckets
>>>> will vary depending on where you are. That at least one step (at the
>>>> user's device) of this be TCP flow "fair" does have the great advantage
>>>> of
>>>> helping the RTT unfairness problem that violates the principle of "least
>>>> surprise", such as that routinely seen in places like New Zealand.
>>>>
>>>> This is why I have so many problems using the word "fair" near this
>>>> algorithm. "fair" is impossible to define, overloaded in people's mind
>>>> with TCP fair queuing, not even desirable much of the time, and by
>>>> definition and design, even today's fq_codel isn't fair to lots of
>>>> things,
>>>> and the same basic algorithm can/should be tweaked in lots of directions
>>>> depending on what we need to do. Calling this "smart" queuing or some
>>>> such
>>>> would be better.
>>>>
>>>> When you've done another round on the document, I'll do a more detailed
>>>> read.
>>>> - Jim
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
>>>> paulmck@linux.vnet.ibm.com> wrote:
>>>>
>>>>> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
>>>>>> David Woodhouse and I fiddled a lot with adsl and openwrt and a
>>>>>> variety of drivers and network layers in a typical bonded adsl stack
>>>>>> yesterday. The complexity of it all makes my head hurt. I'm happy
>>>> that
>>>>>> a newly BQL'd ethernet driver (for the geos and qemu) emerged from
>>>> it,
>>>>>> which he submitted to netdev...
>>>>>
>>>>> Cool!!! ;-)
>>>>>
>>>>>> I made a recording of us last night discussing the layers, which I
>>>>>> will produce and distribute later...
>>>>>>
>>>>>> Anyway, along the way, we fiddled a lot with trying to analyze where
>>>>>> the 350ms or so of added latency was coming from in the traverse
>>>> geo's
>>>>>> adsl implementation and overlying stack....
>>>>>>
>>>>>> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
>>>>>>
>>>>>> Note: 1:
>>>>>>
>>>>>> The netperf sample rate on the rrul test needs to be higher than
>>>>>> 100ms in order to get a decent result at sub 10Mbit speeds.
>>>>>>
>>>>>> Note 2:
>>>>>>
>>>>>> The two nicest graphs here are nofq.svg vs fq.svg, which were taken
>>>> on
>>>>>> a gigE link from a Mac running Linux to another gigE link. (in other
>>>>>> words, NOT on the friggin adsl link) (firefox can display svg, I
>>>> don't
>>>>>> know what else) I find the T+10 delay before stream start in the
>>>>>> fq.svg graph suspicious and think the "throw out the outlier" code
>>>> in
>>>>>> the netperf-wrapper code is at fault. Prior to that, codel is merely
>>>>>> buffering up things madly, which can also be seen in the pfifo_fast
>>>>>> behavior, with 1000pkts it's default.
>>>>>
>>>>> I am using these two in a new "Effectiveness of FQ-CoDel" section.
>>>>> Chrome can display .svg, and if it becomes a problem, I am sure that
>>>>> they can be converted. Please let me know if some other data would
>>>>> make the point better.
>>>>>
>>>>> I am assuming that the colored throughput spikes are due to occasional
>>>>> packet losses. Please let me know if this interpretation is overly
>>>> naive.
>>>>>
>>>>> Also, I know what ICMP is, but the UDP variants are new to me. Could
>>>>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>>>>>
>>>>>> (Arguably, the default queue length in codel can be reduced from 10k
>>>>>> packets to something more reasonable at GigE speeds)
>>>>>>
>>>>>> (the indicator that it's the graph, not the reality, is that the
>>>>>> fq.svg pings and udp start at T+5 and grow minimally, as is usual
>>>> with
>>>>>> fq_codel.)
>>>>>
>>>>> All sessions were started at T+5, then?
>>>>>
>>>>>> As for the *.ps graphs, well, they would take david's network
>>>> topology
>>>>>> to explain, and were conducted over a variety of circumstances,
>>>>>> including wifi, with more variables in play than I care to think
>>>>>> about.
>>>>>>
>>>>>> We didn't really get anywhere on digging deeper. As we got to purer
>>>>>> tests - with a minimal number of boxes, running pure ethernet,
>>>>>> switched over a couple of switches, even in the simplest two box
>>>> case,
>>>>>> my HTB based "ceroshaper" implementation had multiple problems in
>>>>>> cutting median latencies below 100ms, on this very slow ADSL link.
>>>>>> David suspects problems on the path along the carrier backbone as a
>>>>>> potential issue, and the only way to measure that is with two one
>>>> way
>>>>>> trip time measurements (rather than rtt), time synced via ntp... I
>>>>>> keep hoping to find a rtp test, but I'm open to just about any
>>>> option
>>>>>> at this point. anyone?
>>>>>>
>>>>>> We also found a probable bug in mtr in that multiple mtrs on the
>>>> same
>>>>>> box don't co-exist.
>>>>>
>>>>> I must confess that I am not seeing all that clear a difference
>>>> between
>>>>> the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better
>>>> latencies
>>>>> for FQ-CoDel, but not unambiguously so.
>>>>>
>>>>>> Moving back to more scientific clarity and simpler tests...
>>>>>>
>>>>>> The two graphs, taken a few weeks back, on pages 5 and 6 of this:
>>>>>>
>>>>>>
>>>>>
>>>> http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
>>>> erbloat_on_wifi.pdf
>>>>>>
>>>>>> appear to show the advantage of fq_codel fq + codel + head drop over
>>>>>> tail drop during the slow start period on a 10Mbit link - (see how
>>>>>> squiggly slow start is on pfifo fast?) as well as the marvelous
>>>>>> interstream latency that can be achieved with BQL=3000 (on a 10 mbit
>>>>>> link.) Even that latency can be halved by reducing BQL to 1500,
>>>> which
>>>>>> is just fine on a 10mbit. Below those rates I'd like to be rid of
>>>> BQL
>>>>>> entirely, and just have a single packet outstanding... in everything
>>>>>> from adsl to cable...
>>>>>>
>>>>>> That said, I'd welcome other explanations of the squiggly slowstart
>>>>>> pfifo_fast behavior before I put that explanation on the slide....
>>>> ECN
>>>>>> was in play here, too. I can redo this test easily, it's basically
>>>>>> running a netperf TCP_RR for 70 seconds, and starting up a
>>>> TCP_MAERTS
>>>>>> and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
>>>>>> limit and the link speeds on two sides of a directly connected
>>>> laptop
>>>>>> connection.
>>>>>
>>>>> I must defer to others on this one. I do note the much lower
>>>> latencies
>>>>> on slide 6 compared to slide 5, though.
>>>>>
>>>>> Please see attached for update including .git directory.
>>>>>
>>>>> Thanx, Paul
>>>>>
>>>>>> ethtool -s eth0 advertise 0x002 # 10 Mbit
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Cerowrt-devel mailing list
>>>>> Cerowrt-devel@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> Codel mailing list
>>> Codel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/codel
>>
>
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 3:43 ` Kathleen Nichols
@ 2012-11-28 4:38 ` Paul E. McKenney
2012-11-28 16:01 ` Paul E. McKenney
0 siblings, 1 reply; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-28 4:38 UTC (permalink / raw)
To: Kathleen Nichols
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
I guess I just have to be grateful that people mostly agree on the acronym,
regardless of the expansion.
Thanx, Paul
On Tue, Nov 27, 2012 at 07:43:56PM -0800, Kathleen Nichols wrote:
>
> It would be me that tries to say "stochastic flow queuing with CoDel"
> as I like to be accurate. But I think FQ-Codel is Flow queuing with CoDel.
> JimG suggests "smart flow queuing" because he is ever mindful of the
> big audience.
>
> On 11/27/12 4:27 PM, Paul E. McKenney wrote:
> > On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote:
> >> BTW, I've heard some use the term "stochastic flow queueing" as a
> >> replacement to avoid the term "fair". Seems like a more apt term anyway.
> >
> > Would that mean that FQ-CoDel is Flow Queue Controlled Delay? ;-)
> >
> > Thanx, Paul
> >
> >> -Greg
> >>
> >>
> >> On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >>
> >>> Thank you for the review and comments, Jim! I will apply them when
> >>> I get the pen back from Dave. And yes, that is the thing about
> >>> "fairness" -- there are a great many definitions, many of the most
> >>> useful of which appear to many to be patently unfair. ;-)
> >>>
> >>> As you suggest, it might well be best to drop discussion of fairness,
> >>> or to at the least supply the corresponding definition.
> >>>
> >>> Thanx, Paul
> >>>
> >>> On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> >>>> Some points worth making:
> >>>>
> >>>> 1) It is important to point out that (and how) fq_codel avoids
> >>>> starvation:
> >>>> unpleasant as elephant flows are, it would be very unfriendly to never
> >>>> service them at all until they time out.
> >>>>
> >>>> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> >>>> really like to penalize those who induce congestion the most. But we
> >>>> don't
> >>>> currently have a solution (though Bob Briscoe at BT thinks he does, and
> >>>> is
> >>>> seeing if he can get it out from under a BT patent), so the current
> >>>> fq_codel round robins ultimately until/unless we can do something like
> >>>> Bob's idea. This is a local information only subset of the ideas he's
> >>>> been
> >>>> working on in the congestion exposure (conex) group at the IETF.
> >>>>
> >>>> 3) "fairness" is always in the eyes of the beholder (and should be left
> >>>> to
> >>>> the beholder to determine). "fairness" depends on where in the network
> >>>> you
> >>>> are. While being "fair" among TCP flows is sensible default policy for
> >>>> a
> >>>> host, else where in the network it may not be/usually isn't.
> >>>>
> >>>> Two examples:
> >>>> o at a home router, you probably want to be "fair" according to transmit
> >>>> opportunities. We really don't want a single system remote from the
> >>>> router
> >>>> to be able to starve the network so that devices near the router get
> >>>> much
> >>>> less bandwidth than you might hope/expect.
> >>>>
> >>>> What is more, you probably want to account for a single host using many
> >>>> flows, and regulate that they not be able to "hog" bandwidth in the home
> >>>> environment, but only use their "fair" share.
> >>>>
> >>>> o at an ISP, you must to be "fair" between customers; it is best to
> >>>> leave
> >>>> the judgement of "fairness" at finer granularity (e.g. host and TCP
> >>>> flows)
> >>>> to the points closer to the customer's systems, so that they can enforce
> >>>> whatever definition of "fair" they need to themselves.
> >>>>
> >>>>
> >>>> Algorithms like fq_codel can be/should be adjusted to the circumstances.
> >>>>
> >>>> And therefore exactly what you choose to hash against to form the
> >>>> buckets
> >>>> will vary depending on where you are. That at least one step (at the
> >>>> user's device) of this be TCP flow "fair" does have the great advantage
> >>>> of
> >>>> helping the RTT unfairness problem that violates the principle of "least
> >>>> surprise", such as that routinely seen in places like New Zealand.
> >>>>
> >>>> This is why I have so many problems using the word "fair" near this
> >>>> algorithm. "fair" is impossible to define, overloaded in people's mind
> >>>> with TCP fair queuing, not even desirable much of the time, and by
> >>>> definition and design, even today's fq_codel isn't fair to lots of
> >>>> things,
> >>>> and the same basic algorithm can/should be tweaked in lots of directions
> >>>> depending on what we need to do. Calling this "smart" queuing or some
> >>>> such
> >>>> would be better.
> >>>>
> >>>> When you've done another round on the document, I'll do a more detailed
> >>>> read.
> >>>> - Jim
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> >>>> paulmck@linux.vnet.ibm.com> wrote:
> >>>>
> >>>>> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> >>>>>> David Woodhouse and I fiddled a lot with adsl and openwrt and a
> >>>>>> variety of drivers and network layers in a typical bonded adsl stack
> >>>>>> yesterday. The complexity of it all makes my head hurt. I'm happy
> >>>> that
> >>>>>> a newly BQL'd ethernet driver (for the geos and qemu) emerged from
> >>>> it,
> >>>>>> which he submitted to netdev...
> >>>>>
> >>>>> Cool!!! ;-)
> >>>>>
> >>>>>> I made a recording of us last night discussing the layers, which I
> >>>>>> will produce and distribute later...
> >>>>>>
> >>>>>> Anyway, along the way, we fiddled a lot with trying to analyze where
> >>>>>> the 350ms or so of added latency was coming from in the traverse
> >>>> geo's
> >>>>>> adsl implementation and overlying stack....
> >>>>>>
> >>>>>> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> >>>>>>
> >>>>>> Note: 1:
> >>>>>>
> >>>>>> The netperf sample rate on the rrul test needs to be higher than
> >>>>>> 100ms in order to get a decent result at sub 10Mbit speeds.
> >>>>>>
> >>>>>> Note 2:
> >>>>>>
> >>>>>> The two nicest graphs here are nofq.svg vs fq.svg, which were taken
> >>>> on
> >>>>>> a gigE link from a Mac running Linux to another gigE link. (in other
> >>>>>> words, NOT on the friggin adsl link) (firefox can display svg, I
> >>>> don't
> >>>>>> know what else) I find the T+10 delay before stream start in the
> >>>>>> fq.svg graph suspicious and think the "throw out the outlier" code
> >>>> in
> >>>>>> the netperf-wrapper code is at fault. Prior to that, codel is merely
> >>>>>> buffering up things madly, which can also be seen in the pfifo_fast
> >>>>>> behavior, with 1000pkts it's default.
> >>>>>
> >>>>> I am using these two in a new "Effectiveness of FQ-CoDel" section.
> >>>>> Chrome can display .svg, and if it becomes a problem, I am sure that
> >>>>> they can be converted. Please let me know if some other data would
> >>>>> make the point better.
> >>>>>
> >>>>> I am assuming that the colored throughput spikes are due to occasional
> >>>>> packet losses. Please let me know if this interpretation is overly
> >>>> naive.
> >>>>>
> >>>>> Also, I know what ICMP is, but the UDP variants are new to me. Could
> >>>>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> >>>>>
> >>>>>> (Arguably, the default queue length in codel can be reduced from 10k
> >>>>>> packets to something more reasonable at GigE speeds)
> >>>>>>
> >>>>>> (the indicator that it's the graph, not the reality, is that the
> >>>>>> fq.svg pings and udp start at T+5 and grow minimally, as is usual
> >>>> with
> >>>>>> fq_codel.)
> >>>>>
> >>>>> All sessions were started at T+5, then?
> >>>>>
> >>>>>> As for the *.ps graphs, well, they would take david's network
> >>>> topology
> >>>>>> to explain, and were conducted over a variety of circumstances,
> >>>>>> including wifi, with more variables in play than I care to think
> >>>>>> about.
> >>>>>>
> >>>>>> We didn't really get anywhere on digging deeper. As we got to purer
> >>>>>> tests - with a minimal number of boxes, running pure ethernet,
> >>>>>> switched over a couple of switches, even in the simplest two box
> >>>> case,
> >>>>>> my HTB based "ceroshaper" implementation had multiple problems in
> >>>>>> cutting median latencies below 100ms, on this very slow ADSL link.
> >>>>>> David suspects problems on the path along the carrier backbone as a
> >>>>>> potential issue, and the only way to measure that is with two one
> >>>> way
> >>>>>> trip time measurements (rather than rtt), time synced via ntp... I
> >>>>>> keep hoping to find a rtp test, but I'm open to just about any
> >>>> option
> >>>>>> at this point. anyone?
> >>>>>>
> >>>>>> We also found a probable bug in mtr in that multiple mtrs on the
> >>>> same
> >>>>>> box don't co-exist.
> >>>>>
> >>>>> I must confess that I am not seeing all that clear a difference
> >>>> between
> >>>>> the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better
> >>>> latencies
> >>>>> for FQ-CoDel, but not unambiguously so.
> >>>>>
> >>>>>> Moving back to more scientific clarity and simpler tests...
> >>>>>>
> >>>>>> The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> >>>>>>
> >>>>>>
> >>>>>
> >>>> http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
> >>>> erbloat_on_wifi.pdf
> >>>>>>
> >>>>>> appear to show the advantage of fq_codel fq + codel + head drop over
> >>>>>> tail drop during the slow start period on a 10Mbit link - (see how
> >>>>>> squiggly slow start is on pfifo fast?) as well as the marvelous
> >>>>>> interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> >>>>>> link.) Even that latency can be halved by reducing BQL to 1500,
> >>>> which
> >>>>>> is just fine on a 10mbit. Below those rates I'd like to be rid of
> >>>> BQL
> >>>>>> entirely, and just have a single packet outstanding... in everything
> >>>>>> from adsl to cable...
> >>>>>>
> >>>>>> That said, I'd welcome other explanations of the squiggly slowstart
> >>>>>> pfifo_fast behavior before I put that explanation on the slide....
> >>>> ECN
> >>>>>> was in play here, too. I can redo this test easily, it's basically
> >>>>>> running a netperf TCP_RR for 70 seconds, and starting up a
> >>>> TCP_MAERTS
> >>>>>> and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> >>>>>> limit and the link speeds on two sides of a directly connected
> >>>> laptop
> >>>>>> connection.
> >>>>>
> >>>>> I must defer to others on this one. I do note the much lower
> >>>> latencies
> >>>>> on slide 6 compared to slide 5, though.
> >>>>>
> >>>>> Please see attached for update including .git directory.
> >>>>>
> >>>>> Thanx, Paul
> >>>>>
> >>>>>> ethtool -s eth0 advertise 0x002 # 10 Mbit
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Cerowrt-devel mailing list
> >>>>> Cerowrt-devel@lists.bufferbloat.net
> >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >>>>>
> >>>>>
> >>>
> >>> _______________________________________________
> >>> Codel mailing list
> >>> Codel@lists.bufferbloat.net
> >>> https://lists.bufferbloat.net/listinfo/codel
> >>
> >
> > _______________________________________________
> > Codel mailing list
> > Codel@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/codel
> >
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] [Bloat] FQ_Codel lwn draft article review
2012-11-27 22:31 ` [Codel] [Bloat] " David Lang
2012-11-27 22:54 ` Paul E. McKenney
@ 2012-11-28 14:06 ` Michael Richardson
1 sibling, 0 replies; 56+ messages in thread
From: Michael Richardson @ 2012-11-28 14:06 UTC (permalink / raw)
To: codel, cerowrt-devel, bloat, Paul McKenney, David Woodhouse,
John Crispin, David Lang
[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]
>>>>> "David" == David Lang <david@lang.hm> writes:
David> We strongly suspect that we want to prioritize short-lived connections over
David> long lived connections. We don't know a good way to do this, but one good
David> starting point would be to prioritize syn packets so that the initialization
David> of the connection happens as fast as possible.
David> Ideally we'd probably like to prioritize the first couple of packets of a
David> connection so that very short lived connections finish quickly
It's not short-lived that we care about (think HTTP/1.1 connections).
It's connections which are newly active. I think, but I'm not certain,
are those which have not yet opened their window: slow start is still
active. That also means that the connection can not yet dominate.
--
] He who is tired of Weird Al is tired of life! | firewalls [
] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[
] mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[
Kyoto Plus: watch the video <http://www.youtube.com/watch?v=kzx1ycLXQSE>
then sign the petition.
[-- Attachment #2: Type: application/pgp-signature, Size: 307 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 4:38 ` Paul E. McKenney
@ 2012-11-28 16:01 ` Paul E. McKenney
2012-11-28 16:16 ` Jonathan Morton
0 siblings, 1 reply; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-28 16:01 UTC (permalink / raw)
To: Kathleen Nichols
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
Dave gave me back the pen, so I looked to see what I had expanded
FQ-CoDel to. The answer was... Nothing. Nothing at all.
So I added a Quick Quiz as follows:
Quick Quiz 2: What does the FQ-CoDel acronym expand to?
Answer: There are some differences of opinion on this. The
comment header in net/sched/sch_fq_codel.c says
“Fair Queue CoDel” (presumably by analogy to SFQ's
expansion of “Stochastic Fairness Queueing”), and
“CoDel” is generally agreed to expand to “controlled
delay”. However, some prefer “Flow Queue Controlled
Delay” and still others prefer to prepend a silent and
invisible "S", expanding to “Stochastic Flow Queue
Controlled Delay” or “Smart Flow Queue Controlled
Delay”. No doubt additional expansions will appear in
the fullness of time.
In the meantime, this article focuses on the concepts,
implementation, and performance, leaving naming debates
to others.
This level snarkiness would go over reasonably well in an LWN article,
I would -not- suggest this approach in an academic paper, just in case
you were wondering. But if there is too much discomfort with snarking,
I just might be convinced to take another approach.
Thanx, Paul
On Tue, Nov 27, 2012 at 08:38:38PM -0800, Paul E. McKenney wrote:
> I guess I just have to be grateful that people mostly agree on the acronym,
> regardless of the expansion.
>
> Thanx, Paul
>
> On Tue, Nov 27, 2012 at 07:43:56PM -0800, Kathleen Nichols wrote:
> >
> > It would be me that tries to say "stochastic flow queuing with CoDel"
> > as I like to be accurate. But I think FQ-Codel is Flow queuing with CoDel.
> > JimG suggests "smart flow queuing" because he is ever mindful of the
> > big audience.
> >
> > On 11/27/12 4:27 PM, Paul E. McKenney wrote:
> > > On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote:
> > >> BTW, I've heard some use the term "stochastic flow queueing" as a
> > >> replacement to avoid the term "fair". Seems like a more apt term anyway.
> > >
> > > Would that mean that FQ-CoDel is Flow Queue Controlled Delay? ;-)
> > >
> > > Thanx, Paul
> > >
> > >> -Greg
> > >>
> > >>
> > >> On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >>
> > >>> Thank you for the review and comments, Jim! I will apply them when
> > >>> I get the pen back from Dave. And yes, that is the thing about
> > >>> "fairness" -- there are a great many definitions, many of the most
> > >>> useful of which appear to many to be patently unfair. ;-)
> > >>>
> > >>> As you suggest, it might well be best to drop discussion of fairness,
> > >>> or to at the least supply the corresponding definition.
> > >>>
> > >>> Thanx, Paul
> > >>>
> > >>> On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> > >>>> Some points worth making:
> > >>>>
> > >>>> 1) It is important to point out that (and how) fq_codel avoids
> > >>>> starvation:
> > >>>> unpleasant as elephant flows are, it would be very unfriendly to never
> > >>>> service them at all until they time out.
> > >>>>
> > >>>> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> > >>>> really like to penalize those who induce congestion the most. But we
> > >>>> don't
> > >>>> currently have a solution (though Bob Briscoe at BT thinks he does, and
> > >>>> is
> > >>>> seeing if he can get it out from under a BT patent), so the current
> > >>>> fq_codel round robins ultimately until/unless we can do something like
> > >>>> Bob's idea. This is a local information only subset of the ideas he's
> > >>>> been
> > >>>> working on in the congestion exposure (conex) group at the IETF.
> > >>>>
> > >>>> 3) "fairness" is always in the eyes of the beholder (and should be left
> > >>>> to
> > >>>> the beholder to determine). "fairness" depends on where in the network
> > >>>> you
> > >>>> are. While being "fair" among TCP flows is sensible default policy for
> > >>>> a
> > >>>> host, else where in the network it may not be/usually isn't.
> > >>>>
> > >>>> Two examples:
> > >>>> o at a home router, you probably want to be "fair" according to transmit
> > >>>> opportunities. We really don't want a single system remote from the
> > >>>> router
> > >>>> to be able to starve the network so that devices near the router get
> > >>>> much
> > >>>> less bandwidth than you might hope/expect.
> > >>>>
> > >>>> What is more, you probably want to account for a single host using many
> > >>>> flows, and regulate that they not be able to "hog" bandwidth in the home
> > >>>> environment, but only use their "fair" share.
> > >>>>
> > >>>> o at an ISP, you must to be "fair" between customers; it is best to
> > >>>> leave
> > >>>> the judgement of "fairness" at finer granularity (e.g. host and TCP
> > >>>> flows)
> > >>>> to the points closer to the customer's systems, so that they can enforce
> > >>>> whatever definition of "fair" they need to themselves.
> > >>>>
> > >>>>
> > >>>> Algorithms like fq_codel can be/should be adjusted to the circumstances.
> > >>>>
> > >>>> And therefore exactly what you choose to hash against to form the
> > >>>> buckets
> > >>>> will vary depending on where you are. That at least one step (at the
> > >>>> user's device) of this be TCP flow "fair" does have the great advantage
> > >>>> of
> > >>>> helping the RTT unfairness problem that violates the principle of "least
> > >>>> surprise", such as that routinely seen in places like New Zealand.
> > >>>>
> > >>>> This is why I have so many problems using the word "fair" near this
> > >>>> algorithm. "fair" is impossible to define, overloaded in people's mind
> > >>>> with TCP fair queuing, not even desirable much of the time, and by
> > >>>> definition and design, even today's fq_codel isn't fair to lots of
> > >>>> things,
> > >>>> and the same basic algorithm can/should be tweaked in lots of directions
> > >>>> depending on what we need to do. Calling this "smart" queuing or some
> > >>>> such
> > >>>> would be better.
> > >>>>
> > >>>> When you've done another round on the document, I'll do a more detailed
> > >>>> read.
> > >>>> - Jim
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> > >>>> paulmck@linux.vnet.ibm.com> wrote:
> > >>>>
> > >>>>> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > >>>>>> David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > >>>>>> variety of drivers and network layers in a typical bonded adsl stack
> > >>>>>> yesterday. The complexity of it all makes my head hurt. I'm happy
> > >>>> that
> > >>>>>> a newly BQL'd ethernet driver (for the geos and qemu) emerged from
> > >>>> it,
> > >>>>>> which he submitted to netdev...
> > >>>>>
> > >>>>> Cool!!! ;-)
> > >>>>>
> > >>>>>> I made a recording of us last night discussing the layers, which I
> > >>>>>> will produce and distribute later...
> > >>>>>>
> > >>>>>> Anyway, along the way, we fiddled a lot with trying to analyze where
> > >>>>>> the 350ms or so of added latency was coming from in the traverse
> > >>>> geo's
> > >>>>>> adsl implementation and overlying stack....
> > >>>>>>
> > >>>>>> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> > >>>>>>
> > >>>>>> Note: 1:
> > >>>>>>
> > >>>>>> The netperf sample rate on the rrul test needs to be higher than
> > >>>>>> 100ms in order to get a decent result at sub 10Mbit speeds.
> > >>>>>>
> > >>>>>> Note 2:
> > >>>>>>
> > >>>>>> The two nicest graphs here are nofq.svg vs fq.svg, which were taken
> > >>>> on
> > >>>>>> a gigE link from a Mac running Linux to another gigE link. (in other
> > >>>>>> words, NOT on the friggin adsl link) (firefox can display svg, I
> > >>>> don't
> > >>>>>> know what else) I find the T+10 delay before stream start in the
> > >>>>>> fq.svg graph suspicious and think the "throw out the outlier" code
> > >>>> in
> > >>>>>> the netperf-wrapper code is at fault. Prior to that, codel is merely
> > >>>>>> buffering up things madly, which can also be seen in the pfifo_fast
> > >>>>>> behavior, with 1000pkts it's default.
> > >>>>>
> > >>>>> I am using these two in a new "Effectiveness of FQ-CoDel" section.
> > >>>>> Chrome can display .svg, and if it becomes a problem, I am sure that
> > >>>>> they can be converted. Please let me know if some other data would
> > >>>>> make the point better.
> > >>>>>
> > >>>>> I am assuming that the colored throughput spikes are due to occasional
> > >>>>> packet losses. Please let me know if this interpretation is overly
> > >>>> naive.
> > >>>>>
> > >>>>> Also, I know what ICMP is, but the UDP variants are new to me. Could
> > >>>>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> > >>>>>
> > >>>>>> (Arguably, the default queue length in codel can be reduced from 10k
> > >>>>>> packets to something more reasonable at GigE speeds)
> > >>>>>>
> > >>>>>> (the indicator that it's the graph, not the reality, is that the
> > >>>>>> fq.svg pings and udp start at T+5 and grow minimally, as is usual
> > >>>> with
> > >>>>>> fq_codel.)
> > >>>>>
> > >>>>> All sessions were started at T+5, then?
> > >>>>>
> > >>>>>> As for the *.ps graphs, well, they would take david's network
> > >>>> topology
> > >>>>>> to explain, and were conducted over a variety of circumstances,
> > >>>>>> including wifi, with more variables in play than I care to think
> > >>>>>> about.
> > >>>>>>
> > >>>>>> We didn't really get anywhere on digging deeper. As we got to purer
> > >>>>>> tests - with a minimal number of boxes, running pure ethernet,
> > >>>>>> switched over a couple of switches, even in the simplest two box
> > >>>> case,
> > >>>>>> my HTB based "ceroshaper" implementation had multiple problems in
> > >>>>>> cutting median latencies below 100ms, on this very slow ADSL link.
> > >>>>>> David suspects problems on the path along the carrier backbone as a
> > >>>>>> potential issue, and the only way to measure that is with two one
> > >>>> way
> > >>>>>> trip time measurements (rather than rtt), time synced via ntp... I
> > >>>>>> keep hoping to find a rtp test, but I'm open to just about any
> > >>>> option
> > >>>>>> at this point. anyone?
> > >>>>>>
> > >>>>>> We also found a probable bug in mtr in that multiple mtrs on the
> > >>>> same
> > >>>>>> box don't co-exist.
> > >>>>>
> > >>>>> I must confess that I am not seeing all that clear a difference
> > >>>> between
> > >>>>> the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better
> > >>>> latencies
> > >>>>> for FQ-CoDel, but not unambiguously so.
> > >>>>>
> > >>>>>> Moving back to more scientific clarity and simpler tests...
> > >>>>>>
> > >>>>>> The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>> http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
> > >>>> erbloat_on_wifi.pdf
> > >>>>>>
> > >>>>>> appear to show the advantage of fq_codel fq + codel + head drop over
> > >>>>>> tail drop during the slow start period on a 10Mbit link - (see how
> > >>>>>> squiggly slow start is on pfifo fast?) as well as the marvelous
> > >>>>>> interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> > >>>>>> link.) Even that latency can be halved by reducing BQL to 1500,
> > >>>> which
> > >>>>>> is just fine on a 10mbit. Below those rates I'd like to be rid of
> > >>>> BQL
> > >>>>>> entirely, and just have a single packet outstanding... in everything
> > >>>>>> from adsl to cable...
> > >>>>>>
> > >>>>>> That said, I'd welcome other explanations of the squiggly slowstart
> > >>>>>> pfifo_fast behavior before I put that explanation on the slide....
> > >>>> ECN
> > >>>>>> was in play here, too. I can redo this test easily, it's basically
> > >>>>>> running a netperf TCP_RR for 70 seconds, and starting up a
> > >>>> TCP_MAERTS
> > >>>>>> and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> > >>>>>> limit and the link speeds on two sides of a directly connected
> > >>>> laptop
> > >>>>>> connection.
> > >>>>>
> > >>>>> I must defer to others on this one. I do note the much lower
> > >>>> latencies
> > >>>>> on slide 6 compared to slide 5, though.
> > >>>>>
> > >>>>> Please see attached for update including .git directory.
> > >>>>>
> > >>>>> Thanx, Paul
> > >>>>>
> > >>>>>> ethtool -s eth0 advertise 0x002 # 10 Mbit
> > >>>>>>
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Cerowrt-devel mailing list
> > >>>>> Cerowrt-devel@lists.bufferbloat.net
> > >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> > >>>>>
> > >>>>>
> > >>>
> > >>> _______________________________________________
> > >>> Codel mailing list
> > >>> Codel@lists.bufferbloat.net
> > >>> https://lists.bufferbloat.net/listinfo/codel
> > >>
> > >
> > > _______________________________________________
> > > Codel mailing list
> > > Codel@lists.bufferbloat.net
> > > https://lists.bufferbloat.net/listinfo/codel
> > >
> >
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 16:01 ` Paul E. McKenney
@ 2012-11-28 16:16 ` Jonathan Morton
2012-11-28 17:44 ` Paul E. McKenney
0 siblings, 1 reply; 56+ messages in thread
From: Jonathan Morton @ 2012-11-28 16:16 UTC (permalink / raw)
To: paulmck
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
[-- Attachment #1: Type: text/plain, Size: 14954 bytes --]
It may be worth noting that fq-codel is not stochastic in it's fairness
mechanism. SFQ suffers from the birthday effect because it hashes packets
into buffers, which is what makes it stochastic.
- Jonathan Morton
On Nov 28, 2012 6:02 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
wrote:
> Dave gave me back the pen, so I looked to see what I had expanded
> FQ-CoDel to. The answer was... Nothing. Nothing at all.
>
> So I added a Quick Quiz as follows:
>
> Quick Quiz 2: What does the FQ-CoDel acronym expand to?
>
> Answer: There are some differences of opinion on this. The
> comment header in net/sched/sch_fq_codel.c says
> “Fair Queue CoDel” (presumably by analogy to SFQ's
> expansion of “Stochastic Fairness Queueing”), and
> “CoDel” is generally agreed to expand to “controlled
> delay”. However, some prefer “Flow Queue Controlled
> Delay” and still others prefer to prepend a silent and
> invisible "S", expanding to “Stochastic Flow Queue
> Controlled Delay” or “Smart Flow Queue Controlled
> Delay”. No doubt additional expansions will appear in
> the fullness of time.
>
> In the meantime, this article focuses on the concepts,
> implementation, and performance, leaving naming debates
> to others.
>
> This level snarkiness would go over reasonably well in an LWN article,
> I would -not- suggest this approach in an academic paper, just in case
> you were wondering. But if there is too much discomfort with snarking,
> I just might be convinced to take another approach.
>
> Thanx, Paul
>
> On Tue, Nov 27, 2012 at 08:38:38PM -0800, Paul E. McKenney wrote:
> > I guess I just have to be grateful that people mostly agree on the
> acronym,
> > regardless of the expansion.
> >
> > Thanx, Paul
> >
> > On Tue, Nov 27, 2012 at 07:43:56PM -0800, Kathleen Nichols wrote:
> > >
> > > It would be me that tries to say "stochastic flow queuing with CoDel"
> > > as I like to be accurate. But I think FQ-Codel is Flow queuing with
> CoDel.
> > > JimG suggests "smart flow queuing" because he is ever mindful of the
> > > big audience.
> > >
> > > On 11/27/12 4:27 PM, Paul E. McKenney wrote:
> > > > On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote:
> > > >> BTW, I've heard some use the term "stochastic flow queueing" as a
> > > >> replacement to avoid the term "fair". Seems like a more apt term
> anyway.
> > > >
> > > > Would that mean that FQ-CoDel is Flow Queue Controlled Delay? ;-)
> > > >
> > > > Thanx, Paul
> > > >
> > > >> -Greg
> > > >>
> > > >>
> > > >> On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> wrote:
> > > >>
> > > >>> Thank you for the review and comments, Jim! I will apply them when
> > > >>> I get the pen back from Dave. And yes, that is the thing about
> > > >>> "fairness" -- there are a great many definitions, many of the most
> > > >>> useful of which appear to many to be patently unfair. ;-)
> > > >>>
> > > >>> As you suggest, it might well be best to drop discussion of
> fairness,
> > > >>> or to at the least supply the corresponding definition.
> > > >>>
> > > >>> Thanx, Paul
> > > >>>
> > > >>> On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> > > >>>> Some points worth making:
> > > >>>>
> > > >>>> 1) It is important to point out that (and how) fq_codel avoids
> > > >>>> starvation:
> > > >>>> unpleasant as elephant flows are, it would be very unfriendly to
> never
> > > >>>> service them at all until they time out.
> > > >>>>
> > > >>>> 2) "fairness" is not necessarily what we ultimately want at all;
> you'd
> > > >>>> really like to penalize those who induce congestion the most.
> But we
> > > >>>> don't
> > > >>>> currently have a solution (though Bob Briscoe at BT thinks he
> does, and
> > > >>>> is
> > > >>>> seeing if he can get it out from under a BT patent), so the
> current
> > > >>>> fq_codel round robins ultimately until/unless we can do something
> like
> > > >>>> Bob's idea. This is a local information only subset of the ideas
> he's
> > > >>>> been
> > > >>>> working on in the congestion exposure (conex) group at the IETF.
> > > >>>>
> > > >>>> 3) "fairness" is always in the eyes of the beholder (and should
> be left
> > > >>>> to
> > > >>>> the beholder to determine). "fairness" depends on where in the
> network
> > > >>>> you
> > > >>>> are. While being "fair" among TCP flows is sensible default
> policy for
> > > >>>> a
> > > >>>> host, else where in the network it may not be/usually isn't.
> > > >>>>
> > > >>>> Two examples:
> > > >>>> o at a home router, you probably want to be "fair" according to
> transmit
> > > >>>> opportunities. We really don't want a single system remote from
> the
> > > >>>> router
> > > >>>> to be able to starve the network so that devices near the router
> get
> > > >>>> much
> > > >>>> less bandwidth than you might hope/expect.
> > > >>>>
> > > >>>> What is more, you probably want to account for a single host
> using many
> > > >>>> flows, and regulate that they not be able to "hog" bandwidth in
> the home
> > > >>>> environment, but only use their "fair" share.
> > > >>>>
> > > >>>> o at an ISP, you must to be "fair" between customers; it is best
> to
> > > >>>> leave
> > > >>>> the judgement of "fairness" at finer granularity (e.g. host and
> TCP
> > > >>>> flows)
> > > >>>> to the points closer to the customer's systems, so that they can
> enforce
> > > >>>> whatever definition of "fair" they need to themselves.
> > > >>>>
> > > >>>>
> > > >>>> Algorithms like fq_codel can be/should be adjusted to the
> circumstances.
> > > >>>>
> > > >>>> And therefore exactly what you choose to hash against to form the
> > > >>>> buckets
> > > >>>> will vary depending on where you are. That at least one step (at
> the
> > > >>>> user's device) of this be TCP flow "fair" does have the great
> advantage
> > > >>>> of
> > > >>>> helping the RTT unfairness problem that violates the principle of
> "least
> > > >>>> surprise", such as that routinely seen in places like New Zealand.
> > > >>>>
> > > >>>> This is why I have so many problems using the word "fair" near
> this
> > > >>>> algorithm. "fair" is impossible to define, overloaded in
> people's mind
> > > >>>> with TCP fair queuing, not even desirable much of the time, and by
> > > >>>> definition and design, even today's fq_codel isn't fair to lots of
> > > >>>> things,
> > > >>>> and the same basic algorithm can/should be tweaked in lots of
> directions
> > > >>>> depending on what we need to do. Calling this "smart" queuing or
> some
> > > >>>> such
> > > >>>> would be better.
> > > >>>>
> > > >>>> When you've done another round on the document, I'll do a more
> detailed
> > > >>>> read.
> > > >>>> - Jim
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> > > >>>> paulmck@linux.vnet.ibm.com> wrote:
> > > >>>>
> > > >>>>> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > > >>>>>> David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > > >>>>>> variety of drivers and network layers in a typical bonded adsl
> stack
> > > >>>>>> yesterday. The complexity of it all makes my head hurt. I'm
> happy
> > > >>>> that
> > > >>>>>> a newly BQL'd ethernet driver (for the geos and qemu) emerged
> from
> > > >>>> it,
> > > >>>>>> which he submitted to netdev...
> > > >>>>>
> > > >>>>> Cool!!! ;-)
> > > >>>>>
> > > >>>>>> I made a recording of us last night discussing the layers,
> which I
> > > >>>>>> will produce and distribute later...
> > > >>>>>>
> > > >>>>>> Anyway, along the way, we fiddled a lot with trying to analyze
> where
> > > >>>>>> the 350ms or so of added latency was coming from in the traverse
> > > >>>> geo's
> > > >>>>>> adsl implementation and overlying stack....
> > > >>>>>>
> > > >>>>>> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> > > >>>>>>
> > > >>>>>> Note: 1:
> > > >>>>>>
> > > >>>>>> The netperf sample rate on the rrul test needs to be higher
> than
> > > >>>>>> 100ms in order to get a decent result at sub 10Mbit speeds.
> > > >>>>>>
> > > >>>>>> Note 2:
> > > >>>>>>
> > > >>>>>> The two nicest graphs here are nofq.svg vs fq.svg, which were
> taken
> > > >>>> on
> > > >>>>>> a gigE link from a Mac running Linux to another gigE link. (in
> other
> > > >>>>>> words, NOT on the friggin adsl link) (firefox can display svg, I
> > > >>>> don't
> > > >>>>>> know what else) I find the T+10 delay before stream start in the
> > > >>>>>> fq.svg graph suspicious and think the "throw out the outlier"
> code
> > > >>>> in
> > > >>>>>> the netperf-wrapper code is at fault. Prior to that, codel is
> merely
> > > >>>>>> buffering up things madly, which can also be seen in the
> pfifo_fast
> > > >>>>>> behavior, with 1000pkts it's default.
> > > >>>>>
> > > >>>>> I am using these two in a new "Effectiveness of FQ-CoDel"
> section.
> > > >>>>> Chrome can display .svg, and if it becomes a problem, I am sure
> that
> > > >>>>> they can be converted. Please let me know if some other data
> would
> > > >>>>> make the point better.
> > > >>>>>
> > > >>>>> I am assuming that the colored throughput spikes are due to
> occasional
> > > >>>>> packet losses. Please let me know if this interpretation is
> overly
> > > >>>> naive.
> > > >>>>>
> > > >>>>> Also, I know what ICMP is, but the UDP variants are new to me.
> Could
> > > >>>>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> > > >>>>>
> > > >>>>>> (Arguably, the default queue length in codel can be reduced
> from 10k
> > > >>>>>> packets to something more reasonable at GigE speeds)
> > > >>>>>>
> > > >>>>>> (the indicator that it's the graph, not the reality, is that the
> > > >>>>>> fq.svg pings and udp start at T+5 and grow minimally, as is
> usual
> > > >>>> with
> > > >>>>>> fq_codel.)
> > > >>>>>
> > > >>>>> All sessions were started at T+5, then?
> > > >>>>>
> > > >>>>>> As for the *.ps graphs, well, they would take david's network
> > > >>>> topology
> > > >>>>>> to explain, and were conducted over a variety of circumstances,
> > > >>>>>> including wifi, with more variables in play than I care to think
> > > >>>>>> about.
> > > >>>>>>
> > > >>>>>> We didn't really get anywhere on digging deeper. As we got to
> purer
> > > >>>>>> tests - with a minimal number of boxes, running pure ethernet,
> > > >>>>>> switched over a couple of switches, even in the simplest two box
> > > >>>> case,
> > > >>>>>> my HTB based "ceroshaper" implementation had multiple problems
> in
> > > >>>>>> cutting median latencies below 100ms, on this very slow ADSL
> link.
> > > >>>>>> David suspects problems on the path along the carrier backbone
> as a
> > > >>>>>> potential issue, and the only way to measure that is with two
> one
> > > >>>> way
> > > >>>>>> trip time measurements (rather than rtt), time synced via
> ntp... I
> > > >>>>>> keep hoping to find a rtp test, but I'm open to just about any
> > > >>>> option
> > > >>>>>> at this point. anyone?
> > > >>>>>>
> > > >>>>>> We also found a probable bug in mtr in that multiple mtrs on the
> > > >>>> same
> > > >>>>>> box don't co-exist.
> > > >>>>>
> > > >>>>> I must confess that I am not seeing all that clear a difference
> > > >>>> between
> > > >>>>> the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better
> > > >>>> latencies
> > > >>>>> for FQ-CoDel, but not unambiguously so.
> > > >>>>>
> > > >>>>>> Moving back to more scientific clarity and simpler tests...
> > > >>>>>>
> > > >>>>>> The two graphs, taken a few weeks back, on pages 5 and 6 of
> this:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
> > > >>>> erbloat_on_wifi.pdf
> > > >>>>>>
> > > >>>>>> appear to show the advantage of fq_codel fq + codel + head drop
> over
> > > >>>>>> tail drop during the slow start period on a 10Mbit link - (see
> how
> > > >>>>>> squiggly slow start is on pfifo fast?) as well as the marvelous
> > > >>>>>> interstream latency that can be achieved with BQL=3000 (on a 10
> mbit
> > > >>>>>> link.) Even that latency can be halved by reducing BQL to 1500,
> > > >>>> which
> > > >>>>>> is just fine on a 10mbit. Below those rates I'd like to be rid
> of
> > > >>>> BQL
> > > >>>>>> entirely, and just have a single packet outstanding... in
> everything
> > > >>>>>> from adsl to cable...
> > > >>>>>>
> > > >>>>>> That said, I'd welcome other explanations of the squiggly
> slowstart
> > > >>>>>> pfifo_fast behavior before I put that explanation on the
> slide....
> > > >>>> ECN
> > > >>>>>> was in play here, too. I can redo this test easily, it's
> basically
> > > >>>>>> running a netperf TCP_RR for 70 seconds, and starting up a
> > > >>>> TCP_MAERTS
> > > >>>>>> and TCP_STREAM for 60 seconds a T+5, after hammering down on
> BQL's
> > > >>>>>> limit and the link speeds on two sides of a directly connected
> > > >>>> laptop
> > > >>>>>> connection.
> > > >>>>>
> > > >>>>> I must defer to others on this one. I do note the much lower
> > > >>>> latencies
> > > >>>>> on slide 6 compared to slide 5, though.
> > > >>>>>
> > > >>>>> Please see attached for update including .git directory.
> > > >>>>>
> > > >>>>> Thanx,
> Paul
> > > >>>>>
> > > >>>>>> ethtool -s eth0 advertise 0x002 # 10 Mbit
> > > >>>>>>
> > > >>>>>
> > > >>>>> _______________________________________________
> > > >>>>> Cerowrt-devel mailing list
> > > >>>>> Cerowrt-devel@lists.bufferbloat.net
> > > >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> Codel mailing list
> > > >>> Codel@lists.bufferbloat.net
> > > >>> https://lists.bufferbloat.net/listinfo/codel
> > > >>
> > > >
> > > > _______________________________________________
> > > > Codel mailing list
> > > > Codel@lists.bufferbloat.net
> > > > https://lists.bufferbloat.net/listinfo/codel
> > > >
> > >
>
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel
>
[-- Attachment #2: Type: text/html, Size: 21800 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 22:03 ` [Codel] [Cerowrt-devel] " Jim Gettys
2012-11-27 22:31 ` [Codel] [Bloat] " David Lang
2012-11-27 22:49 ` [Codel] [Cerowrt-devel] " Paul E. McKenney
@ 2012-11-28 17:20 ` Paul E. McKenney
2012-12-02 23:06 ` Paul E. McKenney
2012-11-30 1:09 ` Dan Siemon
3 siblings, 1 reply; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-28 17:20 UTC (permalink / raw)
To: Jim Gettys
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> Some points worth making:
OK, I have the pen again. (Just when you all thought it was safe!)
> 1) It is important to point out that (and how) fq_codel avoids starvation:
> unpleasant as elephant flows are, it would be very unfriendly to never
> service them at all until they time out.
Fair point -- I had alluded to this, but not really explained it.
Interestingly enough, sch_fq_codel.c uses a cute trick to make this
happen. When a low-bandwidth flow empties, it is added to the end
of the elephant list. This guarantees that the list of low-bandwidth
flows eventually empties, which forces at least a partial scan of the
elephant list.
Of course, this also means that Andrew's bounds on jitter are a bit
optimistic. It would be possible to make sch_fq_codel.c work the
way that Andrew said it does, for example, by keeping a pointer in
fq_codel_sched_data that references the next flow to access. From what
I can see, the price is a slight increase in overhead of the common
dequeue case.
Of course, if there are only a small number of elephant flows, the
increased jitter will not be that large.
In the meantime, the draft LWN article now explains the approach used
in the Linux kernel and states its importance in avoiding starvation.
> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> really like to penalize those who induce congestion the most. But we don't
> currently have a solution (though Bob Briscoe at BT thinks he does, and is
> seeing if he can get it out from under a BT patent), so the current
> fq_codel round robins ultimately until/unless we can do something like
> Bob's idea. This is a local information only subset of the ideas he's been
> working on in the congestion exposure (conex) group at the IETF.
This of course depends on the definition of fairness. For example,
completely random dropping of packets can be considered to be fair on a
per-packet basis, but since the heavy flows that are inducing congestion
the most have the most packets, they will tend to be the most heavily
penalized. Leaky-bucket schemes can also be considered to be fair,
but they also preferentially penalize heavy flows that exhaust their
supply of tokens.
And FQ-CoDel's definition of fairness can be thought of as a tradeoff
between latency and reliability on the one hand and bandwidth on the
other. Low-bandwidth flows will tend to have low latency and low drop
rates, while elephant flows will suffer worse latency and drop rate,
but in exchange will enjoy higher bandwidths.
> 3) "fairness" is always in the eyes of the beholder (and should be left to
> the beholder to determine). "fairness" depends on where in the network you
> are. While being "fair" among TCP flows is sensible default policy for a
> host, else where in the network it may not be/usually isn't.
>
> Two examples:
> o at a home router, you probably want to be "fair" according to transmit
> opportunities. We really don't want a single system remote from the router
> to be able to starve the network so that devices near the router get much
> less bandwidth than you might hope/expect.
>
> What is more, you probably want to account for a single host using many
> flows, and regulate that they not be able to "hog" bandwidth in the home
> environment, but only use their "fair" share.
>
> o at an ISP, you must to be "fair" between customers; it is best to leave
> the judgement of "fairness" at finer granularity (e.g. host and TCP flows)
> to the points closer to the customer's systems, so that they can enforce
> whatever definition of "fair" they need to themselves.
Right now, the draft LWN article doesn't mention "fair" in connection with
FQ-CoDel. Instead, I am adding a paragraph calling out the tradeoff
between high bandwidth on the one hand (for the elephants) and low latency
and low packet-loss rate on the other (for the light flows).
Fair enough? ;-)
> Algorithms like fq_codel can be/should be adjusted to the circumstances.
For example, increasing the quantum for connections with less than
4Mbit/s bandwidth?
> And therefore exactly what you choose to hash against to form the buckets
> will vary depending on where you are. That at least one step (at the
> user's device) of this be TCP flow "fair" does have the great advantage of
> helping the RTT unfairness problem that violates the principle of "least
> surprise", such as that routinely seen in places like New Zealand.
>
> This is why I have so many problems using the word "fair" near this
> algorithm. "fair" is impossible to define, overloaded in people's mind
> with TCP fair queuing, not even desirable much of the time, and by
> definition and design, even today's fq_codel isn't fair to lots of things,
> and the same basic algorithm can/should be tweaked in lots of directions
> depending on what we need to do. Calling this "smart" queuing or some such
> would be better.
I really don't think we need to be quite -that- worried about the word
"fair". Most other performance-related words have similar problems.
For example:
o "bandwidth": Is that 10Mbit/second continuous? Or average over
each second? Minute? Hour? Day? Or is it the maximum
bandwidth you can hope for when travelling downhill with the
wind at your back?
o "efficiency": Efficiency of exactly what desired output vs.
exactly what inputs? And is that average efficiency or (again)
the best efficiency you can hope for when travelling downhill
with the wind at your back? Or measured efficiency under
conditions of typical use?
o "cost effectiveness": Who is paying? What are the alternatives
and what are their costs? How is cost measured, in money, time
consumed, lives lost, or something else? How are the costs and
benefits spread over time? Am I being asked to make a large and
definite payment now for some hoped-for long-term benefit, or am
I accepting an immediate benefit in return for some future cost
that might accumulate to an arbitrarily large value over time?
Or are the costs and benefits realized at roughly the same time?
Besides, if we carefully avoid all mention of the word "fair", the
question "but is it fair" will with very high probability come up in
the comments to the article. Furthermore, if we avoid any word that
some marketing department somewhere has successfully abused, we won't
have any words left to use. ;-)
> When you've done another round on the document, I'll do a more detailed
> read.
Sounds good! I expect to have another version by the end of the weekend.
Thanx, Paul
> - Jim
>
>
>
>
> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> paulmck@linux.vnet.ibm.com> wrote:
>
> > On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > > David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > > variety of drivers and network layers in a typical bonded adsl stack
> > > yesterday. The complexity of it all makes my head hurt. I'm happy that
> > > a newly BQL'd ethernet driver (for the geos and qemu) emerged from it,
> > > which he submitted to netdev...
> >
> > Cool!!! ;-)
> >
> > > I made a recording of us last night discussing the layers, which I
> > > will produce and distribute later...
> > >
> > > Anyway, along the way, we fiddled a lot with trying to analyze where
> > > the 350ms or so of added latency was coming from in the traverse geo's
> > > adsl implementation and overlying stack....
> > >
> > > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> > >
> > > Note: 1:
> > >
> > > The netperf sample rate on the rrul test needs to be higher than
> > > 100ms in order to get a decent result at sub 10Mbit speeds.
> > >
> > > Note 2:
> > >
> > > The two nicest graphs here are nofq.svg vs fq.svg, which were taken on
> > > a gigE link from a Mac running Linux to another gigE link. (in other
> > > words, NOT on the friggin adsl link) (firefox can display svg, I don't
> > > know what else) I find the T+10 delay before stream start in the
> > > fq.svg graph suspicious and think the "throw out the outlier" code in
> > > the netperf-wrapper code is at fault. Prior to that, codel is merely
> > > buffering up things madly, which can also be seen in the pfifo_fast
> > > behavior, with 1000pkts it's default.
> >
> > I am using these two in a new "Effectiveness of FQ-CoDel" section.
> > Chrome can display .svg, and if it becomes a problem, I am sure that
> > they can be converted. Please let me know if some other data would
> > make the point better.
> >
> > I am assuming that the colored throughput spikes are due to occasional
> > packet losses. Please let me know if this interpretation is overly naive.
> >
> > Also, I know what ICMP is, but the UDP variants are new to me. Could
> > you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> >
> > > (Arguably, the default queue length in codel can be reduced from 10k
> > > packets to something more reasonable at GigE speeds)
> > >
> > > (the indicator that it's the graph, not the reality, is that the
> > > fq.svg pings and udp start at T+5 and grow minimally, as is usual with
> > > fq_codel.)
> >
> > All sessions were started at T+5, then?
> >
> > > As for the *.ps graphs, well, they would take david's network topology
> > > to explain, and were conducted over a variety of circumstances,
> > > including wifi, with more variables in play than I care to think
> > > about.
> > >
> > > We didn't really get anywhere on digging deeper. As we got to purer
> > > tests - with a minimal number of boxes, running pure ethernet,
> > > switched over a couple of switches, even in the simplest two box case,
> > > my HTB based "ceroshaper" implementation had multiple problems in
> > > cutting median latencies below 100ms, on this very slow ADSL link.
> > > David suspects problems on the path along the carrier backbone as a
> > > potential issue, and the only way to measure that is with two one way
> > > trip time measurements (rather than rtt), time synced via ntp... I
> > > keep hoping to find a rtp test, but I'm open to just about any option
> > > at this point. anyone?
> > >
> > > We also found a probable bug in mtr in that multiple mtrs on the same
> > > box don't co-exist.
> >
> > I must confess that I am not seeing all that clear a difference between
> > the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better latencies
> > for FQ-CoDel, but not unambiguously so.
> >
> > > Moving back to more scientific clarity and simpler tests...
> > >
> > > The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> > >
> > >
> > http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Bufferbloat_on_wifi.pdf
> > >
> > > appear to show the advantage of fq_codel fq + codel + head drop over
> > > tail drop during the slow start period on a 10Mbit link - (see how
> > > squiggly slow start is on pfifo fast?) as well as the marvelous
> > > interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> > > link.) Even that latency can be halved by reducing BQL to 1500, which
> > > is just fine on a 10mbit. Below those rates I'd like to be rid of BQL
> > > entirely, and just have a single packet outstanding... in everything
> > > from adsl to cable...
> > >
> > > That said, I'd welcome other explanations of the squiggly slowstart
> > > pfifo_fast behavior before I put that explanation on the slide.... ECN
> > > was in play here, too. I can redo this test easily, it's basically
> > > running a netperf TCP_RR for 70 seconds, and starting up a TCP_MAERTS
> > > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> > > limit and the link speeds on two sides of a directly connected laptop
> > > connection.
> >
> > I must defer to others on this one. I do note the much lower latencies
> > on slide 6 compared to slide 5, though.
> >
> > Please see attached for update including .git directory.
> >
> > Thanx, Paul
> >
> > > ethtool -s eth0 advertise 0x002 # 10 Mbit
> > >
> >
> > _______________________________________________
> > Cerowrt-devel mailing list
> > Cerowrt-devel@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >
> >
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 23:15 ` Andrew McGregor
2012-11-28 0:51 ` Paul E. McKenney
@ 2012-11-28 17:36 ` Paul E. McKenney
1 sibling, 0 replies; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-28 17:36 UTC (permalink / raw)
To: Andrew McGregor
Cc: David Lang, Paolo Valente, Toke Høiland-Jørgensen,
codel, cerowrt-devel, bloat, John Crispin
On Wed, Nov 28, 2012 at 12:15:35PM +1300, Andrew McGregor wrote:
>
> On 28/11/2012, at 11:54 AM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
> > On Tue, Nov 27, 2012 at 02:31:53PM -0800, David Lang wrote:
> >> On Tue, 27 Nov 2012, Jim Gettys wrote:
> >>
> >>> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> >>> really like to penalize those who induce congestion the most. But we don't
> >>> currently have a solution (though Bob Briscoe at BT thinks he does, and is
> >>> seeing if he can get it out from under a BT patent), so the current
> >>> fq_codel round robins ultimately until/unless we can do something like
> >>> Bob's idea. This is a local information only subset of the ideas he's been
> >>> working on in the congestion exposure (conex) group at the IETF.
> >>
> >> Even more than this, we _know_ that we don't want to be fair in
> >> terms of the raw packet priority.
> >>
> >> For example, we know that we want to prioritize DNS traffic over TCP
> >> streams (due to the fact that the TCP traffic usually can't even
> >> start until DNS resolution finishes)
> >>
> >> We strongly suspect that we want to prioritize short-lived
> >> connections over long lived connections. We don't know a good way to
> >> do this, but one good starting point would be to prioritize syn
> >> packets so that the initialization of the connection happens as fast
> >> as possible.
> >>
> >> Ideally we'd probably like to prioritize the first couple of packets
> >> of a connection so that very short lived connections finish quickly
>
> fq_codel does all of this, although it isn't explicit about it so it is hard to see how it happens.
>
> >> it may make sense to prioritize fin packets so that connection
> >> teardown (and the resulting release of resources and connection
> >> tracking) happens as fast as possible
> >>
> >> all of these are horribly unfair when you are looking at the raw
> >> packet flow, but they significantly help the user's percieved
> >> response time without making much difference on the large download
> >> cases.
> >
> > In all cases, to Jim's point, as long as we avoid starvation. And there
> > will likely be more corner cases that show up under extreme overload.
> >
> > Thanx, Paul
> >
>
> So, fq_codel exhibits a new kind of fairness: it is jitter fair, or in other words, each flow gets the same bound on how much jitter it can induce in the whole ensemble of flows. Exceed that bound, and flows get deprioritised. This achieves thin-flow and DNS prioritisation, while allowing TCP flows to build more buffer if required. The sub-flow CoDel queues then allow short flows to use a reasonably large buffer, while draining standing buffers for long TCP flows.
>
> The really interesting part of the jitter-fair behaviour is that jitter-sensitive traffic is protected as much as it can be, provided its own sending rate control does something sensible. Good news for interactive video, in other words.
>
> The actual jitter bound is the transmission time of max(mtu, quantum) * n_thin_flows bytes, where a thin flow is one that has not exceeded its own jitter allowance since the last time its queue drained. While it is possible that there might instantaneously be a fairly large number of thin flows, in practice on a home network link there are normally only a very few of these at any one moment, and so the jitter experienced is pretty good.
OK, let me see if I can restate this in terms of the code.
Each flow gets to induce one quantum q of jitter. If there are n thin
flows and m thick flows, then each thin flow will see at most (n+m)*q
jitter, which is the case when a new packet arrives just after the last
packet was transmitted, so that the flow has been placed at the end
of the thick-flows list. Thick flows are allowed q+interval before
dropping (where "q" is the "target" parameter in the code), so see at
most (n*q+m*(q+interval)) -- any attempt to exceed this will result in
packets being dropped.
Seem reasonable or am I confused?
Thanx, Paul
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 16:16 ` Jonathan Morton
@ 2012-11-28 17:44 ` Paul E. McKenney
2012-11-28 18:37 ` [Codel] [Bloat] " Michael Richardson
2012-11-28 19:00 ` Eric Dumazet
0 siblings, 2 replies; 56+ messages in thread
From: Paul E. McKenney @ 2012-11-28 17:44 UTC (permalink / raw)
To: Jonathan Morton
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
You lost me on this one. It looks to me like net/sched/sch_fq_codel.c
in fact does hash packets into flows, so FQ-CoDel is stochastic in the
the same sense that SFQ is. In particular, FQ-CoDel can hash a thin
session into the same flow as a thick session, which really is the
birthday effect.
Now FQ-CoDel uses a 1024-bucket hash table compared to SFQ's default
of 128 buckets, so FQ-CoDel will have smaller collision probabilities
than will SFQ on a given set of flows. In addition, FQ-CoDel seems
to be able to tolerate a limited number of collisions among thin flows,
while SFQ doesn't distinguish thin from thick.
But the possibility of stochastic collision behavior really is there
with FQ-CoDel. I hasten to add that in practice, I do not expect this
possibility of stochastic behavior to be a problem in the common case.
Or am I missing your point? Or perhaps your definition of either
fairness or stochastic?
Thanx, Paul
On Wed, Nov 28, 2012 at 06:16:08PM +0200, Jonathan Morton wrote:
> It may be worth noting that fq-codel is not stochastic in it's fairness
> mechanism. SFQ suffers from the birthday effect because it hashes packets
> into buffers, which is what makes it stochastic.
>
> - Jonathan Morton
> On Nov 28, 2012 6:02 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> wrote:
>
> > Dave gave me back the pen, so I looked to see what I had expanded
> > FQ-CoDel to. The answer was... Nothing. Nothing at all.
> >
> > So I added a Quick Quiz as follows:
> >
> > Quick Quiz 2: What does the FQ-CoDel acronym expand to?
> >
> > Answer: There are some differences of opinion on this. The
> > comment header in net/sched/sch_fq_codel.c says
> > “Fair Queue CoDel” (presumably by analogy to SFQ's
> > expansion of “Stochastic Fairness Queueing”), and
> > “CoDel” is generally agreed to expand to “controlled
> > delay”. However, some prefer “Flow Queue Controlled
> > Delay” and still others prefer to prepend a silent and
> > invisible "S", expanding to “Stochastic Flow Queue
> > Controlled Delay” or “Smart Flow Queue Controlled
> > Delay”. No doubt additional expansions will appear in
> > the fullness of time.
> >
> > In the meantime, this article focuses on the concepts,
> > implementation, and performance, leaving naming debates
> > to others.
> >
> > This level snarkiness would go over reasonably well in an LWN article,
> > I would -not- suggest this approach in an academic paper, just in case
> > you were wondering. But if there is too much discomfort with snarking,
> > I just might be convinced to take another approach.
> >
> > Thanx, Paul
> >
> > On Tue, Nov 27, 2012 at 08:38:38PM -0800, Paul E. McKenney wrote:
> > > I guess I just have to be grateful that people mostly agree on the
> > acronym,
> > > regardless of the expansion.
> > >
> > > Thanx, Paul
> > >
> > > On Tue, Nov 27, 2012 at 07:43:56PM -0800, Kathleen Nichols wrote:
> > > >
> > > > It would be me that tries to say "stochastic flow queuing with CoDel"
> > > > as I like to be accurate. But I think FQ-Codel is Flow queuing with
> > CoDel.
> > > > JimG suggests "smart flow queuing" because he is ever mindful of the
> > > > big audience.
> > > >
> > > > On 11/27/12 4:27 PM, Paul E. McKenney wrote:
> > > > > On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote:
> > > > >> BTW, I've heard some use the term "stochastic flow queueing" as a
> > > > >> replacement to avoid the term "fair". Seems like a more apt term
> > anyway.
> > > > >
> > > > > Would that mean that FQ-CoDel is Flow Queue Controlled Delay? ;-)
> > > > >
> > > > > Thanx, Paul
> > > > >
> > > > >> -Greg
> > > > >>
> > > > >>
> > > > >> On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > wrote:
> > > > >>
> > > > >>> Thank you for the review and comments, Jim! I will apply them when
> > > > >>> I get the pen back from Dave. And yes, that is the thing about
> > > > >>> "fairness" -- there are a great many definitions, many of the most
> > > > >>> useful of which appear to many to be patently unfair. ;-)
> > > > >>>
> > > > >>> As you suggest, it might well be best to drop discussion of
> > fairness,
> > > > >>> or to at the least supply the corresponding definition.
> > > > >>>
> > > > >>> Thanx, Paul
> > > > >>>
> > > > >>> On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> > > > >>>> Some points worth making:
> > > > >>>>
> > > > >>>> 1) It is important to point out that (and how) fq_codel avoids
> > > > >>>> starvation:
> > > > >>>> unpleasant as elephant flows are, it would be very unfriendly to
> > never
> > > > >>>> service them at all until they time out.
> > > > >>>>
> > > > >>>> 2) "fairness" is not necessarily what we ultimately want at all;
> > you'd
> > > > >>>> really like to penalize those who induce congestion the most.
> > But we
> > > > >>>> don't
> > > > >>>> currently have a solution (though Bob Briscoe at BT thinks he
> > does, and
> > > > >>>> is
> > > > >>>> seeing if he can get it out from under a BT patent), so the
> > current
> > > > >>>> fq_codel round robins ultimately until/unless we can do something
> > like
> > > > >>>> Bob's idea. This is a local information only subset of the ideas
> > he's
> > > > >>>> been
> > > > >>>> working on in the congestion exposure (conex) group at the IETF.
> > > > >>>>
> > > > >>>> 3) "fairness" is always in the eyes of the beholder (and should
> > be left
> > > > >>>> to
> > > > >>>> the beholder to determine). "fairness" depends on where in the
> > network
> > > > >>>> you
> > > > >>>> are. While being "fair" among TCP flows is sensible default
> > policy for
> > > > >>>> a
> > > > >>>> host, else where in the network it may not be/usually isn't.
> > > > >>>>
> > > > >>>> Two examples:
> > > > >>>> o at a home router, you probably want to be "fair" according to
> > transmit
> > > > >>>> opportunities. We really don't want a single system remote from
> > the
> > > > >>>> router
> > > > >>>> to be able to starve the network so that devices near the router
> > get
> > > > >>>> much
> > > > >>>> less bandwidth than you might hope/expect.
> > > > >>>>
> > > > >>>> What is more, you probably want to account for a single host
> > using many
> > > > >>>> flows, and regulate that they not be able to "hog" bandwidth in
> > the home
> > > > >>>> environment, but only use their "fair" share.
> > > > >>>>
> > > > >>>> o at an ISP, you must to be "fair" between customers; it is best
> > to
> > > > >>>> leave
> > > > >>>> the judgement of "fairness" at finer granularity (e.g. host and
> > TCP
> > > > >>>> flows)
> > > > >>>> to the points closer to the customer's systems, so that they can
> > enforce
> > > > >>>> whatever definition of "fair" they need to themselves.
> > > > >>>>
> > > > >>>>
> > > > >>>> Algorithms like fq_codel can be/should be adjusted to the
> > circumstances.
> > > > >>>>
> > > > >>>> And therefore exactly what you choose to hash against to form the
> > > > >>>> buckets
> > > > >>>> will vary depending on where you are. That at least one step (at
> > the
> > > > >>>> user's device) of this be TCP flow "fair" does have the great
> > advantage
> > > > >>>> of
> > > > >>>> helping the RTT unfairness problem that violates the principle of
> > "least
> > > > >>>> surprise", such as that routinely seen in places like New Zealand.
> > > > >>>>
> > > > >>>> This is why I have so many problems using the word "fair" near
> > this
> > > > >>>> algorithm. "fair" is impossible to define, overloaded in
> > people's mind
> > > > >>>> with TCP fair queuing, not even desirable much of the time, and by
> > > > >>>> definition and design, even today's fq_codel isn't fair to lots of
> > > > >>>> things,
> > > > >>>> and the same basic algorithm can/should be tweaked in lots of
> > directions
> > > > >>>> depending on what we need to do. Calling this "smart" queuing or
> > some
> > > > >>>> such
> > > > >>>> would be better.
> > > > >>>>
> > > > >>>> When you've done another round on the document, I'll do a more
> > detailed
> > > > >>>> read.
> > > > >>>> - Jim
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> > > > >>>> paulmck@linux.vnet.ibm.com> wrote:
> > > > >>>>
> > > > >>>>> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > > > >>>>>> David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > > > >>>>>> variety of drivers and network layers in a typical bonded adsl
> > stack
> > > > >>>>>> yesterday. The complexity of it all makes my head hurt. I'm
> > happy
> > > > >>>> that
> > > > >>>>>> a newly BQL'd ethernet driver (for the geos and qemu) emerged
> > from
> > > > >>>> it,
> > > > >>>>>> which he submitted to netdev...
> > > > >>>>>
> > > > >>>>> Cool!!! ;-)
> > > > >>>>>
> > > > >>>>>> I made a recording of us last night discussing the layers,
> > which I
> > > > >>>>>> will produce and distribute later...
> > > > >>>>>>
> > > > >>>>>> Anyway, along the way, we fiddled a lot with trying to analyze
> > where
> > > > >>>>>> the 350ms or so of added latency was coming from in the traverse
> > > > >>>> geo's
> > > > >>>>>> adsl implementation and overlying stack....
> > > > >>>>>>
> > > > >>>>>> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> > > > >>>>>>
> > > > >>>>>> Note: 1:
> > > > >>>>>>
> > > > >>>>>> The netperf sample rate on the rrul test needs to be higher
> > than
> > > > >>>>>> 100ms in order to get a decent result at sub 10Mbit speeds.
> > > > >>>>>>
> > > > >>>>>> Note 2:
> > > > >>>>>>
> > > > >>>>>> The two nicest graphs here are nofq.svg vs fq.svg, which were
> > taken
> > > > >>>> on
> > > > >>>>>> a gigE link from a Mac running Linux to another gigE link. (in
> > other
> > > > >>>>>> words, NOT on the friggin adsl link) (firefox can display svg, I
> > > > >>>> don't
> > > > >>>>>> know what else) I find the T+10 delay before stream start in the
> > > > >>>>>> fq.svg graph suspicious and think the "throw out the outlier"
> > code
> > > > >>>> in
> > > > >>>>>> the netperf-wrapper code is at fault. Prior to that, codel is
> > merely
> > > > >>>>>> buffering up things madly, which can also be seen in the
> > pfifo_fast
> > > > >>>>>> behavior, with 1000pkts it's default.
> > > > >>>>>
> > > > >>>>> I am using these two in a new "Effectiveness of FQ-CoDel"
> > section.
> > > > >>>>> Chrome can display .svg, and if it becomes a problem, I am sure
> > that
> > > > >>>>> they can be converted. Please let me know if some other data
> > would
> > > > >>>>> make the point better.
> > > > >>>>>
> > > > >>>>> I am assuming that the colored throughput spikes are due to
> > occasional
> > > > >>>>> packet losses. Please let me know if this interpretation is
> > overly
> > > > >>>> naive.
> > > > >>>>>
> > > > >>>>> Also, I know what ICMP is, but the UDP variants are new to me.
> > Could
> > > > >>>>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> > > > >>>>>
> > > > >>>>>> (Arguably, the default queue length in codel can be reduced
> > from 10k
> > > > >>>>>> packets to something more reasonable at GigE speeds)
> > > > >>>>>>
> > > > >>>>>> (the indicator that it's the graph, not the reality, is that the
> > > > >>>>>> fq.svg pings and udp start at T+5 and grow minimally, as is
> > usual
> > > > >>>> with
> > > > >>>>>> fq_codel.)
> > > > >>>>>
> > > > >>>>> All sessions were started at T+5, then?
> > > > >>>>>
> > > > >>>>>> As for the *.ps graphs, well, they would take david's network
> > > > >>>> topology
> > > > >>>>>> to explain, and were conducted over a variety of circumstances,
> > > > >>>>>> including wifi, with more variables in play than I care to think
> > > > >>>>>> about.
> > > > >>>>>>
> > > > >>>>>> We didn't really get anywhere on digging deeper. As we got to
> > purer
> > > > >>>>>> tests - with a minimal number of boxes, running pure ethernet,
> > > > >>>>>> switched over a couple of switches, even in the simplest two box
> > > > >>>> case,
> > > > >>>>>> my HTB based "ceroshaper" implementation had multiple problems
> > in
> > > > >>>>>> cutting median latencies below 100ms, on this very slow ADSL
> > link.
> > > > >>>>>> David suspects problems on the path along the carrier backbone
> > as a
> > > > >>>>>> potential issue, and the only way to measure that is with two
> > one
> > > > >>>> way
> > > > >>>>>> trip time measurements (rather than rtt), time synced via
> > ntp... I
> > > > >>>>>> keep hoping to find a rtp test, but I'm open to just about any
> > > > >>>> option
> > > > >>>>>> at this point. anyone?
> > > > >>>>>>
> > > > >>>>>> We also found a probable bug in mtr in that multiple mtrs on the
> > > > >>>> same
> > > > >>>>>> box don't co-exist.
> > > > >>>>>
> > > > >>>>> I must confess that I am not seeing all that clear a difference
> > > > >>>> between
> > > > >>>>> the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better
> > > > >>>> latencies
> > > > >>>>> for FQ-CoDel, but not unambiguously so.
> > > > >>>>>
> > > > >>>>>> Moving back to more scientific clarity and simpler tests...
> > > > >>>>>>
> > > > >>>>>> The two graphs, taken a few weeks back, on pages 5 and 6 of
> > this:
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
> > > > >>>> erbloat_on_wifi.pdf
> > > > >>>>>>
> > > > >>>>>> appear to show the advantage of fq_codel fq + codel + head drop
> > over
> > > > >>>>>> tail drop during the slow start period on a 10Mbit link - (see
> > how
> > > > >>>>>> squiggly slow start is on pfifo fast?) as well as the marvelous
> > > > >>>>>> interstream latency that can be achieved with BQL=3000 (on a 10
> > mbit
> > > > >>>>>> link.) Even that latency can be halved by reducing BQL to 1500,
> > > > >>>> which
> > > > >>>>>> is just fine on a 10mbit. Below those rates I'd like to be rid
> > of
> > > > >>>> BQL
> > > > >>>>>> entirely, and just have a single packet outstanding... in
> > everything
> > > > >>>>>> from adsl to cable...
> > > > >>>>>>
> > > > >>>>>> That said, I'd welcome other explanations of the squiggly
> > slowstart
> > > > >>>>>> pfifo_fast behavior before I put that explanation on the
> > slide....
> > > > >>>> ECN
> > > > >>>>>> was in play here, too. I can redo this test easily, it's
> > basically
> > > > >>>>>> running a netperf TCP_RR for 70 seconds, and starting up a
> > > > >>>> TCP_MAERTS
> > > > >>>>>> and TCP_STREAM for 60 seconds a T+5, after hammering down on
> > BQL's
> > > > >>>>>> limit and the link speeds on two sides of a directly connected
> > > > >>>> laptop
> > > > >>>>>> connection.
> > > > >>>>>
> > > > >>>>> I must defer to others on this one. I do note the much lower
> > > > >>>> latencies
> > > > >>>>> on slide 6 compared to slide 5, though.
> > > > >>>>>
> > > > >>>>> Please see attached for update including .git directory.
> > > > >>>>>
> > > > >>>>> Thanx,
> > Paul
> > > > >>>>>
> > > > >>>>>> ethtool -s eth0 advertise 0x002 # 10 Mbit
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> _______________________________________________
> > > > >>>>> Cerowrt-devel mailing list
> > > > >>>>> Cerowrt-devel@lists.bufferbloat.net
> > > > >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> > > > >>>>>
> > > > >>>>>
> > > > >>>
> > > > >>> _______________________________________________
> > > > >>> Codel mailing list
> > > > >>> Codel@lists.bufferbloat.net
> > > > >>> https://lists.bufferbloat.net/listinfo/codel
> > > > >>
> > > > >
> > > > > _______________________________________________
> > > > > Codel mailing list
> > > > > Codel@lists.bufferbloat.net
> > > > > https://lists.bufferbloat.net/listinfo/codel
> > > > >
> > > >
> >
> > _______________________________________________
> > Codel mailing list
> > Codel@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/codel
> >
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 17:44 ` Paul E. McKenney
@ 2012-11-28 18:37 ` Michael Richardson
2012-11-28 18:51 ` Eric Dumazet
2012-11-28 19:00 ` Eric Dumazet
1 sibling, 1 reply; 56+ messages in thread
From: Michael Richardson @ 2012-11-28 18:37 UTC (permalink / raw)
To: codel, cerowrt-devel, bloat, John Crispin
>>>>> "Paul" == Paul E McKenney <paulmck@linux.vnet.ibm.com> writes:
Paul> You lost me on this one. It looks to me like
Paul> net/sched/sch_fq_codel.c
Paul> in fact does hash packets into flows, so FQ-CoDel is
Paul> stochastic in the
Paul> the same sense that SFQ is. In particular, FQ-CoDel can hash a thin
Paul> session into the same flow as a thick session, which really is the
Paul> birthday effect.
Silly question from someone who should read more code, but...
if one is hashing the packet to pick a flow bucket, shouldn't this hash
occur before any application of address/port translation. ? (NAT)
(Maybe I've just found out that IPv6 + CoDel will be a killer
combination)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 18:37 ` [Codel] [Bloat] " Michael Richardson
@ 2012-11-28 18:51 ` Eric Dumazet
2012-11-28 21:44 ` Michael Richardson
0 siblings, 1 reply; 56+ messages in thread
From: Eric Dumazet @ 2012-11-28 18:51 UTC (permalink / raw)
To: Michael Richardson; +Cc: John Crispin, codel, cerowrt-devel, bloat
On Wed, 2012-11-28 at 13:37 -0500, Michael Richardson wrote:
> >>>>> "Paul" == Paul E McKenney <paulmck@linux.vnet.ibm.com> writes:
> Paul> You lost me on this one. It looks to me like
> Paul> net/sched/sch_fq_codel.c
> Paul> in fact does hash packets into flows, so FQ-CoDel is
> Paul> stochastic in the
> Paul> the same sense that SFQ is. In particular, FQ-CoDel can hash a thin
> Paul> session into the same flow as a thick session, which really is the
> Paul> birthday effect.
>
> Silly question from someone who should read more code, but...
> if one is hashing the packet to pick a flow bucket, shouldn't this hash
> occur before any application of address/port translation. ? (NAT)
>
> (Maybe I've just found out that IPv6 + CoDel will be a killer
> combination)
>
Why would it be a killer ?
A NATed flow becomes another flow, with its own hash
A tunnel might have a problem, _if_ the hash would be computed on
elements found on the outer header.
But luckily, we do a complete flow dissection.
(Assuming the default fq_codel classification)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 17:44 ` Paul E. McKenney
2012-11-28 18:37 ` [Codel] [Bloat] " Michael Richardson
@ 2012-11-28 19:00 ` Eric Dumazet
2012-12-02 21:37 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 56+ messages in thread
From: Eric Dumazet @ 2012-11-28 19:00 UTC (permalink / raw)
To: paulmck
Cc: Paolo Valente, Toke Høiland-Jørgensen, codel,
cerowrt-devel, bloat, John Crispin
On Wed, 2012-11-28 at 09:44 -0800, Paul E. McKenney wrote:
> You lost me on this one. It looks to me like net/sched/sch_fq_codel.c
> in fact does hash packets into flows, so FQ-CoDel is stochastic in the
> the same sense that SFQ is. In particular, FQ-CoDel can hash a thin
> session into the same flow as a thick session, which really is the
> birthday effect.
>
> Now FQ-CoDel uses a 1024-bucket hash table compared to SFQ's default
> of 128 buckets, so FQ-CoDel will have smaller collision probabilities
> than will SFQ on a given set of flows. In addition, FQ-CoDel seems
> to be able to tolerate a limited number of collisions among thin flows,
> while SFQ doesn't distinguish thin from thick.
>
> But the possibility of stochastic collision behavior really is there
> with FQ-CoDel. I hasten to add that in practice, I do not expect this
> possibility of stochastic behavior to be a problem in the common case.
>
> Or am I missing your point? Or perhaps your definition of either
> fairness or stochastic?
Thats absolutely correct, fq_codel uses a stochastic hashing exactly
like SFQ.
In fact, I wrote fq_codel using same base, after patching SFQ and
hitting some limits last year.
Note that I played a bit with a version using a non stochastic hash,
we tested it here in our labs.
This can help if you really want to avoid a thick flow sharing a thin
flow bucket, but given that all packets are going eventually into the
Internet (or equivalent crowded network), its not really a clear win.
Its more an academic issue...
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 18:51 ` Eric Dumazet
@ 2012-11-28 21:44 ` Michael Richardson
0 siblings, 0 replies; 56+ messages in thread
From: Michael Richardson @ 2012-11-28 21:44 UTC (permalink / raw)
To: Eric Dumazet; +Cc: John Crispin, codel, cerowrt-devel, bloat
>>>>> "Eric" == Eric Dumazet <eric.dumazet@gmail.com> writes:
>> >>>>> "Paul" == Paul E McKenney <paulmck@linux.vnet.ibm.com> writes:
Paul> You lost me on this one. It looks to me like
Paul> net/sched/sch_fq_codel.c
Paul> in fact does hash packets into flows, so FQ-CoDel is
Paul> stochastic in the
Paul> the same sense that SFQ is. In particular, FQ-CoDel can hash a thin
Paul> session into the same flow as a thick session, which really is the
Paul> birthday effect.
>>
>> Silly question from someone who should read more code, but...
>> if one is hashing the packet to pick a flow bucket, shouldn't this hash
>> occur before any application of address/port translation. ? (NAT)
>>
>> (Maybe I've just found out that IPv6 + CoDel will be a killer
>> combination)
Eric> Why would it be a killer ?
I mean, it's a killer app :-)
Eric> A NATed flow becomes another flow, with its own hash
yes, but one loses the afinity to which internal host the flow belongs
to, so it's harder to be "fair" to each host.
going deeper into the packet is both expensive and is difficult to
impossible given tunnels and NAT. The lack of NAT on IPv6 would make
deeper classification easier for layer-two devices (DSL modems, etc),
and the IPv6 flow label would also contribute significantly to making
this easier.
Consider a future scenario where there are the following streams easily
identified:
1) IPv4 packets less than 128 bytes.
2) IPv6 packets > 128 bytes.
*) IPv6 flows identified by flow label
In such a situation, IPv6 becomes the choice for interactive video.
--
] He who is tired of Weird Al is tired of life! | firewalls [
] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[
] mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[
Kyoto Plus: watch the video <http://www.youtube.com/watch?v=kzx1ycLXQSE>
then sign the petition.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-27 22:03 ` [Codel] [Cerowrt-devel] " Jim Gettys
` (2 preceding siblings ...)
2012-11-28 17:20 ` [Codel] " Paul E. McKenney
@ 2012-11-30 1:09 ` Dan Siemon
3 siblings, 0 replies; 56+ messages in thread
From: Dan Siemon @ 2012-11-30 1:09 UTC (permalink / raw)
To: Jim Gettys, bloat
Cc: Paolo Valente, Toke Høiland-Jørgensen, codel,
cerowrt-devel, Paul McKenney, John Crispin
[-- Attachment #1: Type: text/plain, Size: 915 bytes --]
On Tue, 2012-11-27 at 17:03 -0500, Jim Gettys wrote:
> Two examples:
> o at a home router, you probably want to be "fair" according to
> transmit opportunities. We really don't want a single system remote
> from the router to be able to starve the network so that devices near
> the router get much less bandwidth than you might hope/expect.
>
>
> What is more, you probably want to account for a single host using
> many flows, and regulate that they not be able to "hog" bandwidth in
> the home environment, but only use their "fair" share.
People interested in per-host 'fairness' (with fq_codel per host) may
want to experiment with a tc script I've been working on:
http://git.coverfire.com/?p=linux-qos-scripts.git;a=blob;f=src-3tos.sh;hb=HEAD
There are comments in the script which explain the approach. I've
collected some results but haven't gotten around to writing them up yet.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 19:00 ` Eric Dumazet
@ 2012-12-02 21:37 ` Toke Høiland-Jørgensen
2012-12-02 21:47 ` Andrew McGregor
2012-12-02 22:07 ` Eric Dumazet
0 siblings, 2 replies; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-12-02 21:37 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Valente, codel, cerowrt-devel, bloat, paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]
Eric Dumazet <eric.dumazet@gmail.com> writes:
> This can help if you really want to avoid a thick flow sharing a thin
> flow bucket, but given that all packets are going eventually into the
> Internet (or equivalent crowded network), its not really a clear win.
I've been trying to grok the fq_codel code by reading through it while
following the discussion in the article, and I'm having a bit of trouble
squaring the thin/thick (or "hog"/"non-hog") flow designation of the
article with the code. As far as I can tell from the code, there are two
lists, called new_flows and old_flows; and a flow starts out as 'new'
and stays that way until it has sent a quantum of bytes or codel fails
to dequeue a packet from it, whereupon it is moved to the end of the
old_flows list. It then stays in the old_flows list for the rest of its
"life".
Now, talking about thin flows being distinguished from thick ones, it
seems to me that if a flow sends packets at a low enough rate it can in
principle stay 'thin' indefinitely. So I'm assuming I've missed
something in the code that allows a flow to stay in the new_flows list
if it is sufficiently thin. Could someone please point out to me what
I'm missing? :)
Thanks,
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-02 21:37 ` Toke Høiland-Jørgensen
@ 2012-12-02 21:47 ` Andrew McGregor
2012-12-03 8:04 ` Dave Taht
2012-12-02 22:07 ` Eric Dumazet
1 sibling, 1 reply; 56+ messages in thread
From: Andrew McGregor @ 2012-12-02 21:47 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, codel, cerowrt-devel, bloat, paulmck, John Crispin
On 3/12/2012, at 10:37 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
>
>> This can help if you really want to avoid a thick flow sharing a thin
>> flow bucket, but given that all packets are going eventually into the
>> Internet (or equivalent crowded network), its not really a clear win.
>
> I've been trying to grok the fq_codel code by reading through it while
> following the discussion in the article, and I'm having a bit of trouble
> squaring the thin/thick (or "hog"/"non-hog") flow designation of the
> article with the code. As far as I can tell from the code, there are two
> lists, called new_flows and old_flows; and a flow starts out as 'new'
> and stays that way until it has sent a quantum of bytes or codel fails
> to dequeue a packet from it, whereupon it is moved to the end of the
> old_flows list. It then stays in the old_flows list for the rest of its
> "life".
'new' is what I was calling 'thin', and 'old' is the 'thick' list.
When a flow drains, it disappears from both lists (but is not garbage collected), and will come back on the 'new' or 'thin' list. The code for this is in enqueue, where if the flow is not on either list it gets put on the tail of the 'new' list.
>
> Now, talking about thin flows being distinguished from thick ones, it
> seems to me that if a flow sends packets at a low enough rate it can in
> principle stay 'thin' indefinitely. So I'm assuming I've missed
> something in the code that allows a flow to stay in the new_flows list
> if it is sufficiently thin. Could someone please point out to me what
> I'm missing? :)
It's the implicit return to the 'new' list.
>
> Thanks,
>
> -Toke
>
> --
> Toke Høiland-Jørgensen
> toke@toke.dk
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-02 21:37 ` Toke Høiland-Jørgensen
2012-12-02 21:47 ` Andrew McGregor
@ 2012-12-02 22:07 ` Eric Dumazet
2012-12-02 22:15 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 56+ messages in thread
From: Eric Dumazet @ 2012-12-02 22:07 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, codel, cerowrt-devel, bloat, paulmck, John Crispin
On Sun, 2012-12-02 at 22:37 +0100, Toke Høiland-Jørgensen wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
>
> > This can help if you really want to avoid a thick flow sharing a thin
> > flow bucket, but given that all packets are going eventually into the
> > Internet (or equivalent crowded network), its not really a clear win.
>
> I've been trying to grok the fq_codel code by reading through it while
> following the discussion in the article, and I'm having a bit of trouble
> squaring the thin/thick (or "hog"/"non-hog") flow designation of the
> article with the code. As far as I can tell from the code, there are two
> lists, called new_flows and old_flows; and a flow starts out as 'new'
> and stays that way until it has sent a quantum of bytes or codel fails
> to dequeue a packet from it, whereupon it is moved to the end of the
> old_flows list. It then stays in the old_flows list for the rest of its
> "life".
Not at all.
If a cell has :
- a positive credit/deficit
- no more packets to send
Then
if is part of old_flows list, we remove it
else we move it from new_flows to old_flows
The algo has a special shortcut to avoid a pass to old_flows if
old_flows is empty, but thats basically :
/* force a pass through old_flows to prevent starvation */
if (head == &q->new_flows)
list_move_tail(&flow->flowchain, &q->old_flows);
else
list_del_init(&flow->flowchain);
Next time a packet re-activates the flow (or more exactly the bucket in
hash table), we move it to 'new_flows' only for one quantum.
>
> Now, talking about thin flows being distinguished from thick ones, it
> seems to me that if a flow sends packets at a low enough rate it can in
> principle stay 'thin' indefinitely. So I'm assuming I've missed
> something in the code that allows a flow to stay in the new_flows list
> if it is sufficiently thin. Could someone please point out to me what
> I'm missing? :)
>
I dont know, this new_flow/old_flows seems too hard to understand for
most readers of the code.
I saw many people confused by this algo.
Of course, a thin flow stay thin, if the bucket can be in the following
states, forever :
- empty
- packet arrives -> bucket queued to end of new_flows
- packet is dequeued.
- as bucket is empty, move bucket to end of old_flows
- next time we hit this bucket (because we processed all packets from
old_flows), we remove it from the list and its state becomes 'empty'
If the next packet arrives while the bucket is still in old_flows,
we wont put the bucket in new_flow, its bucket have to wait its turn in
the RR list.
Think of it this way : If a bucket lost its turn in the RRR mechanism
and became idle, it has its deficit refilled to 'q->quantum', and it has
the right to be elected as a new_flow next time a packet wants to use
this bucket.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-02 22:07 ` Eric Dumazet
@ 2012-12-02 22:15 ` Toke Høiland-Jørgensen
2012-12-02 22:30 ` Eric Dumazet
0 siblings, 1 reply; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-12-02 22:15 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Valente, codel, cerowrt-devel, bloat, paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 878 bytes --]
Eric Dumazet <eric.dumazet@gmail.com> writes:
> If the next packet arrives while the bucket is still in old_flows,
> we wont put the bucket in new_flow, its bucket have to wait its turn in
> the RR list.
Right. So in order to stay in the new_flows list, the flow needs to be
slow enough that no new data is added to it in the time it takes
subsequent dequeue invocations to get back to it in the old_flows list?
> Think of it this way : If a bucket lost its turn in the RRR mechanism
> and became idle, it has its deficit refilled to 'q->quantum', and it has
> the right to be elected as a new_flow next time a packet wants to use
> this bucket.
Yes, I think I missed this, and was somehow assuming that the flow stays
"alive" (i.e. in the old_flows list) for as long as the /connection/ is
up... :)
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-02 22:15 ` Toke Høiland-Jørgensen
@ 2012-12-02 22:30 ` Eric Dumazet
2012-12-02 22:51 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 56+ messages in thread
From: Eric Dumazet @ 2012-12-02 22:30 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, codel, cerowrt-devel, bloat, paulmck, John Crispin
On Sun, 2012-12-02 at 23:15 +0100, Toke Høiland-Jørgensen wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
>
> > If the next packet arrives while the bucket is still in old_flows,
> > we wont put the bucket in new_flow, its bucket have to wait its turn in
> > the RR list.
>
> Right. So in order to stay in the new_flows list, the flow needs to be
> slow enough that no new data is added to it in the time it takes
> subsequent dequeue invocations to get back to it in the old_flows list?
I guess so. It must travel through the old_flows list from the tail to
the head (so each flows in old_flows must be serviced by one quantum),
to get the right to be declared as 'empty' and gain the honor to get
promoted to 'new_flows' next time a packet is enqueued.
If all flows are slow, but the queue can still build up, because there
are too many flows for the possible bandwidth, than all flows are
actually considered as thick, as if we had a single RR queue. (the
old_flows).
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-02 22:30 ` Eric Dumazet
@ 2012-12-02 22:51 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-12-02 22:51 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Valente, codel, cerowrt-devel, bloat, paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 704 bytes --]
Eric Dumazet <eric.dumazet@gmail.com> writes:
> I guess so. It must travel through the old_flows list from the tail to
> the head (so each flows in old_flows must be serviced by one quantum),
> to get the right to be declared as 'empty' and gain the honor to get
> promoted to 'new_flows' next time a packet is enqueued.
>
> If all flows are slow, but the queue can still build up, because there
> are too many flows for the possible bandwidth, than all flows are
> actually considered as thick, as if we had a single RR queue. (the
> old_flows).
Right, that makes sense. Everything is relative. Thanks for clearing
things up! :)
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-11-28 17:20 ` [Codel] " Paul E. McKenney
@ 2012-12-02 23:06 ` Paul E. McKenney
2012-12-03 11:24 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 56+ messages in thread
From: Paul E. McKenney @ 2012-12-02 23:06 UTC (permalink / raw)
To: Jim Gettys
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
[-- Attachment #1: Type: text/plain, Size: 300 bytes --]
On Wed, Nov 28, 2012 at 09:20:58AM -0800, Paul E. McKenney wrote:
[ . . . ]
> > When you've done another round on the document, I'll do a more detailed
> > read.
>
> Sounds good! I expect to have another version by the end of the weekend.
And please see attached. Thoughts?
Thanx, Paul
[-- Attachment #2: SFQ2012.12.02a.tgz --]
[-- Type: application/x-gtar-compressed, Size: 399993 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-02 21:47 ` Andrew McGregor
@ 2012-12-03 8:04 ` Dave Taht
0 siblings, 0 replies; 56+ messages in thread
From: Dave Taht @ 2012-12-03 8:04 UTC (permalink / raw)
To: Andrew McGregor
Cc: Paolo Valente, Toke Høiland-Jørgensen, codel,
cerowrt-devel, bloat, paulmck, John Crispin
On Sun, Dec 2, 2012 at 10:47 PM, Andrew McGregor <andrewmcgr@gmail.com> wrote:
>
> On 3/12/2012, at 10:37 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
>> Eric Dumazet <eric.dumazet@gmail.com> writes:
>>
>>> This can help if you really want to avoid a thick flow sharing a thin
>>> flow bucket, but given that all packets are going eventually into the
>>> Internet (or equivalent crowded network), its not really a clear win.
>>
>> I've been trying to grok the fq_codel code by reading through it while
>> following the discussion in the article, and I'm having a bit of trouble
>> squaring the thin/thick (or "hog"/"non-hog") flow designation of the
>> article with the code. As far as I can tell from the code, there are two
>> lists, called new_flows and old_flows; and a flow starts out as 'new'
>> and stays that way until it has sent a quantum of bytes or codel fails
>> to dequeue a packet from it, whereupon it is moved to the end of the
>> old_flows list. It then stays in the old_flows list for the rest of its
>> "life".
>
> 'new' is what I was calling 'thin', and 'old' is the 'thick' list.
The phrasing that I use for this stuff is "sparse" rather than "thin",
if that helps any.
TCP mice are VERY sparse as they are a function of RTT, and thus most
web flows on a home router end up staying in the new flow queue.
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-02 23:06 ` Paul E. McKenney
@ 2012-12-03 11:24 ` Toke Høiland-Jørgensen
2012-12-03 11:31 ` Dave Taht
0 siblings, 1 reply; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-12-03 11:24 UTC (permalink / raw)
To: paulmck
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat, John Crispin
[-- Attachment #1: Type: text/plain, Size: 305 bytes --]
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> And please see attached. Thoughts?
"In addition, you network driver must be instrumented to support packet
scheduling..."
So what happens if you run fq_codel on a non-BQL driver?
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-03 11:24 ` Toke Høiland-Jørgensen
@ 2012-12-03 11:31 ` Dave Taht
2012-12-03 12:54 ` Toke Høiland-Jørgensen
` (2 more replies)
0 siblings, 3 replies; 56+ messages in thread
From: Dave Taht @ 2012-12-03 11:31 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat,
paulmck, John Crispin
On Mon, Dec 3, 2012 at 12:24 PM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>
>> And please see attached. Thoughts?
>
>"In addition, you network driver must be instrumented to support packet
^^^^ your ethernet network driver
different technologies have different answers. BQL is an answer to
ethernet. ADSL benefits from closely tieing buffering to being nearly
zero and to signalling the actual packet delivery from the hardware.
Wifi, well, I don't want to talk about wifi...
> scheduling..."
>
> So what happens if you run fq_codel on a non-BQL driver?
you have no control. The tx queue rings are flooded before control is
handed back to the fq_codel scheduler. You can get some control back
on a non-BQL driver by reducing the number of tx descriptors
dramatically, but that leads to issues with small vs big packets....
Be worthwhile to plot BQL vs non-BQL on the same driver/device....
>
> -Toke
>
> --
> Toke Høiland-Jørgensen
> toke@toke.dk
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-03 11:31 ` Dave Taht
@ 2012-12-03 12:54 ` Toke Høiland-Jørgensen
2012-12-03 14:58 ` Paul E. McKenney
2012-12-03 15:03 ` Paul E. McKenney
2012-12-03 15:58 ` David Woodhouse
2 siblings, 1 reply; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-12-03 12:54 UTC (permalink / raw)
To: Dave Taht
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat,
paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 771 bytes --]
Dave Taht <dave.taht@gmail.com> writes:
> you have no control. The tx queue rings are flooded before control is
> handed back to the fq_codel scheduler. You can get some control back
> on a non-BQL driver by reducing the number of tx descriptors
> dramatically, but that leads to issues with small vs big packets....
Right, so BQL is pretty much a requirement to get anything worthwhile
out of fq_codel? Or is that too strongly put?
Without BQL, exactly how much does the driver queue up? I've heard
someone say 1000 packets for ethernet, but is that in a different
context? E.g. pfifo_fast uses /sys/class/net/*/tx_queue_len for queue
length (I think?), but if the qdisc doesn't, does the driver?
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-03 12:54 ` Toke Høiland-Jørgensen
@ 2012-12-03 14:58 ` Paul E. McKenney
2012-12-03 15:19 ` Toke Høiland-Jørgensen
2012-12-03 15:49 ` Eric Dumazet
0 siblings, 2 replies; 56+ messages in thread
From: Paul E. McKenney @ 2012-12-03 14:58 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat, John Crispin
On Mon, Dec 03, 2012 at 01:54:35PM +0100, Toke Høiland-Jørgensen wrote:
> Dave Taht <dave.taht@gmail.com> writes:
>
> > you have no control. The tx queue rings are flooded before control is
> > handed back to the fq_codel scheduler. You can get some control back
> > on a non-BQL driver by reducing the number of tx descriptors
> > dramatically, but that leads to issues with small vs big packets....
>
> Right, so BQL is pretty much a requirement to get anything worthwhile
> out of fq_codel? Or is that too strongly put?
If I understand the code correctly, without BQL, FQ-CoDel does not get
invoked at all.
Thanx, Paul
> Without BQL, exactly how much does the driver queue up? I've heard
> someone say 1000 packets for ethernet, but is that in a different
> context? E.g. pfifo_fast uses /sys/class/net/*/tx_queue_len for queue
> length (I think?), but if the qdisc doesn't, does the driver?
>
> -Toke
>
> --
> Toke Høiland-Jørgensen
> toke@toke.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-03 11:31 ` Dave Taht
2012-12-03 12:54 ` Toke Høiland-Jørgensen
@ 2012-12-03 15:03 ` Paul E. McKenney
2012-12-03 15:58 ` David Woodhouse
2 siblings, 0 replies; 56+ messages in thread
From: Paul E. McKenney @ 2012-12-03 15:03 UTC (permalink / raw)
To: Dave Taht
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
On Mon, Dec 03, 2012 at 12:31:30PM +0100, Dave Taht wrote:
> On Mon, Dec 3, 2012 at 12:24 PM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> >
> >> And please see attached. Thoughts?
> >
> >"In addition, you network driver must be instrumented to support packet
> ^^^^ your ethernet network driver
Good catch, fixed!
Thanx, Paul
> different technologies have different answers. BQL is an answer to
> ethernet. ADSL benefits from closely tieing buffering to being nearly
> zero and to signalling the actual packet delivery from the hardware.
>
> Wifi, well, I don't want to talk about wifi...
>
> > scheduling..."
> >
> > So what happens if you run fq_codel on a non-BQL driver?
>
> you have no control. The tx queue rings are flooded before control is
> handed back to the fq_codel scheduler. You can get some control back
> on a non-BQL driver by reducing the number of tx descriptors
> dramatically, but that leads to issues with small vs big packets....
>
> Be worthwhile to plot BQL vs non-BQL on the same driver/device....
>
> >
> > -Toke
> >
> > --
> > Toke Høiland-Jørgensen
> > toke@toke.dk
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-03 14:58 ` Paul E. McKenney
@ 2012-12-03 15:19 ` Toke Høiland-Jørgensen
2012-12-03 15:49 ` Eric Dumazet
1 sibling, 0 replies; 56+ messages in thread
From: Toke Høiland-Jørgensen @ 2012-12-03 15:19 UTC (permalink / raw)
To: paulmck
Cc: Paolo Valente, Eric Raymond, codel, cerowrt-devel, bloat, John Crispin
[-- Attachment #1: Type: text/plain, Size: 865 bytes --]
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> If I understand the code correctly, without BQL, FQ-CoDel does not get
> invoked at all.
But doesn't fq_codel live at the qdisc layer? How can BQL determine
whether or not fq_codel gets invoked? Or does "invoked" mean something
different in this context?
I can understand how this happens *effectively* by fq_codel just
dequeuing packets at once into the device buffers, thus rendering the
whole exercise moot.
Either way, perhaps spelling this out in the article would be a good
idea so people don't start experimenting with fq_codel and conclude it
doesn't work, when the culprit is really missing BQL support in their
driver? I know that it technically says that now, but perhaps a more
prominent mention might be prudent? :)
-Toke
--
Toke Høiland-Jørgensen
toke@toke.dk
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-03 14:58 ` Paul E. McKenney
2012-12-03 15:19 ` Toke Høiland-Jørgensen
@ 2012-12-03 15:49 ` Eric Dumazet
1 sibling, 0 replies; 56+ messages in thread
From: Eric Dumazet @ 2012-12-03 15:49 UTC (permalink / raw)
To: paulmck
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, John Crispin
On Mon, 2012-12-03 at 06:58 -0800, Paul E. McKenney wrote:
> On Mon, Dec 03, 2012 at 01:54:35PM +0100, Toke Høiland-Jørgensen wrote:
> > Dave Taht <dave.taht@gmail.com> writes:
> >
> > > you have no control. The tx queue rings are flooded before control is
> > > handed back to the fq_codel scheduler. You can get some control back
> > > on a non-BQL driver by reducing the number of tx descriptors
> > > dramatically, but that leads to issues with small vs big packets....
> >
> > Right, so BQL is pretty much a requirement to get anything worthwhile
> > out of fq_codel? Or is that too strongly put?
>
> If I understand the code correctly, without BQL, FQ-CoDel does not get
> invoked at all.
>
>
It depends on many factors.
You can setup a qdisc hierarchy of on HTB/TBF and one fq_codel.
Egress rate limiting is a good way to permit to build a queue in qdisc
(in fq_codel)
BQL in itself avoids a too large queue on the device driver, thats only
a part of the problem.
If your setup is a computer with 1Gbps ethernet link, and a home router
with a 1Mbps upstream link that you cant access (ie install fq_codel on
it), BQL on the 1Gbps link wont help in itself.
It will still allow your computer to send 1Gbps worth of data that will
build a huge queue on the home router.
One way to handle that is using rate limiting to not create a queue on
the router.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-03 11:31 ` Dave Taht
2012-12-03 12:54 ` Toke Høiland-Jørgensen
2012-12-03 15:03 ` Paul E. McKenney
@ 2012-12-03 15:58 ` David Woodhouse
2012-12-04 3:13 ` Dan Siemon
2 siblings, 1 reply; 56+ messages in thread
From: David Woodhouse @ 2012-12-03 15:58 UTC (permalink / raw)
To: Dave Taht
Cc: Paolo Valente, Toke Høiland-Jørgensen, Eric Raymond,
codel, cerowrt-devel, bloat, paulmck, John Crispin
[-- Attachment #1: Type: text/plain, Size: 2463 bytes --]
On Mon, 2012-12-03 at 12:31 +0100, Dave Taht wrote:
> ADSL benefits from closely tieing buffering to being nearly
> zero and to signalling the actual packet delivery from the hardware.
Er, does it?
ADSL is basically just ATM with a strange PHY. You have a bunch of
options for how you use this ATM link. Mostly it's RFC2364 PPP-over-ATM
or it's PPPoE on top of RFC2684 Ethernet-over-ATM.
In about the 3.4 kernel I fixed PPPoATM to limit its queue depth to 2
packets, so the PPP netdev queue/qdisc is *fairly* much in control. And
thus I think fq_codel should work, even if we don't have BQL on PPP?
In net-next a couple of days ago, I fixed the BR2684 driver to limit
*its* queue depth to 2 packets too, so there there's "only" the
txqueuelen on the 'nas0' virtual Ether-on-ATM device, which defaults to
100 packets, below the PPP netdev. If you cut that down to single
figures, perhaps fq_codel stands a chance on PPPoEoA too?
I'd actually like to track those packets all the way down, and make BQL
work on PPP devices generically — by installing a skb destructor so that
we get notified when the packet *actually* leaves the box, however many
layers of tunnelling it goes through. What we have so far is helpful,
but surely BQL will still make things better?
There's also straight IP on that virtual Ethernet interface too, which
is rarer. And then the nas0 interface would be the one with your default
route, and you'd be using fq_codel directly on that. So it should be OK.
I have posted an RFC patch to the netdev list which adds BQL to BR2684
virtual Ethernet devices, but it's "interesting" because the ATM driver
normally prepends a header before handing it to the ATM device... which
means that the packet we account as *completed* is larger than the
packet we account as *sending*... and the BQL code panics the box when
that happens. I have a workaround, which seems mostly OK.
However, none of this helps much in the case where the PPPoE part and
the Ethernet-over-ATM part are done on separate boxes; a separate ADSL
"modem" and then your own box doing PPPoE to it. In that case, you just
hope that the "modem" does some kind of flow control... is that even
possible for PPPoE or would it have to use pause frames at the Ethernet
level?
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-03 15:58 ` David Woodhouse
@ 2012-12-04 3:13 ` Dan Siemon
2012-12-05 0:01 ` Sebastian Moeller
[not found] ` <1354613026.72238.YahooMailNeo@web126202.mail.ne1.yahoo.com>
0 siblings, 2 replies; 56+ messages in thread
From: Dan Siemon @ 2012-12-04 3:13 UTC (permalink / raw)
To: bloat, codel, cerowrt-devel
On Mon, 2012-12-03 at 15:58 +0000, David Woodhouse wrote:
> ADSL is basically just ATM with a strange PHY. You have a bunch of
> options for how you use this ATM link. Mostly it's RFC2364 PPP-over-ATM
> or it's PPPoE on top of RFC2684 Ethernet-over-ATM.
Speaking of xDSL, does anyone on the list happen to have a good
understanding of how much per-packet overhead there is on VDSL2? I've
been tweaking the buffering and shaping on my upstream link and noticed
unexpected behavior with small packets.
The link below (use wayback machine version) has a good description of
per-packet overhead for various forms of ADSL but I haven't found
something similar for more modern DSL variants.
http://www.adsl-optimizer.dk/thesis/
http://web.archive.org/web/20090422131547/http://www.adsl-optimizer.dk/thesis/
I started a discussion on DSLReports
http://www.dslreports.com/forum/r27565251-Internet-Per-packet-overhead-on-Bell-s-VDSL-ATM-based-
but experimentally the overhead discussed there doesn't appear to be
correct
http://www.coverfire.com/archives/2012/11/29/per-packet-overhead-on-vdsl2/
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
2012-12-04 3:13 ` Dan Siemon
@ 2012-12-05 0:01 ` Sebastian Moeller
[not found] ` <1354613026.72238.YahooMailNeo@web126202.mail.ne1.yahoo.com>
1 sibling, 0 replies; 56+ messages in thread
From: Sebastian Moeller @ 2012-12-05 0:01 UTC (permalink / raw)
To: Dan Siemon; +Cc: codel, cerowrt-devel, bloat
Hi Dan,
silly question, are you sure your ISP actually delivers PTM-TC instead of ATM? (Should be easy to check, ATM carrier will show a "quantized" in crease in ping time, increasing only every 48byte, but I guess you know more about theses things than I do). Also what about simply empirically increasing the overhead in your shaper until the anomaly goes away to figure out whether this is simply a misjudged per packet overhead? And then maybe the exact overhead value will give a clue about what is happening there…
best
Sebastian
On Dec 3, 2012, at 19:13 , Dan Siemon wrote:
> On Mon, 2012-12-03 at 15:58 +0000, David Woodhouse wrote:
>> ADSL is basically just ATM with a strange PHY. You have a bunch of
>> options for how you use this ATM link. Mostly it's RFC2364 PPP-over-ATM
>> or it's PPPoE on top of RFC2684 Ethernet-over-ATM.
>
> Speaking of xDSL, does anyone on the list happen to have a good
> understanding of how much per-packet overhead there is on VDSL2? I've
> been tweaking the buffering and shaping on my upstream link and noticed
> unexpected behavior with small packets.
>
> The link below (use wayback machine version) has a good description of
> per-packet overhead for various forms of ADSL but I haven't found
> something similar for more modern DSL variants.
> http://www.adsl-optimizer.dk/thesis/
> http://web.archive.org/web/20090422131547/http://www.adsl-optimizer.dk/thesis/
>
> I started a discussion on DSLReports
> http://www.dslreports.com/forum/r27565251-Internet-Per-packet-overhead-on-Bell-s-VDSL-ATM-based-
> but experimentally the overhead discussed there doesn't appear to be
> correct
> http://www.coverfire.com/archives/2012/11/29/per-packet-overhead-on-vdsl2/
>
>
>
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
[not found] ` <1354613026.72238.YahooMailNeo@web126202.mail.ne1.yahoo.com>
@ 2012-12-05 3:41 ` Dan Siemon
[not found] ` <1354739624.4431.YahooMailNeo@web126205.mail.ne1.yahoo.com>
0 siblings, 1 reply; 56+ messages in thread
From: Dan Siemon @ 2012-12-05 3:41 UTC (permalink / raw)
To: Alex Burr; +Cc: codel, cerowrt-devel, bloat
On Tue, 2012-12-04 at 01:23 -0800, Alex Burr wrote:
> [Oops, intended to CC the list.]
>
> An extra 15-20ms of latency, which is what you're seeing there,
> shouldn't be caused by packet overhead in the actual VDSL modem part of
> the device. Worst case the implementation of the PTM-TC layer (the
> equivalent of ATM) might round the packet up to a multiple of 64 bytes
> (although it's not supposed to), but unless your line rate is 25kbps,
> that should not cause 20ms of latency.
>
> VDSL2 operates at 4Khz or
> 8Khz symbol rate, and while it contains and interleaving or
> retransmission layer which can add up to 64ms in its most high-latency
> configuration, those layers do not operate at the level of packets and
> I'm pretty sure that that latency cannot be affected by packet size in a
> conformant implementation.
> This latency is being caused by something else.
>
> The
> best way to figure out the per-packet overhead in your VDSL2 modem is
> probably to count the number of packets that get through, not measure
> latency.
Thanks for the info. I guess I'll have to keep digging to figure out
where the latency comes from.
I did a couple more experiments which appear to confirm the large amount
of per-packet overhead:
http://www.coverfire.com/archives/2012/12/04/per-packet-overhead-on-vdsl2-2/
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Codel] [Bloat] [Cerowrt-devel] FQ_Codel lwn draft article review
[not found] ` <1354739624.4431.YahooMailNeo@web126205.mail.ne1.yahoo.com>
@ 2012-12-06 4:12 ` Dan Siemon
0 siblings, 0 replies; 56+ messages in thread
From: Dan Siemon @ 2012-12-06 4:12 UTC (permalink / raw)
To: Alex Burr; +Cc: codel, cerowrt-devel, bloat
[-- Attachment #1: Type: text/plain, Size: 2344 bytes --]
On Wed, 2012-12-05 at 12:33 -0800, Alex Burr wrote:
> > From: Dan Siemon <dan@coverfire.com>
>
>
> > Thanks for the info. I guess I'll have to keep digging to figure out
> > where the latency comes from.
> >
> > I did a couple more experiments which appear to confirm the large amount
> > of per-packet overhead:
> > http://www.coverfire.com/archives/2012/12/04/per-packet-overhead-on-vdsl2-2/
> >
>
> It almost looks like you're being limited to 5k packets/sec. Now, I
> know that some devices will only support a certain packet rate, and
> it's not as stupid as it sounds because it's fairly rare to want to
> send max rate of solely minimum size packets. But I wouldn't expect it
> here, because I would expect a VDSL2 device to work with 100Mbps down,
> 50Mbps up, even if your contract and line only support much less. And
> the device would need to support more than 5k packets/sec to get
> 50Mbps, even at the biggest packet size. I guess you might be able to
> tell by plotting packet rate against packet size with more values of
> packet size: if it's a rate limit, there will be a sharp corner,
> whereas if it's overhead, there will should be more of a curve. The
> figure for the 75byte payload suggests a curve.
I ran the same test with more small packet sizes:
http://www.coverfire.com/archives/2012/12/05/per-packet-overhead-on-vdsl-3/
This looks like it supports the PPS limit theory.
> PTM-TC has a minimum overhead of 2 bytes/packet, IIRC. (I'm not
> counting the 65/64 cell overhead, because that is not (in a good
> implementation) aligned to packet boundaries and is therefore better
> thought of as a 1/65 tax on your line rate). I'm not optimistic about
> tuning shapers based on these details, however, as unlike ATM/AAL5,
> implementations are allowed to insert any amount of padding between
> packets, so the per packet overhead is not predictable from the
> standard. There are also nonstandard framing implementations.
That's disappointing news.
My VDSL2 modem (Alcatel Cellpipe 7130) shows:
Upstream line rate: 7344 kbps
Bearer Upstream payload rate: 6560 kbps
Can you shed some light on what causes the difference? Is it the 65/64
encoding and error correction overhead? I assume this does not take into
account things like the 802.3 header.
Thanks
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2012-12-06 4:12 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAA93jw5yFvrOyXu2s2DY3oK_0v3OaNfnL+1zTteJodfxtAAzcQ@mail.gmail.com>
2012-11-23 8:57 ` [Codel] FQ_Codel lwn draft article review Dave Taht
2012-11-23 22:18 ` Paul E. McKenney
2012-11-24 0:07 ` Toke Høiland-Jørgensen
2012-11-24 16:19 ` Dave Taht
2012-11-24 16:36 ` [Codel] [Cerowrt-devel] " dpreed
2012-11-24 19:57 ` [Codel] " Andrew McGregor
2012-11-26 21:13 ` Rick Jones
2012-11-26 21:19 ` Dave Taht
2012-11-26 22:16 ` Toke Høiland-Jørgensen
2012-11-26 23:21 ` Toke Høiland-Jørgensen
2012-11-26 23:39 ` [Codel] [Cerowrt-devel] " dpreed
2012-11-26 23:58 ` Toke Høiland-Jørgensen
2012-11-26 17:20 ` [Codel] " Paul E. McKenney
2012-11-26 21:05 ` Rick Jones
2012-11-26 23:18 ` [Codel] [Bloat] " Rick Jones
2012-11-27 22:03 ` [Codel] [Cerowrt-devel] " Jim Gettys
2012-11-27 22:31 ` [Codel] [Bloat] " David Lang
2012-11-27 22:54 ` Paul E. McKenney
2012-11-27 23:15 ` Andrew McGregor
2012-11-28 0:51 ` Paul E. McKenney
2012-11-28 17:36 ` Paul E. McKenney
2012-11-28 14:06 ` [Codel] [Cerowrt-devel] [Bloat] " Michael Richardson
2012-11-27 22:49 ` [Codel] [Cerowrt-devel] " Paul E. McKenney
2012-11-27 23:53 ` Greg White
2012-11-28 0:27 ` Paul E. McKenney
2012-11-28 3:43 ` Kathleen Nichols
2012-11-28 4:38 ` Paul E. McKenney
2012-11-28 16:01 ` Paul E. McKenney
2012-11-28 16:16 ` Jonathan Morton
2012-11-28 17:44 ` Paul E. McKenney
2012-11-28 18:37 ` [Codel] [Bloat] " Michael Richardson
2012-11-28 18:51 ` Eric Dumazet
2012-11-28 21:44 ` Michael Richardson
2012-11-28 19:00 ` Eric Dumazet
2012-12-02 21:37 ` Toke Høiland-Jørgensen
2012-12-02 21:47 ` Andrew McGregor
2012-12-03 8:04 ` Dave Taht
2012-12-02 22:07 ` Eric Dumazet
2012-12-02 22:15 ` Toke Høiland-Jørgensen
2012-12-02 22:30 ` Eric Dumazet
2012-12-02 22:51 ` Toke Høiland-Jørgensen
2012-11-28 17:20 ` [Codel] " Paul E. McKenney
2012-12-02 23:06 ` Paul E. McKenney
2012-12-03 11:24 ` Toke Høiland-Jørgensen
2012-12-03 11:31 ` Dave Taht
2012-12-03 12:54 ` Toke Høiland-Jørgensen
2012-12-03 14:58 ` Paul E. McKenney
2012-12-03 15:19 ` Toke Høiland-Jørgensen
2012-12-03 15:49 ` Eric Dumazet
2012-12-03 15:03 ` Paul E. McKenney
2012-12-03 15:58 ` David Woodhouse
2012-12-04 3:13 ` Dan Siemon
2012-12-05 0:01 ` Sebastian Moeller
[not found] ` <1354613026.72238.YahooMailNeo@web126202.mail.ne1.yahoo.com>
2012-12-05 3:41 ` [Codel] [Bloat] " Dan Siemon
[not found] ` <1354739624.4431.YahooMailNeo@web126205.mail.ne1.yahoo.com>
2012-12-06 4:12 ` Dan Siemon
2012-11-30 1:09 ` Dan Siemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox