* Please enter issues into the issue tracker - Issue system organisation needed.
@ 2011-02-24 14:19 Jim Gettys
2011-02-24 15:00 ` [Bloat] " Fred Baker
0 siblings, 1 reply; 19+ messages in thread
From: Jim Gettys @ 2011-02-24 14:19 UTC (permalink / raw)
To: bloat-devel, bloat
We have lots of different issues to track. We are uncovering more and
more with time, and the responsibility for the issues is all over the
Internet ecology.
These issues include drivers in multiple operating systems, queue
disciplines, OS distribution problems, broken networks, broadband gear,
ISP's with broken configurations, routers with broken configurations,
etc, etc, etc. Many of the responsible organizations are completely
unaware they have issues at the moment, and when they do wake up, the
need to have a work list. Serious as bufferbloat is, and generating
tremendous support costs as it does, it is hidden among most
organisations issue tracking as obscure, hard to explain problems, that
have heretofore defied analysis.
I think both for the sanity of the upstream open source projects and
companies that depend on it, commercial software and hardware vendors,
and our own sanity, it's time to start to keep track of these problems.
A simple example is in the following mail, where Juliusz identified a
bunch of Linux drivers with problems communicating back-pressure.
https://lists.bufferbloat.net/pipermail/bloat/2011-February/000036.html
These driver bugs, of course, can and will be worked upstream in the
project and/or responsible organisation; but from a practical point of
view, these issues aren't really going to be fixed until people can
actually take action on their own (by upgrading affected OS's, routers,
broadband gear, etc. as appropriate).
So I think we need to track bufferbloat issues in possibly a different
way (and maybe with a bit different work flow) than a usual tracking system.
First
=====
I think we need to capture what we know. I encourage people to start
entering issues in the bloat tracker found at:
http://www.bufferbloat.net/projects/bloat/issues/new
Note that redmine lets us move issues from one (sub)project to another,
so we're best off capturing what we know immediately; we can sort and
redeal later.
Note: "We're all bozos on this glass bus, no stones allowed". We know
there are problems all over; issue descriptions should always be polite
and constructive, please!
Noting these issues will help people already involved (the mailing list
had > 120 people the last I looked, from large numbers of organisations)
take concrete action. Issues buried in mail threads are too easy to lose.
Second
======
As this effort grows, we'll need to organise the result, and delegate it
appropriately as the effort scales.
Today, we're probably best off with a single project: but we hope
certainly that won't be reasonable with time, possibly almost immediately.
We installed Redmine in particular as it has a competent issue tracking
system, as well as good (sub)project management, which can easily be
delegated to others (one of the huge problems with Bugzilla or Trac is
the lack of project management).
If anyone is looking for a way to help bufferbloat and has experience
with tracking systems on large, complex projects, I'd love to see
someone organise this effort, and put some thought and structure into
the categories, (sub)projects and work flow of issue states. I know from
my OLPC experience just how important this can be, though this is a
somewhat different situation.
Best regards,
- Jim
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed.
2011-02-24 14:19 Please enter issues into the issue tracker - Issue system organisation needed Jim Gettys
@ 2011-02-24 15:00 ` Fred Baker
2011-02-24 16:32 ` Jim Gettys
0 siblings, 1 reply; 19+ messages in thread
From: Fred Baker @ 2011-02-24 15:00 UTC (permalink / raw)
To: Jim Gettys; +Cc: bloat-devel, bloat
Thanks, Jim.
One thing that would help me; I have been a fan of RFC 2309 and RFC 3168 for some time. I suspect is that between them any given queue should be manageable to a set depth; tests I have run suggest that with RED settings, average queue depth under load approximates min-threshold pretty closely, and ECN has the advantage that it manages to do so without dropping traffic. I suspect that this community's efforts will support that. Some thoughts:
First, if the premise is wrong or there is a materially better solution, I'm all ears.
Second, if the premise is correct, I'd like data that I can put in front of people to get them to configure it.
Third, there is a long-standing debate between Van and Sally on what units to use with min-threshold. Sally argues, or argued, in favor of byte count, as that correlates with time and biases mark/drop toward large datagrams, which is to say datagrams carrying data - which happen to be the datagrams that act as signals to Reno-et-al. Van argues, or argued, in favor of buffers, as what is being managed is the router's buffer resource. In our implementations, we provide for both options, and personally Sally's model makes more mathematical sense to me. Is there a "best practice" we can document? Is there a "best practice" we can document regarding min-threshold and max-threshold settings?
In private email, I shared an approach that might make tuning a little more reliable and not require a max-threshold. If there is material being developed - an updated version of RED-Lite, or experience with other approaches, anything that would allow us to make the AQM algorithm self-tuning would be of great interest. The result of any such self-tuning algorithm is that it should be usable with dropping or marking, should keep the line operating at full utilization as long as there is traffic to send (eg not depend on the line occasionally going idle), maintain the queue at a "reasonably low delay" level under normal circumstances, not result in a given session being forced to shut down entirely, and not result in multiple drops on the same session within the same RTT in the normal case.
There is one special case that I have wondered about from time to time; the impact of loss of SYNs or SYN-ACKs. The network I started thinking about that in what an African network that was seriously underprovisioned (they needed to, and eventually did, spend more money) on a satcom link. In essence, I wondered if there was a way that one could permit the first or second retransmission of a SYN as opposed to the initial one to get through in times of heavy load. The effect might be to let an existing session quiesce. That falls under "research" :-)
We have issues with at least some of our hardware in this; on the GSR, for example, queues are on the output card but IP processing is on the input card, meaning that we have lost all IP-related information by the time one would like to set ECN CE or inspect the DSCP value, and on the input card we have no real-time (microsecond-scale) way to inspect queue depth or integrated rate of a queue on the output card. The GSR is a mite elderly, but still widely used, and no, folks aren't going to replace cards at this stage in its life. So, ideas people have on working around such issues would be of interest.
On Feb 24, 2011, at 7:19 AM, Jim Gettys wrote:
> We have lots of different issues to track. We are uncovering more and more with time, and the responsibility for the issues is all over the Internet ecology.
>
> These issues include drivers in multiple operating systems, queue disciplines, OS distribution problems, broken networks, broadband gear, ISP's with broken configurations, routers with broken configurations, etc, etc, etc. Many of the responsible organizations are completely unaware they have issues at the moment, and when they do wake up, the need to have a work list. Serious as bufferbloat is, and generating tremendous support costs as it does, it is hidden among most organisations issue tracking as obscure, hard to explain problems, that have heretofore defied analysis.
>
> I think both for the sanity of the upstream open source projects and companies that depend on it, commercial software and hardware vendors, and our own sanity, it's time to start to keep track of these problems.
>
> A simple example is in the following mail, where Juliusz identified a bunch of Linux drivers with problems communicating back-pressure.
> https://lists.bufferbloat.net/pipermail/bloat/2011-February/000036.html
>
> These driver bugs, of course, can and will be worked upstream in the project and/or responsible organisation; but from a practical point of view, these issues aren't really going to be fixed until people can actually take action on their own (by upgrading affected OS's, routers, broadband gear, etc. as appropriate).
>
> So I think we need to track bufferbloat issues in possibly a different way (and maybe with a bit different work flow) than a usual tracking system.
>
> First
> =====
> I think we need to capture what we know. I encourage people to start entering issues in the bloat tracker found at:
>
> http://www.bufferbloat.net/projects/bloat/issues/new
>
> Note that redmine lets us move issues from one (sub)project to another, so we're best off capturing what we know immediately; we can sort and redeal later.
>
> Note: "We're all bozos on this glass bus, no stones allowed". We know there are problems all over; issue descriptions should always be polite and constructive, please!
>
> Noting these issues will help people already involved (the mailing list had > 120 people the last I looked, from large numbers of organisations) take concrete action. Issues buried in mail threads are too easy to lose.
>
> Second
> ======
> As this effort grows, we'll need to organise the result, and delegate it appropriately as the effort scales.
>
> Today, we're probably best off with a single project: but we hope certainly that won't be reasonable with time, possibly almost immediately.
>
> We installed Redmine in particular as it has a competent issue tracking system, as well as good (sub)project management, which can easily be delegated to others (one of the huge problems with Bugzilla or Trac is the lack of project management).
>
> If anyone is looking for a way to help bufferbloat and has experience with tracking systems on large, complex projects, I'd love to see someone organise this effort, and put some thought and structure into the categories, (sub)projects and work flow of issue states. I know from my OLPC experience just how important this can be, though this is a somewhat different situation.
>
>
> Best regards,
> - Jim
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed.
2011-02-24 15:00 ` [Bloat] " Fred Baker
@ 2011-02-24 16:32 ` Jim Gettys
2011-02-24 17:08 ` Eric Dumazet
0 siblings, 1 reply; 19+ messages in thread
From: Jim Gettys @ 2011-02-24 16:32 UTC (permalink / raw)
To: Fred Baker; +Cc: Van Jacobson, Kathleen Nichols, bloat-devel, bloat
On 02/24/2011 10:00 AM, Fred Baker wrote:
> Thanks, Jim.
>
> One thing that would help me; I have been a fan of RFC 2309 and RFC 3168 for some time. I suspect is that between them any given queue should be manageable to a set depth; tests I have run suggest that with RED settings, average queue depth under load approximates min-threshold pretty closely, and ECN has the advantage that it manages to do so without dropping traffic. I suspect that this community's efforts will support that. Some thoughts:
>
> First, if the premise is wrong or there is a materially better solution, I'm all ears.
I certainly agree!
The conversations I've had with Van, (as I wrote up in my blog at:
http://gettys.wordpress.com/2010/12/17/red-in-a-different-light/
however, are that classic RED 93 has no chance of solving the problems
we face in home routers and broadband, both due to its tuning problems
and the high dynamic range of goodput of traffic, and the greatly
variable kinds of traffic (which we see aggregated in core routers,
where classic RED has been effective).
Certainly those network operators not enabling AQM who can should do so:
and this is hurting corporate networks, some ISP's and the broadband
deployment, as shown by Characterizing Residential Broadband Networks
(http://broadband.mpi-sws.org/residential) by Dischinger et. al. As far
as data, the lack of tools is hurting. Smokeping inside ALU made me
suspect we were running without AQM internally (by spikey latency); I
confirmed this by talking with our IT department (we do sophisticated
classification for VOIP, etc.). As Windows XP retires, this will become
much more an issue, as single machines will be able to saturate pretty
much any path using a single TCP connection.
But it's clear that RED 93 isn't enough for everything, and its
shortcomings are in large part why many network operators have not
enabled it.
So we need better AQM algorithms and extensive testing: as you may have
seen, SFB just went into the Linux mainline this morning.
I'd like it if Kathie and Van would get their "RED in a different light"
paper done aSAP, with its nRED algorithm so we can try it out as well.
I've seen a somewhat later version than that which escaped onto the
network, but it's not quite ready for public consumption.
As far as ECN goes, Steve Bauer and Robert Beverly have been studying
ECN deployment
(http://gettys.wordpress.com/2011/02/22/caida-workshop-aims-2011-bauer-and-beverly-ecn-results/);
I'm encouraged that it seems to be finally deploying, but we need better
tools to debug the remaining broken hardware and networks. I'm hoping
we can start using ECN soon in some environments (e.g. handsets)
immediately, while take a more guarded view about being able to use it
everywhere. We'll know more as they get further into that study.
So I think there is a BCP that needs to exist (and need to be updated as
better queue management algorithms deploy frequently) to help people
understand what they can/should do immediately, and do in different
circumstances as better AQM algorithms are implemented and deploy and
ECN deploy.
I've toyed with the idea of whether a bufferbloat BOF at the Prague IETF
should exist. Your opinion would be valued here.
>
> Second, if the premise is correct, I'd like data that I can put in front of people to get them to configure it.
For a (lower bound) on the mess we are in, the Netalyzr data is the best
I've seen for end users.
For broadband head ends, the Dishinger et. al. paper is the best I've seen.
Smokeping is really wonderful for monitoring and detecting potential
bufferbloat; but most don't even know of its existence. (anyone want to
help set smokeping up on bufferbloat.net? the smokeping version at dsl
reports is now quite dated).
The best path dignostic tool I'm aware of right now is pingplotter; we
need more (and freely available tools) for diagnosing bufferbloat in
both the public internet and private networks. mtr on Linux is a step or
two above old fashioned traceroute and ping. Steve Bauer knows how to
modify traceroute/mtr to give us better ECN diagnosis. I'd love it if
someone would take mtr under their wing and push it forward for both
bufferbloat detection and ECN testing.
Dave Clark arranged for the FCC tests (SamKnows) to contain a "latency
under load" test; we've also talked to the Ookla (Speedtest.net) folks,
who are interested in adding a test once they've finished their current
infrastructure rollout.
But that still leaves us with poor tools; detecting you have a problem
while you are provoking it is pretty easy: but right now, problem
reports won't easily get sorted to which hop is the bottleneck due to
the poor tools in inexpert hands, and we have problems everywhere from
base host OS's, to home routers, to broadband gear, to some ISP's, to 3g
wireless.
And some of these networks are complex aggregates: so while there are 3g
problems I know, not only can RNC's be issues, but handsets (due to host
bufferbloat) and the backhaul networks if they are not running AQM.
Thankfully some of these problems are really easy to demonstrate by
anyone, as I did early in my blog sequence (e.g. host ethernet and
wireless), and the engineers can see the results directly to their
personal benefit. I hope this will go a long way toward making
believers of people.
But if we don't have better diagnostic tools quickly, the lack will
generate a significantly higher support problem for ISP's; this seems
bad to me, when much of the problem is inside homes and out of ISP's
control.
>
> Third, there is a long-standing debate between Van and Sally on what units to use with min-threshold. Sally argues, or argued, in favor of byte count, as that correlates with time and biases mark/drop toward large datagrams, which is to say datagrams carrying data - which happen to be the datagrams that act as signals to Reno-et-al. Van argues, or argued, in favor of buffers, as what is being managed is the router's buffer resource. In our implementations, we provide for both options, and personally Sally's model makes more mathematical sense to me. Is there a "best practice" we can document? Is there a "best practice" we can document regarding min-threshold and max-threshold settings?
I'm really not the right person to ask; queue management has never been
my area. I've been a (network based) UI guy who stumbled into the
problem, realising that the issues I was seeing was the kiss of death to
a large class of applications (particularly the class of apps I get paid
to worry about...) I knew just enough to know what I saw was broken,
and knew the right people to go ask about it.
>
> In private email, I shared an approach that might make tuning a little more reliable and not require a max-threshold. If there is material being developed - an updated version of RED-Lite, or experience with other approaches, anything that would allow us to make the AQM algorithm self-tuning would be of great interest. The result of any such self-tuning algorithm is that it should be usable with dropping or marking, should keep the line operating at full utilization as long as there is traffic to send (eg not depend on the line occasionally going idle), maintain the queue at a "reasonably low delay" level under normal circumstances, not result in a given session being forced to shut down entirely, and not result in multiple drops on the same session within the same RTT in the normal case.
>
We know of the following possibilities:
o the SFB stuff
o the work Van pointed us at for 802.11 that people are implementing in
Linux as we speak (see: http://www.hamilton.ie/tianji_li/buffersizing.html)
o Your suggestion, if you share it publicly
o the nRED stuff of Kathie and Van, once they can get us a consistent
document to work from.
o other possibilities we haven't heard of yet.
o evil TCP tricks, to control window sizes.
I think we need to play with all of these to sort through what really
works. The testing at scale part is going to be the most difficult
problem (having dealt with this at OLPC, and essentially failed to
understand the difficulty of diagnosing systems when testing at scale,
the scars are still fresh on my personal back).
Kathie's warning I posted at
http://gettys.wordpress.com/2011/02/10/goings-on-at-bufferbloat-net/
needs to be taken to heart by all. Some of this is easy (the gross
bufferectomies), but some is very subtle stuff indeed.
We also believe/know:
o classic RED 93 won't work in many of the environments suffering
really badly from bloat today.
o RED 93 and the like *should* be configured everywhere we can
productively do so.
And we don't yet know if ECN can be turned on, though we may know soon.
> There is one special case that I have wondered about from time to time; the impact of loss of SYNs or SYN-ACKs. The network I started thinking about that in what an African network that was seriously underprovisioned (they needed to, and eventually did, spend more money) on a satcom link. In essence, I wondered if there was a way that one could permit the first or second retransmission of a SYN as opposed to the initial one to get through in times of heavy load. The effect might be to let an existing session quiesce. That falls under "research" :-)
Yup. There is lots of research to do; we should start tracking these
items in the tracker as well, to help focus the effort and help funding
and organisation.
>
> We have issues with at least some of our hardware in this; on the GSR, for example, queues are on the output card but IP processing is on the input card, meaning that we have lost all IP-related information by the time one would like to set ECN CE or inspect the DSCP value, and on the input card we have no real-time (microsecond-scale) way to inspect queue depth or integrated rate of a queue on the output card. The GSR is a mite elderly, but still widely used, and no, folks aren't going to replace cards at this stage in its life. So, ideas people have on working around such issues would be of interest.
>
>
You, along with most or all of the industry. We're all in a very big
bus together...
Sometimes the hardware will be impossible to fully fix; often software
firmware upgrades can fix, but also often, the firmware is so old no-one
can fix it any more. Sometimes one can mitigate the headaches, as we're
doing on home broadband by shaping traffic to make sure the buffers
don't fill, or similar network configuration tricks, to ensure the
bottleneck is not in the offending unfixable equipment, until it can be
fixed.
- Jim
>
>
> On Feb 24, 2011, at 7:19 AM, Jim Gettys wrote:
>
>> We have lots of different issues to track. We are uncovering more and more with time, and the responsibility for the issues is all over the Internet ecology.
>>
>> These issues include drivers in multiple operating systems, queue disciplines, OS distribution problems, broken networks, broadband gear, ISP's with broken configurations, routers with broken configurations, etc, etc, etc. Many of the responsible organizations are completely unaware they have issues at the moment, and when they do wake up, the need to have a work list. Serious as bufferbloat is, and generating tremendous support costs as it does, it is hidden among most organisations issue tracking as obscure, hard to explain problems, that have heretofore defied analysis.
>>
>> I think both for the sanity of the upstream open source projects and companies that depend on it, commercial software and hardware vendors, and our own sanity, it's time to start to keep track of these problems.
>>
>> A simple example is in the following mail, where Juliusz identified a bunch of Linux drivers with problems communicating back-pressure.
>> https://lists.bufferbloat.net/pipermail/bloat/2011-February/000036.html
>>
>> These driver bugs, of course, can and will be worked upstream in the project and/or responsible organisation; but from a practical point of view, these issues aren't really going to be fixed until people can actually take action on their own (by upgrading affected OS's, routers, broadband gear, etc. as appropriate).
>>
>> So I think we need to track bufferbloat issues in possibly a different way (and maybe with a bit different work flow) than a usual tracking system.
>>
>> First
>> =====
>> I think we need to capture what we know. I encourage people to start entering issues in the bloat tracker found at:
>>
>> http://www.bufferbloat.net/projects/bloat/issues/new
>>
>> Note that redmine lets us move issues from one (sub)project to another, so we're best off capturing what we know immediately; we can sort and redeal later.
>>
>> Note: "We're all bozos on this glass bus, no stones allowed". We know there are problems all over; issue descriptions should always be polite and constructive, please!
>>
>> Noting these issues will help people already involved (the mailing list had> 120 people the last I looked, from large numbers of organisations) take concrete action. Issues buried in mail threads are too easy to lose.
>>
>> Second
>> ======
>> As this effort grows, we'll need to organise the result, and delegate it appropriately as the effort scales.
>>
>> Today, we're probably best off with a single project: but we hope certainly that won't be reasonable with time, possibly almost immediately.
>>
>> We installed Redmine in particular as it has a competent issue tracking system, as well as good (sub)project management, which can easily be delegated to others (one of the huge problems with Bugzilla or Trac is the lack of project management).
>>
>> If anyone is looking for a way to help bufferbloat and has experience with tracking systems on large, complex projects, I'd love to see someone organise this effort, and put some thought and structure into the categories, (sub)projects and work flow of issue states. I know from my OLPC experience just how important this can be, though this is a somewhat different situation.
>>
>>
>> Best regards,
>> - Jim
>>
>>
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed.
2011-02-24 16:32 ` Jim Gettys
@ 2011-02-24 17:08 ` Eric Dumazet
2011-02-24 18:31 ` Dave Täht
0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2011-02-24 17:08 UTC (permalink / raw)
To: Jim Gettys; +Cc: Van Jacobson, bloat-devel, bloat
Le jeudi 24 février 2011 à 11:32 -0500, Jim Gettys a écrit :
> So we need better AQM algorithms and extensive testing: as you may have
> seen, SFB just went into the Linux mainline this morning.
>
Indeed ;)
> mtr on Linux is a step or
> two above old fashioned traceroute and ping. Steve Bauer knows how to
> modify traceroute/mtr to give us better ECN diagnosis. I'd love it if
> someone would take mtr under their wing and push it forward for both
> bufferbloat detection and ECN testing.
Not sure I understand here.
mtr could send ECN enabled frames to detect ECN blackholes, but since
ICMP replies wont have any ECN information, I cant see added value ?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed.
2011-02-24 17:08 ` Eric Dumazet
@ 2011-02-24 18:31 ` Dave Täht
2011-02-24 19:29 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Dave Täht @ 2011-02-24 18:31 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Van Jacobson, bloat-devel, bloat
Eric Dumazet <eric.dumazet@gmail.com> writes:
> Le jeudi 24 février 2011 à 11:32 -0500, Jim Gettys a écrit :
>
>> So we need better AQM algorithms and extensive testing: as you may have
>> seen, SFB just went into the Linux mainline this morning.
>>
>
> Indeed ;)
I would really like a sane implementation of some basic tc scripts for
SFB and CHOKe to be out there.
Dan Siemon's recent set of bufferbloat-related tests
http://www.coverfire.com/archives/2011/02/21/network-latency-experiments/
appeared to invoke issues between multiple queuing disciplines when
using SFB.
Now that SFB is kernel bound, and CHOKe as well, I've merged up
support for CHOKe and SFB into this git repo for iproute2 and the tc
traffic control tool.
See:
https://github.com/dtaht/iproute2bufferbloat
There is a debloat-testing Linux kernel repo up at:
http://git.infradead.org/debloat-testing.git
It builds, I haven't booted into it yet. The patch set is pretty minor,
if you already have a Linux-2.6 tree it pays to use:
git clone --references your_existing_tree git://git.infradead.org/debloat-testing.git
(Doesn't have SFB yet, either. John?)
--
Dave Taht
http://nex-6.taht.net
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed.
2011-02-24 18:31 ` Dave Täht
@ 2011-02-24 19:29 ` Eric Dumazet
2011-02-25 11:21 ` GSO (was: Please enter issues into the issue tracker - Issue system organisation needed) Jesper Dangaard Brouer
2011-02-24 23:15 ` debloat-testing: Kitten not eaten - was Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed Jim Gettys
2011-02-25 15:40 ` John W. Linville
2 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2011-02-24 19:29 UTC (permalink / raw)
To: Dave Täht; +Cc: Van Jacobson, bloat-devel, bloat
Le jeudi 24 février 2011 à 11:31 -0700, Dave Täht a écrit :
> I would really like a sane implementation of some basic tc scripts for
> SFB and CHOKe to be out there.
>
> Dan Siemon's recent set of bufferbloat-related tests
>
> http://www.coverfire.com/archives/2011/02/21/network-latency-experiments/
>
> appeared to invoke issues between multiple queuing disciplines when
> using SFB.
>
Hmm.. hard to say because I personally use SFQ a lot, in ingress and
egress. It's really good for many uses, but you also need a shaper above
SFQ, of course !
Notes:
- We had some bugs in SFQ that I fixed two months ago, take care !
- Its important to set TSO off (ethtool -K eth0 tso off), or else we
send big packets (up to 64Kbytes) and this used to break SFQ fairness.
This can really hurt latencies of interactive flows.
Extract of SFQ changelog :
commit 18c8d82ae5b802c5d82e0dfbcc08b1b568955f46
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri Dec 31 12:48:55 2010 -0800
sfq: fix slot_dequeue_head()
slot_dequeue_head() should make sure slot skb chain is correct in both
ways, or we can crash if all possible flows are in use.
Jarek pointed out slot_queue_init() can now be done in sfq_init() once,
instead each time a flow is setup.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit eeaeb068f1393b4db4861481bf594bcd1c3eda7a
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue Dec 28 21:53:33 2010 +0000
sch_sfq: allow big packets and be fair
SFQ is currently 'limited' to small packets, because it uses a 15bit
allotment number per flow. Introduce a scale by 8, so that we can handle
full size TSO/GRO packets.
Use appropriate handling to make sure allot is positive before a new
packet is dequeued, so that fairness is respected.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit ee09b3c1cff0335137dc1b146488e4352f640f13
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed Dec 22 11:39:59 2010 -0800
sfq: fix sfq class stats handling
sfq_walk() runs without qdisc lock. By the time it selects a non empty
hash slot and sfq_dump_class_stats() is run (with lock held), slot might
have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of
bounds, and crash in slot_queue_walk()
On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and
allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal
memory access happens, only possibly wrong data reported to user.
Also, slot_dequeue_tail() should make sure slot skb chain is correctly
terminated, or sfq_dump_class_stats() can access freed skbs.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit eda83e3b63e88351310c13c99178eb4634f137b2
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon Dec 20 12:54:58 2010 +0000
net_sched: sch_sfq: better struct layouts
Here is a respin of patch.
I'll send a short patch to make SFQ more fair in presence of large
packets as well.
Thanks
[PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts
This patch shrinks sizeof(struct sfq_sched_data)
from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
reduce text size as well.
text data bss dec hex filename
4821 152 0 4973 136d old/net/sched/sch_sfq.o
4627 136 0 4763 129b new/net/sched/sch_sfq.o
All data for a slot/flow is now grouped in a compact and cache friendly
structure, instead of being spreaded in many different points.
commit aa3e219997e4b949be4199660936099ded0b401f
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon Dec 20 13:18:16 2010 -0800
net_sched: sch_sfq: fix allot handling
When deploying SFQ/IFB here at work, I found the allot management was
pretty wrong in sfq, even changing allot from short to int...
We should init allot for each new flow, not using a previous value found
in slot.
Before patch, I saw bursts of several packets per flow, apparently
denying the default "quantum 1514" limit I had on my SFQ class.
^ permalink raw reply [flat|nested] 19+ messages in thread
* debloat-testing: Kitten not eaten - was Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed.
2011-02-24 18:31 ` Dave Täht
2011-02-24 19:29 ` Eric Dumazet
@ 2011-02-24 23:15 ` Jim Gettys
2011-02-24 23:18 ` Dave Täht
2011-02-25 15:40 ` John W. Linville
2 siblings, 1 reply; 19+ messages in thread
From: Jim Gettys @ 2011-02-24 23:15 UTC (permalink / raw)
To: Dave Täht; +Cc: linville, bloat-devel
On 02/24/2011 01:31 PM, Dave Täht wrote:
> There is a debloat-testing Linux kernel repo up at:
>
> http://git.infradead.org/debloat-testing.git
>
> It builds, I haven't booted into it yet. The patch set is pretty minor,
> if you already have a Linux-2.6 tree it pays to use:
>
> git clone --references your_existing_tree git://git.infradead.org/debloat-testing.git
>
> (Doesn't have SFB yet, either. John?)
I built it; it booted, worked on my eDP display, and did my usual ping +
scp test (to a local system, so that the 802.11 would be saturated).
OOTB, it controls the latency quite decently (no more than a few
milliseconds), but the drop rate on the ICMP running at the same time is
quite high (~25%). Pretty uncontrolled experiment; my son is home ;-).
- Jim
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: debloat-testing: Kitten not eaten - was Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed.
2011-02-24 23:15 ` debloat-testing: Kitten not eaten - was Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed Jim Gettys
@ 2011-02-24 23:18 ` Dave Täht
2011-02-24 23:31 ` Jim Gettys
0 siblings, 1 reply; 19+ messages in thread
From: Dave Täht @ 2011-02-24 23:18 UTC (permalink / raw)
To: Jim Gettys; +Cc: linville, bloat-devel
Jim Gettys <jg@freedesktop.org> writes:
> On 02/24/2011 01:31 PM, Dave Täht wrote:
>
>> There is a debloat-testing Linux kernel repo up at:
>>
>> http://git.infradead.org/debloat-testing.git
>>
>> It builds, I haven't booted into it yet. The patch set is pretty minor,
>> if you already have a Linux-2.6 tree it pays to use:
>>
>> git clone --references your_existing_tree git://git.infradead.org/debloat-testing.git
>>
>> (Doesn't have SFB yet, either. John?)
>
> I built it; it booted, worked on my eDP display, and did my usual ping
> + scp test (to a local system, so that the 802.11 would be saturated).
>
> OOTB, it controls the latency quite decently (no more than a few
> milliseconds), but the drop rate on the ICMP running at the same time
This brings the iwl driver down from 130+ms to a few ms!?
How many more orders of magnitude do you need for *awesome*, rather than decent?
> is quite high (~25%). Pretty uncontrolled experiment; my son is home
> ;-).
That IS high. Perhaps the iwl and the linvilles alg are competing
overmuch. Does your card do aggregation?
> - Jim
>
--
Dave Taht
http://nex-6.taht.net
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: debloat-testing: Kitten not eaten - was Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed.
2011-02-24 23:18 ` Dave Täht
@ 2011-02-24 23:31 ` Jim Gettys
0 siblings, 0 replies; 19+ messages in thread
From: Jim Gettys @ 2011-02-24 23:31 UTC (permalink / raw)
To: Dave Täht; +Cc: linville, bloat-devel
On 02/24/2011 06:18 PM, Dave Täht wrote:
> Jim Gettys<jg@freedesktop.org> writes:
>
>> On 02/24/2011 01:31 PM, Dave Täht wrote:
>>
>>> There is a debloat-testing Linux kernel repo up at:
>>>
>>> http://git.infradead.org/debloat-testing.git
>>>
>>> It builds, I haven't booted into it yet. The patch set is pretty minor,
>>> if you already have a Linux-2.6 tree it pays to use:
>>>
>>> git clone --references your_existing_tree git://git.infradead.org/debloat-testing.git
>>>
>>> (Doesn't have SFB yet, either. John?)
>>
>> I built it; it booted, worked on my eDP display, and did my usual ping
>> + scp test (to a local system, so that the 802.11 would be saturated).
>>
>> OOTB, it controls the latency quite decently (no more than a few
>> milliseconds), but the drop rate on the ICMP running at the same time
>
> This brings the iwl driver down from 130+ms to a few ms!?
rtt min/avg/max/mdev = 0.646/3.631/32.951/5.487 ms
>
> How many more orders of magnitude do you need for *awesome*, rather than decent?
Well, we have to do some computation from first principles here. I
happened to be running G, so there is about 20Mbps goodput available.
If we have 1 packet queued to keep the interface busy all the time, that
should insert something like a 1/2 of one millisecond of latency to
another flow (if I did my math right).
So being in the 3-4ms range is still off by a bit. But we're now much
closer to where we should be, that where we started, so consider this
major progress.
>
>> is quite high (~25%). Pretty uncontrolled experiment; my son is home
>> ;-).
>
> That IS high. Perhaps the iwl and the linvilles alg are competing
> overmuch. Does your card do aggregation?
>
No clue; its a iwl6200agn; which is pretty recent; but I was running g.
Running n, I'm still getting high icmp (~25%) loss rates; the bandwidth
goodput is staying about the same. Then again, it's only running at
54Mbps, so that makes sense.
rtt min/avg/max/mdev = 0.671/2.439/6.018/1.411 ms
- Jim
^ permalink raw reply [flat|nested] 19+ messages in thread
* GSO (was: Please enter issues into the issue tracker - Issue system organisation needed)
2011-02-24 19:29 ` Eric Dumazet
@ 2011-02-25 11:21 ` Jesper Dangaard Brouer
2011-02-25 11:54 ` Eric Dumazet
0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2011-02-25 11:21 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Van Jacobson, bloat-devel, herbert, bloat
On Thu, 2011-02-24 at 20:29 +0100, Eric Dumazet wrote:
> - Its important to set TSO off (ethtool -K eth0 tso off), or else we
> send big packets (up to 64Kbytes) and this used to break SFQ fairness.
> This can really hurt latencies of interactive flows.
Don't you mean "GSO" Generic-Segmentation-Offload (ethtool -K eth0 gso
off) as this happens in the stack. While TSO Tcp-Segmentation-Offload
happens in hardware, and you will not see it in the SFQ qdisc?
I recommend that both is turned off, on small bandwidth links where
latency matters.
I'm wondering if LRO (Large-Receive-Offload) affect you, when you are
using SFQ on ingress?
Recently had some "funny" issues with GRO, where a 100 Mbit/s customer
could "only" get approx 90 Mbit/s throughput to our speed test server
(other customers, in another appartment building could get approx 96
Mbit/s). The issue was resolved by disabling GSO on the speed test
server. The theory is that some switch on the path cannot handle the
bursts generated by GSO, which is max 64K (I think, correct me if I'm
wrong).
When adjusting buffer sizes, its important to take this bursty TCP
behavior into account, which is created by both GSO and TSO. I'm not
saying that the queue size needs to be above 64K. For smaller links, it
might make sense to set it, significantly below 64K, to avoid a GSO
enabled Linux machine to ramp up its window size, which makes it capable
of bursting.
--
Best regards
Jesper Brouer
ComX Networks A/S
Linux Network Kernel Developer
Cand. Scient Datalog / MSc.CS
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSO (was: Please enter issues into the issue tracker - Issue system organisation needed)
2011-02-25 11:21 ` GSO (was: Please enter issues into the issue tracker - Issue system organisation needed) Jesper Dangaard Brouer
@ 2011-02-25 11:54 ` Eric Dumazet
2011-02-25 15:48 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2011-02-25 11:54 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: Van Jacobson, bloat-devel, herbert, bloat
Le vendredi 25 février 2011 à 12:21 +0100, Jesper Dangaard Brouer a
écrit :
> On Thu, 2011-02-24 at 20:29 +0100, Eric Dumazet wrote:
> > - Its important to set TSO off (ethtool -K eth0 tso off), or else we
> > send big packets (up to 64Kbytes) and this used to break SFQ fairness.
> > This can really hurt latencies of interactive flows.
>
> Don't you mean "GSO" Generic-Segmentation-Offload (ethtool -K eth0 gso
> off) as this happens in the stack. While TSO Tcp-Segmentation-Offload
> happens in hardware, and you will not see it in the SFQ qdisc?
>
I definitly see big packets if TSO is enabled, for localy generated
trafic. (You probably are concerned by routers, where all trafic is
forwarded, so TSO is not used, even if enabled)
> I recommend that both is turned off, on small bandwidth links where
> latency matters.
>
Sure.
> I'm wondering if LRO (Large-Receive-Offload) affect you, when you are
> using SFQ on ingress?
>
>
GRO/LRO can have an impact, for sure. But most 'current' kernels dont
have GRO/LRO by default. I mean, kernels in use by 2-3 years old
distros.
> Recently had some "funny" issues with GRO, where a 100 Mbit/s customer
> could "only" get approx 90 Mbit/s throughput to our speed test server
> (other customers, in another appartment building could get approx 96
> Mbit/s). The issue was resolved by disabling GSO on the speed test
> server. The theory is that some switch on the path cannot handle the
> bursts generated by GSO, which is max 64K (I think, correct me if I'm
> wrong).
>
Thats right. One 64K packet with standard MTU means some spikes on wire,
but if your switches cant resist to this... Is TCP SACK active on the
customer side (and speed test server) ?
> When adjusting buffer sizes, its important to take this bursty TCP
> behavior into account, which is created by both GSO and TSO. I'm not
> saying that the queue size needs to be above 64K. For smaller links, it
> might make sense to set it, significantly below 64K, to avoid a GSO
> enabled Linux machine to ramp up its window size, which makes it capable
> of bursting.
>
TSO basically hurts SFQ or other AQM, unless you use big/fast pipes.
For a router workload anyway, I would say its better to not try to
coalesce frames in software level, just handle them one by one.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed.
2011-02-24 18:31 ` Dave Täht
2011-02-24 19:29 ` Eric Dumazet
2011-02-24 23:15 ` debloat-testing: Kitten not eaten - was Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed Jim Gettys
@ 2011-02-25 15:40 ` John W. Linville
2 siblings, 0 replies; 19+ messages in thread
From: John W. Linville @ 2011-02-25 15:40 UTC (permalink / raw)
To: Dave Täht; +Cc: Van Jacobson, bloat-devel, Eric Dumazet, bloat
On Thu, Feb 24, 2011 at 11:31:40AM -0700, Dave Täht wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
>
> > Le jeudi 24 février 2011 à 11:32 -0500, Jim Gettys a écrit :
> >
> >> So we need better AQM algorithms and extensive testing: as you may have
> >> seen, SFB just went into the Linux mainline this morning.
> >>
> >
> > Indeed ;)
>
> I would really like a sane implementation of some basic tc scripts for
> SFB and CHOKe to be out there.
>
> Dan Siemon's recent set of bufferbloat-related tests
>
> http://www.coverfire.com/archives/2011/02/21/network-latency-experiments/
>
> appeared to invoke issues between multiple queuing disciplines when
> using SFB.
>
> Now that SFB is kernel bound, and CHOKe as well, I've merged up
> support for CHOKe and SFB into this git repo for iproute2 and the tc
> traffic control tool.
>
> See:
>
> https://github.com/dtaht/iproute2bufferbloat
>
> There is a debloat-testing Linux kernel repo up at:
>
> http://git.infradead.org/debloat-testing.git
>
> It builds, I haven't booted into it yet. The patch set is pretty minor,
> if you already have a Linux-2.6 tree it pays to use:
>
> git clone --references your_existing_tree git://git.infradead.org/debloat-testing.git
>
> (Doesn't have SFB yet, either. John?)
There now, along with a revision of my eBDP patch. The SFQ
patches Eric identified later in this thread were already there
from 2.6.38-rc6.
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSO (was: Please enter issues into the issue tracker - Issue system organisation needed)
2011-02-25 11:54 ` Eric Dumazet
@ 2011-02-25 15:48 ` Jesper Dangaard Brouer
2011-02-25 16:19 ` Eric Dumazet
2011-02-25 16:33 ` Eric Dumazet
0 siblings, 2 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2011-02-25 15:48 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Van Jacobson, bloat-devel, herbert, bloat
On Fri, 2011-02-25 at 12:54 +0100, Eric Dumazet wrote:
> Le vendredi 25 février 2011 à 12:21 +0100, Jesper Dangaard Brouer a
> écrit :
> > On Thu, 2011-02-24 at 20:29 +0100, Eric Dumazet wrote:
> > > - Its important to set TSO off (ethtool -K eth0 tso off), or else we
> > > send big packets (up to 64Kbytes) and this used to break SFQ fairness.
> > > This can really hurt latencies of interactive flows.
> >
> > Don't you mean "GSO" Generic-Segmentation-Offload (ethtool -K eth0 gso
> > off) as this happens in the stack. While TSO Tcp-Segmentation-Offload
> > happens in hardware, and you will not see it in the SFQ qdisc?
> >
>
> I definitly see big packets if TSO is enabled, for localy generated
> trafic. (You probably are concerned by routers, where all trafic is
> forwarded, so TSO is not used, even if enabled)
Yes, as you know I'm very converned about the router case. Guess that
explains my experience with TSO.
> > I recommend that both is turned off, on small bandwidth links where
> > latency matters.
> >
>
> Sure.
>
> > I'm wondering if LRO (Large-Receive-Offload) affect you, when you are
> > using SFQ on ingress?
> >
> >
>
> GRO/LRO can have an impact, for sure. But most 'current' kernels dont
> have GRO/LRO by default. I mean, kernels in use by 2-3 years old
> distros.
Hmm, are you sure?
The speed test server runs Debian Lenny and kernel 2.6.26-2-686, and had
GSO enabled...
> > Recently had some "funny" issues with GRO, where a 100 Mbit/s customer
> > could "only" get approx 90 Mbit/s throughput to our speed test server
> > (other customers, in another appartment building could get approx 96
> > Mbit/s). The issue was resolved by disabling GSO on the speed test
> > server. The theory is that some switch on the path cannot handle the
> > bursts generated by GSO, which is max 64K (I think, correct me if I'm
> > wrong).
Just looked at the case, the throughput was only average 83 Mbit/s, with
spikes. See:
http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-grantoften-1.png
>
> Thats right. One 64K packet with standard MTU means some spikes on wire,
> but if your switches cant resist to this... Is TCP SACK active on the
> customer side (and speed test server) ?
Yes, both servers (/proc/sys/net/ipv4/tcp_sack = 1).
I think that the bufferbloat theory is that SACKs will not work, due to
the long delays introduced by buffers(bloat). In this case, you can
see on the graph, a max RTT around 150 ms and an average of 20 ms.
While another, more well behaved path in the network to the speed
server, I would only see a max RTT around 25 ms and an average of 15 ms,
see:
http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-pc314a-1.png
You can also see this path had an ave of 90Mbit/s, but with significant
throughput drops (the 92Mbit/s line is an artificial line on the graph).
This behavior is probaly caused by the GSO effect.
Disabling GSO on speed server fixed the problem as can be seen on graph:
http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-grantoften-solved.png
The really strange part when troubleshooting this issue was that the
throughput as fine between the two customer end-boxes ("grantoften" and
"pc314a") as can be see here:
http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/pc314a-to-grantoften-1.png
> > When adjusting buffer sizes, its important to take this bursty TCP
> > behavior into account, which is created by both GSO and TSO. I'm not
> > saying that the queue size needs to be above 64K. For smaller links, it
> > might make sense to set it, significantly below 64K, to avoid a GSO
> > enabled Linux machine to ramp up its window size, which makes it capable
> > of bursting.
> >
>
> TSO basically hurts SFQ or other AQM, unless you use big/fast pipes.
>
> For a router workload anyway, I would say its better to not try to
> coalesce frames in software level, just handle them one by one.
Yes, but we still want (at least RX) NAPI/polling-mode, where we process
all the packets in the NIC hardware queue at once.
See you around,
--
Best regards
Jesper Brouer
ComX Networks A/S
Linux Network Kernel Developer
Cand. Scient Datalog / MSc.CS
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSO (was: Please enter issues into the issue tracker - Issue system organisation needed)
2011-02-25 15:48 ` Jesper Dangaard Brouer
@ 2011-02-25 16:19 ` Eric Dumazet
2011-02-25 16:33 ` Eric Dumazet
1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2011-02-25 16:19 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: Van Jacobson, bloat-devel, herbert, bloat
Le vendredi 25 février 2011 à 16:48 +0100, Jesper Dangaard Brouer a
écrit :
> > > I'm wondering if LRO (Large-Receive-Offload) affect you, when you are
> > > using SFQ on ingress?
> > >
> > >
> >
> > GRO/LRO can have an impact, for sure. But most 'current' kernels dont
> > have GRO/LRO by default. I mean, kernels in use by 2-3 years old
> > distros.
>
> Hmm, are you sure?
> The speed test server runs Debian Lenny and kernel 2.6.26-2-686, and had
> GSO enabled...
I am sure about GRO (Receive) side. It was mostly disabled on old
kernels. Only very recently we switched it ON.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSO (was: Please enter issues into the issue tracker - Issue system organisation needed)
2011-02-25 15:48 ` Jesper Dangaard Brouer
2011-02-25 16:19 ` Eric Dumazet
@ 2011-02-25 16:33 ` Eric Dumazet
2011-02-25 17:15 ` Jesper Dangaard Brouer
1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2011-02-25 16:33 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: Van Jacobson, bloat-devel, herbert, bloat
Le vendredi 25 février 2011 à 16:48 +0100, Jesper Dangaard Brouer a
écrit :
> Yes, both servers (/proc/sys/net/ipv4/tcp_sack = 1).
>
> I think that the bufferbloat theory is that SACKs will not work, due to
> the long delays introduced by buffers(bloat). In this case, you can
> see on the graph, a max RTT around 150 ms and an average of 20 ms.
>
> While another, more well behaved path in the network to the speed
> server, I would only see a max RTT around 25 ms and an average of 15 ms,
> see:
> http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-pc314a-1.png
>
> You can also see this path had an ave of 90Mbit/s, but with significant
> throughput drops (the 92Mbit/s line is an artificial line on the graph).
> This behavior is probaly caused by the GSO effect.
>
> Disabling GSO on speed server fixed the problem as can be seen on graph:
> http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-grantoften-solved.png
>
> The really strange part when troubleshooting this issue was that the
> throughput as fine between the two customer end-boxes ("grantoften" and
> "pc314a") as can be see here:
> http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/pc314a-to-grantoften-1.png
>
>
Its a bit hard to interpret these graphs, I am a bit lost...
What exactly is sampled ? Is it from tcpdump analysis or output from
HTB/SFQ stats ?
For sure, one TSO drop really drops a full range of [XX] tcp segments,
while with TSO off, a drop is one segment drop.
This certainly can explain bad artifacts, as receiver interprets this as
a huge congestion indication.
TSO/GRO are good only in the datacenter domain, where we want 10Gb flows
between two nodes with low CPU impact.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSO (was: Please enter issues into the issue tracker - Issue system organisation needed)
2011-02-25 16:33 ` Eric Dumazet
@ 2011-02-25 17:15 ` Jesper Dangaard Brouer
2011-02-26 2:41 ` GSO Dave Täht
0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2011-02-25 17:15 UTC (permalink / raw)
To: Eric Dumazet; +Cc: bloat-devel, herbert, Van Jacobson, shalunov, bloat
On Fri, 2011-02-25 at 17:33 +0100, Eric Dumazet wrote:
> Le vendredi 25 février 2011 à 16:48 +0100, Jesper Dangaard Brouer a
> écrit :
>
> > Yes, both servers (/proc/sys/net/ipv4/tcp_sack = 1).
> >
> > I think that the bufferbloat theory is that SACKs will not work, due to
> > the long delays introduced by buffers(bloat). In this case, you can
> > see on the graph, a max RTT around 150 ms and an average of 20 ms.
> >
> > While another, more well behaved path in the network to the speed
> > server, I would only see a max RTT around 25 ms and an average of 15 ms,
> > see:
> > http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-pc314a-1.png
> >
> > You can also see this path had an ave of 90Mbit/s, but with significant
> > throughput drops (the 92Mbit/s line is an artificial line on the graph).
> > This behavior is probaly caused by the GSO effect.
> >
> > Disabling GSO on speed server fixed the problem as can be seen on graph:
> > http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-grantoften-solved.png
> >
> > The really strange part when troubleshooting this issue was that the
> > throughput as fine between the two customer end-boxes ("grantoften" and
> > "pc314a") as can be see here:
> > http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/pc314a-to-grantoften-1.png
> >
> >
>
> Its a bit hard to interpret these graphs, I am a bit lost...
> What exactly is sampled ? Is it from tcpdump analysis or output from
> HTB/SFQ stats ?
The graph is generated (with GNUplot) with data from the
throughput-latency tool called "thrulay". Its created by Stanislav
Shalunov, and its homepage is here: http://shlang.com/thrulay/
I really love this "thrulay" tool, as it measure both the throughput and
records the TCP sessions experienced delay. And the output can be used
directly by GNUplot. Nice! :-)
--
Best regards
Jesper Brouer
ComX Networks A/S
Linux Network Kernel Developer
Cand. Scient Datalog / MSc.CS
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSO
2011-02-25 17:15 ` Jesper Dangaard Brouer
@ 2011-02-26 2:41 ` Dave Täht
2011-03-02 8:30 ` GSO Jesper Dangaard Brouer
0 siblings, 1 reply; 19+ messages in thread
From: Dave Täht @ 2011-02-26 2:41 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: bloat-devel, herbert, Eric Dumazet, Van Jacobson, shalunov, bloat
Jesper Dangaard Brouer <hawk@comx.dk> writes:
> On Fri, 2011-02-25 at 17:33 +0100, Eric Dumazet wrote:
>> Le vendredi 25 février 2011 à 16:48 +0100, Jesper Dangaard Brouer a
>> écrit :
>>
>> > Disabling GSO on speed server fixed the problem as can be seen on graph:
>> > http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-grantoften-solved.png
>> >
>> > The really strange part when troubleshooting this issue was that the
>> > throughput as fine between the two customer end-boxes ("grantoften" and
>> > "pc314a") as can be see here:
>> > http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/pc314a-to-grantoften-1.png
>> >
>> >
>>
>> Its a bit hard to interpret these graphs, I am a bit lost...
>> What exactly is sampled ? Is it from tcpdump analysis or output from
>> HTB/SFQ stats ?
>
> The graph is generated (with GNUplot) with data from the
> throughput-latency tool called "thrulay". Its created by Stanislav
> Shalunov, and its homepage is here: http://shlang.com/thrulay/
>
> I really love this "thrulay" tool, as it measure both the throughput and
> records the TCP sessions experienced delay. And the output can be used
> directly by GNUplot. Nice! :-)
I find the 10ms granularity on both graphs rather interesting. One of my
issues with HTB (when last I checked) is that it does odd things across
the clock interval.
My assumption is that both systems are running stock (100HZ) kernels?
What would a 1ms clock do on these plots? And/or the Linux-rt patch?
--
Dave Taht
http://nex-6.taht.net
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSO
2011-02-26 2:41 ` GSO Dave Täht
@ 2011-03-02 8:30 ` Jesper Dangaard Brouer
2011-03-04 2:23 ` GSO Dave Täht
0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2011-03-02 8:30 UTC (permalink / raw)
To: Dave Täht
Cc: Jesper Dangaard Brouer, bloat-devel, herbert, Eric Dumazet,
Van Jacobson, shalunov, bloat
On Sat, 2011-02-26 at 03:41 +0100, Dave Täht wrote:
> Jesper Dangaard Brouer <hawk@comx.dk> writes:
> > On Fri, 2011-02-25 at 17:33 +0100, Eric Dumazet wrote:
> >> Le vendredi 25 février 2011 à 16:48 +0100, Jesper Dangaard Brouer a
> >> écrit :
> >>
> >> > Disabling GSO on speed server fixed the problem as can be seen on graph:
> >> > http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-grantoften-solved.png
> >> >
> >> > The really strange part when troubleshooting this issue was that the
> >> > throughput as fine between the two customer end-boxes ("grantoften" and
> >> > "pc314a") as can be see here:
> >> > http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/pc314a-to-grantoften-1.png
> >> >
...
> > The graph is generated (with GNUplot) with data from the
> > throughput-latency tool called "thrulay". Its created by Stanislav
> > Shalunov, and its homepage is here: http://shlang.com/thrulay/
> >
> > I really love this "thrulay" tool, as it measure both the throughput and
> > records the TCP sessions experienced delay. And the output can be used
> > directly by GNUplot. Nice! :-)
>
> I find the 10ms granularity on both graphs rather interesting. One of my
> issues with HTB (when last I checked) is that it does odd things across
> the clock interval.
This case/graphs have nothing to do with the HTB qdisc. The traffic is
not affected by the HTB shaper (on the path) as the customer actually
have a 110Mbit/s bandwidth limit (as we always give customers 10% extra
to avoid any complaints about overhead).
If I change the customers bandwidth to 90 Mbit/s or 93 Mbit/s, which
makes the HTB shaper (+the SFQ scheduler) have effect, then the customer
experience is perfect, as I have solved the bufferbloat issue. The
problem is of cause that marketing want to sell 100Mbit/s, not 90Mbit/s
or 93Mbit/s. Thus, I cannot really implement the fix :-(.
But, you memory is not totally faulted regrading HTB ;-)
HTB used to be affected by the HZ clock interval, but I think Stephen
Hemminger fixed that by using the highres timer API. And I fixed the
"no_hyst" case where HTB could introduce spikes of three times the
expected delay.
--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network Kernel Developer
Cand. Scient Datalog / MSc.CS
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSO
2011-03-02 8:30 ` GSO Jesper Dangaard Brouer
@ 2011-03-04 2:23 ` Dave Täht
0 siblings, 0 replies; 19+ messages in thread
From: Dave Täht @ 2011-03-04 2:23 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Jesper Dangaard Brouer, bloat-devel, herbert, Eric Dumazet,
Van Jacobson, shalunov, bloat
Jesper Dangaard Brouer <jdb@comx.dk> writes:
> On Sat, 2011-02-26 at 03:41 +0100, Dave Täht wrote:
>> Jesper Dangaard Brouer <hawk@comx.dk> writes:
>> > On Fri, 2011-02-25 at 17:33 +0100, Eric Dumazet wrote:
>> >> Le vendredi 25 février 2011 à 16:48 +0100, Jesper Dangaard Brouer a
>> >> écrit :
>> >>
>> >> > Disabling GSO on speed server fixed the problem as can be seen on graph:
>> >> > http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-grantoften-solved.png
>> >> >
>> >> > The really strange part when troubleshooting this issue was that the
>> >> > throughput as fine between the two customer end-boxes ("grantoften" and
>> >> > "pc314a") as can be see here:
>> >> > http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/pc314a-to-grantoften-1.png
>> >> >
> ...
>> > The graph is generated (with GNUplot) with data from the
>> > throughput-latency tool called "thrulay". Its created by Stanislav
>> > Shalunov, and its homepage is here: http://shlang.com/thrulay/
>> >
>> > I really love this "thrulay" tool, as it measure both the throughput and
>> > records the TCP sessions experienced delay. And the output can be used
>> > directly by GNUplot. Nice! :-)
>>
>> I find the 10ms granularity on both graphs rather interesting. One of my
>> issues with HTB (when last I checked) is that it does odd things across
>> the clock interval.
>
> This case/graphs have nothing to do with the HTB qdisc. The traffic is
> not affected by the HTB shaper (on the path) as the customer actually
> have a 110Mbit/s bandwidth limit (as we always give customers 10% extra
> to avoid any complaints about overhead).
>
> If I change the customers bandwidth to 90 Mbit/s or 93 Mbit/s, which
> makes the HTB shaper (+the SFQ scheduler) have effect, then the customer
> experience is perfect, as I have solved the bufferbloat issue. The
> problem is of cause that marketing want to sell 100Mbit/s, not 90Mbit/s
> or 93Mbit/s. Thus, I cannot really implement the fix :-(.
>
> But, you memory is not totally faulted regrading HTB ;-)
> HTB used to be affected by the HZ clock interval, but I think Stephen
> Hemminger fixed that by using the highres timer API. And I fixed the
> "no_hyst" case where HTB could introduce spikes of three times the
> expected delay.
So, thank you both for the HTB fixes (belatedly). There is at least one
academic paper I've read fairly recently that is now thoroughly
obsoleted by events.
That said, I conflated two things in my question. The first was the old
HTB problem.
The second was that your data has strong signals at exactly 10 and 20ms,
which implies your tool or your kernel - or something else - is not
using high res timers... ?
>
>
> --
> Med venlig hilsen / Best regards
> Jesper Brouer
> ComX Networks A/S
> Linux Network Kernel Developer
> Cand. Scient Datalog / MSc.CS
> Author of http://adsl-optimizer.dk
> LinkedIn: http://www.linkedin.com/in/brouer
>
>
--
Dave Taht
http://nex-6.taht.net
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2011-03-04 2:24 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-24 14:19 Please enter issues into the issue tracker - Issue system organisation needed Jim Gettys
2011-02-24 15:00 ` [Bloat] " Fred Baker
2011-02-24 16:32 ` Jim Gettys
2011-02-24 17:08 ` Eric Dumazet
2011-02-24 18:31 ` Dave Täht
2011-02-24 19:29 ` Eric Dumazet
2011-02-25 11:21 ` GSO (was: Please enter issues into the issue tracker - Issue system organisation needed) Jesper Dangaard Brouer
2011-02-25 11:54 ` Eric Dumazet
2011-02-25 15:48 ` Jesper Dangaard Brouer
2011-02-25 16:19 ` Eric Dumazet
2011-02-25 16:33 ` Eric Dumazet
2011-02-25 17:15 ` Jesper Dangaard Brouer
2011-02-26 2:41 ` GSO Dave Täht
2011-03-02 8:30 ` GSO Jesper Dangaard Brouer
2011-03-04 2:23 ` GSO Dave Täht
2011-02-24 23:15 ` debloat-testing: Kitten not eaten - was Re: [Bloat] Please enter issues into the issue tracker - Issue system organisation needed Jim Gettys
2011-02-24 23:18 ` Dave Täht
2011-02-24 23:31 ` Jim Gettys
2011-02-25 15:40 ` John W. Linville
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox