* [Bloat] Network computing article on bloat @ 2011-04-26 17:05 Dave Taht 2011-04-26 18:13 ` Dave Hart 0 siblings, 1 reply; 66+ messages in thread From: Dave Taht @ 2011-04-26 17:05 UTC (permalink / raw) To: bloat Not bad, although I can live without the title. Coins a new-ish phrase "insertion latency" http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 17:05 [Bloat] Network computing article on bloat Dave Taht @ 2011-04-26 18:13 ` Dave Hart 2011-04-26 18:17 ` Dave Taht 0 siblings, 1 reply; 66+ messages in thread From: Dave Hart @ 2011-04-26 18:13 UTC (permalink / raw) To: Dave Taht; +Cc: bloat On Tue, Apr 26, 2011 at 17:05 UTC, Dave Taht <dave.taht@gmail.com> wrote: > Not bad, although I can live without the title. Coins a new-ish phrase > "insertion latency" > > http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php The piece ends with a paragraph claiming preventing packet loss is addressing a more fundamental problem which contributes to bufferbloat. As long as the writer and readers believe packet loss is an unmitigated evil, the battle is lost. More encouraging would have been a statement that packet loss is preferable to excessive queueing and a required TCP feedback signal when ECN isn't in play. Cheers, Dave Hart ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 18:13 ` Dave Hart @ 2011-04-26 18:17 ` Dave Taht 2011-04-26 18:28 ` dave greenfield 2011-04-26 18:32 ` Wesley Eddy 0 siblings, 2 replies; 66+ messages in thread From: Dave Taht @ 2011-04-26 18:17 UTC (permalink / raw) To: bloat; +Cc: dave greenfield "Big Buffers Bad. Small Buffers Good." "*Some* packet loss is essential for the correct operation of the Internet" are two of the memes I try to propagate, in their simplicity. Even then there are so many qualifiers to both of those that the core message gets lost. On Tue, Apr 26, 2011 at 12:13 PM, Dave Hart <davehart@gmail.com> wrote: > On Tue, Apr 26, 2011 at 17:05 UTC, Dave Taht <dave.taht@gmail.com> wrote: >> Not bad, although I can live without the title. Coins a new-ish phrase >> "insertion latency" >> >> http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php > > The piece ends with a paragraph claiming preventing packet loss is > addressing a more fundamental problem which contributes to > bufferbloat. As long as the writer and readers believe packet loss is > an unmitigated evil, the battle is lost. More encouraging would have > been a statement that packet loss is preferable to excessive queueing > and a required TCP feedback signal when ECN isn't in play. > > Cheers, > Dave Hart > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 18:17 ` Dave Taht @ 2011-04-26 18:28 ` dave greenfield 2011-04-26 18:32 ` Wesley Eddy 1 sibling, 0 replies; 66+ messages in thread From: dave greenfield @ 2011-04-26 18:28 UTC (permalink / raw) To: Dave Taht; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 2274 bytes --] Thanks, Dave. I'm actually NOT under the impression that packet loss is the dark lord incarnate. Yes, I too would have preferred a different title, but editors have the last say sometimes. Oh, and insertion latency or insertion loss isn't all that new. I've seen it used in switch and device design for several years. Call it what you will, but it's important that IT understands the amount of latency introduced by a given device into the data path. This isn't always widely discussed in WAN opt circles..... Dave PS Can we please have someone else jump in here who's name is NOT Dave! On Tue, Apr 26, 2011 at 9:17 PM, Dave Taht <dave.taht@gmail.com> wrote: > "Big Buffers Bad. Small Buffers Good." > > "*Some* packet loss is essential for the correct operation of the Internet" > > are two of the memes I try to propagate, in their simplicity. Even > then there are so many qualifiers to both of those that the core > message gets lost. > > > > On Tue, Apr 26, 2011 at 12:13 PM, Dave Hart <davehart@gmail.com> wrote: > > On Tue, Apr 26, 2011 at 17:05 UTC, Dave Taht <dave.taht@gmail.com> > wrote: > >> Not bad, although I can live without the title. Coins a new-ish phrase > >> "insertion latency" > >> > >> > http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php > > > > The piece ends with a paragraph claiming preventing packet loss is > > addressing a more fundamental problem which contributes to > > bufferbloat. As long as the writer and readers believe packet loss is > > an unmitigated evil, the battle is lost. More encouraging would have > > been a statement that packet loss is preferable to excessive queueing > > and a required TCP feedback signal when ECN isn't in play. > > > > Cheers, > > Dave Hart > > > > > > -- > Dave Täht > SKYPE: davetaht > US Tel: 1-239-829-5608 > http://the-edge.blogspot.com > -- --- Dave Greenfield Principal Strategic Technology Analytics Research. Analysis. Insight <dave@stanalytics.com>dave@stanalytics.com | 1-908-206-4114 Netmagdave | @Netmagdave Blogs: ZDNet <http://www.blogs.zdnet.com/greenfield> | Information Week<http://www.networkcomputing.com/author_profile.php?name=dgreenfield&page_no=1> [-- Attachment #2: Type: text/html, Size: 3879 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 18:17 ` Dave Taht 2011-04-26 18:28 ` dave greenfield @ 2011-04-26 18:32 ` Wesley Eddy 2011-04-26 19:37 ` Dave Taht ` (3 more replies) 1 sibling, 4 replies; 66+ messages in thread From: Wesley Eddy @ 2011-04-26 18:32 UTC (permalink / raw) To: bloat On 4/26/2011 2:17 PM, Dave Taht wrote: > "Big Buffers Bad. Small Buffers Good." > > "*Some* packet loss is essential for the correct operation of the Internet" > > are two of the memes I try to propagate, in their simplicity. Even > then there are so many qualifiers to both of those that the core > message gets lost. The second one is actually backwards; it should be "the Internet can operate correctly with some packet loss". -- Wes Eddy MTI Systems ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 18:32 ` Wesley Eddy @ 2011-04-26 19:37 ` Dave Taht 2011-04-26 20:21 ` Wesley Eddy 2011-04-27 7:43 ` Jonathan Morton ` (2 subsequent siblings) 3 siblings, 1 reply; 66+ messages in thread From: Dave Taht @ 2011-04-26 19:37 UTC (permalink / raw) To: Wesley Eddy; +Cc: bloat On Tue, Apr 26, 2011 at 12:32 PM, Wesley Eddy <wes@mti-systems.com> wrote: > On 4/26/2011 2:17 PM, Dave Taht wrote: >> >> "Big Buffers Bad. Small Buffers Good." >> >> "*Some* packet loss is essential for the correct operation of the >> Internet" >> >> are two of the memes I try to propagate, in their simplicity. Even >> then there are so many qualifiers to both of those that the core >> message gets lost. > > > The second one is actually backwards; it should be "the Internet can > operate correctly with some packet loss". > INCORRECT. See? We can't win, even amongst ourselves. The Internet *cannot operate correctly without packet loss*. RFC970, http://www.faqs.org/rfcs/rfc970.html > -- > Wes Eddy > MTI Systems > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 19:37 ` Dave Taht @ 2011-04-26 20:21 ` Wesley Eddy 2011-04-26 20:30 ` Constantine Dovrolis 2011-04-27 17:10 ` Bill Sommerfeld 0 siblings, 2 replies; 66+ messages in thread From: Wesley Eddy @ 2011-04-26 20:21 UTC (permalink / raw) To: Dave Taht; +Cc: bloat On 4/26/2011 3:37 PM, Dave Taht wrote: > On Tue, Apr 26, 2011 at 12:32 PM, Wesley Eddy<wes@mti-systems.com> wrote: >> On 4/26/2011 2:17 PM, Dave Taht wrote: >>> >>> "Big Buffers Bad. Small Buffers Good." >>> >>> "*Some* packet loss is essential for the correct operation of the >>> Internet" >>> >>> are two of the memes I try to propagate, in their simplicity. Even >>> then there are so many qualifiers to both of those that the core >>> message gets lost. >> >> >> The second one is actually backwards; it should be "the Internet can >> operate correctly with some packet loss". >> > INCORRECT. > > See? We can't win, even amongst ourselves. > > The Internet *cannot operate correctly without packet loss*. > > RFC970, http://www.faqs.org/rfcs/rfc970.html > Operating with infinite storage and operating without packet loss are two different things. Ideally, you may have a path with ample bandwidth such that packet losses don't occur and all connections are either application limited or receive window limitedand congestion control never kicks in. In this case, there's no loss and the Internet clearly works. -- Wes Eddy MTI Systems ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 20:21 ` Wesley Eddy @ 2011-04-26 20:30 ` Constantine Dovrolis 2011-04-26 21:16 ` Dave Taht 2011-04-27 17:10 ` Bill Sommerfeld 1 sibling, 1 reply; 66+ messages in thread From: Constantine Dovrolis @ 2011-04-26 20:30 UTC (permalink / raw) To: bloat Thanks Wes - I was hoping that someone will make this point. btw, another common reason for lossless operation is the size of the flows. basically flows often finish before their window increases so much that they overflow their bottleneck's buffer. Plz spend some time to read the following paper: http://www.cc.gatech.edu/fac/Constantinos.Dovrolis/Papers/buffers-ton.pdf It is very relevant to the bufferbloat initiative and it shows clearly, I think, that statements like "Big Buffers Bad. Small Buffers Good." are crude oversimplifications that will cause even more confusion. regards Constantine On 4/26/2011 4:21 PM, Wesley Eddy wrote: > > > Operating with infinite storage and operating without packet loss are > two different things. > > Ideally, you may have a path with ample bandwidth such that packet > losses don't occur and all connections are either application limited or > receive window limitedand congestion control never kicks in. In this > case, there's no loss and the Internet clearly works. > -------------------------------------------------------------- Constantine Dovrolis, Associate Professor College of Computing, Georgia Institute of Technology 3346 KACB, 404-385-4205, dovrolis@cc.gatech.edu http://www.cc.gatech.edu/~dovrolis/ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 20:30 ` Constantine Dovrolis @ 2011-04-26 21:16 ` Dave Taht 0 siblings, 0 replies; 66+ messages in thread From: Dave Taht @ 2011-04-26 21:16 UTC (permalink / raw) To: Constantine Dovrolis; +Cc: bloat On Tue, Apr 26, 2011 at 2:30 PM, Constantine Dovrolis <dovrolis@cc.gatech.edu> wrote: > Thanks Wes - I was hoping that someone will make this point. > > btw, another common reason for lossless operation is the > size of the flows. basically flows often finish before their > window increases so much that they overflow their bottleneck's > buffer. We do tend to overuse TCP for short flows, like those of the core http protocol without 1.1 pipelining. However more uptake of 1.1's pipelining would lead to more correct and timely behavior in the presence of congestion, and longer flows in the general case. That said in an age of netflix and facetime, we have problems with big flows again. > > Plz spend some time to read the following paper: > http://www.cc.gatech.edu/fac/Constantinos.Dovrolis/Papers/buffers-ton.pdf The paper above appears to be testing against networks in the USA, and at speeds higher than 1Mbit. Did you try working internationally, at speeds closer to 128Kbit? > It is very relevant to the bufferbloat initiative and it shows clearly, > I think, that statements like "Big Buffers Bad. Small Buffers Good." > are crude oversimplifications that will cause even more confusion. I think they are crude simplifications, but they lead to slightly more correct conclusions in the general case than the alternatives. I would love to have a short elevator pitch that nailed the problem adequately. > > regards > > Constantine > > On 4/26/2011 4:21 PM, Wesley Eddy wrote: >> >> >> Operating with infinite storage and operating without packet loss are >> two different things. >> >> Ideally, you may have a path with ample bandwidth such that packet >> losses don't occur and all connections are either application limited or >> receive window limitedand congestion control never kicks in. In this >> case, there's no loss and the Internet clearly works. >> > > -------------------------------------------------------------- > Constantine Dovrolis, Associate Professor > College of Computing, Georgia Institute of Technology > 3346 KACB, 404-385-4205, dovrolis@cc.gatech.edu > http://www.cc.gatech.edu/~dovrolis/ > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 20:21 ` Wesley Eddy 2011-04-26 20:30 ` Constantine Dovrolis @ 2011-04-27 17:10 ` Bill Sommerfeld 2011-04-27 17:40 ` Wesley Eddy 1 sibling, 1 reply; 66+ messages in thread From: Bill Sommerfeld @ 2011-04-27 17:10 UTC (permalink / raw) To: Wesley Eddy; +Cc: bloat On Tue, Apr 26, 2011 at 13:21, Wesley Eddy <wes@mti-systems.com> wrote: > Ideally, you may have a path with ample bandwidth such that packet > losses don't occur and all connections are either application limited or > receive window limited and congestion control never kicks in. In this > case, there's no loss and the Internet clearly works. This situation is not really "ideal" because it indicates an unbalanced system -- you've probably spent too much on link bandwidth and not enough on end system performance. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-27 17:10 ` Bill Sommerfeld @ 2011-04-27 17:40 ` Wesley Eddy 0 siblings, 0 replies; 66+ messages in thread From: Wesley Eddy @ 2011-04-27 17:40 UTC (permalink / raw) To: Bill Sommerfeld; +Cc: bloat On 4/27/2011 1:10 PM, Bill Sommerfeld wrote: > On Tue, Apr 26, 2011 at 13:21, Wesley Eddy<wes@mti-systems.com> wrote: >> Ideally, you may have a path with ample bandwidth such that packet >> losses don't occur and all connections are either application limited or >> receive window limited and congestion control never kicks in. In this >> case, there's no loss and the Internet clearly works. > > This situation is not really "ideal" because it indicates an > unbalanced system -- you've probably spent too much on link bandwidth > and not enough on end system performance. > > Many applications are inherently limited in max rate; e.g. VoIP and video streams with fixed-rate codecs, telemetry, etc. The elevator pitch should be that optimizing for low loss is harmful and needs to be balanced with optimizing latency. It should not be saying that loss is required. -- Wes Eddy MTI Systems ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 18:32 ` Wesley Eddy 2011-04-26 19:37 ` Dave Taht @ 2011-04-27 7:43 ` Jonathan Morton 2011-04-30 15:56 ` Henrique de Moraes Holschuh 2011-04-30 19:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger 3 siblings, 0 replies; 66+ messages in thread From: Jonathan Morton @ 2011-04-27 7:43 UTC (permalink / raw) To: Wesley Eddy; +Cc: bloat On 26 Apr, 2011, at 9:32 pm, Wesley Eddy wrote: > On 4/26/2011 2:17 PM, Dave Taht wrote: >> "Big Buffers Bad. Small Buffers Good." >> >> "*Some* packet loss is essential for the correct operation of the Internet" >> >> are two of the memes I try to propagate, in their simplicity. Even >> then there are so many qualifiers to both of those that the core >> message gets lost. > > The second one is actually backwards; it should be "the Internet can > operate correctly with some packet loss". I would say, more accurately, that the *potential* for packet loss is necessary for correct Internet operation. This is the same as saying that the potential for bringing individual trains to an unscheduled halt is necessary to allow railways to operate safely. If one train is delayed, another train has to wait for it to clear the junction to avoid a collision. If the brakes fail, they are designed to bring the train to an immediate halt rather than face the possibility of not coming to a halt when later required to. If the signals fail, they automatically show Danger. When congestion control fails, packet loss is inevitable. Bigger buffers - the traditional "solution" to packet loss - only delay that fact slightly, and not even for very long. - Jonathan Morton ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 18:32 ` Wesley Eddy 2011-04-26 19:37 ` Dave Taht 2011-04-27 7:43 ` Jonathan Morton @ 2011-04-30 15:56 ` Henrique de Moraes Holschuh 2011-04-30 19:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger 3 siblings, 0 replies; 66+ messages in thread From: Henrique de Moraes Holschuh @ 2011-04-30 15:56 UTC (permalink / raw) To: Wesley Eddy; +Cc: bloat On Tue, 26 Apr 2011, Wesley Eddy wrote: > On 4/26/2011 2:17 PM, Dave Taht wrote: > >"Big Buffers Bad. Small Buffers Good." > > > >"*Some* packet loss is essential for the correct operation of the Internet" > > > >are two of the memes I try to propagate, in their simplicity. Even > >then there are so many qualifiers to both of those that the core > >message gets lost. > > > The second one is actually backwards; it should be "the Internet can > operate correctly with some packet loss". Right now in the real world, it CANNOT operate correctly WITHOUT the use of aggressive packet loss to throttle back flows, or the queues just fill up to the brink, and then you start dropping all packets anyway. IMO, Dave's wodring get that point across a lot better. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh ^ permalink raw reply [flat|nested] 66+ messages in thread
* [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-04-26 18:32 ` Wesley Eddy ` (2 preceding siblings ...) 2011-04-30 15:56 ` Henrique de Moraes Holschuh @ 2011-04-30 19:18 ` Richard Scheffenegger 2011-05-05 16:01 ` Jim Gettys 3 siblings, 1 reply; 66+ messages in thread From: Richard Scheffenegger @ 2011-04-30 19:18 UTC (permalink / raw) To: bloat I'm curious, has anyone done some simulations to check if the following qualitative statement holds true, and if, what the quantitative effect is: With bufferbloat, the TCP congestion control reaction is unduely delayed. When it finally happens, the tcp stream is likely facing a "burst loss" event - multiple consecutive packets get dropped. Worse yet, the sender with the lowest RTT across the bottleneck will likely start to retransmit while the (tail-drop) queue is still overflowing. And a lost retransmission means a major setback in bandwidth (except for Linux with bulk transfers and SACK enabled), as the standard (RFC documented) behaviour asks for a RTO (1sec nominally, 200-500 ms typically) to recover such a lost retransmission... The second part (more important as an incentive to the ISPs actually), how does the fraction of goodput vs. throughput change, when AQM schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs have to pay for their upstream volume, regardless if that is "real" work (goodput) or unneccessary retransmissions. When I was at a small cable ISP in switzerland last week, surely enough bufferbloat was readily observable (17ms -> 220ms after 30 sec of a bulk transfer), but at first they had the "not our problem" view, until I started discussing burst loss / retransmissions / goodput vs throughput - with the latest point being a real commercial incentive to them. (They promised to check if AQM would be available in the CPE / CMTS, and put latency bounds in their tenders going forward). Best regards, Richard ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-04-30 19:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger @ 2011-05-05 16:01 ` Jim Gettys 2011-05-05 16:10 ` Stephen Hemminger 2011-05-06 4:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Fred Baker 0 siblings, 2 replies; 66+ messages in thread From: Jim Gettys @ 2011-05-05 16:01 UTC (permalink / raw) To: bloat On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: > I'm curious, has anyone done some simulations to check if the > following qualitative statement holds true, and if, what the > quantitative effect is: > > With bufferbloat, the TCP congestion control reaction is unduely > delayed. When it finally happens, the tcp stream is likely facing a > "burst loss" event - multiple consecutive packets get dropped. Worse > yet, the sender with the lowest RTT across the bottleneck will likely > start to retransmit while the (tail-drop) queue is still overflowing. > > And a lost retransmission means a major setback in bandwidth (except > for Linux with bulk transfers and SACK enabled), as the standard (RFC > documented) behaviour asks for a RTO (1sec nominally, 200-500 ms > typically) to recover such a lost retransmission... > > The second part (more important as an incentive to the ISPs actually), > how does the fraction of goodput vs. throughput change, when AQM > schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs > have to pay for their upstream volume, regardless if that is "real" > work (goodput) or unneccessary retransmissions. > > When I was at a small cable ISP in switzerland last week, surely > enough bufferbloat was readily observable (17ms -> 220ms after 30 sec > of a bulk transfer), but at first they had the "not our problem" view, > until I started discussing burst loss / retransmissions / goodput vs > throughput - with the latest point being a real commercial incentive > to them. (They promised to check if AQM would be available in the CPE > / CMTS, and put latency bounds in their tenders going forward). > I wish I had a good answer to your very good questions. Simulation would be interesting though real daa is more convincing. I haven't looked in detail at all that many traces to try to get a feel for how much bandwidth waste there actually is, and more formal studies like Netalyzr, SamKnows, or the Bismark project would be needed to quantify the loss on the network as a whole. I did spend some time last fall with the traces I've taken. In those, I've typically been seeing 1-3% packet loss in the main TCP transfers. On the wireless trace I took, I saw 9% loss, but whether that is bufferbloat induced loss or not, I don't know (the data is out there for those who might want to dig). And as you note, the losses are concentrated in bursts (probably due to the details of Cubic, so I'm told). I've had anecdotal reports (and some first hand experience) with much higher loss rates, for example from Nick Weaver at ICSI; but I believe in playing things conservatively with any numbers I quote and I've not gotten consistent results when I've tried, so I just report what's in the packet captures I did take. A phenomena that could be occurring is that during congestion avoidance (until TCP loses its cookies entirely and probes for a higher operating point) that TCP is carefully timing it's packets to keep the buffers almost exactly full, so that competing flows (in my case, simple pings) are likely to arrive just when there is no buffer space to accept them and therefore you see higher losses on them than you would on the single flow I've been tracing and getting loss statistics from. People who want to look into this further would be a great help. - Jim ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-05 16:01 ` Jim Gettys @ 2011-05-05 16:10 ` Stephen Hemminger 2011-05-05 16:30 ` Jim Gettys 2011-05-05 16:49 ` [Bloat] Burst Loss Neil Davies 2011-05-06 4:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Fred Baker 1 sibling, 2 replies; 66+ messages in thread From: Stephen Hemminger @ 2011-05-05 16:10 UTC (permalink / raw) To: Jim Gettys; +Cc: bloat On Thu, 05 May 2011 12:01:22 -0400 Jim Gettys <jg@freedesktop.org> wrote: > On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: > > I'm curious, has anyone done some simulations to check if the > > following qualitative statement holds true, and if, what the > > quantitative effect is: > > > > With bufferbloat, the TCP congestion control reaction is unduely > > delayed. When it finally happens, the tcp stream is likely facing a > > "burst loss" event - multiple consecutive packets get dropped. Worse > > yet, the sender with the lowest RTT across the bottleneck will likely > > start to retransmit while the (tail-drop) queue is still overflowing. > > > > And a lost retransmission means a major setback in bandwidth (except > > for Linux with bulk transfers and SACK enabled), as the standard (RFC > > documented) behaviour asks for a RTO (1sec nominally, 200-500 ms > > typically) to recover such a lost retransmission... > > > > The second part (more important as an incentive to the ISPs actually), > > how does the fraction of goodput vs. throughput change, when AQM > > schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs > > have to pay for their upstream volume, regardless if that is "real" > > work (goodput) or unneccessary retransmissions. > > > > When I was at a small cable ISP in switzerland last week, surely > > enough bufferbloat was readily observable (17ms -> 220ms after 30 sec > > of a bulk transfer), but at first they had the "not our problem" view, > > until I started discussing burst loss / retransmissions / goodput vs > > throughput - with the latest point being a real commercial incentive > > to them. (They promised to check if AQM would be available in the CPE > > / CMTS, and put latency bounds in their tenders going forward). > > > I wish I had a good answer to your very good questions. Simulation > would be interesting though real daa is more convincing. > > I haven't looked in detail at all that many traces to try to get a feel > for how much bandwidth waste there actually is, and more formal studies > like Netalyzr, SamKnows, or the Bismark project would be needed to > quantify the loss on the network as a whole. > > I did spend some time last fall with the traces I've taken. In those, > I've typically been seeing 1-3% packet loss in the main TCP transfers. > On the wireless trace I took, I saw 9% loss, but whether that is > bufferbloat induced loss or not, I don't know (the data is out there for > those who might want to dig). And as you note, the losses are > concentrated in bursts (probably due to the details of Cubic, so I'm told). > > I've had anecdotal reports (and some first hand experience) with much > higher loss rates, for example from Nick Weaver at ICSI; but I believe > in playing things conservatively with any numbers I quote and I've not > gotten consistent results when I've tried, so I just report what's in > the packet captures I did take. > > A phenomena that could be occurring is that during congestion avoidance > (until TCP loses its cookies entirely and probes for a higher operating > point) that TCP is carefully timing it's packets to keep the buffers > almost exactly full, so that competing flows (in my case, simple pings) > are likely to arrive just when there is no buffer space to accept them > and therefore you see higher losses on them than you would on the single > flow I've been tracing and getting loss statistics from. > > People who want to look into this further would be a great help. > - Jim I would not put a lot of trust in measuring loss with pings. I heard that some ISP's do different processing on ICMP's used for ping packets. They either prioritize them high to provide artificially good response (better marketing numbers); or prioritize them low since they aren't useful traffic. There are also filters that only allow N ICMP requests per second which means repeated probes will be dropped. -- ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-05 16:10 ` Stephen Hemminger @ 2011-05-05 16:30 ` Jim Gettys 2011-05-05 16:49 ` [Bloat] Burst Loss Neil Davies 1 sibling, 0 replies; 66+ messages in thread From: Jim Gettys @ 2011-05-05 16:30 UTC (permalink / raw) To: bloat On 05/05/2011 12:10 PM, Stephen Hemminger wrote: > On Thu, 05 May 2011 12:01:22 -0400 > Jim Gettys<jg@freedesktop.org> wrote: > >> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: >>> I'm curious, has anyone done some simulations to check if the >>> following qualitative statement holds true, and if, what the >>> quantitative effect is: >>> >>> With bufferbloat, the TCP congestion control reaction is unduely >>> delayed. When it finally happens, the tcp stream is likely facing a >>> "burst loss" event - multiple consecutive packets get dropped. Worse >>> yet, the sender with the lowest RTT across the bottleneck will likely >>> start to retransmit while the (tail-drop) queue is still overflowing. >>> >>> And a lost retransmission means a major setback in bandwidth (except >>> for Linux with bulk transfers and SACK enabled), as the standard (RFC >>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms >>> typically) to recover such a lost retransmission... >>> >>> The second part (more important as an incentive to the ISPs actually), >>> how does the fraction of goodput vs. throughput change, when AQM >>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs >>> have to pay for their upstream volume, regardless if that is "real" >>> work (goodput) or unneccessary retransmissions. >>> >>> When I was at a small cable ISP in switzerland last week, surely >>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec >>> of a bulk transfer), but at first they had the "not our problem" view, >>> until I started discussing burst loss / retransmissions / goodput vs >>> throughput - with the latest point being a real commercial incentive >>> to them. (They promised to check if AQM would be available in the CPE >>> / CMTS, and put latency bounds in their tenders going forward). >>> >> I wish I had a good answer to your very good questions. Simulation >> would be interesting though real daa is more convincing. >> >> I haven't looked in detail at all that many traces to try to get a feel >> for how much bandwidth waste there actually is, and more formal studies >> like Netalyzr, SamKnows, or the Bismark project would be needed to >> quantify the loss on the network as a whole. >> >> I did spend some time last fall with the traces I've taken. In those, >> I've typically been seeing 1-3% packet loss in the main TCP transfers. >> On the wireless trace I took, I saw 9% loss, but whether that is >> bufferbloat induced loss or not, I don't know (the data is out there for >> those who might want to dig). And as you note, the losses are >> concentrated in bursts (probably due to the details of Cubic, so I'm told). >> >> I've had anecdotal reports (and some first hand experience) with much >> higher loss rates, for example from Nick Weaver at ICSI; but I believe >> in playing things conservatively with any numbers I quote and I've not >> gotten consistent results when I've tried, so I just report what's in >> the packet captures I did take. >> >> A phenomena that could be occurring is that during congestion avoidance >> (until TCP loses its cookies entirely and probes for a higher operating >> point) that TCP is carefully timing it's packets to keep the buffers >> almost exactly full, so that competing flows (in my case, simple pings) >> are likely to arrive just when there is no buffer space to accept them >> and therefore you see higher losses on them than you would on the single >> flow I've been tracing and getting loss statistics from. >> >> People who want to look into this further would be a great help. >> - Jim > I would not put a lot of trust in measuring loss with pings. > I heard that some ISP's do different processing on ICMP's used > for ping packets. They either prioritize them high to provide > artificially good response (better marketing numbers); or > prioritize them low since they aren't useful traffic. > There are also filters that only allow N ICMP requests per second > which means repeated probes will be dropped. I didn't use ping for my loss measurements above, but derived them from the traces themselves (using tstat: see: http://tstat.tlc.polito.it/index.shtml). Your explanation is part of why I don't use what I've seen when using ping for loss rates (though I have yet to actually see the behaviour of messing with priorities or preferentially dropping that many have claimed. Ping does often get processed on network gear slow paths, and it is believable that on loaded routers or broad band head end under load the pings might get dropped, classified or otherwise messed with. So I made sure to avoid that in the loss numbers I quote on the traces I looked at. It's also why I worked with Folkert Van Heusden last summer and fall to ensure that there was a TCP based ping program available (you can use options to httping http://www.vanheusden.com/httping/ to get an HTTP based ping using HTTP persistent connections), with one packet out and exactly one back, so it should be prioritised exactly as web traffic. So far, it and conventional ICMP ping have always returned effectively identical tests in the paths I've probed. How much of the anecdotal information of ISP's doing this or that with ICMP I'd believe is therefore not clear. But at least with httpping we can figure out what extent it may be true, and certainly care is in order on any measurements. Best regards, - jim - Jim ^ permalink raw reply [flat|nested] 66+ messages in thread
* [Bloat] Burst Loss 2011-05-05 16:10 ` Stephen Hemminger 2011-05-05 16:30 ` Jim Gettys @ 2011-05-05 16:49 ` Neil Davies 2011-05-05 18:34 ` Jim Gettys ` (2 more replies) 1 sibling, 3 replies; 66+ messages in thread From: Neil Davies @ 2011-05-05 16:49 UTC (permalink / raw) To: Stephen Hemminger; +Cc: bloat On the issue of loss - we did a study of the UK's ADSL access network back in 2006 over several weeks, looking at the loss and delay that was introduced into the bi-directional traffic. We found that the delay variability (that bit left over after you've taken the effects of geography and line sync rates) was broadly the same over the half dozen locations we studied - it was there all the time to the same level of variance and that what did vary by time of day was the loss rate. We also found out, at the time much to our surprise - but we understand why now, that loss was broadly independent of the offered load - we used a constant data rate (with either fixed or variable packet sizes) . We found that loss rates were in the range 1% to 3% (which is what would be expected from a large number of TCP streams contending for a limiting resource). As for burst loss, yes it does occur - but it could be argued that this more the fault of the sending TCP stack than the network. This phenomenon was well covered in the academic literature in the '90s (if I remember correctly folks at INRIA lead the way) - it is all down to the nature of random processes and how you observe them. Back to back packets see higher loss rates than packets more spread out in time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec link into a queue being serviced at 34Mbit/sec, the first packet being 'lost' is equivalent to saying that the first packet 'observed' the queue full - the system's state is no longer a random variable - it is known to be full. The second packet (lets assume it is also a full one) 'makes an observation' of the state of that queue about 12us later - but that is only 3% of the time that it takes to service such large packets at 34 Mbit/sec. The system has not had any time to 'relax' anywhere near to back its steady state, it is highly likely that it is still full. Fixing this makes a phenomenal difference on the goodput (with the usual delay effects that implies), we've even built and deployed systems with this sort of engineering embedded (deployed as a network 'wrap') that mean that end users can sustainably (days on end) achieve effective throughput that is better than 98% of (the transmission media imposed) maximum. What we had done is make the network behave closer to the underlying statistical assumptions made in TCP's design. Neil On 5 May 2011, at 17:10, Stephen Hemminger wrote: > On Thu, 05 May 2011 12:01:22 -0400 > Jim Gettys <jg@freedesktop.org> wrote: > >> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: >>> I'm curious, has anyone done some simulations to check if the >>> following qualitative statement holds true, and if, what the >>> quantitative effect is: >>> >>> With bufferbloat, the TCP congestion control reaction is unduely >>> delayed. When it finally happens, the tcp stream is likely facing a >>> "burst loss" event - multiple consecutive packets get dropped. Worse >>> yet, the sender with the lowest RTT across the bottleneck will likely >>> start to retransmit while the (tail-drop) queue is still overflowing. >>> >>> And a lost retransmission means a major setback in bandwidth (except >>> for Linux with bulk transfers and SACK enabled), as the standard (RFC >>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms >>> typically) to recover such a lost retransmission... >>> >>> The second part (more important as an incentive to the ISPs actually), >>> how does the fraction of goodput vs. throughput change, when AQM >>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs >>> have to pay for their upstream volume, regardless if that is "real" >>> work (goodput) or unneccessary retransmissions. >>> >>> When I was at a small cable ISP in switzerland last week, surely >>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec >>> of a bulk transfer), but at first they had the "not our problem" view, >>> until I started discussing burst loss / retransmissions / goodput vs >>> throughput - with the latest point being a real commercial incentive >>> to them. (They promised to check if AQM would be available in the CPE >>> / CMTS, and put latency bounds in their tenders going forward). >>> >> I wish I had a good answer to your very good questions. Simulation >> would be interesting though real daa is more convincing. >> >> I haven't looked in detail at all that many traces to try to get a feel >> for how much bandwidth waste there actually is, and more formal studies >> like Netalyzr, SamKnows, or the Bismark project would be needed to >> quantify the loss on the network as a whole. >> >> I did spend some time last fall with the traces I've taken. In those, >> I've typically been seeing 1-3% packet loss in the main TCP transfers. >> On the wireless trace I took, I saw 9% loss, but whether that is >> bufferbloat induced loss or not, I don't know (the data is out there for >> those who might want to dig). And as you note, the losses are >> concentrated in bursts (probably due to the details of Cubic, so I'm told). >> >> I've had anecdotal reports (and some first hand experience) with much >> higher loss rates, for example from Nick Weaver at ICSI; but I believe >> in playing things conservatively with any numbers I quote and I've not >> gotten consistent results when I've tried, so I just report what's in >> the packet captures I did take. >> >> A phenomena that could be occurring is that during congestion avoidance >> (until TCP loses its cookies entirely and probes for a higher operating >> point) that TCP is carefully timing it's packets to keep the buffers >> almost exactly full, so that competing flows (in my case, simple pings) >> are likely to arrive just when there is no buffer space to accept them >> and therefore you see higher losses on them than you would on the single >> flow I've been tracing and getting loss statistics from. >> >> People who want to look into this further would be a great help. >> - Jim > > I would not put a lot of trust in measuring loss with pings. > I heard that some ISP's do different processing on ICMP's used > for ping packets. They either prioritize them high to provide > artificially good response (better marketing numbers); or > prioritize them low since they aren't useful traffic. > There are also filters that only allow N ICMP requests per second > which means repeated probes will be dropped. > > > > -- > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-05 16:49 ` [Bloat] Burst Loss Neil Davies @ 2011-05-05 18:34 ` Jim Gettys 2011-05-06 11:40 ` Sam Stickland 2011-05-08 12:42 ` Richard Scheffenegger 2 siblings, 0 replies; 66+ messages in thread From: Jim Gettys @ 2011-05-05 18:34 UTC (permalink / raw) To: bloat On 05/05/2011 12:49 PM, Neil Davies wrote: > On the issue of loss - we did a study of the UK's ADSL access network back in 2006 over several weeks, looking at the loss and delay that was introduced into the bi-directional traffic. > > We found that the delay variability (that bit left over after you've taken the effects of geography and line sync rates) was broadly > the same over the half dozen locations we studied - it was there all the time to the same level of variance and that what did vary by time of day was the loss rate. > > We also found out, at the time much to our surprise - but we understand why now, that loss was broadly independent of the offered load - we used a constant data rate (with either fixed or variable packet sizes) . > > We found that loss rates were in the range 1% to 3% (which is what would be expected from a large number of TCP streams contending for a limiting resource). > > As for burst loss, yes it does occur - but it could be argued that this more the fault of the sending TCP stack than the network. > > This phenomenon was well covered in the academic literature in the '90s (if I remember correctly folks at INRIA lead the way) - it is all down to the nature of random processes and how you observe them. > > Back to back packets see higher loss rates than packets more spread out in time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec link into a queue being serviced at 34Mbit/sec, the first packet being 'lost' is equivalent to saying that the first packet 'observed' the queue full - the system's state is no longer a random variable - it is known to be full. The second packet (lets assume it is also a full one) 'makes an observation' of the state of that queue about 12us later - but that is only 3% of the time that it takes to service such large packets at 34 Mbit/sec. The system has not had any time to 'relax' anywhere near to back its steady state, it is highly likely that it is still full. > > Fixing this makes a phenomenal difference on the goodput (with the usual delay effects that implies), we've even built and deployed systems with this sort of engineering embedded (deployed as a network 'wrap') that mean that end users can sustainably (days on end) achieve effective throughput that is better than 98% of (the transmission media imposed) maximum. What we had done is make the network behave closer to the underlying statistical assumptions made in TCP's design. > > Neil Good point: in phone conversations with Van Jacobson, he made the point that we'd really like the hardware to allow scheduling of packet transmission to allow proper paceing of packets, to avoid clumping and smooth flow. - Jim > > > > On 5 May 2011, at 17:10, Stephen Hemminger wrote: > >> On Thu, 05 May 2011 12:01:22 -0400 >> Jim Gettys<jg@freedesktop.org> wrote: >> >>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: >>>> I'm curious, has anyone done some simulations to check if the >>>> following qualitative statement holds true, and if, what the >>>> quantitative effect is: >>>> >>>> With bufferbloat, the TCP congestion control reaction is unduely >>>> delayed. When it finally happens, the tcp stream is likely facing a >>>> "burst loss" event - multiple consecutive packets get dropped. Worse >>>> yet, the sender with the lowest RTT across the bottleneck will likely >>>> start to retransmit while the (tail-drop) queue is still overflowing. >>>> >>>> And a lost retransmission means a major setback in bandwidth (except >>>> for Linux with bulk transfers and SACK enabled), as the standard (RFC >>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms >>>> typically) to recover such a lost retransmission... >>>> >>>> The second part (more important as an incentive to the ISPs actually), >>>> how does the fraction of goodput vs. throughput change, when AQM >>>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs >>>> have to pay for their upstream volume, regardless if that is "real" >>>> work (goodput) or unneccessary retransmissions. >>>> >>>> When I was at a small cable ISP in switzerland last week, surely >>>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec >>>> of a bulk transfer), but at first they had the "not our problem" view, >>>> until I started discussing burst loss / retransmissions / goodput vs >>>> throughput - with the latest point being a real commercial incentive >>>> to them. (They promised to check if AQM would be available in the CPE >>>> / CMTS, and put latency bounds in their tenders going forward). >>>> >>> I wish I had a good answer to your very good questions. Simulation >>> would be interesting though real daa is more convincing. >>> >>> I haven't looked in detail at all that many traces to try to get a feel >>> for how much bandwidth waste there actually is, and more formal studies >>> like Netalyzr, SamKnows, or the Bismark project would be needed to >>> quantify the loss on the network as a whole. >>> >>> I did spend some time last fall with the traces I've taken. In those, >>> I've typically been seeing 1-3% packet loss in the main TCP transfers. >>> On the wireless trace I took, I saw 9% loss, but whether that is >>> bufferbloat induced loss or not, I don't know (the data is out there for >>> those who might want to dig). And as you note, the losses are >>> concentrated in bursts (probably due to the details of Cubic, so I'm told). >>> >>> I've had anecdotal reports (and some first hand experience) with much >>> higher loss rates, for example from Nick Weaver at ICSI; but I believe >>> in playing things conservatively with any numbers I quote and I've not >>> gotten consistent results when I've tried, so I just report what's in >>> the packet captures I did take. >>> >>> A phenomena that could be occurring is that during congestion avoidance >>> (until TCP loses its cookies entirely and probes for a higher operating >>> point) that TCP is carefully timing it's packets to keep the buffers >>> almost exactly full, so that competing flows (in my case, simple pings) >>> are likely to arrive just when there is no buffer space to accept them >>> and therefore you see higher losses on them than you would on the single >>> flow I've been tracing and getting loss statistics from. >>> >>> People who want to look into this further would be a great help. >>> - Jim >> I would not put a lot of trust in measuring loss with pings. >> I heard that some ISP's do different processing on ICMP's used >> for ping packets. They either prioritize them high to provide >> artificially good response (better marketing numbers); or >> prioritize them low since they aren't useful traffic. >> There are also filters that only allow N ICMP requests per second >> which means repeated probes will be dropped. >> >> >> >> -- >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-05 16:49 ` [Bloat] Burst Loss Neil Davies 2011-05-05 18:34 ` Jim Gettys @ 2011-05-06 11:40 ` Sam Stickland 2011-05-06 11:53 ` Neil Davies 2011-05-08 12:42 ` Richard Scheffenegger 2 siblings, 1 reply; 66+ messages in thread From: Sam Stickland @ 2011-05-06 11:40 UTC (permalink / raw) To: Neil Davies; +Cc: Stephen Hemminger, bloat [-- Attachment #1: Type: text/plain, Size: 2604 bytes --] On 5 May 2011, at 17:49, Neil Davies <Neil.Davies@pnsol.com> wrote: > On the issue of loss - we did a study of the UK's ADSL access network back in 2006 over several weeks, looking at the loss and delay that was introduced into the bi-directional traffic. > > We found that the delay variability (that bit left over after you've taken the effects of geography and line sync rates) was broadly > the same over the half dozen locations we studied - it was there all the time to the same level of variance and that what did vary by time of day was the loss rate. > > We also found out, at the time much to our surprise - but we understand why now, that loss was broadly independent of the offered load - we used a constant data rate (with either fixed or variable packet sizes) . > > We found that loss rates were in the range 1% to 3% (which is what would be expected from a large number of TCP streams contending for a limiting resource). > > As for burst loss, yes it does occur - but it could be argued that this more the fault of the sending TCP stack than the network. > > This phenomenon was well covered in the academic literature in the '90s (if I remember correctly folks at INRIA lead the way) - it is all down to the nature of random processes and how you observe them. > > Back to back packets see higher loss rates than packets more spread out in time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec link into a queue being serviced at 34Mbit/sec, the first packet being 'lost' is equivalent to saying that the first packet 'observed' the queue full - the system's state is no longer a random variable - it is known to be full. The second packet (lets assume it is also a full one) 'makes an observation' of the state of that queue about 12us later - but that is only 3% of the time that it takes to service such large packets at 34 Mbit/sec. The system has not had any time to 'relax' anywhere near to back its steady state, it is highly likely that it is still full. > > Fixing this makes a phenomenal difference on the goodput (with the usual delay effects that implies), we've even built and deployed systems with this sort of engineering embedded (deployed as a network 'wrap') that mean that end users can sustainably (days on end) achieve effective throughput that is better than 98% of (the transmission media imposed) maximum. What we had done is make the network behave closer to the underlying statistical assumptions made in TCP's design. How did you fix this? What alters the packet spacing? The network or the host? Sam [-- Attachment #2: Type: text/html, Size: 3594 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-06 11:40 ` Sam Stickland @ 2011-05-06 11:53 ` Neil Davies 0 siblings, 0 replies; 66+ messages in thread From: Neil Davies @ 2011-05-06 11:53 UTC (permalink / raw) To: Sam Stickland; +Cc: Stephen Hemminger, bloat [-- Attachment #1: Type: text/plain, Size: 2862 bytes --] On 6 May 2011, at 12:40, Sam Stickland wrote: > > > On 5 May 2011, at 17:49, Neil Davies <Neil.Davies@pnsol.com> wrote: > >> On the issue of loss - we did a study of the UK's ADSL access network back in 2006 over several weeks, looking at the loss and delay that was introduced into the bi-directional traffic. >> >> We found that the delay variability (that bit left over after you've taken the effects of geography and line sync rates) was broadly >> the same over the half dozen locations we studied - it was there all the time to the same level of variance and that what did vary by time of day was the loss rate. >> >> We also found out, at the time much to our surprise - but we understand why now, that loss was broadly independent of the offered load - we used a constant data rate (with either fixed or variable packet sizes) . >> >> We found that loss rates were in the range 1% to 3% (which is what would be expected from a large number of TCP streams contending for a limiting resource). >> >> As for burst loss, yes it does occur - but it could be argued that this more the fault of the sending TCP stack than the network. >> >> This phenomenon was well covered in the academic literature in the '90s (if I remember correctly folks at INRIA lead the way) - it is all down to the nature of random processes and how you observe them. >> >> Back to back packets see higher loss rates than packets more spread out in time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec link into a queue being serviced at 34Mbit/sec, the first packet being 'lost' is equivalent to saying that the first packet 'observed' the queue full - the system's state is no longer a random variable - it is known to be full. The second packet (lets assume it is also a full one) 'makes an observation' of the state of that queue about 12us later - but that is only 3% of the time that it takes to service such large packets at 34 Mbit/sec. The system has not had any time to 'relax' anywhere near to back its steady state, it is highly likely that it is still full. >> >> Fixing this makes a phenomenal difference on the goodput (with the usual delay effects that implies), we've even built and deployed systems with this sort of engineering embedded (deployed as a network 'wrap') that mean that end users can sustainably (days on end) achieve effective throughput that is better than 98% of (the transmission media imposed) maximum. What we had done is make the network behave closer to the underlying statistical assumptions made in TCP's design. > > How did you fix this? What alters the packet spacing? The network or the host? It is a device in the network, it sits at the 'edge' of the access network (at the ISP / Network Wholesaler boundary) - that resolves the downstream issue. Neil > > Sam [-- Attachment #2: Type: text/html, Size: 4145 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-05 16:49 ` [Bloat] Burst Loss Neil Davies 2011-05-05 18:34 ` Jim Gettys 2011-05-06 11:40 ` Sam Stickland @ 2011-05-08 12:42 ` Richard Scheffenegger 2011-05-09 18:06 ` Rick Jones 2 siblings, 1 reply; 66+ messages in thread From: Richard Scheffenegger @ 2011-05-08 12:42 UTC (permalink / raw) To: Neil Davies, Stephen Hemminger; +Cc: bloat I'm not an expert in TSO / GSO, and NIC driver design, but what I gathered is, that with these schemes, and mordern NICs that do scatter/gather DMA of dotzends of "independent" header/data chuncks directly from memory, the NIC will typically send out non-interleaved trains of segments all belonging to single TCP sessions. With the implicit assumption, that these burst of up to 180 segments (Intel supports 256kB data per chain) can be absorped by the buffer at the bottleneck and spread out in time there... From my perspective, having such GSO / TSO to "cycle" through all the different chains belonging to different sessions (to not introduce reordering at the sender even), should already help pace the segments per session somewhat; a slightly more sophisticated DMA engine could check each of the chains for how much data is to be sent by those, and then clock an appropriate number of interleaved segmets out... I do understand that this is "work" for a HW DMA engine and slows down GSO software implementations, but may severly reduce the instantaneous rate of a single session, and thereby the impact of burst loss to to momenary buffer overload... (Let me know if I should draw a picture of the way I understand TSO / HW DMA is currently working, and where it could be improved upon): Best regards, Richard ----- Original Message ----- > Back to back packets see higher loss rates than packets more spread out in > time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec > link into a queue being serviced at 34Mbit/sec, the first packet being > 'lost' is equivalent to saying that the first packet 'observed' the queue > full - the system's state is no longer a random variable - it is known to > be full. The second packet (lets assume it is also a full one) 'makes an > observation' of the state of that queue about 12us later - but that is > only 3% of the time that it takes to service such large packets at 34 > Mbit/sec. The system has not had any time to 'relax' anywhere near to back > its steady state, it is highly likely that it is still full. > > Fixing this makes a phenomenal difference on the goodput (with the usual > delay effects that implies), we've even built and deployed systems with > this sort of engineering embedded (deployed as a network 'wrap') that mean > that end users can sustainably (days on end) achieve effective throughput > that is better than 98% of (the transmission media imposed) maximum. What > we had done is make the network behave closer to the underlying > statistical assumptions made in TCP's design. > > Neil > > > > > On 5 May 2011, at 17:10, Stephen Hemminger wrote: > >> On Thu, 05 May 2011 12:01:22 -0400 >> Jim Gettys <jg@freedesktop.org> wrote: >> >>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: >>>> I'm curious, has anyone done some simulations to check if the >>>> following qualitative statement holds true, and if, what the >>>> quantitative effect is: >>>> >>>> With bufferbloat, the TCP congestion control reaction is unduely >>>> delayed. When it finally happens, the tcp stream is likely facing a >>>> "burst loss" event - multiple consecutive packets get dropped. Worse >>>> yet, the sender with the lowest RTT across the bottleneck will likely >>>> start to retransmit while the (tail-drop) queue is still overflowing. >>>> >>>> And a lost retransmission means a major setback in bandwidth (except >>>> for Linux with bulk transfers and SACK enabled), as the standard (RFC >>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms >>>> typically) to recover such a lost retransmission... >>>> >>>> The second part (more important as an incentive to the ISPs actually), >>>> how does the fraction of goodput vs. throughput change, when AQM >>>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs >>>> have to pay for their upstream volume, regardless if that is "real" >>>> work (goodput) or unneccessary retransmissions. >>>> >>>> When I was at a small cable ISP in switzerland last week, surely >>>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec >>>> of a bulk transfer), but at first they had the "not our problem" view, >>>> until I started discussing burst loss / retransmissions / goodput vs >>>> throughput - with the latest point being a real commercial incentive >>>> to them. (They promised to check if AQM would be available in the CPE >>>> / CMTS, and put latency bounds in their tenders going forward). >>>> >>> I wish I had a good answer to your very good questions. Simulation >>> would be interesting though real daa is more convincing. >>> >>> I haven't looked in detail at all that many traces to try to get a feel >>> for how much bandwidth waste there actually is, and more formal studies >>> like Netalyzr, SamKnows, or the Bismark project would be needed to >>> quantify the loss on the network as a whole. >>> >>> I did spend some time last fall with the traces I've taken. In those, >>> I've typically been seeing 1-3% packet loss in the main TCP transfers. >>> On the wireless trace I took, I saw 9% loss, but whether that is >>> bufferbloat induced loss or not, I don't know (the data is out there for >>> those who might want to dig). And as you note, the losses are >>> concentrated in bursts (probably due to the details of Cubic, so I'm >>> told). >>> >>> I've had anecdotal reports (and some first hand experience) with much >>> higher loss rates, for example from Nick Weaver at ICSI; but I believe >>> in playing things conservatively with any numbers I quote and I've not >>> gotten consistent results when I've tried, so I just report what's in >>> the packet captures I did take. >>> >>> A phenomena that could be occurring is that during congestion avoidance >>> (until TCP loses its cookies entirely and probes for a higher operating >>> point) that TCP is carefully timing it's packets to keep the buffers >>> almost exactly full, so that competing flows (in my case, simple pings) >>> are likely to arrive just when there is no buffer space to accept them >>> and therefore you see higher losses on them than you would on the single >>> flow I've been tracing and getting loss statistics from. >>> >>> People who want to look into this further would be a great help. >>> - Jim >> >> I would not put a lot of trust in measuring loss with pings. >> I heard that some ISP's do different processing on ICMP's used >> for ping packets. They either prioritize them high to provide >> artificially good response (better marketing numbers); or >> prioritize them low since they aren't useful traffic. >> There are also filters that only allow N ICMP requests per second >> which means repeated probes will be dropped. >> >> >> >> -- >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-08 12:42 ` Richard Scheffenegger @ 2011-05-09 18:06 ` Rick Jones 2011-05-11 8:53 ` Richard Scheffenegger 2011-05-12 16:31 ` [Bloat] Burst Loss Fred Baker 0 siblings, 2 replies; 66+ messages in thread From: Rick Jones @ 2011-05-09 18:06 UTC (permalink / raw) To: Richard Scheffenegger; +Cc: Stephen Hemminger, bloat On Sun, 2011-05-08 at 14:42 +0200, Richard Scheffenegger wrote: > I'm not an expert in TSO / GSO, and NIC driver design, but what I gathered > is, that with these schemes, and mordern NICs that do scatter/gather DMA of > dotzends of "independent" header/data chuncks directly from memory, the NIC > will typically send out non-interleaved trains of segments all belonging to > single TCP sessions. With the implicit assumption, that these burst of up to > 180 segments (Intel supports 256kB data per chain) can be absorped by the > buffer at the bottleneck and spread out in time there... > > From my perspective, having such GSO / TSO to "cycle" through all the > different chains belonging to different sessions (to not introduce > reordering at the sender even), should already help pace the segments per > session somewhat; a slightly more sophisticated DMA engine could check each > of the chains for how much data is to be sent by those, and then clock an > appropriate number of interleaved segmets out... I do understand that this > is "work" for a HW DMA engine and slows down GSO software implementations, > but may severly reduce the instantaneous rate of a single session, and > thereby the impact of burst loss to to momenary buffer overload... > > (Let me know if I should draw a picture of the way I understand TSO / HW DMA > is currently working, and where it could be improved upon): GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE) refusing to standardize an increase in frame sizes. Put another way, they are a "poor man's jumbo frames." Within the context of a given "priority" at least, NICs are setup/designed to do things in order. I too cannot claim to be a NIC designer, but suspect it would be a non-trivial, if straight-forward exercise to get a NIC to cycle through multiple GSO/TSO sends. Yes, they could probably (ab)use any prioritization support they have. NICs and drivers are accustomed to "in order" processing - grab packet, send packet, update status, lather, rinse, repeat (modulo some pre-fetching). Those rings aren't really amenable to "out of order" completion notifications, so the NIC would have to still do "in order" retirement of packets or the driver model will loose simplicity. As for the issue below, even if the NIC(s) upstream did interleave between two GSO'd sends, you are simply trading back-to-back frames of a single flow for back-to-back frames of different flows. And if there is only the one flow upstream of this bottleneck, whether GSO is on or not probably won't make a huge difference in the timing - only how much CPU is burned on the source host. > Best regards, > Richard > > > ----- Original Message ----- > > Back to back packets see higher loss rates than packets more spread out in > > time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec > > link into a queue being serviced at 34Mbit/sec, the first packet being > > 'lost' is equivalent to saying that the first packet 'observed' the queue > > full - the system's state is no longer a random variable - it is known to > > be full. The second packet (lets assume it is also a full one) 'makes an > > observation' of the state of that queue about 12us later - but that is > > only 3% of the time that it takes to service such large packets at 34 > > Mbit/sec. The system has not had any time to 'relax' anywhere near to back > > its steady state, it is highly likely that it is still full. > > > > Fixing this makes a phenomenal difference on the goodput (with the usual > > delay effects that implies), we've even built and deployed systems with > > this sort of engineering embedded (deployed as a network 'wrap') that mean > > that end users can sustainably (days on end) achieve effective throughput > > that is better than 98% of (the transmission media imposed) maximum. What > > we had done is make the network behave closer to the underlying > > statistical assumptions made in TCP's design. > > > > Neil > > > > > > > > > > On 5 May 2011, at 17:10, Stephen Hemminger wrote: > > > >> On Thu, 05 May 2011 12:01:22 -0400 > >> Jim Gettys <jg@freedesktop.org> wrote: > >> > >>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: > >>>> I'm curious, has anyone done some simulations to check if the > >>>> following qualitative statement holds true, and if, what the > >>>> quantitative effect is: > >>>> > >>>> With bufferbloat, the TCP congestion control reaction is unduely > >>>> delayed. When it finally happens, the tcp stream is likely facing a > >>>> "burst loss" event - multiple consecutive packets get dropped. Worse > >>>> yet, the sender with the lowest RTT across the bottleneck will likely > >>>> start to retransmit while the (tail-drop) queue is still overflowing. > >>>> > >>>> And a lost retransmission means a major setback in bandwidth (except > >>>> for Linux with bulk transfers and SACK enabled), as the standard (RFC > >>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms > >>>> typically) to recover such a lost retransmission... > >>>> > >>>> The second part (more important as an incentive to the ISPs actually), > >>>> how does the fraction of goodput vs. throughput change, when AQM > >>>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs > >>>> have to pay for their upstream volume, regardless if that is "real" > >>>> work (goodput) or unneccessary retransmissions. > >>>> > >>>> When I was at a small cable ISP in switzerland last week, surely > >>>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec > >>>> of a bulk transfer), but at first they had the "not our problem" view, > >>>> until I started discussing burst loss / retransmissions / goodput vs > >>>> throughput - with the latest point being a real commercial incentive > >>>> to them. (They promised to check if AQM would be available in the CPE > >>>> / CMTS, and put latency bounds in their tenders going forward). > >>>> > >>> I wish I had a good answer to your very good questions. Simulation > >>> would be interesting though real daa is more convincing. > >>> > >>> I haven't looked in detail at all that many traces to try to get a feel > >>> for how much bandwidth waste there actually is, and more formal studies > >>> like Netalyzr, SamKnows, or the Bismark project would be needed to > >>> quantify the loss on the network as a whole. > >>> > >>> I did spend some time last fall with the traces I've taken. In those, > >>> I've typically been seeing 1-3% packet loss in the main TCP transfers. > >>> On the wireless trace I took, I saw 9% loss, but whether that is > >>> bufferbloat induced loss or not, I don't know (the data is out there for > >>> those who might want to dig). And as you note, the losses are > >>> concentrated in bursts (probably due to the details of Cubic, so I'm > >>> told). > >>> > >>> I've had anecdotal reports (and some first hand experience) with much > >>> higher loss rates, for example from Nick Weaver at ICSI; but I believe > >>> in playing things conservatively with any numbers I quote and I've not > >>> gotten consistent results when I've tried, so I just report what's in > >>> the packet captures I did take. > >>> > >>> A phenomena that could be occurring is that during congestion avoidance > >>> (until TCP loses its cookies entirely and probes for a higher operating > >>> point) that TCP is carefully timing it's packets to keep the buffers > >>> almost exactly full, so that competing flows (in my case, simple pings) > >>> are likely to arrive just when there is no buffer space to accept them > >>> and therefore you see higher losses on them than you would on the single > >>> flow I've been tracing and getting loss statistics from. > >>> > >>> People who want to look into this further would be a great help. > >>> - Jim > >> > >> I would not put a lot of trust in measuring loss with pings. > >> I heard that some ISP's do different processing on ICMP's used > >> for ping packets. They either prioritize them high to provide > >> artificially good response (better marketing numbers); or > >> prioritize them low since they aren't useful traffic. > >> There are also filters that only allow N ICMP requests per second > >> which means repeated probes will be dropped. > >> > >> > >> > >> -- > >> _______________________________________________ > >> Bloat mailing list > >> Bloat@lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/bloat > > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-09 18:06 ` Rick Jones @ 2011-05-11 8:53 ` Richard Scheffenegger 2011-05-11 9:53 ` Eric Dumazet 2011-05-12 16:31 ` [Bloat] Burst Loss Fred Baker 1 sibling, 1 reply; 66+ messages in thread From: Richard Scheffenegger @ 2011-05-11 8:53 UTC (permalink / raw) To: rick.jones2; +Cc: Stephen Hemminger, bloat > Within the context of a given "priority" at least, NICs are > setup/designed to do things in order. I too cannot claim to be a NIC > designer, but suspect it would be a non-trivial, if straight-forward > exercise to get a NIC to cycle through multiple GSO/TSO sends. Yes, > they could probably (ab)use any prioritization support they have. > > NICs and drivers are accustomed to "in order" processing - grab packet, > send packet, update status, lather, rinse, repeat (modulo some > pre-fetching). Those rings aren't really amenable to "out of order" > completion notifications, so the NIC would have to still do "in order" > retirement of packets or the driver model will loose simplicity. > > As for the issue below, even if the NIC(s) upstream did interleave > between two GSO'd sends, you are simply trading back-to-back frames of a > single flow for back-to-back frames of different flows. And if there is > only the one flow upstream of this bottleneck, whether GSO is on or not > probably won't make a huge difference in the timing - only how much CPU > is burned on the source host. Well, the transmit descriptors (header + pointer to the data to be segmented) is in the hand of the hw driver... The hw driver could at least check if the current list of transmit descriptors is for different tcp sessions (or interspaced non-tcp traffic), and could interleave these descriptors (reorder them, before they are processed by hardware - while obviously maintaining relative ordering between the descriptors belonging to the same flow. Also, I think this feature could be utilized for pacing to some extent - interspace the (valid) traffic descriptors with descriptors that will cause "invalid" packets to be sent (ie. dst mac == src max; should be dropped by the first switch). It's been well known that properly paced traffic is much more resilient than traffic being sent in short bursts of wirespeed trains of packets. (TSO defeats the self-clocking of TCP with ACKs). Just a thought... Richard ----- Original Message ----- From: "Rick Jones" <rick.jones2@hp.com> To: "Richard Scheffenegger" <rscheff@gmx.at> Cc: "Neil Davies" <Neil.Davies@pnsol.com>; "Stephen Hemminger" <shemminger@vyatta.com>; <bloat@lists.bufferbloat.net> Sent: Monday, May 09, 2011 8:06 PM Subject: Re: [Bloat] Burst Loss > On Sun, 2011-05-08 at 14:42 +0200, Richard Scheffenegger wrote: >> I'm not an expert in TSO / GSO, and NIC driver design, but what I >> gathered >> is, that with these schemes, and mordern NICs that do scatter/gather DMA >> of >> dotzends of "independent" header/data chuncks directly from memory, the >> NIC >> will typically send out non-interleaved trains of segments all belonging >> to >> single TCP sessions. With the implicit assumption, that these burst of up >> to >> 180 segments (Intel supports 256kB data per chain) can be absorped by the >> buffer at the bottleneck and spread out in time there... >> >> From my perspective, having such GSO / TSO to "cycle" through all the >> different chains belonging to different sessions (to not introduce >> reordering at the sender even), should already help pace the segments per >> session somewhat; a slightly more sophisticated DMA engine could check >> each >> of the chains for how much data is to be sent by those, and then clock an >> appropriate number of interleaved segmets out... I do understand that >> this >> is "work" for a HW DMA engine and slows down GSO software >> implementations, >> but may severly reduce the instantaneous rate of a single session, and >> thereby the impact of burst loss to to momenary buffer overload... >> >> (Let me know if I should draw a picture of the way I understand TSO / HW >> DMA >> is currently working, and where it could be improved upon): > > GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE) > refusing to standardize an increase in frame sizes. Put another way, > they are a "poor man's jumbo frames." > > Within the context of a given "priority" at least, NICs are > setup/designed to do things in order. I too cannot claim to be a NIC > designer, but suspect it would be a non-trivial, if straight-forward > exercise to get a NIC to cycle through multiple GSO/TSO sends. Yes, > they could probably (ab)use any prioritization support they have. > > NICs and drivers are accustomed to "in order" processing - grab packet, > send packet, update status, lather, rinse, repeat (modulo some > pre-fetching). Those rings aren't really amenable to "out of order" > completion notifications, so the NIC would have to still do "in order" > retirement of packets or the driver model will loose simplicity. > > As for the issue below, even if the NIC(s) upstream did interleave > between two GSO'd sends, you are simply trading back-to-back frames of a > single flow for back-to-back frames of different flows. And if there is > only the one flow upstream of this bottleneck, whether GSO is on or not > probably won't make a huge difference in the timing - only how much CPU > is burned on the source host. > >> Best regards, >> Richard >> >> >> ----- Original Message ----- >> > Back to back packets see higher loss rates than packets more spread out >> > in >> > time. Consider a pair of packets, back to back, arriving over a >> > 1Gbit/sec >> > link into a queue being serviced at 34Mbit/sec, the first packet being >> > 'lost' is equivalent to saying that the first packet 'observed' the >> > queue >> > full - the system's state is no longer a random variable - it is known >> > to >> > be full. The second packet (lets assume it is also a full one) 'makes >> > an >> > observation' of the state of that queue about 12us later - but that is >> > only 3% of the time that it takes to service such large packets at 34 >> > Mbit/sec. The system has not had any time to 'relax' anywhere near to >> > back >> > its steady state, it is highly likely that it is still full. >> > >> > Fixing this makes a phenomenal difference on the goodput (with the >> > usual >> > delay effects that implies), we've even built and deployed systems with >> > this sort of engineering embedded (deployed as a network 'wrap') that >> > mean >> > that end users can sustainably (days on end) achieve effective >> > throughput >> > that is better than 98% of (the transmission media imposed) maximum. >> > What >> > we had done is make the network behave closer to the underlying >> > statistical assumptions made in TCP's design. >> > >> > Neil >> > >> > >> > >> > >> > On 5 May 2011, at 17:10, Stephen Hemminger wrote: >> > >> >> On Thu, 05 May 2011 12:01:22 -0400 >> >> Jim Gettys <jg@freedesktop.org> wrote: >> >> >> >>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: >> >>>> I'm curious, has anyone done some simulations to check if the >> >>>> following qualitative statement holds true, and if, what the >> >>>> quantitative effect is: >> >>>> >> >>>> With bufferbloat, the TCP congestion control reaction is unduely >> >>>> delayed. When it finally happens, the tcp stream is likely facing a >> >>>> "burst loss" event - multiple consecutive packets get dropped. Worse >> >>>> yet, the sender with the lowest RTT across the bottleneck will >> >>>> likely >> >>>> start to retransmit while the (tail-drop) queue is still >> >>>> overflowing. >> >>>> >> >>>> And a lost retransmission means a major setback in bandwidth (except >> >>>> for Linux with bulk transfers and SACK enabled), as the standard >> >>>> (RFC >> >>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms >> >>>> typically) to recover such a lost retransmission... >> >>>> >> >>>> The second part (more important as an incentive to the ISPs >> >>>> actually), >> >>>> how does the fraction of goodput vs. throughput change, when AQM >> >>>> schemes are deployed, and TCP CC reacts in a timely manner? Small >> >>>> ISPs >> >>>> have to pay for their upstream volume, regardless if that is "real" >> >>>> work (goodput) or unneccessary retransmissions. >> >>>> >> >>>> When I was at a small cable ISP in switzerland last week, surely >> >>>> enough bufferbloat was readily observable (17ms -> 220ms after 30 >> >>>> sec >> >>>> of a bulk transfer), but at first they had the "not our problem" >> >>>> view, >> >>>> until I started discussing burst loss / retransmissions / goodput vs >> >>>> throughput - with the latest point being a real commercial incentive >> >>>> to them. (They promised to check if AQM would be available in the >> >>>> CPE >> >>>> / CMTS, and put latency bounds in their tenders going forward). >> >>>> >> >>> I wish I had a good answer to your very good questions. Simulation >> >>> would be interesting though real daa is more convincing. >> >>> >> >>> I haven't looked in detail at all that many traces to try to get a >> >>> feel >> >>> for how much bandwidth waste there actually is, and more formal >> >>> studies >> >>> like Netalyzr, SamKnows, or the Bismark project would be needed to >> >>> quantify the loss on the network as a whole. >> >>> >> >>> I did spend some time last fall with the traces I've taken. In >> >>> those, >> >>> I've typically been seeing 1-3% packet loss in the main TCP >> >>> transfers. >> >>> On the wireless trace I took, I saw 9% loss, but whether that is >> >>> bufferbloat induced loss or not, I don't know (the data is out there >> >>> for >> >>> those who might want to dig). And as you note, the losses are >> >>> concentrated in bursts (probably due to the details of Cubic, so I'm >> >>> told). >> >>> >> >>> I've had anecdotal reports (and some first hand experience) with much >> >>> higher loss rates, for example from Nick Weaver at ICSI; but I >> >>> believe >> >>> in playing things conservatively with any numbers I quote and I've >> >>> not >> >>> gotten consistent results when I've tried, so I just report what's in >> >>> the packet captures I did take. >> >>> >> >>> A phenomena that could be occurring is that during congestion >> >>> avoidance >> >>> (until TCP loses its cookies entirely and probes for a higher >> >>> operating >> >>> point) that TCP is carefully timing it's packets to keep the buffers >> >>> almost exactly full, so that competing flows (in my case, simple >> >>> pings) >> >>> are likely to arrive just when there is no buffer space to accept >> >>> them >> >>> and therefore you see higher losses on them than you would on the >> >>> single >> >>> flow I've been tracing and getting loss statistics from. >> >>> >> >>> People who want to look into this further would be a great help. >> >>> - Jim >> >> >> >> I would not put a lot of trust in measuring loss with pings. >> >> I heard that some ISP's do different processing on ICMP's used >> >> for ping packets. They either prioritize them high to provide >> >> artificially good response (better marketing numbers); or >> >> prioritize them low since they aren't useful traffic. >> >> There are also filters that only allow N ICMP requests per second >> >> which means repeated probes will be dropped. >> >> >> >> >> >> >> >> -- >> >> _______________________________________________ >> >> Bloat mailing list >> >> Bloat@lists.bufferbloat.net >> >> https://lists.bufferbloat.net/listinfo/bloat >> > >> > _______________________________________________ >> > Bloat mailing list >> > Bloat@lists.bufferbloat.net >> > https://lists.bufferbloat.net/listinfo/bloat >> >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-11 8:53 ` Richard Scheffenegger @ 2011-05-11 9:53 ` Eric Dumazet 2011-05-12 14:16 ` [Bloat] Publications Richard Scheffenegger 0 siblings, 1 reply; 66+ messages in thread From: Eric Dumazet @ 2011-05-11 9:53 UTC (permalink / raw) To: Richard Scheffenegger; +Cc: Stephen Hemminger, bloat Le mercredi 11 mai 2011 à 10:53 +0200, Richard Scheffenegger a écrit : > Well, the transmit descriptors (header + pointer to the data to be > segmented) is in the hand of the hw driver... > The hw driver could at least check if the current list of transmit > descriptors is for different tcp sessions > (or interspaced non-tcp traffic), and could interleave these descriptors > (reorder them, before they are processed > by hardware - while obviously maintaining relative ordering between the > descriptors belonging to the same flow. > > Also, I think this feature could be utilized for pacing to some extent - > interspace the (valid) traffic descriptors > with descriptors that will cause "invalid" packets to be sent (ie. dst mac > == src max; should be dropped by the first switch). It's been well known > that properly paced traffic is much more resilient than traffic being sent > in short bursts of wirespeed trains of packets. (TSO defeats the > self-clocking of TCP with ACKs). In French, we would say "Avoir le beurre et l'argent du beurre" ;) GSO is for high performance data xmits, usually in LAN. Dont expect NICS perform the hard/smart work for you. Of course hardware vendors claim they can do this, but this is mostly done with vendor specific methods, and you might spend a lot of time tuning hardware. If you want AQM, better use a well chosen qdisc setup (depending on the workload), and disable TSO/GSO. This will work well with all hardware, and presumably last for longer times (including hardware changes) ^ permalink raw reply [flat|nested] 66+ messages in thread
* [Bloat] Publications 2011-05-11 9:53 ` Eric Dumazet @ 2011-05-12 14:16 ` Richard Scheffenegger 0 siblings, 0 replies; 66+ messages in thread From: Richard Scheffenegger @ 2011-05-12 14:16 UTC (permalink / raw) To: bloat Multimedia-unfriendly TCP Congestion Control and Home Gateway Queue Management http://caia.swin.edu.au/~gja/papers/mmsys2011-lstewart-p35.pdf Two-way TCP Connections: Old Problem, New Insight http://ccr.sigcomm.org/online/files/p6-v41n2b2-heussePS.pdf (actually, they look at two antiparallel tcp connections, not individual two-way tcp connections :). Both papers have something to say about buffer sizeing in Home CPE gear... Regards, Richard ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-09 18:06 ` Rick Jones 2011-05-11 8:53 ` Richard Scheffenegger @ 2011-05-12 16:31 ` Fred Baker 2011-05-12 16:41 ` Rick Jones 2011-05-13 5:00 ` Kevin Gross 1 sibling, 2 replies; 66+ messages in thread From: Fred Baker @ 2011-05-12 16:31 UTC (permalink / raw) To: rick.jones2; +Cc: Stephen Hemminger, bloat On May 9, 2011, at 11:06 AM, Rick Jones wrote: > GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE) > refusing to standardize an increase in frame sizes. Put another way, > they are a "poor man's jumbo frames." I'll agree, but only half; once the packets are transferred on the local wire, any jumbo-ness is lost. GSO/TSO mostly squeezes interframe gaps out of the wire and perhaps limits the amount of work the driver has to do. The real value of an end to end (IP) jumbo frame is that the receiving system experiences less interrupt load - a 9K frame replaces half a dozen 1500 byte frames, and as a result the receiver experiences 1/5 or 1/6 of the interrupts. Given that it has to save state, activate the kernel thread, and at least enqueue and perhaps acknowledge the received message, reducing interrupt load on the receiver makes it far more effective. This has the greatest effect on multi-gigabit file transfers. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-12 16:31 ` [Bloat] Burst Loss Fred Baker @ 2011-05-12 16:41 ` Rick Jones 2011-05-12 17:11 ` Fred Baker 2011-05-13 5:00 ` Kevin Gross 1 sibling, 1 reply; 66+ messages in thread From: Rick Jones @ 2011-05-12 16:41 UTC (permalink / raw) To: Fred Baker; +Cc: Stephen Hemminger, bloat On Thu, 2011-05-12 at 09:31 -0700, Fred Baker wrote: > On May 9, 2011, at 11:06 AM, Rick Jones wrote: > > > GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE) > > refusing to standardize an increase in frame sizes. Put another way, > > they are a "poor man's jumbo frames." > > I'll agree, but only half; once the packets are transferred on the > local wire, any jumbo-ness is lost. That is why I called them "poor man's" - he can't have everything :) > GSO/TSO mostly squeezes interframe gaps out of the wire and perhaps > limits the amount of work the driver has to do. The real value of an > end to end (IP) jumbo frame is that the receiving system experiences > less interrupt load - a 9K frame replaces half a dozen 1500 byte > frames, and as a result the receiver experiences 1/5 or 1/6 of the > interrupts. Given that it has to save state, activate the kernel > thread, and at least enqueue and perhaps acknowledge the received > message, reducing interrupt load on the receiver makes it far more > effective. This has the greatest effect on multi-gigabit file > transfers. Perhaps I'm trying to argue about the number of angels which can dance on the head of a pin, but isn't mitigating interrupt rates something that NICs and their drivers (and NAPI in the context of Linux) been doing for years? Or are you using "interrupt" to refer to the entire trip up the protocol stack and not just "interupts?" And then there is GRO/LRO. Of course as all the world is not bulk flows, one still has to write a nice, tight, stack and driver :) rick jones ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-12 16:41 ` Rick Jones @ 2011-05-12 17:11 ` Fred Baker 0 siblings, 0 replies; 66+ messages in thread From: Fred Baker @ 2011-05-12 17:11 UTC (permalink / raw) To: rick.jones2; +Cc: Stephen Hemminger, bloat On May 12, 2011, at 9:41 AM, Rick Jones wrote: > Perhaps I'm trying to argue about the number of angels which can dance > on the head of a pin, but isn't mitigating interrupt rates something > that NICs and their drivers (and NAPI in the context of Linux) been > doing for years? > > Or are you using "interrupt" to refer to the entire trip up the protocol > stack and not just "interupts?" It's the stack up to the API to the application, which of course receives the data when it reads the socket. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-12 16:31 ` [Bloat] Burst Loss Fred Baker 2011-05-12 16:41 ` Rick Jones @ 2011-05-13 5:00 ` Kevin Gross 2011-05-13 14:35 ` Rick Jones 1 sibling, 1 reply; 66+ messages in thread From: Kevin Gross @ 2011-05-13 5:00 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 1571 bytes --] One of the principal reasons jumbo frames have not been standardized is due to latency concerns. I assume this group can appreciate the IEEE holding ground on this. For a short time, servers with gigabit NICs suffered but smarter NICs were developed (TSO, LRO, other TLAs) and OSs upgraded to support them and I believe it is no longer a significant issue. Kevin Gross On Thu, May 12, 2011 at 10:31 AM, Fred Baker <fred@cisco.com> wrote: > > On May 9, 2011, at 11:06 AM, Rick Jones wrote: > > > GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE) > > refusing to standardize an increase in frame sizes. Put another way, > > they are a "poor man's jumbo frames." > > I'll agree, but only half; once the packets are transferred on the local > wire, any jumbo-ness is lost. GSO/TSO mostly squeezes interframe gaps out of > the wire and perhaps limits the amount of work the driver has to do. The > real value of an end to end (IP) jumbo frame is that the receiving system > experiences less interrupt load - a 9K frame replaces half a dozen 1500 byte > frames, and as a result the receiver experiences 1/5 or 1/6 of the > interrupts. Given that it has to save state, activate the kernel thread, and > at least enqueue and perhaps acknowledge the received message, reducing > interrupt load on the receiver makes it far more effective. This has the > greatest effect on multi-gigabit file transfers. > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > [-- Attachment #2: Type: text/html, Size: 1997 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-13 5:00 ` Kevin Gross @ 2011-05-13 14:35 ` Rick Jones 2011-05-13 14:54 ` Dave Taht 2011-05-13 19:32 ` Denton Gentry 0 siblings, 2 replies; 66+ messages in thread From: Rick Jones @ 2011-05-13 14:35 UTC (permalink / raw) To: Kevin Gross; +Cc: bloat On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: > One of the principal reasons jumbo frames have not been standardized > is due to latency concerns. I assume this group can appreciate the > IEEE holding ground on this. Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds of queuing delay. I don't think this list is worrying about the tens of microseconds difference between the transmission time of a 9000 byte frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds difference at 10 GbE. The "lets try to get onto the Top 500 list" crowd might, but official sanction for a 9000 byte MTU (or larger) doesn't mean it *must* be used. > For a short time, servers with gigabit NICs suffered but smarter NICs > were developed (TSO, LRO, other TLAs) and OSs upgraded to support them > and I believe it is no longer a significant issue. Are TSO and LRO going to be sufficient at 40 and 100 GbE? Cores aren't getting any faster. Only more plentiful. And while it isn't the strongest point in the world, one might even argue that the need to use TSO/LRO to achieve performance hinders new transport protocol adoption - the presence of NIC offloads for only TCP (or UDP) leaves a new transport protocol (perhaps SCTP) at a disadvantage. rick jones > Kevin Gross > > On Thu, May 12, 2011 at 10:31 AM, Fred Baker <fred@cisco.com> wrote: > > On May 9, 2011, at 11:06 AM, Rick Jones wrote: > > > GSO/TSO can be thought of as a symptom of standards bodies > (eg the IEEE) > > refusing to standardize an increase in frame sizes. Put > another way, > > they are a "poor man's jumbo frames." > > I'll agree, but only half; once the packets are transferred on > the local wire, any jumbo-ness is lost. GSO/TSO mostly > squeezes interframe gaps out of the wire and perhaps limits > the amount of work the driver has to do. The real value of an > end to end (IP) jumbo frame is that the receiving system > experiences less interrupt load - a 9K frame replaces half a > dozen 1500 byte frames, and as a result the receiver > experiences 1/5 or 1/6 of the interrupts. Given that it has to > save state, activate the kernel thread, and at least enqueue > and perhaps acknowledge the received message, reducing > interrupt load on the receiver makes it far more effective. > This has the greatest effect on multi-gigabit file transfers. > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-13 14:35 ` Rick Jones @ 2011-05-13 14:54 ` Dave Taht 2011-05-13 20:03 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross ` (2 more replies) 2011-05-13 19:32 ` Denton Gentry 1 sibling, 3 replies; 66+ messages in thread From: Dave Taht @ 2011-05-13 14:54 UTC (permalink / raw) To: rick.jones2; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 1280 bytes --] On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote: > On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: > > One of the principal reasons jumbo frames have not been standardized > > is due to latency concerns. I assume this group can appreciate the > > IEEE holding ground on this. > > Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds > of queuing delay. I don't think this list is worrying about the tens of > microseconds difference between the transmission time of a 9000 byte > frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds > difference at 10 GbE. > Heh. With the first iteration of the bismark project I'm trying to get to where I have less than 30ms latency under load and have far larger problems to worry about than jumbo frames. I'll be lucky to manage 1/10th that (300ms) at this point. Not, incidentally that I mind the idea of jumbo frames. It seems silly to be saddled with default frame sizes that made sense in the 70s, and in an age where we will be seeing ever more packet encapsulation, reducing the header size as a ratio to data size strikes me as a very worthy goal. -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com [-- Attachment #2: Type: text/html, Size: 1699 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-13 14:54 ` Dave Taht @ 2011-05-13 20:03 ` Kevin Gross 2011-05-14 20:48 ` Fred Baker [not found] ` <-4629065256951087821@unknownmsgid> 2011-05-13 22:08 ` [Bloat] Burst Loss david 2 siblings, 1 reply; 66+ messages in thread From: Kevin Gross @ 2011-05-13 20:03 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 2529 bytes --] Do we think that bufferbloat is just a WAN problem? I work on live media applications for LANs and campus networks. I'm seeing what I think could be characterized as bufferbloat in LAN equipment. The timescales on 1 Gb Ethernet are orders of magnitude shorter and the performance problems caused are in many cases a bit different but root cause and potential solutions are, I'm hoping, very similar. Keeping the frame byte size small while the frame time has shrunk maintains the overhead at the same level. Again, this has been a conscious decision not a stubborn relic. Ethernet improvements have increased bandwidth by orders of magnitude. Do we really need to increase it by a couple percentage points more by reducing overhead for large payloads? The cost of that improved marginal bandwidth efficiency is a 6x increase in latency. Many applications would not notice an increase from 12 us to 72 us for a Gigabit switch hop. But on a large network it adds up, some applications are absolutely that sensitive (transaction processing, cluster computing, SANs) and (I thought I'd be preaching to the choir here) there's no way to ever recover the lost performance. Kevin Gross From: Dave Taht [mailto:dave.taht@gmail.com] Sent: Friday, May 13, 2011 8:54 AM To: rick.jones2@hp.com Cc: Kevin Gross; bloat@lists.bufferbloat.net Subject: Re: [Bloat] Burst Loss On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote: On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: > One of the principal reasons jumbo frames have not been standardized > is due to latency concerns. I assume this group can appreciate the > IEEE holding ground on this. Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds of queuing delay. I don't think this list is worrying about the tens of microseconds difference between the transmission time of a 9000 byte frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds difference at 10 GbE. Heh. With the first iteration of the bismark project I'm trying to get to where I have less than 30ms latency under load and have far larger problems to worry about than jumbo frames. I'll be lucky to manage 1/10th that (300ms) at this point. Not, incidentally that I mind the idea of jumbo frames. It seems silly to be saddled with default frame sizes that made sense in the 70s, and in an age where we will be seeing ever more packet encapsulation, reducing the header size as a ratio to data size strikes me as a very worthy goal. [-- Attachment #2: Type: text/html, Size: 8491 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-13 20:03 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross @ 2011-05-14 20:48 ` Fred Baker 2011-05-15 18:28 ` Jonathan Morton 2011-05-17 7:49 ` BeckW 0 siblings, 2 replies; 66+ messages in thread From: Fred Baker @ 2011-05-14 20:48 UTC (permalink / raw) To: Kevin Gross; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 5084 bytes --] On May 13, 2011, at 1:03 PM, Kevin Gross wrote: > Do we think that bufferbloat is just a WAN problem? I work on live media applications for LANs and campus networks. I'm seeing what I think could be characterized as bufferbloat in LAN equipment. The timescales on 1 Gb Ethernet are orders of magnitude shorter and the performance problems caused are in many cases a bit different but root cause and potential solutions are, I'm hoping, very similar. Bufferbloat is most noticeable on WANs, because they have longer delays, but yes LAN equipment does the same thing. It shows up as extended delay or as an increase in loss rates. A lot of LAN equipment has very shallow buffers due to cost (LAN markets are very cost-sensitive). One myth with bufferbloat is that a reasonable solution is to make the buffer shallow; no, because when the queue fills you now have an increased loss rate, which shows up in timeout-driven retransmissions - you really want a deep buffer (for bursts and temporary surges) that you keep shallow using AQM techniques. > Keeping the frame byte size small while the frame time has shrunk maintains the overhead at the same level. Again, this has been a conscious decision not a stubborn relic. Ethernet improvements have increased bandwidth by orders of magnitude. Do we really need to increase it by a couple percentage points more by reducing overhead for large payloads? You might talk with folks who do the LAN Speed records. They generally view end to end jumboframes as material to the achievement. It's not about changing the serialization delay, it's about changing the amount of processing at the endpoints. > The cost of that improved marginal bandwidth efficiency is a 6x increase in latency. Many applications would not notice an increase from 12 us to 72 us for a Gigabit switch hop. But on a large network it adds up, some applications are absolutely that sensitive (transaction processing, cluster computing, SANs) and (I thought I'd be preaching to the choir here) there's no way to ever recover the lost performance. Well, the extra delay is solvable in the transport. The question isn't really what the impact on the network is; it's what the requirements of the application are. For voice, if a voice sample is delayed 50 ms the jitter buffer in the codec resolves that - microseconds are irrelevant. Video codecs generally keep at least three video frames in their jitter buffer; at 30 fps, that's 100 milliseconds of acceptable variation in delay. milliseconds. Where it gets dicey is in elastic applications (applications using transports with the characteristics of TCP) that are retransmitting or otherwise reacting in timeframes comparable to the RTT and the RTT is small, or in elastic applications in which the timeout-retransmission interval is on the order of hundreds of milliseconds to seconds (true of most TCPs) but the RTT is on the order of microseconds to milliseconds. In the former, a deep queue buildup and trigger a transmission that further builds the queue; in the latter, a hiccup can have dramatic side effects. There is ongoing research on how best to do such things in data centers. My suspicion is that the right approach is something akin to 802.2 at the link layer, but with NACK retransmission - system A enumerates the data it sends to system B, and if system B sees a number skip it asks A to retransmit the indicated datagram. You might take a look at RFC 5401/5740/5776 for implementation suggestions. > Kevin Gross > > From: Dave Taht [mailto:dave.taht@gmail.com] > Sent: Friday, May 13, 2011 8:54 AM > To: rick.jones2@hp.com > Cc: Kevin Gross; bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Burst Loss > > > > On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote: > On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: > > One of the principal reasons jumbo frames have not been standardized > > is due to latency concerns. I assume this group can appreciate the > > IEEE holding ground on this. > > Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds > of queuing delay. I don't think this list is worrying about the tens of > microseconds difference between the transmission time of a 9000 byte > frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds > difference at 10 GbE. > > Heh. With the first iteration of the bismark project I'm trying to get to where I have less than 30ms latency under load and have far larger problems to worry about than jumbo frames. I'll be lucky to manage 1/10th that (300ms) at this point. > > Not, incidentally that I mind the idea of jumbo frames. It seems silly to be saddled with default frame sizes that made sense in the 70s, and in an age where we will be seeing ever more packet encapsulation, reducing the header size as a ratio to data size strikes me as a very worthy goal. > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat [-- Attachment #2: Type: text/html, Size: 15595 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-14 20:48 ` Fred Baker @ 2011-05-15 18:28 ` Jonathan Morton 2011-05-15 20:49 ` Fred Baker 2011-05-17 7:49 ` BeckW 1 sibling, 1 reply; 66+ messages in thread From: Jonathan Morton @ 2011-05-15 18:28 UTC (permalink / raw) To: Fred Baker; +Cc: bloat On 14 May, 2011, at 11:48 pm, Fred Baker wrote: > My suspicion is that the right approach is something akin to 802.2 at the link layer, but with NACK retransmission - system A enumerates the data it sends to system B, and if system B sees a number skip it asks A to retransmit the indicated datagram. You might take a look at RFC 5401/5740/5776 for implementation suggestions. This sounds like "reliable datagram" semantics to me. It also sounds a lot like ARQ as used in amateur packet radio. I believe similar mechanisms are built into 802.11. The fundamental thing is that the sender must be able to know when sent frames can be flushed from the buffer because they don't need to be retransmitted. So if there's a NACK, there must also be an ACK - at which point the ACK serves the purpose of the NACK, as it does in TCP. The only alternative is a wall-time TTL, which is doable on single hops but requires careful design. Let's face it. UDP is unreliable by design - applications using it *must* anticipate and cope with dropped and delayed packets, either by exponential RTO or ARQ or NACK or FEC, all at the application layer. And, in a congested network, some UDP packets *will* be lost. TCP is reliable but needs to maintain appropriate window sizes - which it doesn't at present because a lossless network without ECN provides insufficient feedback (and AQM, which is required for good ECN signals, is usually absent), and in the quest for performance, the trend has been inexorably towards more aggressive window sizing (of which TCP-Fit is the latest example). At the receiver end, it is possible to restrain this trend by reducing the receive window. Unfortunately, it's useless to expect Ethernet switches to turn on ECN. They operate at a lower stack level than IP, so they will not modify the IP TOS headers. However, recent versions of Ethernet *do* support a throttling feedback mechanism, and this can and should be exploited to tell the edge host or router that ECN *might* be needed. Also, with throttling feedback throughout the LAN, the Ethernet can for practical purposes be treated as almost-reliable. This is *better* in terms of packet loss than ARQ or NACK, although if the Ethernet's buffers are large, it will still increase delay. (With small buffers, it will just decrease throughput to the capacity, which is fine.) - Jonathan ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-15 18:28 ` Jonathan Morton @ 2011-05-15 20:49 ` Fred Baker 2011-05-16 0:31 ` Jonathan Morton 0 siblings, 1 reply; 66+ messages in thread From: Fred Baker @ 2011-05-15 20:49 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat On May 15, 2011, at 11:28 AM, Jonathan Morton wrote: > The fundamental thing is that the sender must be able to know when sent frames can be flushed from the buffer because they don't need to be retransmitted. So if there's a NACK, there must also be an ACK - at which point the ACK serves the purpose of the NACK, as it does in TCP. The only alternative is a wall-time TTL, which is doable on single hops but requires careful design. To a point. NORM holds a frame for possible retransmission for a stated period of time, and if retransmission isn't requested in that interval forgets it. So the ack isn't actually necessary; what is necessary is that the retention interval be long enough that a nack has a high probability of succeeding in getting the message through. A 100 Gbit interface can handle 97656 per millisecond (100G/(8*128*1000). We're looking at something on the order of 18 bits (4 ms to retransmit without falling back to TCP) for a rational sequence number at 100 Gbps; 16 bits would be enough at 10 Gbps, and 12 bits would be enough at 1 Gbps. > ...recent versions of Ethernet *do* support a throttling feedback mechanism, and this can and should be exploited to tell the edge host or router that ECN *might* be needed. Also, with throttling feedback throughout the LAN, the Ethernet can for practical purposes be treated as almost-reliable. This is *better* in terms of packet loss than ARQ or NACK, although if the Ethernet's buffers are large, it will still increase delay. (With small buffers, it will just decrease throughput to the capacity, which is fine.) It increases the delay anyway. It just pushes the retention buffer to another place. What do you think the packet is doing during the "don't transmit" interval? Throughput never exceeds capacity. If I have a 10 GBPS link, I will never get more than 10 GBPS through it. Buffer fill rate is statistically predictable. With small buffers, the fill rate acheives the top sooner. They increase the probability that the buffers are full, which is to say the drop probability. Which puts us to an end to end retransmission, which is the worst case of what you were worried about. I'm not going to argue against letting retransmission go end to end; it's an endless debate. I'll simply note that several link layers, including but not limited to those you mention, find that applications using them work better if there is a high high probability of retransmission in an interval on the order of the link RTT as opposed to the end to end RTT. You brought up data centers (aka variable delays in LAN networks); those have been heavily the province of fiberchannel, which is a link layer protocol with retransmission. Think about it. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-15 20:49 ` Fred Baker @ 2011-05-16 0:31 ` Jonathan Morton 2011-05-16 7:51 ` Richard Scheffenegger 0 siblings, 1 reply; 66+ messages in thread From: Jonathan Morton @ 2011-05-16 0:31 UTC (permalink / raw) To: Fred Baker; +Cc: bloat On 15 May, 2011, at 11:49 pm, Fred Baker wrote: > > On May 15, 2011, at 11:28 AM, Jonathan Morton wrote: >> The fundamental thing is that the sender must be able to know when sent frames can be flushed from the buffer because they don't need to be retransmitted. So if there's a NACK, there must also be an ACK - at which point the ACK serves the purpose of the NACK, as it does in TCP. The only alternative is a wall-time TTL, which is doable on single hops but requires careful design. > > To a point. NORM holds a frame for possible retransmission for a stated period of time, and if retransmission isn't requested in that interval forgets it. So the ack isn't actually necessary; what is necessary is that the retention interval be long enough that a nack has a high probability of succeeding in getting the message through. Okay, so because it can fall back to TCP's retransmit, the retention requirements can be relaxed. >> ...recent versions of Ethernet *do* support a throttling feedback mechanism, and this can and should be exploited to tell the edge host or router that ECN *might* be needed. Also, with throttling feedback throughout the LAN, the Ethernet can for practical purposes be treated as almost-reliable. This is *better* in terms of packet loss than ARQ or NACK, although if the Ethernet's buffers are large, it will still increase delay. (With small buffers, it will just decrease throughput to the capacity, which is fine.) > > It increases the delay anyway. It just pushes the retention buffer to another place. What do you think the packet is doing during the "don't transmit" interval? Most packets delayed by Ethernet throttling would, with small buffers, end up waiting in the sending host (or router). They thus spend more time in a potentially active queue instead of in a dumb one. But even if the host queue is dumb, the overall delay is no worse than with the larger Ethernet buffers. > Throughput never exceeds capacity. If I have a 10 GBPS link, I will never get more than 10 GBPS through it. Buffer fill rate is statistically predictable. With small buffers, the fill rate acheives the top sooner. They increase the probability that the buffers are full, which is to say the drop probability. Which puts us to an end to end retransmission, which is the worst case of what you were worried about. Let's suppose someone has generously provisioned an office with GigE throughout, using a two-level hierarchy of switches. Some dumb schmuck then schedules every single computer to run it's backups (to a single fileserver) at the same time. That's say 100 computers all competing for one GigE link to the fileserver. If the switches are fair, each computer should get 10Mbps - that's the capacity. With throttling, each computer sees the link closed 99% of the time. It can send at link rate for the remaining 1% of the time. On medium timescales, that looks like a 10Mbps bottleneck at the first link. So the throughput on that link equals the capacity, and hopefully the goodput is also thus. The only queue that is likely to overflow is the one on the sending computer, and one would hope there is enough feedback in a host's own TCP/IP stack to prevent that. Without throttling but with ARQ, NACK or whatever you want to call it, the host has no signal to tell it to slow down - so the throughput on the edge link is more than 10Mbps (but the goodput will be less). The buffer in the outer switch fills up - no matter how big or small it is - and starts dropping packets. The switch then won't ask for retransmission of packets it's just dropped, because it has nowhere to put them. The same process then repeats at the inner switch. Finally, the server sees the missing packets, and asks for the retransmission - but these requests have to be switched all the way back to the clients, because the missing packets aren't in the switches' buffers. It's therefore no better than a TCP SACK retransmission. So there you have a classic congested network scenario in which throttling solves the problem, but link-level retransmission can't. Where ARQ and/or NACK come in handy is where the link itself is unreliable, such as on WLANs (hence the use in amateur radio) and last-mile links. In that case, the reason for the packet loss is not a full receive buffer, so asking for a retransmission is not inherently self-defeating. > I'm not going to argue against letting retransmission go end to end; it's an endless debate. I'll simply note that several link layers, including but not limited to those you mention, find that applications using them work better if there is a high high probability of retransmission in an interval on the order of the link RTT as opposed to the end to end RTT. You brought up data centers (aka variable delays in LAN networks); those have been heavily the province of fiberchannel, which is a link layer protocol with retransmission. Think about it. What I'd like to see is a complete absence of need for retransmission on a properly built wired network. Obviously the capability still needs to be there to cope with the parts that aren't properly built or aren't wired, but TCP can do that. Throttling (in the form of Ethernet PAUSE) is simply the third possible method of signalling congestion in the network, alongside delay and loss - and it happens to be quite widely deployed already. - Jonathan ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-16 0:31 ` Jonathan Morton @ 2011-05-16 7:51 ` Richard Scheffenegger 2011-05-16 9:49 ` Fred Baker 0 siblings, 1 reply; 66+ messages in thread From: Richard Scheffenegger @ 2011-05-16 7:51 UTC (permalink / raw) To: Jonathan Morton, Fred Baker; +Cc: bloat Jonathan, > What I'd like to see is a complete absence of need for retransmission on a > properly > built wired network. Obviously the capability still needs to be there to > cope with > the parts that aren't properly built or aren't wired, but TCP can do that. > Throttling > (in the form of Ethernet PAUSE) is simply the third possible method of > signalling > congestion in the network, alongside delay and loss - and it happens to be > quite > widely deployed already. Two comments: TCP can currently NOT deal properly with non-congestion loss (with other words, any loss will lead to a congestion control reaction - reduction of sending rate). TCP can only (mostly) deal with the recovery part in a hopefully timely fashion. In this area you'll find a high number of possible approaches, none of which is quite backwards-compatible with "standard" TCP. Second, you wouldn't want to deploy basic 802.3x to any network consisting of more than a single switch. If you do, you can run into an effect called congestion tree formation, where (simplified) the slowest receiver determines the global speed of your ethernet network. 802.1Qbb is also prone to congestion trees, even though the probability is somewhat reduced provided all priority classes are being used. Unfortunately, most traffic is in the same 802.1p class... Adequate solutions (more complex than the FCP buffer-credit based congestion avoidance) like 802.1Qau / QCN are not available commercially afaik. (They need new NICs + new Switches for the HW support). But I agree, a L3 device should be able to distribute L2 congestion information into the L3 header (even though today, cheap generic broadcom and perhaps even Realtek chipsets support ECN marking even when they are running as L2 switch; a speciality firmware (see the DCTCP papers) is required though. Best regards, Richard ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-16 7:51 ` Richard Scheffenegger @ 2011-05-16 9:49 ` Fred Baker 2011-05-16 11:23 ` [Bloat] Jumbo frames and LAN buffers Jim Gettys 2011-05-16 18:11 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Richard Scheffenegger 0 siblings, 2 replies; 66+ messages in thread From: Fred Baker @ 2011-05-16 9:49 UTC (permalink / raw) To: Richard Scheffenegger; +Cc: bloat On May 16, 2011, at 9:51 AM, Richard Scheffenegger wrote: > Second, you wouldn't want to deploy basic 802.3x to any network consisting of more than a single switch. actually, it's pretty common practice. Three layers, even. People build backbones, and then ring them with workgroup switches, and then put small switches on their desks. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 9:49 ` Fred Baker @ 2011-05-16 11:23 ` Jim Gettys 2011-05-16 13:15 ` Kevin Gross 2011-05-16 18:11 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Richard Scheffenegger 1 sibling, 1 reply; 66+ messages in thread From: Jim Gettys @ 2011-05-16 11:23 UTC (permalink / raw) To: bloat On 05/16/2011 05:49 AM, Fred Baker wrote: > On May 16, 2011, at 9:51 AM, Richard Scheffenegger wrote: > >> Second, you wouldn't want to deploy basic 802.3x to any network consisting of more than a single switch. > actually, it's pretty common practice. Three layers, even. People build backbones, and then ring them with workgroup switches, and then put small switches on their desks. > Not necessarily out of knowledge or desire (since it isn't usually controllable in the small switches you buy for home). It can cause trouble even in small environments as your house. http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html I know I'm at least three consumer switches deep, and it's not by choice. - Jim ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 11:23 ` [Bloat] Jumbo frames and LAN buffers Jim Gettys @ 2011-05-16 13:15 ` Kevin Gross 2011-05-16 13:22 ` Jim Gettys 2011-05-16 18:36 ` Richard Scheffenegger 0 siblings, 2 replies; 66+ messages in thread From: Kevin Gross @ 2011-05-16 13:15 UTC (permalink / raw) To: bloat All the stand-alone switches I've looked at recently either do not support 802.3x or support it in the (desireable) manner described in the last paragraph of the linked blog post. I don't believe Ethernet flow control is a factor in current LANs. I'd be interested to know the specifics if anyone sees it differently. My understanding is that 802.1au, "lossless Ethernet", was designed primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and LAN can share a common infrastructure in datacenters. I don't believe anyone intends for it to be enabled for traffic classes carrying TCP. Kevin Gross -----Original Message----- From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys Sent: Monday, May 16, 2011 5:24 AM To: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Jumbo frames and LAN buffers Not necessarily out of knowledge or desire (since it isn't usually controllable in the small switches you buy for home). It can cause trouble even in small environments as your house. http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html I know I'm at least three consumer switches deep, and it's not by choice. - Jim ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 13:15 ` Kevin Gross @ 2011-05-16 13:22 ` Jim Gettys 2011-05-16 13:42 ` Kevin Gross [not found] ` <-854731558634984958@unknownmsgid> 2011-05-16 18:36 ` Richard Scheffenegger 1 sibling, 2 replies; 66+ messages in thread From: Jim Gettys @ 2011-05-16 13:22 UTC (permalink / raw) To: bloat On 05/16/2011 09:15 AM, Kevin Gross wrote: > All the stand-alone switches I've looked at recently either do not support > 802.3x or support it in the (desireable) manner described in the last > paragraph of the linked blog post. I don't believe Ethernet flow control is > a factor in current LANs. I'd be interested to know the specifics if anyone > sees it differently. Heh. Plug wireshark into current off the shelf cheap consumer switches intended for the home. You won't like what you see. And you have no way to manage them. I was quite surprised last fall when doing my home experiments to see 802.3 frames; I had been blissfully unaware of its existence, and had to go read up on it as a result. I don't think any of the enterprise switches are so brain damaged. So i suspect it's mostly lurking to cause trouble in home and small office environments, exactly where no-one will know what's going on. - Jim > My understanding is that 802.1au, "lossless Ethernet", was designed > primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and > LAN can share a common infrastructure in datacenters. I don't believe anyone > intends for it to be enabled for traffic classes carrying TCP. > > Kevin Gross > > -----Original Message----- > From: bloat-bounces@lists.bufferbloat.net > [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys > Sent: Monday, May 16, 2011 5:24 AM > To: bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Jumbo frames and LAN buffers > > Not necessarily out of knowledge or desire (since it isn't usually > controllable in the small switches you buy for home). It can cause > trouble even in small environments as your house. > > http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html > > I know I'm at least three consumer switches deep, and it's not by choice. > - Jim > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 13:22 ` Jim Gettys @ 2011-05-16 13:42 ` Kevin Gross 2011-05-16 15:23 ` Jim Gettys [not found] ` <-854731558634984958@unknownmsgid> 1 sibling, 1 reply; 66+ messages in thread From: Kevin Gross @ 2011-05-16 13:42 UTC (permalink / raw) To: bloat I would like to try this. Can you suggest specific equipment to look at. Due to integration and low port count, most of the cheap consumer stuff has surprisingly good layer-2 performance. I've tested a bunch of Linksys and other small/medium business 5 to 24 port gigabit switches. Since I measure latency, I expect I would have noticed if flow control were kicking in. Kevin Gross -----Original Message----- From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys Sent: Monday, May 16, 2011 7:23 AM To: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Jumbo frames and LAN buffers On 05/16/2011 09:15 AM, Kevin Gross wrote: > All the stand-alone switches I've looked at recently either do not support > 802.3x or support it in the (desireable) manner described in the last > paragraph of the linked blog post. I don't believe Ethernet flow control is > a factor in current LANs. I'd be interested to know the specifics if anyone > sees it differently. Heh. Plug wireshark into current off the shelf cheap consumer switches intended for the home. You won't like what you see. And you have no way to manage them. I was quite surprised last fall when doing my home experiments to see 802.3 frames; I had been blissfully unaware of its existence, and had to go read up on it as a result. I don't think any of the enterprise switches are so brain damaged. So i suspect it's mostly lurking to cause trouble in home and small office environments, exactly where no-one will know what's going on. - Jim ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 13:42 ` Kevin Gross @ 2011-05-16 15:23 ` Jim Gettys 0 siblings, 0 replies; 66+ messages in thread From: Jim Gettys @ 2011-05-16 15:23 UTC (permalink / raw) To: bloat On 05/16/2011 09:42 AM, Kevin Gross wrote: > I would like to try this. Can you suggest specific equipment to look at. Due > to integration and low port count, most of the cheap consumer stuff has > surprisingly good layer-2 performance. I've tested a bunch of Linksys and > other small/medium business 5 to 24 port gigabit switches. Since I measure > latency, I expect I would have noticed if flow control were kicking in. I think I was using a D-Link DGS2208. (8 port consumer switch). I then went and looked at the spec sheets of some of the other consumer kit out there and found they all had the "feature" of 802.3 flow control. I may have been using iperf to tickle it, rather than ssh. I was also playing around with an old 100Mbps switch, as documented in my blog; I don't remember if I saw it there. - Jim > Kevin Gross > > -----Original Message----- > From: bloat-bounces@lists.bufferbloat.net > [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys > Sent: Monday, May 16, 2011 7:23 AM > To: bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Jumbo frames and LAN buffers > > On 05/16/2011 09:15 AM, Kevin Gross wrote: >> All the stand-alone switches I've looked at recently either do not support >> 802.3x or support it in the (desireable) manner described in the last >> paragraph of the linked blog post. I don't believe Ethernet flow control > is >> a factor in current LANs. I'd be interested to know the specifics if > anyone >> sees it differently. > Heh. Plug wireshark into current off the shelf cheap consumer switches > intended for the home. You won't like what you see. And you have no > way to manage them. I was quite surprised last fall when doing my home > experiments to see 802.3 frames; I had been blissfully unaware of its > existence, and had to go read up on it as a result. > > I don't think any of the enterprise switches are so brain damaged. So i > suspect it's mostly lurking to cause trouble in home and small office > environments, exactly where no-one will know what's going on. > - Jim > > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <-854731558634984958@unknownmsgid>]
* Re: [Bloat] Jumbo frames and LAN buffers [not found] ` <-854731558634984958@unknownmsgid> @ 2011-05-16 13:45 ` Dave Taht 0 siblings, 0 replies; 66+ messages in thread From: Dave Taht @ 2011-05-16 13:45 UTC (permalink / raw) To: Kevin Gross; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 2196 bytes --] On Mon, May 16, 2011 at 7:42 AM, Kevin Gross <kevin.gross@avanw.com> wrote: > I would like to try this. Can you suggest specific equipment to look at. > Due > to integration and low port count, most of the cheap consumer stuff has > surprisingly good layer-2 performance. I've tested a bunch of Linksys and > other small/medium business 5 to 24 port gigabit switches. Since I measure > latency, I expect I would have noticed if flow control were kicking in. > I would certainly appreciate more people looking at the switch in the wndr3700v2 we're using on the bismark project. I'm seeing some pretty deep buffering on it > > Kevin Gross > > -----Original Message----- > From: bloat-bounces@lists.bufferbloat.net > [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys > Sent: Monday, May 16, 2011 7:23 AM > To: bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Jumbo frames and LAN buffers > > On 05/16/2011 09:15 AM, Kevin Gross wrote: > > All the stand-alone switches I've looked at recently either do not > support > > 802.3x or support it in the (desireable) manner described in the last > > paragraph of the linked blog post. I don't believe Ethernet flow control > is > > a factor in current LANs. I'd be interested to know the specifics if > anyone > > sees it differently. > > Heh. Plug wireshark into current off the shelf cheap consumer switches > intended for the home. You won't like what you see. And you have no > way to manage them. I was quite surprised last fall when doing my home > experiments to see 802.3 frames; I had been blissfully unaware of its > existence, and had to go read up on it as a result. > > I don't think any of the enterprise switches are so brain damaged. So i > suspect it's mostly lurking to cause trouble in home and small office > environments, exactly where no-one will know what's going on. > - Jim > > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com [-- Attachment #2: Type: text/html, Size: 3233 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 13:15 ` Kevin Gross 2011-05-16 13:22 ` Jim Gettys @ 2011-05-16 18:36 ` Richard Scheffenegger 1 sibling, 0 replies; 66+ messages in thread From: Richard Scheffenegger @ 2011-05-16 18:36 UTC (permalink / raw) To: Kevin Gross, bloat Kevin, > My understanding is that 802.1au, "lossless Ethernet", was designed > primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and > LAN can share a common infrastructure in datacenters. I don't believe > anyone > intends for it to be enabled for traffic classes carrying TCP. Well, QCN requires a L2 MAC sender, network and receiver cooperation (thus you need fancy "CNA" converged network adapters, to start using it - these would be reaction/reflection points; plus the congestion points - switches - would need HW support too; nothing one can buy today; higher-grade (carrier?) switches may have the reaction/reflection points built into them, and could use legacy 802.3x signalling outside the 802.1Qau cloud). The following may be too simplistic Once the hardware has a reaction point support, it classifies traffic, and calculates the per flow congestion of the path (with flow really being the classification rules by the sender), the intermediates / receiver sample the flow and return the congestion back to the sender - and within the sender, a token bucket-like rate limiter will adjust the sending rate of the appropriate flow(s) to adjust to the observed network conditions. http://www.stanford.edu/~balaji/presentations/au-prabhakar-qcn-description.pdf http://www.ieee802.org/1/files/public/docs2007/au-pan-qcn-details-053007.pdf The congestion control loop has a lot of similarities to TCP CC as you will note... Also, I haven't found out how fine-grained the classification is supposed to be (per L2 address pair? Group of flows? Which hashing then to use for mapping L2 flows into those groups between reaction/congestion/reflection points...). Anyway, for the here and now, this is pretty much esoteric stuff not relevant in this context :) Best regards, Richard ----- Original Message ----- From: "Kevin Gross" <kevin.gross@avanw.com> To: <bloat@lists.bufferbloat.net> Sent: Monday, May 16, 2011 3:15 PM Subject: Re: [Bloat] Jumbo frames and LAN buffers > All the stand-alone switches I've looked at recently either do not support > 802.3x or support it in the (desireable) manner described in the last > paragraph of the linked blog post. I don't believe Ethernet flow control > is > a factor in current LANs. I'd be interested to know the specifics if > anyone > sees it differently. > > My understanding is that 802.1au, "lossless Ethernet", was designed > primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and > LAN can share a common infrastructure in datacenters. I don't believe > anyone > intends for it to be enabled for traffic classes carrying TCP. > > Kevin Gross > > -----Original Message----- > From: bloat-bounces@lists.bufferbloat.net > [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys > Sent: Monday, May 16, 2011 5:24 AM > To: bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Jumbo frames and LAN buffers > > Not necessarily out of knowledge or desire (since it isn't usually > controllable in the small switches you buy for home). It can cause > trouble even in small environments as your house. > > http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html > > I know I'm at least three consumer switches deep, and it's not by choice. > - Jim > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-16 9:49 ` Fred Baker 2011-05-16 11:23 ` [Bloat] Jumbo frames and LAN buffers Jim Gettys @ 2011-05-16 18:11 ` Richard Scheffenegger 1 sibling, 0 replies; 66+ messages in thread From: Richard Scheffenegger @ 2011-05-16 18:11 UTC (permalink / raw) To: Fred Baker; +Cc: bloat Hi Fred, Yes, that's the common topology; However, 802.3x is often used only unidirectional and with very limited effect, but not bidirectional. At least that's the default settings... (I wonder, if both ends of a link are RX, would flow control ever get triggered?) I know a number of deployments, where globally enabling full flowcontrol (as opposed to RX / TX only) lead to fewer packet drops, but also to sometimes massively reduces network bandwidth. This is what I meant when I said you don't want to deploy flow control in a multi-tier network topology because of the congestion tree forming. Best regards, Richard ----- Original Message ----- From: "Fred Baker" <fred@cisco.com> To: "Richard Scheffenegger" <rscheff@gmx.at> Cc: "Jonathan Morton" <chromatix99@gmail.com>; <bloat@lists.bufferbloat.net> Sent: Monday, May 16, 2011 11:49 AM Subject: Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) On May 16, 2011, at 9:51 AM, Richard Scheffenegger wrote: > Second, you wouldn't want to deploy basic 802.3x to any network consisting > of more than a single switch. actually, it's pretty common practice. Three layers, even. People build backbones, and then ring them with workgroup switches, and then put small switches on their desks.= ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-14 20:48 ` Fred Baker 2011-05-15 18:28 ` Jonathan Morton @ 2011-05-17 7:49 ` BeckW 2011-05-17 14:16 ` Dave Taht 1 sibling, 1 reply; 66+ messages in thread From: BeckW @ 2011-05-17 7:49 UTC (permalink / raw) To: bloat (I think) Fred wrote: > Well, the extra delay is solvable in the transport. The question isn't really what the impact on the > network is; it's what the requirements of the application are. For voice, if a voice sample is > delayed 50 ms the jitter buffer in the codec resolves that - microseconds are irrelevant. If you meant 50 microseconds, ignore the rest of this post. 50 milliseconds is a *long* time in VoIP. The total mouth-to-ear delay budget is only 150 ms. Adaptive jitter buffer algorithms choose a buffer size that is bigger than the observed delay variation. So the additional delay will be even higher than 50 ms. Big frames are a problem on slower upstream links, even if you strictly prioritize VoIP and don't use jumbo frames. Some DSL providers resort to using two ATM VCs, just to prevent TCP packets from delaying VoIP. Wolfgang Beck -- Deutsche Telekom Netzproduktion GmbH Zentrum Technik Einführung Heinrich-Hertz-Straße 3-7, 64295 Darmstadt +49 61516282832 (Tel.) http://www.telekom.com Deutsche Telekom Netzproduktion GmbH Aufsichtsrat: Timotheus Höttges (Vorsitzender) Geschäftsführung: Bruno Jacobfeuerborn (Vorsitzender), Albert Matheis, Klaus Peren Handelsregister: Amtsgericht Bonn HRB 14190 Sitz der Gesellschaft: Bonn USt-IdNr.: DE 814645262 Erleben, was verbindet. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-17 7:49 ` BeckW @ 2011-05-17 14:16 ` Dave Taht 0 siblings, 0 replies; 66+ messages in thread From: Dave Taht @ 2011-05-17 14:16 UTC (permalink / raw) To: BeckW; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 879 bytes --] On Tue, May 17, 2011 at 1:49 AM, <BeckW@telekom.de> wrote: > (I think) Fred wrote: > > Well, the extra delay is solvable in the transport. The question isn't > really what the impact on the > network is; it's what the requirements of > the application are. For voice, if a voice sample is > > delayed 50 ms the jitter buffer in the codec resolves that - microseconds > are irrelevant. > > If you meant 50 microseconds, ignore the rest of this post. > > 50 milliseconds is a *long* time in VoIP. The total mouth-to-ear delay > budget is only 150 ms. Adaptive jitter buffer algorithms choose a buffer > size that is bigger than the observed delay variation. So the additional > delay will be even higher than 50 ms. > > *10* ms in terms of jitter is a *long* time in voip. -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com [-- Attachment #2: Type: text/html, Size: 1262 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <-4629065256951087821@unknownmsgid>]
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) [not found] ` <-4629065256951087821@unknownmsgid> @ 2011-05-13 20:21 ` Dave Taht 2011-05-13 22:36 ` Kevin Gross 0 siblings, 1 reply; 66+ messages in thread From: Dave Taht @ 2011-05-13 20:21 UTC (permalink / raw) To: Kevin Gross; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 4323 bytes --] On Fri, May 13, 2011 at 2:03 PM, Kevin Gross <kevin.gross@avanw.com> wrote: > Do we think that bufferbloat is just a WAN problem? I work on live media > applications for LANs and campus networks. I'm seeing what I think could be > characterized as bufferbloat in LAN equipment. The timescales on 1 Gb > Ethernet are orders of magnitude shorter and the performance problems caused > are in many cases a bit different but root cause and potential solutions > are, I'm hoping, very similar. > > > > Keeping the frame byte size small while the frame time has shrunk maintains > the overhead at the same level. Again, this has been a conscious decision > not a stubborn relic. Ethernet improvements have increased bandwidth by > orders of magnitude. Do we really need to increase it by a couple percentage > points more by reducing overhead for large payloads? > > > > The cost of that improved marginal bandwidth efficiency is a 6x increase in > latency. Many applications would not notice an increase from 12 us to 72 us > for a Gigabit switch hop. But on a large network it adds up, some > applications are absolutely that sensitive (transaction processing, cluster > computing, SANs) and (I thought I'd be preaching to the choir here) there's > no way to ever recover the lost performance. > > > You are preaching to the choir here, but I note several things: Large frame sizes on 10GigE networks to other 10GigE networks is less of a problem than 10GigE to 10Mbit networks. I would hope/expect that frame would fragment in that case. Getting to where latencies are less than 10ms in the general case makes voip feasible again. I'm still at well over 300ms on bismark. Enabling higher speed stock market trades and live music exchange over a lan would be next on my list after getting below 10ms on the local switch/wireless interface! A lot of research points to widely enabling some form of fair queuing at the servers and switches to distribute the load at sane levels. (nagle, 89) I think few gig+e vendors are doing that in hardware, and it would be good to know who is and who isn't. For example, the switch I'm using on bismark has all sorts of wonderful QoS features such as fair queuing, but as best as I can tell they are not enabled, and I'm seeing buffering in the switch at well above 20ms.... It is astonishing that a switch chip this capable has reached the consumer marketplace... http://realtek.info/pdf/rtl8366s_8366sr_datasheet_vpre-1.4_20071022.pdf And depressing that so few of it's capabilities have software to configure them. > Kevin Gross > > > > *From:* Dave Taht [mailto:dave.taht@gmail.com] > *Sent:* Friday, May 13, 2011 8:54 AM > *To:* rick.jones2@hp.com > *Cc:* Kevin Gross; bloat@lists.bufferbloat.net > *Subject:* Re: [Bloat] Burst Loss > > > > > > On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote: > > On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: > > One of the principal reasons jumbo frames have not been standardized > > is due to latency concerns. I assume this group can appreciate the > > IEEE holding ground on this. > > Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds > of queuing delay. I don't think this list is worrying about the tens of > microseconds difference between the transmission time of a 9000 byte > frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds > difference at 10 GbE. > > > Heh. With the first iteration of the bismark project I'm trying to get to > where I have less than 30ms latency under load and have far larger problems > to worry about than jumbo frames. I'll be lucky to manage 1/10th that > (300ms) at this point. > > Not, incidentally that I mind the idea of jumbo frames. It seems silly to > be saddled with default frame sizes that made sense in the 70s, and in an > age where we will be seeing ever more packet encapsulation, reducing the > header size as a ratio to data size strikes me as a very worthy goal. > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com [-- Attachment #2: Type: text/html, Size: 6678 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-13 20:21 ` Dave Taht @ 2011-05-13 22:36 ` Kevin Gross 0 siblings, 0 replies; 66+ messages in thread From: Kevin Gross @ 2011-05-13 22:36 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 5732 bytes --] Even through jumbo frames are not standardized, most new network equipment supports them (though generally support is disabled by default). If you IPv4 route jumbo packets to a network that doesn't support them, the router will fragment for you. Under IPv6, it is the sender's responsibility to choose an MTU that is supported by all networks between source and destination. IPv6 routers do no fragmentation. Although consumer products are often dumbed down, it is not difficult to find switches with comprehensive QoS configurability. Weighted fair queuing is a popular scheme. Strict priority is a bit dangerous but useful for latency-critical applications. The IEEE has just ratified a credit-based algorithm called 802.1av. What I find is missing from all but the high-end equipment is configurability of buffering capacity and behavior. Bad buffering can burn an otherwise competent QoS implementation. In his talks, Jim Gettys claims that these QoS features do not fix bufferbloat - they just move the problem elsewhere. I generally agree with this though I find that moving the problem elsewhere is sometimes a perfectly acceptable solution. Kevin Gross On Fri, May 13, 2011 at 2:21 PM, Dave Taht <dave.taht@gmail.com> wrote: > > > On Fri, May 13, 2011 at 2:03 PM, Kevin Gross <kevin.gross@avanw.com>wrote: > >> Do we think that bufferbloat is just a WAN problem? I work on live media >> applications for LANs and campus networks. I'm seeing what I think could be >> characterized as bufferbloat in LAN equipment. The timescales on 1 Gb >> Ethernet are orders of magnitude shorter and the performance problems caused >> are in many cases a bit different but root cause and potential solutions >> are, I'm hoping, very similar. >> >> >> >> Keeping the frame byte size small while the frame time has shrunk >> maintains the overhead at the same level. Again, this has been a conscious >> decision not a stubborn relic. Ethernet improvements have increased >> bandwidth by orders of magnitude. Do we really need to increase it by a >> couple percentage points more by reducing overhead for large payloads? >> >> >> >> The cost of that improved marginal bandwidth efficiency is a 6x increase >> in latency. Many applications would not notice an increase from 12 us to 72 >> us for a Gigabit switch hop. But on a large network it adds up, some >> applications are absolutely that sensitive (transaction processing, cluster >> computing, SANs) and (I thought I'd be preaching to the choir here) there's >> no way to ever recover the lost performance. >> >> >> > > You are preaching to the choir here, but I note several things: > > Large frame sizes on 10GigE networks to other 10GigE networks is less of a > problem than 10GigE to 10Mbit networks. I would hope/expect that frame would > fragment in that case. > > Getting to where latencies are less than 10ms in the general case makes > voip feasible again. I'm still at well over 300ms on bismark. > > Enabling higher speed stock market trades and live music exchange over a > lan would be next on my list after getting below 10ms on the local > switch/wireless interface! > > A lot of research points to widely enabling some form of fair queuing at > the servers and switches to distribute the load at sane levels. (nagle, 89) > I think few gig+e vendors are doing that in hardware, and it would be good > to know who is and who isn't. > > For example, the switch I'm using on bismark has all sorts of wonderful QoS > features such as fair queuing, but as best as I can tell they are not > enabled, and I'm seeing buffering in the switch at well above 20ms.... > > It is astonishing that a switch chip this capable has reached the consumer > marketplace... > > http://realtek.info/pdf/rtl8366s_8366sr_datasheet_vpre-1.4_20071022.pdf > > And depressing that so few of it's capabilities have software to configure > them. > >> Kevin Gross >> >> >> >> *From:* Dave Taht [mailto:dave.taht@gmail.com] >> *Sent:* Friday, May 13, 2011 8:54 AM >> *To:* rick.jones2@hp.com >> *Cc:* Kevin Gross; bloat@lists.bufferbloat.net >> *Subject:* Re: [Bloat] Burst Loss >> >> >> >> >> >> On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote: >> >> On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: >> > One of the principal reasons jumbo frames have not been standardized >> > is due to latency concerns. I assume this group can appreciate the >> > IEEE holding ground on this. >> >> Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds >> of queuing delay. I don't think this list is worrying about the tens of >> microseconds difference between the transmission time of a 9000 byte >> frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds >> difference at 10 GbE. >> >> >> Heh. With the first iteration of the bismark project I'm trying to get to >> where I have less than 30ms latency under load and have far larger problems >> to worry about than jumbo frames. I'll be lucky to manage 1/10th that >> (300ms) at this point. >> >> Not, incidentally that I mind the idea of jumbo frames. It seems silly to >> be saddled with default frame sizes that made sense in the 70s, and in an >> age where we will be seeing ever more packet encapsulation, reducing the >> header size as a ratio to data size strikes me as a very worthy goal. >> >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat >> >> > > > -- > Dave Täht > SKYPE: davetaht > US Tel: 1-239-829-5608 > http://the-edge.blogspot.com > [-- Attachment #2: Type: text/html, Size: 8367 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-13 14:54 ` Dave Taht 2011-05-13 20:03 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross [not found] ` <-4629065256951087821@unknownmsgid> @ 2011-05-13 22:08 ` david 2 siblings, 0 replies; 66+ messages in thread From: david @ 2011-05-13 22:08 UTC (permalink / raw) To: Dave Taht; +Cc: bloat [-- Attachment #1: Type: TEXT/Plain, Size: 926 bytes --] On Fri, 13 May 2011, Dave Taht wrote: > Not, incidentally that I mind the idea of jumbo frames. It seems silly to be > saddled with default frame sizes that made sense in the 70s, and in an age > where we will be seeing ever more packet encapsulation, reducing the header > size as a ratio to data size strikes me as a very worthy goal. the header to data size ratio is a small factor (but with a header of ~50 bytes, you don't save _that_ much) but I thought the huge advantage to jumbo frames was eliminating the gap between packets. back in the 1Mb network days, this gap size was not significant (a few bits work), but as networks have gotten faster, the gap has not gotten smaller by the same ratio. you guys are probably closer to the raw numbers than I am, but what it the total throughput of a network (including header data as throughput) for various packet sizes (64 byte, 1500 byte, 9000 byte) David Lang [-- Attachment #2: Type: TEXT/PLAIN, Size: 140 bytes --] _______________________________________________ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-13 14:35 ` Rick Jones 2011-05-13 14:54 ` Dave Taht @ 2011-05-13 19:32 ` Denton Gentry 2011-05-13 20:47 ` Rick Jones 1 sibling, 1 reply; 66+ messages in thread From: Denton Gentry @ 2011-05-13 19:32 UTC (permalink / raw) To: rick.jones2, Kevin Gross; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 865 bytes --] On Fri, May 13, 2011 at 7:35 AM, Rick Jones <rick.jones2@hp.com> wrote: > > For a short time, servers with gigabit NICs suffered but smarter NICs > > were developed (TSO, LRO, other TLAs) and OSs upgraded to support them > > and I believe it is no longer a significant issue. > > Are TSO and LRO going to be sufficient at 40 and 100 GbE? Cores aren't > getting any faster. Only more plentiful. NICs seem to be responding by hashing incoming 5-tuples to distribute flows across cores. > And while it isn't the > strongest point in the world, one might even argue that the need to use > TSO/LRO to achieve performance hinders new transport protocol adoption - > the presence of NIC offloads for only TCP (or UDP) leaves a new > transport protocol (perhaps SCTP) at a disadvantage. True, and even UDP seems to be often blocked for anything other than DNS. [-- Attachment #2: Type: text/html, Size: 1340 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-13 19:32 ` Denton Gentry @ 2011-05-13 20:47 ` Rick Jones 0 siblings, 0 replies; 66+ messages in thread From: Rick Jones @ 2011-05-13 20:47 UTC (permalink / raw) To: Denton Gentry; +Cc: bloat On Fri, 2011-05-13 at 12:32 -0700, Denton Gentry wrote: > NICs seem to be responding by hashing incoming 5-tuples to > distribute flows across cores. When I first kicked netperf out onto the Internet, when 10 Megabits/second was really fast, people started asking me "Why can't I get link-rate on a single-stream netperf test?" The answer was "Because you don't have enough CPU horsepower, but perhaps the next processor will." Then when 100BT happened, people asked me "Why can't I get link-rate on a single-stream netperf test?" And the answer was the same. Then when 1 GbE happened, people asked me "Why can't I get link-rate on a single-stream netperf test?" And the answer was the same, tweaked slightly to suggest they get a NIC with CKO. Then when 10 GbE happened people asked me "Why can't I get link-rate on a single-stream netperf test?" And the answer was "Because you don't have enough CPU, try a NIC with TSO and LRO." Based on the past 20 years I am quite confident that when 40 and 100 GbE NICs appear for end systems, I will again be asked "Why can't I get link-rate on a single-stream netperf test?" While indeed, the world is not just unidirectional bulk flows (if it were netperf and its request-response tests would never have come into being to replace ttcp), even after decades it is still something people seem to expect. There must be some value to high performance unidirectional transfer. Only now the cores aren't going to have gotten any faster, and spreading incoming 5-tuples across cores isn't going to help a single stream. So, the "answer" will likely end-up being to add still more complexity - either in the applications to use multiple streams, or to push the full stack into the NIC. Adde parvum parvo manus acervus erit. But, by Metcalf, we will have preserved the sacrosanct Ethernet maximum frame size. Crossing emails a bit, Kevin wrote about the 6X increase in latency. It is a 6X increase in *potential* latency *if* someone actually enables the larger MTU. And yes, the "We want to be on the Top 500 list" types do worry about latency and some perhaps even many of them use Ethernet instead of Infiniband (which does, BTW offer at least the illusion of a quite large MTU to IP), but a sanctioned way to run a larger MTU over Ethernet does not *force* them to use it if they want to make the explicit latency vs overhead trade-off. As it stands, those who do not worry about micro or nanoseconds are forced off the standard in the name of preserving something for those who do. (And with 100 GbE it would be nanosecond differences we would talking about - the 12 and 72 usec of 1 GbE become 120 and 720 nanoseconds at 100 GbE - the realm of a processor cache miss because memory latency hasn't and won't likely get much better either) And, are transaction or SAN latencies actually measured in microseconds or nanoseconds? If "transactions" are OLTP, those things are measured in milliseconds and even whole seconds (TPC), and spinning rust (yes, but not SSDs) still has latencies measured in milliseconds. rick jones > > And while it isn't the > strongest point in the world, one might even argue that the > need to use > TSO/LRO to achieve performance hinders new transport protocol > adoption - > the presence of NIC offloads for only TCP (or UDP) leaves a > new > transport protocol (perhaps SCTP) at a disadvantage. > > > True, and even UDP seems to be often blocked for anything other than > DNS. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-05 16:01 ` Jim Gettys 2011-05-05 16:10 ` Stephen Hemminger @ 2011-05-06 4:18 ` Fred Baker 2011-05-06 15:14 ` richard 2011-05-08 12:34 ` Richard Scheffenegger 1 sibling, 2 replies; 66+ messages in thread From: Fred Baker @ 2011-05-06 4:18 UTC (permalink / raw) To: Jim Gettys; +Cc: bloat There are a couple of ways to approach this, and they depend on your network model. In general, if you assume that there is one bottleneck, losses occur in the queue at the bottleneck, and are each retransmitted exactly once (not necessary, but helps), goodput should approximate 100% regardless of the queue depth. Why? Because every packet transits the bottleneck once - if it is dropped at the bottleneck, the retransmission transits the bottleneck. So you are using exactly the capacity of the bottleneck. the value of a shallow queue is to reduce RTT, not to increase or decrease goodput. cwnd can become too small, however; if it is possible to set cwnd to N without increasing queuing delay, and cwnd is less than N, you're not maximizing throughput. When cwnd grows above N, it merely increases queuing delay, and therefore bufferbloat. If there are two bottlenecks in series, you have some probability that a packet transits one bottleneck and doesn't transit the other. In that case, there is probably an analytical way to describe the behavior, but it depends on a lot of factors including distributions of competing traffic. There are a number of other possibilities; imagine that you drop a packet, there is a sack, you retransmit it, the ack is lost, and meanwhile there is another loss. You could easily retransmit the retransmission unnecessarily, which reduces goodput. The list of silly possibilities goes on for a while, and we have to assume that each has some probability of happening in the wild. On May 5, 2011, at 9:01 AM, Jim Gettys wrote: > On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: >> I'm curious, has anyone done some simulations to check if the following qualitative statement holds true, and if, what the quantitative effect is: >> >> With bufferbloat, the TCP congestion control reaction is unduely delayed. When it finally happens, the tcp stream is likely facing a "burst loss" event - multiple consecutive packets get dropped. Worse yet, the sender with the lowest RTT across the bottleneck will likely start to retransmit while the (tail-drop) queue is still overflowing. >> >> And a lost retransmission means a major setback in bandwidth (except for Linux with bulk transfers and SACK enabled), as the standard (RFC documented) behaviour asks for a RTO (1sec nominally, 200-500 ms typically) to recover such a lost retransmission... >> >> The second part (more important as an incentive to the ISPs actually), how does the fraction of goodput vs. throughput change, when AQM schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs have to pay for their upstream volume, regardless if that is "real" work (goodput) or unneccessary retransmissions. >> >> When I was at a small cable ISP in switzerland last week, surely enough bufferbloat was readily observable (17ms -> 220ms after 30 sec of a bulk transfer), but at first they had the "not our problem" view, until I started discussing burst loss / retransmissions / goodput vs throughput - with the latest point being a real commercial incentive to them. (They promised to check if AQM would be available in the CPE / CMTS, and put latency bounds in their tenders going forward). >> > I wish I had a good answer to your very good questions. Simulation would be interesting though real daa is more convincing. > > I haven't looked in detail at all that many traces to try to get a feel for how much bandwidth waste there actually is, and more formal studies like Netalyzr, SamKnows, or the Bismark project would be needed to quantify the loss on the network as a whole. > > I did spend some time last fall with the traces I've taken. In those, I've typically been seeing 1-3% packet loss in the main TCP transfers. On the wireless trace I took, I saw 9% loss, but whether that is bufferbloat induced loss or not, I don't know (the data is out there for those who might want to dig). And as you note, the losses are concentrated in bursts (probably due to the details of Cubic, so I'm told). > > I've had anecdotal reports (and some first hand experience) with much higher loss rates, for example from Nick Weaver at ICSI; but I believe in playing things conservatively with any numbers I quote and I've not gotten consistent results when I've tried, so I just report what's in the packet captures I did take. > > A phenomena that could be occurring is that during congestion avoidance (until TCP loses its cookies entirely and probes for a higher operating point) that TCP is carefully timing it's packets to keep the buffers almost exactly full, so that competing flows (in my case, simple pings) are likely to arrive just when there is no buffer space to accept them and therefore you see higher losses on them than you would on the single flow I've been tracing and getting loss statistics from. > > People who want to look into this further would be a great help. > - Jim > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-06 4:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Fred Baker @ 2011-05-06 15:14 ` richard 2011-05-06 21:56 ` Fred Baker 2011-05-08 12:53 ` Richard Scheffenegger 2011-05-08 12:34 ` Richard Scheffenegger 1 sibling, 2 replies; 66+ messages in thread From: richard @ 2011-05-06 15:14 UTC (permalink / raw) To: Fred Baker; +Cc: bloat I'm wondering if we should look at the ratio of throughput to goodput instead of the absolute numbers. Yes, the goodput will be 100% but at what cost in actual throughput? And at what cost in total bandwidth? If every packet takes two attempts then the ratio will be 1/2 - 1 unit of googput for two units of throughput (at least up to the choke-point). This is worst-case, so the ratio is likely to be something better than that 3/4, 5/6, 99/100 ??? Hmmm... maybe inverting the ratio and calling it something flashy (the bloaty rating???) might give us a lever in the media and with ISPs that is easier for the math challenged to understand. Higher is worse. Putting a number to this will also help those of us trying to get ISPs to understand that their Usage Based Bilking (UBB) won't address the real problem which is hidden in this ratio. The fact is, the choke point for much of this is the home router/firewall - and so that 1/2 ratio tells me the consumer is getting hosed for a technical problem. richard On Thu, 2011-05-05 at 21:18 -0700, Fred Baker wrote: > There are a couple of ways to approach this, and they depend on your network model. > > In general, if you assume that there is one bottleneck, losses occur in the queue at the bottleneck, > and are each retransmitted exactly once (not necessary, but helps), goodput should approximate 100% > regardless of the queue depth. Why? Because every packet transits the bottleneck once - if it is > dropped at the bottleneck, the retransmission transits the bottleneck. So you are using exactly > the capacity of the bottleneck. > > the value of a shallow queue is to reduce RTT, not to increase or decrease goodput. cwnd can become > too small, however; if it is possible to set cwnd to N without increasing queuing delay, and cwnd is > less than N, you're not maximizing throughput. When cwnd grows above N, it merely increases queuing > delay, and therefore bufferbloat. > > If there are two bottlenecks in series, you have some probability that a packet transits one > bottleneck and doesn't transit the other. In that case, there is probably an analytical way > to describe the behavior, but it depends on a lot of factors including distributions of competing > traffic. There are a number of other possibilities; imagine that you drop a packet, there is a > sack, you retransmit it, the ack is lost, and meanwhile there is another loss. You could easily > retransmit the retransmission unnecessarily, which reduces goodput. The list of silly possibilities > goes on for a while, and we have to assume that each has some probability of happening in the wild. > snip... richard -- Richard C. Pitt Pacific Data Capture rcpitt@pacdat.net 604-644-9265 http://digital-rag.com www.pacdat.net PGP Fingerprint: FCEF 167D 151B 64C4 3333 57F0 4F18 AF98 9F59 DD73 ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-06 15:14 ` richard @ 2011-05-06 21:56 ` Fred Baker 2011-05-06 22:10 ` Stephen Hemminger 2011-05-08 13:00 ` Richard Scheffenegger 2011-05-08 12:53 ` Richard Scheffenegger 1 sibling, 2 replies; 66+ messages in thread From: Fred Baker @ 2011-05-06 21:56 UTC (permalink / raw) To: richard; +Cc: bloat On May 6, 2011, at 8:14 AM, richard wrote: > If every packet takes two attempts then the ratio will be 1/2 - 1 unit > of googput for two units of throughput (at least up to the choke-point). > This is worst-case, so the ratio is likely to be something better than > that 3/4, 5/6, 99/100 ??? I have a suggestion. turn on tcpdump on your laptop. Download a web page with lots of imagines, such as a google images web page, and then download a humongous file. Scan through the output file for SACK messages; that will give you the places where the receiver (you) saw losses and tried to recover from them. > Putting a number to this will also help those of us trying to get ISPs > to understand that their Usage Based Bilking (UBB) won't address the > real problem which is hidden in this ratio. The fact is, the choke point > for much of this is the home router/firewall - and so that 1/2 ratio > tells me the consumer is getting hosed for a technical problem. I think you need to do some research there. A TCP session with 1% loss (your ratio being 1/100) has difficulty maintaining throughput; usual TCP loss rates are on the order of tenths to hundredths of a percent. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-06 21:56 ` Fred Baker @ 2011-05-06 22:10 ` Stephen Hemminger 2011-05-07 16:39 ` Jonathan Morton 2011-05-08 13:00 ` Richard Scheffenegger 1 sibling, 1 reply; 66+ messages in thread From: Stephen Hemminger @ 2011-05-06 22:10 UTC (permalink / raw) To: Fred Baker; +Cc: bloat On Fri, 6 May 2011 14:56:01 -0700 Fred Baker <fred@cisco.com> wrote: > > On May 6, 2011, at 8:14 AM, richard wrote: > > If every packet takes two attempts then the ratio will be 1/2 - 1 unit > > of googput for two units of throughput (at least up to the choke-point). > > This is worst-case, so the ratio is likely to be something better than > > that 3/4, 5/6, 99/100 ??? > > I have a suggestion. turn on tcpdump on your laptop. Download a web page with lots of imagines, such as a google images web page, and then download a humongous file. Scan through the output file for SACK messages; that will give you the places where the receiver (you) saw losses and tried to recover from them. > > > Putting a number to this will also help those of us trying to get ISPs > > to understand that their Usage Based Bilking (UBB) won't address the > > real problem which is hidden in this ratio. The fact is, the choke point > > for much of this is the home router/firewall - and so that 1/2 ratio > > tells me the consumer is getting hosed for a technical problem. > > I think you need to do some research there. A TCP session with 1% loss (your ratio being 1/100) has difficulty maintaining throughput; usual TCP loss rates are on the order of tenths to hundredths of a percent. There is some good theoretical work which shows relationship between throughput and loss. http://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html Rate <= (MSS/RTT)*(1 / sqrt{p}) where: Rate: is the TCP transfer rate or throughputd MSS: is the maximum segment size (fixed for each Internet path, typically 1460 bytes) RTT: is the round trip time (as measured by TCP) p: is the packet loss rate. It is interesting that longer RTT which can be an artifact of bloat in the queues, will hurt throughput in this case. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-06 22:10 ` Stephen Hemminger @ 2011-05-07 16:39 ` Jonathan Morton 2011-05-08 0:15 ` Stephen Hemminger 0 siblings, 1 reply; 66+ messages in thread From: Jonathan Morton @ 2011-05-07 16:39 UTC (permalink / raw) To: Stephen Hemminger; +Cc: bloat On 7 May, 2011, at 1:10 am, Stephen Hemminger wrote: > Rate <= (MSS/RTT)*(1 / sqrt{p}) > > where: > Rate: is the TCP transfer rate or throughputd > MSS: is the maximum segment size (fixed for each Internet path, typically 1460 bytes) > RTT: is the round trip time (as measured by TCP) > p: is the packet loss rate. So if the loss rate is 1.0 (100%), the throughput is MSS/RTT. If the loss rate is 0, the throughput goes to infinity. That doesn't seem right to me. - Jonathan ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-07 16:39 ` Jonathan Morton @ 2011-05-08 0:15 ` Stephen Hemminger 2011-05-08 3:04 ` Constantine Dovrolis 0 siblings, 1 reply; 66+ messages in thread From: Stephen Hemminger @ 2011-05-08 0:15 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat On Sat, 7 May 2011 19:39:22 +0300 Jonathan Morton <chromatix99@gmail.com> wrote: > > On 7 May, 2011, at 1:10 am, Stephen Hemminger wrote: > > > Rate <= (MSS/RTT)*(1 / sqrt{p}) > > > > where: > > Rate: is the TCP transfer rate or throughputd > > MSS: is the maximum segment size (fixed for each Internet path, typically 1460 bytes) > > RTT: is the round trip time (as measured by TCP) > > p: is the packet loss rate. > > So if the loss rate is 1.0 (100%), the throughput is MSS/RTT. If the loss rate is 0, the throughput goes to infinity. That doesn't seem right to me. If loss rate is 0 there is no upper bound on TCP due to loss. There are other limits on TCP throughput like window size but not limits because of loss. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-08 0:15 ` Stephen Hemminger @ 2011-05-08 3:04 ` Constantine Dovrolis 0 siblings, 0 replies; 66+ messages in thread From: Constantine Dovrolis @ 2011-05-08 3:04 UTC (permalink / raw) To: Stephen Hemminger; +Cc: bloat Hi, I suggest you look at the following paper for a more general version of this formula (equation 3), which includes the effect of limited capacity and/or limited receive-window: http://www.cc.gatech.edu/fac/Constantinos.Dovrolis/Papers/f235-he.pdf The paper also discusses common mistakes when this formula is used to predict the throughput of a TCP connection - the basic idea is that we cannot use the loss rate *before* the start of a TCP connection to predict what its throughput will be. A large TCP connection that is not limited by its receive-window can of course cause an increase in the loss rate of the path that it traverses (see sections 3.2 - 3.4) regards Constantine On 5/7/2011 8:15 PM, Stephen Hemminger wrote: > On Sat, 7 May 2011 19:39:22 +0300 > Jonathan Morton<chromatix99@gmail.com> wrote: > >> >> On 7 May, 2011, at 1:10 am, Stephen Hemminger wrote: >> >>> Rate<= (MSS/RTT)*(1 / sqrt{p}) >>> >>> where: >>> Rate: is the TCP transfer rate or throughputd >>> MSS: is the maximum segment size (fixed for each Internet path, typically 1460 bytes) >>> RTT: is the round trip time (as measured by TCP) >>> p: is the packet loss rate. >> >> So if the loss rate is 1.0 (100%), the throughput is MSS/RTT. If the loss rate is 0, the throughput goes to infinity. That doesn't seem right to me. > > If loss rate is 0 there is no upper bound on TCP due to loss. > There are other limits on TCP throughput like window size but not limits > because of loss. > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat -- Constantine -------------------------------------------------------------- Constantine Dovrolis, Associate Professor College of Computing, Georgia Institute of Technology 3346 KACB, 404-385-4205, dovrolis@cc.gatech.edu http://www.cc.gatech.edu/~dovrolis/ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-06 21:56 ` Fred Baker 2011-05-06 22:10 ` Stephen Hemminger @ 2011-05-08 13:00 ` Richard Scheffenegger 1 sibling, 0 replies; 66+ messages in thread From: Richard Scheffenegger @ 2011-05-08 13:00 UTC (permalink / raw) To: Fred Baker, richard; +Cc: bloat Note that this will only give you a lower bound; the true losses that were addressed by the sender (ie. RTO retransmissions that got lost again) can by principle not be discovered by a receiver side trace, only a (reliable) sender side trace will allow that. To the second point: Only for simple Reno/NewReno there exists a closed formular for estimating throughput based on random, non-markow distributed losses; and more modern congestion control / loss recovery scheme will permit (more or less slightly) higher thoughput, thus the formulas (ie. RFC 3448 states the one for Reno) will only serve as a (good) lower bound estimate. Again, increasing throughput at the cost of goodput is a bad proposition, if you get charged by traffic volume (because what you really want is data delivered to the receiver, not dumped into the network for no good reason). Regards, Richard ----- Original Message ----- From: "Fred Baker" <fred@cisco.com> To: "richard" <richard@pacdat.net> Cc: <bloat@lists.bufferbloat.net> Sent: Friday, May 06, 2011 11:56 PM Subject: Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat > > On May 6, 2011, at 8:14 AM, richard wrote: >> If every packet takes two attempts then the ratio will be 1/2 - 1 unit >> of googput for two units of throughput (at least up to the choke-point). >> This is worst-case, so the ratio is likely to be something better than >> that 3/4, 5/6, 99/100 ??? > > I have a suggestion. turn on tcpdump on your laptop. Download a web page > with lots of imagines, such as a google images web page, and then download > a humongous file. Scan through the output file for SACK messages; that > will give you the places where the receiver (you) saw losses and tried to > recover from them. > >> Putting a number to this will also help those of us trying to get ISPs >> to understand that their Usage Based Bilking (UBB) won't address the >> real problem which is hidden in this ratio. The fact is, the choke point >> for much of this is the home router/firewall - and so that 1/2 ratio >> tells me the consumer is getting hosed for a technical problem. > > I think you need to do some research there. A TCP session with 1% loss > (your ratio being 1/100) has difficulty maintaining throughput; usual TCP > loss rates are on the order of tenths to hundredths of a percent. > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-06 15:14 ` richard 2011-05-06 21:56 ` Fred Baker @ 2011-05-08 12:53 ` Richard Scheffenegger 1 sibling, 0 replies; 66+ messages in thread From: Richard Scheffenegger @ 2011-05-08 12:53 UTC (permalink / raw) To: richard, Fred Baker; +Cc: bloat I think a definition of terms would be in order. For me: goodput: number of bytes delivered at the receiver to the next upper layer application, per unit of time throughput: number of bytes send by the sender, into the network, per unit of time Thus goodput can be a ratio (delivered bytes on the receiving application vs. data bytes sent by the sender's TCP), but by definition, only a completely loss-less, in-order stream of segments can ever hope of achiving that; any instance of fast recovery, retransmission timeout etc, and the goodput fraction will always be (much) less than 100%. (However, fringe effects like ssthresh reset for idle connections won't influence that fraction at all, but may lower the absolute values). Charging for volume without considering the goodput fraction, is like overpaying - if the publing would work properly, you (end customer, small/medium ISP) would get charged for the real work you demanded of the network (data bytes delivered to a receiving application). Since the plumbing is broken, you get charged for the brokenness also (because only absolut data volume is counted), giving less than zero incentive to those who could fix the plumbing to do it. Exposing this brokenness is one of the nice properties of CONEX - upstream ISPs can be graded by the congestion they cause (or are willing to tolerate), and customers are empowered to make a concious choice to use an ISP which may be charge more (say 2%) per volume of data, but where the goodput fraction is at least a similar percentage points better... I.e. by properly tuning their AQM schemes. Best regards, Richard ----- Original Message ----- From: "richard" <richard@pacdat.net> To: "Fred Baker" <fredbakersba@gmail.com> Cc: <bloat@lists.bufferbloat.net> Sent: Friday, May 06, 2011 5:14 PM Subject: Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat > I'm wondering if we should look at the ratio of throughput to goodput > instead of the absolute numbers. > > Yes, the goodput will be 100% but at what cost in actual throughput? And > at what cost in total bandwidth? > > If every packet takes two attempts then the ratio will be 1/2 - 1 unit > of googput for two units of throughput (at least up to the choke-point). > This is worst-case, so the ratio is likely to be something better than > that 3/4, 5/6, 99/100 ??? > > Hmmm... maybe inverting the ratio and calling it something flashy (the > bloaty rating???) might give us a lever in the media and with ISPs that > is easier for the math challenged to understand. Higher is worse. > > Putting a number to this will also help those of us trying to get ISPs > to understand that their Usage Based Bilking (UBB) won't address the > real problem which is hidden in this ratio. The fact is, the choke point > for much of this is the home router/firewall - and so that 1/2 ratio > tells me the consumer is getting hosed for a technical problem. > > richard > > On Thu, 2011-05-05 at 21:18 -0700, Fred Baker wrote: >> There are a couple of ways to approach this, and they depend on your >> network model. >> >> In general, if you assume that there is one bottleneck, losses occur in >> the queue at the bottleneck, >> and are each retransmitted exactly once (not necessary, but helps), >> goodput should approximate 100% >> regardless of the queue depth. Why? Because every packet transits the >> bottleneck once - if it is >> dropped at the bottleneck, the retransmission transits the bottleneck. So >> you are using exactly >> the capacity of the bottleneck. >> >> the value of a shallow queue is to reduce RTT, not to increase or >> decrease goodput. cwnd can become >> too small, however; if it is possible to set cwnd to N without increasing >> queuing delay, and cwnd is >> less than N, you're not maximizing throughput. When cwnd grows above N, >> it merely increases queuing >> delay, and therefore bufferbloat. >> >> If there are two bottlenecks in series, you have some probability that a >> packet transits one >> bottleneck and doesn't transit the other. In that case, there is probably >> an analytical way >> to describe the behavior, but it depends on a lot of factors including >> distributions of competing >> traffic. There are a number of other possibilities; imagine that you >> drop a packet, there is a >> sack, you retransmit it, the ack is lost, and meanwhile there is another >> loss. You could easily >> retransmit the retransmission unnecessarily, which reduces goodput. The >> list of silly possibilities >> goes on for a while, and we have to assume that each has some >> probability of happening in the wild. >> > snip... > > richard > > -- > Richard C. Pitt Pacific Data Capture > rcpitt@pacdat.net 604-644-9265 > http://digital-rag.com www.pacdat.net > PGP Fingerprint: FCEF 167D 151B 64C4 3333 57F0 4F18 AF98 9F59 DD73 > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-06 4:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Fred Baker 2011-05-06 15:14 ` richard @ 2011-05-08 12:34 ` Richard Scheffenegger 2011-05-09 3:07 ` Fred Baker 1 sibling, 1 reply; 66+ messages in thread From: Richard Scheffenegger @ 2011-05-08 12:34 UTC (permalink / raw) To: Fred Baker, Jim Gettys; +Cc: bloat Hi Fred, Goodput can really only be measured at the sender; by definition, any retransmitted packet will reduce goodput vs throughput; In your example, where each segment is retransmitted once, goodput would be - at most - 0.5, not 1.0... IMHO defining the data volume after the bottleneck by itself as goodput is also a bit short-sighted, because a good fraction of that data may still be discarded by TCP for numerous reasons, ultimately (ie, legacy go-back-n RTO recovery by the sender)... Measuring at the receiver (or in-path network) side, on a SACK enabled session, will miss all the instances where the last (or a number of segments running up to and including the last) segment was lost, or where a retransmitted segment was lost twice. The former can be approximated by checking the RTOs (which would require already some heuristic to come up with a good approximation of what the sender's RTO timeout is likely to be - the IETF RFC 1sec prescribed minRTO is virtually never used). The latter, where retransmitted segments are also lost, you can only infer indirectly about the senders behavior from a receiver-side (or in-path ) trace, again because lost retransmission detection is done by one stack (Linux), but not by the others, and RTOs can again not be evaded under all circumstances. But back to my original question: When looking at modern TCP stacks, with TSO, if the bufferbloat allows the senders cwnd to grow beyond thresholds which allow the aggressive use of TSO (64kB or even 256kB of data allowed in the senders cwnd), the effective sending rate of such a burst will be wirespeed (no interleaving segments of other sessions). As pointed out in other mails to this thread, if the bottleneck has then 1/10th the capacity of the senders wire (and is potentially shared among multiple senders), at least 90% of all the sent data of such a TSO segment train will be dropped in a single burst of loss... With proper AQM, and some (single segment) loss earlier, cwnd may never grow to trigger TSO in that way, and the goodput (1 segment out of 64kB data, vs. 58kB out of 64kB data) is obviously shifted extremely to the scenario with AQM... So, qualitatively, a ISP with proper AQM should be able to have a better Goodput (downloads from upstream or uploads to upstream ISP); However, pricing is typically done on data volume exchanged - if goodput is lower, an inverse number of higher volume is necessary, to achive the same "real" data exchange. However, the next question becomes, how to quanitfy this on large scale - if the monetary difference is, say, in the vicinity of 2-3% saved (average internet loss ratio), that accumulates to huge sums for small / medium ISPs (which get charged more per volume than large ISPs). If the quantitative difference is only 0,02-0,05%, say, than the incentive of enabling AQMs in small ISPs is not really there in monetary terms (and these ISPs would have to be motivated by other, typically much less strong incentives). Best regards, Richard ----- Original Message ----- From: "Fred Baker" <fredbakersba@gmail.com> To: "Jim Gettys" <jg@freedesktop.org> Cc: <bloat@lists.bufferbloat.net> Sent: Friday, May 06, 2011 6:18 AM Subject: Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat > There are a couple of ways to approach this, and they depend on your > network model. > > In general, if you assume that there is one bottleneck, losses occur in > the queue at the bottleneck, and are each retransmitted exactly once (not > necessary, but helps), goodput should approximate 100% regardless of the > queue depth. Why? Because every packet transits the bottleneck once - if > it is dropped at the bottleneck, the retransmission transits the > bottleneck. So you are using exactly the capacity of the bottleneck. > > the value of a shallow queue is to reduce RTT, not to increase or decrease > goodput. cwnd can become too small, however; if it is possible to set cwnd > to N without increasing queuing delay, and cwnd is less than N, you're not > maximizing throughput. When cwnd grows above N, it merely increases > queuing delay, and therefore bufferbloat. > > If there are two bottlenecks in series, you have some probability that a > packet transits one bottleneck and doesn't transit the other. In that > case, there is probably an analytical way to describe the behavior, but it > depends on a lot of factors including distributions of competing traffic. > There are a number of other possibilities; imagine that you drop a packet, > there is a sack, you retransmit it, the ack is lost, and meanwhile there > is another loss. You could easily retransmit the retransmission > unnecessarily, which reduces goodput. The list of silly possibilities goes > on for a while, and we have to assume that each has some probability of > happening in the wild. > > > > On May 5, 2011, at 9:01 AM, Jim Gettys wrote: > >> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: >>> I'm curious, has anyone done some simulations to check if the following >>> qualitative statement holds true, and if, what the quantitative effect >>> is: >>> >>> With bufferbloat, the TCP congestion control reaction is unduely >>> delayed. When it finally happens, the tcp stream is likely facing a >>> "burst loss" event - multiple consecutive packets get dropped. Worse >>> yet, the sender with the lowest RTT across the bottleneck will likely >>> start to retransmit while the (tail-drop) queue is still overflowing. >>> >>> And a lost retransmission means a major setback in bandwidth (except for >>> Linux with bulk transfers and SACK enabled), as the standard (RFC >>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms >>> typically) to recover such a lost retransmission... >>> >>> The second part (more important as an incentive to the ISPs actually), >>> how does the fraction of goodput vs. throughput change, when AQM schemes >>> are deployed, and TCP CC reacts in a timely manner? Small ISPs have to >>> pay for their upstream volume, regardless if that is "real" work >>> (goodput) or unneccessary retransmissions. >>> >>> When I was at a small cable ISP in switzerland last week, surely enough >>> bufferbloat was readily observable (17ms -> 220ms after 30 sec of a bulk >>> transfer), but at first they had the "not our problem" view, until I >>> started discussing burst loss / retransmissions / goodput vs >>> throughput - with the latest point being a real commercial incentive to >>> them. (They promised to check if AQM would be available in the CPE / >>> CMTS, and put latency bounds in their tenders going forward). >>> >> I wish I had a good answer to your very good questions. Simulation would >> be interesting though real daa is more convincing. >> >> I haven't looked in detail at all that many traces to try to get a feel >> for how much bandwidth waste there actually is, and more formal studies >> like Netalyzr, SamKnows, or the Bismark project would be needed to >> quantify the loss on the network as a whole. >> >> I did spend some time last fall with the traces I've taken. In those, >> I've typically been seeing 1-3% packet loss in the main TCP transfers. >> On the wireless trace I took, I saw 9% loss, but whether that is >> bufferbloat induced loss or not, I don't know (the data is out there for >> those who might want to dig). And as you note, the losses are >> concentrated in bursts (probably due to the details of Cubic, so I'm >> told). >> >> I've had anecdotal reports (and some first hand experience) with much >> higher loss rates, for example from Nick Weaver at ICSI; but I believe in >> playing things conservatively with any numbers I quote and I've not >> gotten consistent results when I've tried, so I just report what's in the >> packet captures I did take. >> >> A phenomena that could be occurring is that during congestion avoidance >> (until TCP loses its cookies entirely and probes for a higher operating >> point) that TCP is carefully timing it's packets to keep the buffers >> almost exactly full, so that competing flows (in my case, simple pings) >> are likely to arrive just when there is no buffer space to accept them >> and therefore you see higher losses on them than you would on the single >> flow I've been tracing and getting loss statistics from. >> >> People who want to look into this further would be a great help. >> - Jim >> >> >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-08 12:34 ` Richard Scheffenegger @ 2011-05-09 3:07 ` Fred Baker 0 siblings, 0 replies; 66+ messages in thread From: Fred Baker @ 2011-05-09 3:07 UTC (permalink / raw) To: Richard Scheffenegger; +Cc: bloat On May 8, 2011, at 5:34 AM, Richard Scheffenegger wrote: > Goodput can really only be measured at the sender; by definition, any retransmitted packet will reduce goodput vs throughput; In your example, where each segment is retransmitted once, goodput would be - at most - 0.5, not 1.0... IMHO defining the data volume after the bottleneck by itself as goodput is also a bit short-sighted, because a good fraction of that data may still be discarded by TCP for numerous reasons, ultimately (ie, legacy go-back-n RTO recovery by the sender)... Actually, I didn't say that every packet was retransmitted once. I said that every dropped packet was retransmitted once. And Goodput will never exceed the bit rate of the bottleneck in the path, apart from compression (which in effect applies a multiplier to the bottleneck bandwidth). > But back to my original question: When looking at modern TCP stacks, with TSO, if the bufferbloat allows the senders cwnd to grow beyond thresholds which allow the aggressive use of TSO (64kB or even 256kB of data allowed in the senders cwnd), the effective sending rate of such a burst will be wirespeed (no interleaving segments of other sessions). As pointed out in other mails to this thread, if the bottleneck has then 1/10th the capacity of the senders wire (and is potentially shared among multiple senders), at least 90% of all the sent data of such a TSO segment train will be dropped in a single burst of loss... With proper AQM, and some (single segment) loss earlier, cwnd may never grow to trigger TSO in that way, and the goodput (1 segment out of 64kB data, vs. 58kB out of 64kB data) is obviously shifted extremely to the scenario with AQM... Again, possibly, but not necessarily. If we have a constrained queue and are using tail drop, it is possible for a single burst sent to a full queue to be entirely lost. The question is, in the course of a file transfer, how many packets are lost. Before you make sweeping statements, I would strongly suggest that you mock up the situation and take a tcpdump. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers
@ 2011-05-16 18:40 Richard Scheffenegger
0 siblings, 0 replies; 66+ messages in thread
From: Richard Scheffenegger @ 2011-05-16 18:40 UTC (permalink / raw)
To: Kevin Gross, bloat
Also found this:
http://www.stanford.edu/~balaji/papers/QCN.pdf
Jim, you may notice that the congestion feedback probability function looks
just like the basic RED marking function :)
Regards,
Richard
----- Original Message -----
From: "Richard Scheffenegger" <rscheff@gmx.at>
To: "Kevin Gross" <kevin.gross@avanw.com>; <bloat@lists.bufferbloat.net>
Sent: Monday, May 16, 2011 8:36 PM
Subject: Re: [Bloat] Jumbo frames and LAN buffers
> Kevin,
>
>> My understanding is that 802.1au, "lossless Ethernet", was designed
>> primarily to allow Fibre Channel to be carried over 10 GbE so that SAN
>> and
>> LAN can share a common infrastructure in datacenters. I don't believe
>> anyone
>> intends for it to be enabled for traffic classes carrying TCP.
>
> Well, QCN requires a L2 MAC sender, network and receiver cooperation (thus
> you need fancy "CNA" converged network adapters, to start using it - these
> would be reaction/reflection points; plus the congestion points -
> switches - would need HW support too; nothing one can buy today;
> higher-grade (carrier?) switches may have the reaction/reflection points
> built into them, and could use legacy 802.3x signalling outside the
> 802.1Qau cloud).
>
> The following may be too simplistic
>
> Once the hardware has a reaction point support, it classifies traffic, and
> calculates the per flow congestion of the path (with flow really being the
> classification rules by the sender), the intermediates / receiver sample
> the flow and return the congestion back to the sender - and within the
> sender, a token bucket-like rate limiter will adjust the sending rate of
> the appropriate flow(s) to adjust to the observed network conditions.
>
> http://www.stanford.edu/~balaji/presentations/au-prabhakar-qcn-description.pdf
> http://www.ieee802.org/1/files/public/docs2007/au-pan-qcn-details-053007.pdf
>
> The congestion control loop has a lot of similarities to TCP CC as you
> will note...
>
> Also, I haven't found out how fine-grained the classification is supposed
> to be (per L2 address pair? Group of flows? Which hashing then to use for
> mapping L2 flows into those groups between reaction/congestion/reflection
> points...).
>
>
> Anyway, for the here and now, this is pretty much esoteric stuff not
> relevant in this context :)
>
> Best regards,
> Richard
>
> ----- Original Message -----
> From: "Kevin Gross" <kevin.gross@avanw.com>
> To: <bloat@lists.bufferbloat.net>
> Sent: Monday, May 16, 2011 3:15 PM
> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>
>
>> All the stand-alone switches I've looked at recently either do not
>> support
>> 802.3x or support it in the (desireable) manner described in the last
>> paragraph of the linked blog post. I don't believe Ethernet flow control
>> is
>> a factor in current LANs. I'd be interested to know the specifics if
>> anyone
>> sees it differently.
>>
>> My understanding is that 802.1au, "lossless Ethernet", was designed
>> primarily to allow Fibre Channel to be carried over 10 GbE so that SAN
>> and
>> LAN can share a common infrastructure in datacenters. I don't believe
>> anyone
>> intends for it to be enabled for traffic classes carrying TCP.
>>
>> Kevin Gross
>>
>> -----Original Message-----
>> From: bloat-bounces@lists.bufferbloat.net
>> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys
>> Sent: Monday, May 16, 2011 5:24 AM
>> To: bloat@lists.bufferbloat.net
>> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>>
>> Not necessarily out of knowledge or desire (since it isn't usually
>> controllable in the small switches you buy for home). It can cause
>> trouble even in small environments as your house.
>>
>> http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
>>
>> I know I'm at least three consumer switches deep, and it's not by choice.
>> - Jim
>>
>>
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>>
>
^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2011-05-17 14:06 UTC | newest] Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-04-26 17:05 [Bloat] Network computing article on bloat Dave Taht 2011-04-26 18:13 ` Dave Hart 2011-04-26 18:17 ` Dave Taht 2011-04-26 18:28 ` dave greenfield 2011-04-26 18:32 ` Wesley Eddy 2011-04-26 19:37 ` Dave Taht 2011-04-26 20:21 ` Wesley Eddy 2011-04-26 20:30 ` Constantine Dovrolis 2011-04-26 21:16 ` Dave Taht 2011-04-27 17:10 ` Bill Sommerfeld 2011-04-27 17:40 ` Wesley Eddy 2011-04-27 7:43 ` Jonathan Morton 2011-04-30 15:56 ` Henrique de Moraes Holschuh 2011-04-30 19:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger 2011-05-05 16:01 ` Jim Gettys 2011-05-05 16:10 ` Stephen Hemminger 2011-05-05 16:30 ` Jim Gettys 2011-05-05 16:49 ` [Bloat] Burst Loss Neil Davies 2011-05-05 18:34 ` Jim Gettys 2011-05-06 11:40 ` Sam Stickland 2011-05-06 11:53 ` Neil Davies 2011-05-08 12:42 ` Richard Scheffenegger 2011-05-09 18:06 ` Rick Jones 2011-05-11 8:53 ` Richard Scheffenegger 2011-05-11 9:53 ` Eric Dumazet 2011-05-12 14:16 ` [Bloat] Publications Richard Scheffenegger 2011-05-12 16:31 ` [Bloat] Burst Loss Fred Baker 2011-05-12 16:41 ` Rick Jones 2011-05-12 17:11 ` Fred Baker 2011-05-13 5:00 ` Kevin Gross 2011-05-13 14:35 ` Rick Jones 2011-05-13 14:54 ` Dave Taht 2011-05-13 20:03 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross 2011-05-14 20:48 ` Fred Baker 2011-05-15 18:28 ` Jonathan Morton 2011-05-15 20:49 ` Fred Baker 2011-05-16 0:31 ` Jonathan Morton 2011-05-16 7:51 ` Richard Scheffenegger 2011-05-16 9:49 ` Fred Baker 2011-05-16 11:23 ` [Bloat] Jumbo frames and LAN buffers Jim Gettys 2011-05-16 13:15 ` Kevin Gross 2011-05-16 13:22 ` Jim Gettys 2011-05-16 13:42 ` Kevin Gross 2011-05-16 15:23 ` Jim Gettys [not found] ` <-854731558634984958@unknownmsgid> 2011-05-16 13:45 ` Dave Taht 2011-05-16 18:36 ` Richard Scheffenegger 2011-05-16 18:11 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Richard Scheffenegger 2011-05-17 7:49 ` BeckW 2011-05-17 14:16 ` Dave Taht [not found] ` <-4629065256951087821@unknownmsgid> 2011-05-13 20:21 ` Dave Taht 2011-05-13 22:36 ` Kevin Gross 2011-05-13 22:08 ` [Bloat] Burst Loss david 2011-05-13 19:32 ` Denton Gentry 2011-05-13 20:47 ` Rick Jones 2011-05-06 4:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Fred Baker 2011-05-06 15:14 ` richard 2011-05-06 21:56 ` Fred Baker 2011-05-06 22:10 ` Stephen Hemminger 2011-05-07 16:39 ` Jonathan Morton 2011-05-08 0:15 ` Stephen Hemminger 2011-05-08 3:04 ` Constantine Dovrolis 2011-05-08 13:00 ` Richard Scheffenegger 2011-05-08 12:53 ` Richard Scheffenegger 2011-05-08 12:34 ` Richard Scheffenegger 2011-05-09 3:07 ` Fred Baker 2011-05-16 18:40 [Bloat] Jumbo frames and LAN buffers Richard Scheffenegger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox