* Re: [Bloat] Jumbo frames and LAN buffers
@ 2011-05-16 18:40 Richard Scheffenegger
0 siblings, 0 replies; 8+ messages in thread
From: Richard Scheffenegger @ 2011-05-16 18:40 UTC (permalink / raw)
To: Kevin Gross, bloat
Also found this:
http://www.stanford.edu/~balaji/papers/QCN.pdf
Jim, you may notice that the congestion feedback probability function looks
just like the basic RED marking function :)
Regards,
Richard
----- Original Message -----
From: "Richard Scheffenegger" <rscheff@gmx.at>
To: "Kevin Gross" <kevin.gross@avanw.com>; <bloat@lists.bufferbloat.net>
Sent: Monday, May 16, 2011 8:36 PM
Subject: Re: [Bloat] Jumbo frames and LAN buffers
> Kevin,
>
>> My understanding is that 802.1au, "lossless Ethernet", was designed
>> primarily to allow Fibre Channel to be carried over 10 GbE so that SAN
>> and
>> LAN can share a common infrastructure in datacenters. I don't believe
>> anyone
>> intends for it to be enabled for traffic classes carrying TCP.
>
> Well, QCN requires a L2 MAC sender, network and receiver cooperation (thus
> you need fancy "CNA" converged network adapters, to start using it - these
> would be reaction/reflection points; plus the congestion points -
> switches - would need HW support too; nothing one can buy today;
> higher-grade (carrier?) switches may have the reaction/reflection points
> built into them, and could use legacy 802.3x signalling outside the
> 802.1Qau cloud).
>
> The following may be too simplistic
>
> Once the hardware has a reaction point support, it classifies traffic, and
> calculates the per flow congestion of the path (with flow really being the
> classification rules by the sender), the intermediates / receiver sample
> the flow and return the congestion back to the sender - and within the
> sender, a token bucket-like rate limiter will adjust the sending rate of
> the appropriate flow(s) to adjust to the observed network conditions.
>
> http://www.stanford.edu/~balaji/presentations/au-prabhakar-qcn-description.pdf
> http://www.ieee802.org/1/files/public/docs2007/au-pan-qcn-details-053007.pdf
>
> The congestion control loop has a lot of similarities to TCP CC as you
> will note...
>
> Also, I haven't found out how fine-grained the classification is supposed
> to be (per L2 address pair? Group of flows? Which hashing then to use for
> mapping L2 flows into those groups between reaction/congestion/reflection
> points...).
>
>
> Anyway, for the here and now, this is pretty much esoteric stuff not
> relevant in this context :)
>
> Best regards,
> Richard
>
> ----- Original Message -----
> From: "Kevin Gross" <kevin.gross@avanw.com>
> To: <bloat@lists.bufferbloat.net>
> Sent: Monday, May 16, 2011 3:15 PM
> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>
>
>> All the stand-alone switches I've looked at recently either do not
>> support
>> 802.3x or support it in the (desireable) manner described in the last
>> paragraph of the linked blog post. I don't believe Ethernet flow control
>> is
>> a factor in current LANs. I'd be interested to know the specifics if
>> anyone
>> sees it differently.
>>
>> My understanding is that 802.1au, "lossless Ethernet", was designed
>> primarily to allow Fibre Channel to be carried over 10 GbE so that SAN
>> and
>> LAN can share a common infrastructure in datacenters. I don't believe
>> anyone
>> intends for it to be enabled for traffic classes carrying TCP.
>>
>> Kevin Gross
>>
>> -----Original Message-----
>> From: bloat-bounces@lists.bufferbloat.net
>> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys
>> Sent: Monday, May 16, 2011 5:24 AM
>> To: bloat@lists.bufferbloat.net
>> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>>
>> Not necessarily out of knowledge or desire (since it isn't usually
>> controllable in the small switches you buy for home). It can cause
>> trouble even in small environments as your house.
>>
>> http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
>>
>> I know I'm at least three consumer switches deep, and it's not by choice.
>> - Jim
>>
>>
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bloat] Network computing article on bloat @ 2011-04-26 17:05 Dave Taht 2011-04-26 18:13 ` Dave Hart 0 siblings, 1 reply; 8+ messages in thread From: Dave Taht @ 2011-04-26 17:05 UTC (permalink / raw) To: bloat Not bad, although I can live without the title. Coins a new-ish phrase "insertion latency" http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 17:05 [Bloat] Network computing article on bloat Dave Taht @ 2011-04-26 18:13 ` Dave Hart 2011-04-26 18:17 ` Dave Taht 0 siblings, 1 reply; 8+ messages in thread From: Dave Hart @ 2011-04-26 18:13 UTC (permalink / raw) To: Dave Taht; +Cc: bloat On Tue, Apr 26, 2011 at 17:05 UTC, Dave Taht <dave.taht@gmail.com> wrote: > Not bad, although I can live without the title. Coins a new-ish phrase > "insertion latency" > > http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php The piece ends with a paragraph claiming preventing packet loss is addressing a more fundamental problem which contributes to bufferbloat. As long as the writer and readers believe packet loss is an unmitigated evil, the battle is lost. More encouraging would have been a statement that packet loss is preferable to excessive queueing and a required TCP feedback signal when ECN isn't in play. Cheers, Dave Hart ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 18:13 ` Dave Hart @ 2011-04-26 18:17 ` Dave Taht 2011-04-26 18:32 ` Wesley Eddy 0 siblings, 1 reply; 8+ messages in thread From: Dave Taht @ 2011-04-26 18:17 UTC (permalink / raw) To: bloat; +Cc: dave greenfield "Big Buffers Bad. Small Buffers Good." "*Some* packet loss is essential for the correct operation of the Internet" are two of the memes I try to propagate, in their simplicity. Even then there are so many qualifiers to both of those that the core message gets lost. On Tue, Apr 26, 2011 at 12:13 PM, Dave Hart <davehart@gmail.com> wrote: > On Tue, Apr 26, 2011 at 17:05 UTC, Dave Taht <dave.taht@gmail.com> wrote: >> Not bad, although I can live without the title. Coins a new-ish phrase >> "insertion latency" >> >> http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php > > The piece ends with a paragraph claiming preventing packet loss is > addressing a more fundamental problem which contributes to > bufferbloat. As long as the writer and readers believe packet loss is > an unmitigated evil, the battle is lost. More encouraging would have > been a statement that packet loss is preferable to excessive queueing > and a required TCP feedback signal when ECN isn't in play. > > Cheers, > Dave Hart > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Network computing article on bloat 2011-04-26 18:17 ` Dave Taht @ 2011-04-26 18:32 ` Wesley Eddy 2011-04-30 19:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger 0 siblings, 1 reply; 8+ messages in thread From: Wesley Eddy @ 2011-04-26 18:32 UTC (permalink / raw) To: bloat On 4/26/2011 2:17 PM, Dave Taht wrote: > "Big Buffers Bad. Small Buffers Good." > > "*Some* packet loss is essential for the correct operation of the Internet" > > are two of the memes I try to propagate, in their simplicity. Even > then there are so many qualifiers to both of those that the core > message gets lost. The second one is actually backwards; it should be "the Internet can operate correctly with some packet loss". -- Wes Eddy MTI Systems ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-04-26 18:32 ` Wesley Eddy @ 2011-04-30 19:18 ` Richard Scheffenegger 2011-05-05 16:01 ` Jim Gettys 0 siblings, 1 reply; 8+ messages in thread From: Richard Scheffenegger @ 2011-04-30 19:18 UTC (permalink / raw) To: bloat I'm curious, has anyone done some simulations to check if the following qualitative statement holds true, and if, what the quantitative effect is: With bufferbloat, the TCP congestion control reaction is unduely delayed. When it finally happens, the tcp stream is likely facing a "burst loss" event - multiple consecutive packets get dropped. Worse yet, the sender with the lowest RTT across the bottleneck will likely start to retransmit while the (tail-drop) queue is still overflowing. And a lost retransmission means a major setback in bandwidth (except for Linux with bulk transfers and SACK enabled), as the standard (RFC documented) behaviour asks for a RTO (1sec nominally, 200-500 ms typically) to recover such a lost retransmission... The second part (more important as an incentive to the ISPs actually), how does the fraction of goodput vs. throughput change, when AQM schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs have to pay for their upstream volume, regardless if that is "real" work (goodput) or unneccessary retransmissions. When I was at a small cable ISP in switzerland last week, surely enough bufferbloat was readily observable (17ms -> 220ms after 30 sec of a bulk transfer), but at first they had the "not our problem" view, until I started discussing burst loss / retransmissions / goodput vs throughput - with the latest point being a real commercial incentive to them. (They promised to check if AQM would be available in the CPE / CMTS, and put latency bounds in their tenders going forward). Best regards, Richard ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-04-30 19:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger @ 2011-05-05 16:01 ` Jim Gettys 2011-05-05 16:10 ` Stephen Hemminger 0 siblings, 1 reply; 8+ messages in thread From: Jim Gettys @ 2011-05-05 16:01 UTC (permalink / raw) To: bloat On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: > I'm curious, has anyone done some simulations to check if the > following qualitative statement holds true, and if, what the > quantitative effect is: > > With bufferbloat, the TCP congestion control reaction is unduely > delayed. When it finally happens, the tcp stream is likely facing a > "burst loss" event - multiple consecutive packets get dropped. Worse > yet, the sender with the lowest RTT across the bottleneck will likely > start to retransmit while the (tail-drop) queue is still overflowing. > > And a lost retransmission means a major setback in bandwidth (except > for Linux with bulk transfers and SACK enabled), as the standard (RFC > documented) behaviour asks for a RTO (1sec nominally, 200-500 ms > typically) to recover such a lost retransmission... > > The second part (more important as an incentive to the ISPs actually), > how does the fraction of goodput vs. throughput change, when AQM > schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs > have to pay for their upstream volume, regardless if that is "real" > work (goodput) or unneccessary retransmissions. > > When I was at a small cable ISP in switzerland last week, surely > enough bufferbloat was readily observable (17ms -> 220ms after 30 sec > of a bulk transfer), but at first they had the "not our problem" view, > until I started discussing burst loss / retransmissions / goodput vs > throughput - with the latest point being a real commercial incentive > to them. (They promised to check if AQM would be available in the CPE > / CMTS, and put latency bounds in their tenders going forward). > I wish I had a good answer to your very good questions. Simulation would be interesting though real daa is more convincing. I haven't looked in detail at all that many traces to try to get a feel for how much bandwidth waste there actually is, and more formal studies like Netalyzr, SamKnows, or the Bismark project would be needed to quantify the loss on the network as a whole. I did spend some time last fall with the traces I've taken. In those, I've typically been seeing 1-3% packet loss in the main TCP transfers. On the wireless trace I took, I saw 9% loss, but whether that is bufferbloat induced loss or not, I don't know (the data is out there for those who might want to dig). And as you note, the losses are concentrated in bursts (probably due to the details of Cubic, so I'm told). I've had anecdotal reports (and some first hand experience) with much higher loss rates, for example from Nick Weaver at ICSI; but I believe in playing things conservatively with any numbers I quote and I've not gotten consistent results when I've tried, so I just report what's in the packet captures I did take. A phenomena that could be occurring is that during congestion avoidance (until TCP loses its cookies entirely and probes for a higher operating point) that TCP is carefully timing it's packets to keep the buffers almost exactly full, so that competing flows (in my case, simple pings) are likely to arrive just when there is no buffer space to accept them and therefore you see higher losses on them than you would on the single flow I've been tracing and getting loss statistics from. People who want to look into this further would be a great help. - Jim ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat 2011-05-05 16:01 ` Jim Gettys @ 2011-05-05 16:10 ` Stephen Hemminger 2011-05-05 16:49 ` [Bloat] Burst Loss Neil Davies 0 siblings, 1 reply; 8+ messages in thread From: Stephen Hemminger @ 2011-05-05 16:10 UTC (permalink / raw) To: Jim Gettys; +Cc: bloat On Thu, 05 May 2011 12:01:22 -0400 Jim Gettys <jg@freedesktop.org> wrote: > On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: > > I'm curious, has anyone done some simulations to check if the > > following qualitative statement holds true, and if, what the > > quantitative effect is: > > > > With bufferbloat, the TCP congestion control reaction is unduely > > delayed. When it finally happens, the tcp stream is likely facing a > > "burst loss" event - multiple consecutive packets get dropped. Worse > > yet, the sender with the lowest RTT across the bottleneck will likely > > start to retransmit while the (tail-drop) queue is still overflowing. > > > > And a lost retransmission means a major setback in bandwidth (except > > for Linux with bulk transfers and SACK enabled), as the standard (RFC > > documented) behaviour asks for a RTO (1sec nominally, 200-500 ms > > typically) to recover such a lost retransmission... > > > > The second part (more important as an incentive to the ISPs actually), > > how does the fraction of goodput vs. throughput change, when AQM > > schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs > > have to pay for their upstream volume, regardless if that is "real" > > work (goodput) or unneccessary retransmissions. > > > > When I was at a small cable ISP in switzerland last week, surely > > enough bufferbloat was readily observable (17ms -> 220ms after 30 sec > > of a bulk transfer), but at first they had the "not our problem" view, > > until I started discussing burst loss / retransmissions / goodput vs > > throughput - with the latest point being a real commercial incentive > > to them. (They promised to check if AQM would be available in the CPE > > / CMTS, and put latency bounds in their tenders going forward). > > > I wish I had a good answer to your very good questions. Simulation > would be interesting though real daa is more convincing. > > I haven't looked in detail at all that many traces to try to get a feel > for how much bandwidth waste there actually is, and more formal studies > like Netalyzr, SamKnows, or the Bismark project would be needed to > quantify the loss on the network as a whole. > > I did spend some time last fall with the traces I've taken. In those, > I've typically been seeing 1-3% packet loss in the main TCP transfers. > On the wireless trace I took, I saw 9% loss, but whether that is > bufferbloat induced loss or not, I don't know (the data is out there for > those who might want to dig). And as you note, the losses are > concentrated in bursts (probably due to the details of Cubic, so I'm told). > > I've had anecdotal reports (and some first hand experience) with much > higher loss rates, for example from Nick Weaver at ICSI; but I believe > in playing things conservatively with any numbers I quote and I've not > gotten consistent results when I've tried, so I just report what's in > the packet captures I did take. > > A phenomena that could be occurring is that during congestion avoidance > (until TCP loses its cookies entirely and probes for a higher operating > point) that TCP is carefully timing it's packets to keep the buffers > almost exactly full, so that competing flows (in my case, simple pings) > are likely to arrive just when there is no buffer space to accept them > and therefore you see higher losses on them than you would on the single > flow I've been tracing and getting loss statistics from. > > People who want to look into this further would be a great help. > - Jim I would not put a lot of trust in measuring loss with pings. I heard that some ISP's do different processing on ICMP's used for ping packets. They either prioritize them high to provide artificially good response (better marketing numbers); or prioritize them low since they aren't useful traffic. There are also filters that only allow N ICMP requests per second which means repeated probes will be dropped. -- ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bloat] Burst Loss 2011-05-05 16:10 ` Stephen Hemminger @ 2011-05-05 16:49 ` Neil Davies 2011-05-08 12:42 ` Richard Scheffenegger 0 siblings, 1 reply; 8+ messages in thread From: Neil Davies @ 2011-05-05 16:49 UTC (permalink / raw) To: Stephen Hemminger; +Cc: bloat On the issue of loss - we did a study of the UK's ADSL access network back in 2006 over several weeks, looking at the loss and delay that was introduced into the bi-directional traffic. We found that the delay variability (that bit left over after you've taken the effects of geography and line sync rates) was broadly the same over the half dozen locations we studied - it was there all the time to the same level of variance and that what did vary by time of day was the loss rate. We also found out, at the time much to our surprise - but we understand why now, that loss was broadly independent of the offered load - we used a constant data rate (with either fixed or variable packet sizes) . We found that loss rates were in the range 1% to 3% (which is what would be expected from a large number of TCP streams contending for a limiting resource). As for burst loss, yes it does occur - but it could be argued that this more the fault of the sending TCP stack than the network. This phenomenon was well covered in the academic literature in the '90s (if I remember correctly folks at INRIA lead the way) - it is all down to the nature of random processes and how you observe them. Back to back packets see higher loss rates than packets more spread out in time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec link into a queue being serviced at 34Mbit/sec, the first packet being 'lost' is equivalent to saying that the first packet 'observed' the queue full - the system's state is no longer a random variable - it is known to be full. The second packet (lets assume it is also a full one) 'makes an observation' of the state of that queue about 12us later - but that is only 3% of the time that it takes to service such large packets at 34 Mbit/sec. The system has not had any time to 'relax' anywhere near to back its steady state, it is highly likely that it is still full. Fixing this makes a phenomenal difference on the goodput (with the usual delay effects that implies), we've even built and deployed systems with this sort of engineering embedded (deployed as a network 'wrap') that mean that end users can sustainably (days on end) achieve effective throughput that is better than 98% of (the transmission media imposed) maximum. What we had done is make the network behave closer to the underlying statistical assumptions made in TCP's design. Neil On 5 May 2011, at 17:10, Stephen Hemminger wrote: > On Thu, 05 May 2011 12:01:22 -0400 > Jim Gettys <jg@freedesktop.org> wrote: > >> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: >>> I'm curious, has anyone done some simulations to check if the >>> following qualitative statement holds true, and if, what the >>> quantitative effect is: >>> >>> With bufferbloat, the TCP congestion control reaction is unduely >>> delayed. When it finally happens, the tcp stream is likely facing a >>> "burst loss" event - multiple consecutive packets get dropped. Worse >>> yet, the sender with the lowest RTT across the bottleneck will likely >>> start to retransmit while the (tail-drop) queue is still overflowing. >>> >>> And a lost retransmission means a major setback in bandwidth (except >>> for Linux with bulk transfers and SACK enabled), as the standard (RFC >>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms >>> typically) to recover such a lost retransmission... >>> >>> The second part (more important as an incentive to the ISPs actually), >>> how does the fraction of goodput vs. throughput change, when AQM >>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs >>> have to pay for their upstream volume, regardless if that is "real" >>> work (goodput) or unneccessary retransmissions. >>> >>> When I was at a small cable ISP in switzerland last week, surely >>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec >>> of a bulk transfer), but at first they had the "not our problem" view, >>> until I started discussing burst loss / retransmissions / goodput vs >>> throughput - with the latest point being a real commercial incentive >>> to them. (They promised to check if AQM would be available in the CPE >>> / CMTS, and put latency bounds in their tenders going forward). >>> >> I wish I had a good answer to your very good questions. Simulation >> would be interesting though real daa is more convincing. >> >> I haven't looked in detail at all that many traces to try to get a feel >> for how much bandwidth waste there actually is, and more formal studies >> like Netalyzr, SamKnows, or the Bismark project would be needed to >> quantify the loss on the network as a whole. >> >> I did spend some time last fall with the traces I've taken. In those, >> I've typically been seeing 1-3% packet loss in the main TCP transfers. >> On the wireless trace I took, I saw 9% loss, but whether that is >> bufferbloat induced loss or not, I don't know (the data is out there for >> those who might want to dig). And as you note, the losses are >> concentrated in bursts (probably due to the details of Cubic, so I'm told). >> >> I've had anecdotal reports (and some first hand experience) with much >> higher loss rates, for example from Nick Weaver at ICSI; but I believe >> in playing things conservatively with any numbers I quote and I've not >> gotten consistent results when I've tried, so I just report what's in >> the packet captures I did take. >> >> A phenomena that could be occurring is that during congestion avoidance >> (until TCP loses its cookies entirely and probes for a higher operating >> point) that TCP is carefully timing it's packets to keep the buffers >> almost exactly full, so that competing flows (in my case, simple pings) >> are likely to arrive just when there is no buffer space to accept them >> and therefore you see higher losses on them than you would on the single >> flow I've been tracing and getting loss statistics from. >> >> People who want to look into this further would be a great help. >> - Jim > > I would not put a lot of trust in measuring loss with pings. > I heard that some ISP's do different processing on ICMP's used > for ping packets. They either prioritize them high to provide > artificially good response (better marketing numbers); or > prioritize them low since they aren't useful traffic. > There are also filters that only allow N ICMP requests per second > which means repeated probes will be dropped. > > > > -- > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-05 16:49 ` [Bloat] Burst Loss Neil Davies @ 2011-05-08 12:42 ` Richard Scheffenegger 2011-05-09 18:06 ` Rick Jones 0 siblings, 1 reply; 8+ messages in thread From: Richard Scheffenegger @ 2011-05-08 12:42 UTC (permalink / raw) To: Neil Davies, Stephen Hemminger; +Cc: bloat I'm not an expert in TSO / GSO, and NIC driver design, but what I gathered is, that with these schemes, and mordern NICs that do scatter/gather DMA of dotzends of "independent" header/data chuncks directly from memory, the NIC will typically send out non-interleaved trains of segments all belonging to single TCP sessions. With the implicit assumption, that these burst of up to 180 segments (Intel supports 256kB data per chain) can be absorped by the buffer at the bottleneck and spread out in time there... From my perspective, having such GSO / TSO to "cycle" through all the different chains belonging to different sessions (to not introduce reordering at the sender even), should already help pace the segments per session somewhat; a slightly more sophisticated DMA engine could check each of the chains for how much data is to be sent by those, and then clock an appropriate number of interleaved segmets out... I do understand that this is "work" for a HW DMA engine and slows down GSO software implementations, but may severly reduce the instantaneous rate of a single session, and thereby the impact of burst loss to to momenary buffer overload... (Let me know if I should draw a picture of the way I understand TSO / HW DMA is currently working, and where it could be improved upon): Best regards, Richard ----- Original Message ----- > Back to back packets see higher loss rates than packets more spread out in > time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec > link into a queue being serviced at 34Mbit/sec, the first packet being > 'lost' is equivalent to saying that the first packet 'observed' the queue > full - the system's state is no longer a random variable - it is known to > be full. The second packet (lets assume it is also a full one) 'makes an > observation' of the state of that queue about 12us later - but that is > only 3% of the time that it takes to service such large packets at 34 > Mbit/sec. The system has not had any time to 'relax' anywhere near to back > its steady state, it is highly likely that it is still full. > > Fixing this makes a phenomenal difference on the goodput (with the usual > delay effects that implies), we've even built and deployed systems with > this sort of engineering embedded (deployed as a network 'wrap') that mean > that end users can sustainably (days on end) achieve effective throughput > that is better than 98% of (the transmission media imposed) maximum. What > we had done is make the network behave closer to the underlying > statistical assumptions made in TCP's design. > > Neil > > > > > On 5 May 2011, at 17:10, Stephen Hemminger wrote: > >> On Thu, 05 May 2011 12:01:22 -0400 >> Jim Gettys <jg@freedesktop.org> wrote: >> >>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: >>>> I'm curious, has anyone done some simulations to check if the >>>> following qualitative statement holds true, and if, what the >>>> quantitative effect is: >>>> >>>> With bufferbloat, the TCP congestion control reaction is unduely >>>> delayed. When it finally happens, the tcp stream is likely facing a >>>> "burst loss" event - multiple consecutive packets get dropped. Worse >>>> yet, the sender with the lowest RTT across the bottleneck will likely >>>> start to retransmit while the (tail-drop) queue is still overflowing. >>>> >>>> And a lost retransmission means a major setback in bandwidth (except >>>> for Linux with bulk transfers and SACK enabled), as the standard (RFC >>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms >>>> typically) to recover such a lost retransmission... >>>> >>>> The second part (more important as an incentive to the ISPs actually), >>>> how does the fraction of goodput vs. throughput change, when AQM >>>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs >>>> have to pay for their upstream volume, regardless if that is "real" >>>> work (goodput) or unneccessary retransmissions. >>>> >>>> When I was at a small cable ISP in switzerland last week, surely >>>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec >>>> of a bulk transfer), but at first they had the "not our problem" view, >>>> until I started discussing burst loss / retransmissions / goodput vs >>>> throughput - with the latest point being a real commercial incentive >>>> to them. (They promised to check if AQM would be available in the CPE >>>> / CMTS, and put latency bounds in their tenders going forward). >>>> >>> I wish I had a good answer to your very good questions. Simulation >>> would be interesting though real daa is more convincing. >>> >>> I haven't looked in detail at all that many traces to try to get a feel >>> for how much bandwidth waste there actually is, and more formal studies >>> like Netalyzr, SamKnows, or the Bismark project would be needed to >>> quantify the loss on the network as a whole. >>> >>> I did spend some time last fall with the traces I've taken. In those, >>> I've typically been seeing 1-3% packet loss in the main TCP transfers. >>> On the wireless trace I took, I saw 9% loss, but whether that is >>> bufferbloat induced loss or not, I don't know (the data is out there for >>> those who might want to dig). And as you note, the losses are >>> concentrated in bursts (probably due to the details of Cubic, so I'm >>> told). >>> >>> I've had anecdotal reports (and some first hand experience) with much >>> higher loss rates, for example from Nick Weaver at ICSI; but I believe >>> in playing things conservatively with any numbers I quote and I've not >>> gotten consistent results when I've tried, so I just report what's in >>> the packet captures I did take. >>> >>> A phenomena that could be occurring is that during congestion avoidance >>> (until TCP loses its cookies entirely and probes for a higher operating >>> point) that TCP is carefully timing it's packets to keep the buffers >>> almost exactly full, so that competing flows (in my case, simple pings) >>> are likely to arrive just when there is no buffer space to accept them >>> and therefore you see higher losses on them than you would on the single >>> flow I've been tracing and getting loss statistics from. >>> >>> People who want to look into this further would be a great help. >>> - Jim >> >> I would not put a lot of trust in measuring loss with pings. >> I heard that some ISP's do different processing on ICMP's used >> for ping packets. They either prioritize them high to provide >> artificially good response (better marketing numbers); or >> prioritize them low since they aren't useful traffic. >> There are also filters that only allow N ICMP requests per second >> which means repeated probes will be dropped. >> >> >> >> -- >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-08 12:42 ` Richard Scheffenegger @ 2011-05-09 18:06 ` Rick Jones 2011-05-12 16:31 ` Fred Baker 0 siblings, 1 reply; 8+ messages in thread From: Rick Jones @ 2011-05-09 18:06 UTC (permalink / raw) To: Richard Scheffenegger; +Cc: Stephen Hemminger, bloat On Sun, 2011-05-08 at 14:42 +0200, Richard Scheffenegger wrote: > I'm not an expert in TSO / GSO, and NIC driver design, but what I gathered > is, that with these schemes, and mordern NICs that do scatter/gather DMA of > dotzends of "independent" header/data chuncks directly from memory, the NIC > will typically send out non-interleaved trains of segments all belonging to > single TCP sessions. With the implicit assumption, that these burst of up to > 180 segments (Intel supports 256kB data per chain) can be absorped by the > buffer at the bottleneck and spread out in time there... > > From my perspective, having such GSO / TSO to "cycle" through all the > different chains belonging to different sessions (to not introduce > reordering at the sender even), should already help pace the segments per > session somewhat; a slightly more sophisticated DMA engine could check each > of the chains for how much data is to be sent by those, and then clock an > appropriate number of interleaved segmets out... I do understand that this > is "work" for a HW DMA engine and slows down GSO software implementations, > but may severly reduce the instantaneous rate of a single session, and > thereby the impact of burst loss to to momenary buffer overload... > > (Let me know if I should draw a picture of the way I understand TSO / HW DMA > is currently working, and where it could be improved upon): GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE) refusing to standardize an increase in frame sizes. Put another way, they are a "poor man's jumbo frames." Within the context of a given "priority" at least, NICs are setup/designed to do things in order. I too cannot claim to be a NIC designer, but suspect it would be a non-trivial, if straight-forward exercise to get a NIC to cycle through multiple GSO/TSO sends. Yes, they could probably (ab)use any prioritization support they have. NICs and drivers are accustomed to "in order" processing - grab packet, send packet, update status, lather, rinse, repeat (modulo some pre-fetching). Those rings aren't really amenable to "out of order" completion notifications, so the NIC would have to still do "in order" retirement of packets or the driver model will loose simplicity. As for the issue below, even if the NIC(s) upstream did interleave between two GSO'd sends, you are simply trading back-to-back frames of a single flow for back-to-back frames of different flows. And if there is only the one flow upstream of this bottleneck, whether GSO is on or not probably won't make a huge difference in the timing - only how much CPU is burned on the source host. > Best regards, > Richard > > > ----- Original Message ----- > > Back to back packets see higher loss rates than packets more spread out in > > time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec > > link into a queue being serviced at 34Mbit/sec, the first packet being > > 'lost' is equivalent to saying that the first packet 'observed' the queue > > full - the system's state is no longer a random variable - it is known to > > be full. The second packet (lets assume it is also a full one) 'makes an > > observation' of the state of that queue about 12us later - but that is > > only 3% of the time that it takes to service such large packets at 34 > > Mbit/sec. The system has not had any time to 'relax' anywhere near to back > > its steady state, it is highly likely that it is still full. > > > > Fixing this makes a phenomenal difference on the goodput (with the usual > > delay effects that implies), we've even built and deployed systems with > > this sort of engineering embedded (deployed as a network 'wrap') that mean > > that end users can sustainably (days on end) achieve effective throughput > > that is better than 98% of (the transmission media imposed) maximum. What > > we had done is make the network behave closer to the underlying > > statistical assumptions made in TCP's design. > > > > Neil > > > > > > > > > > On 5 May 2011, at 17:10, Stephen Hemminger wrote: > > > >> On Thu, 05 May 2011 12:01:22 -0400 > >> Jim Gettys <jg@freedesktop.org> wrote: > >> > >>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: > >>>> I'm curious, has anyone done some simulations to check if the > >>>> following qualitative statement holds true, and if, what the > >>>> quantitative effect is: > >>>> > >>>> With bufferbloat, the TCP congestion control reaction is unduely > >>>> delayed. When it finally happens, the tcp stream is likely facing a > >>>> "burst loss" event - multiple consecutive packets get dropped. Worse > >>>> yet, the sender with the lowest RTT across the bottleneck will likely > >>>> start to retransmit while the (tail-drop) queue is still overflowing. > >>>> > >>>> And a lost retransmission means a major setback in bandwidth (except > >>>> for Linux with bulk transfers and SACK enabled), as the standard (RFC > >>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms > >>>> typically) to recover such a lost retransmission... > >>>> > >>>> The second part (more important as an incentive to the ISPs actually), > >>>> how does the fraction of goodput vs. throughput change, when AQM > >>>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs > >>>> have to pay for their upstream volume, regardless if that is "real" > >>>> work (goodput) or unneccessary retransmissions. > >>>> > >>>> When I was at a small cable ISP in switzerland last week, surely > >>>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec > >>>> of a bulk transfer), but at first they had the "not our problem" view, > >>>> until I started discussing burst loss / retransmissions / goodput vs > >>>> throughput - with the latest point being a real commercial incentive > >>>> to them. (They promised to check if AQM would be available in the CPE > >>>> / CMTS, and put latency bounds in their tenders going forward). > >>>> > >>> I wish I had a good answer to your very good questions. Simulation > >>> would be interesting though real daa is more convincing. > >>> > >>> I haven't looked in detail at all that many traces to try to get a feel > >>> for how much bandwidth waste there actually is, and more formal studies > >>> like Netalyzr, SamKnows, or the Bismark project would be needed to > >>> quantify the loss on the network as a whole. > >>> > >>> I did spend some time last fall with the traces I've taken. In those, > >>> I've typically been seeing 1-3% packet loss in the main TCP transfers. > >>> On the wireless trace I took, I saw 9% loss, but whether that is > >>> bufferbloat induced loss or not, I don't know (the data is out there for > >>> those who might want to dig). And as you note, the losses are > >>> concentrated in bursts (probably due to the details of Cubic, so I'm > >>> told). > >>> > >>> I've had anecdotal reports (and some first hand experience) with much > >>> higher loss rates, for example from Nick Weaver at ICSI; but I believe > >>> in playing things conservatively with any numbers I quote and I've not > >>> gotten consistent results when I've tried, so I just report what's in > >>> the packet captures I did take. > >>> > >>> A phenomena that could be occurring is that during congestion avoidance > >>> (until TCP loses its cookies entirely and probes for a higher operating > >>> point) that TCP is carefully timing it's packets to keep the buffers > >>> almost exactly full, so that competing flows (in my case, simple pings) > >>> are likely to arrive just when there is no buffer space to accept them > >>> and therefore you see higher losses on them than you would on the single > >>> flow I've been tracing and getting loss statistics from. > >>> > >>> People who want to look into this further would be a great help. > >>> - Jim > >> > >> I would not put a lot of trust in measuring loss with pings. > >> I heard that some ISP's do different processing on ICMP's used > >> for ping packets. They either prioritize them high to provide > >> artificially good response (better marketing numbers); or > >> prioritize them low since they aren't useful traffic. > >> There are also filters that only allow N ICMP requests per second > >> which means repeated probes will be dropped. > >> > >> > >> > >> -- > >> _______________________________________________ > >> Bloat mailing list > >> Bloat@lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/bloat > > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-09 18:06 ` Rick Jones @ 2011-05-12 16:31 ` Fred Baker 2011-05-13 5:00 ` Kevin Gross 0 siblings, 1 reply; 8+ messages in thread From: Fred Baker @ 2011-05-12 16:31 UTC (permalink / raw) To: rick.jones2; +Cc: Stephen Hemminger, bloat On May 9, 2011, at 11:06 AM, Rick Jones wrote: > GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE) > refusing to standardize an increase in frame sizes. Put another way, > they are a "poor man's jumbo frames." I'll agree, but only half; once the packets are transferred on the local wire, any jumbo-ness is lost. GSO/TSO mostly squeezes interframe gaps out of the wire and perhaps limits the amount of work the driver has to do. The real value of an end to end (IP) jumbo frame is that the receiving system experiences less interrupt load - a 9K frame replaces half a dozen 1500 byte frames, and as a result the receiver experiences 1/5 or 1/6 of the interrupts. Given that it has to save state, activate the kernel thread, and at least enqueue and perhaps acknowledge the received message, reducing interrupt load on the receiver makes it far more effective. This has the greatest effect on multi-gigabit file transfers. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-12 16:31 ` Fred Baker @ 2011-05-13 5:00 ` Kevin Gross 2011-05-13 14:35 ` Rick Jones 0 siblings, 1 reply; 8+ messages in thread From: Kevin Gross @ 2011-05-13 5:00 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 1571 bytes --] One of the principal reasons jumbo frames have not been standardized is due to latency concerns. I assume this group can appreciate the IEEE holding ground on this. For a short time, servers with gigabit NICs suffered but smarter NICs were developed (TSO, LRO, other TLAs) and OSs upgraded to support them and I believe it is no longer a significant issue. Kevin Gross On Thu, May 12, 2011 at 10:31 AM, Fred Baker <fred@cisco.com> wrote: > > On May 9, 2011, at 11:06 AM, Rick Jones wrote: > > > GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE) > > refusing to standardize an increase in frame sizes. Put another way, > > they are a "poor man's jumbo frames." > > I'll agree, but only half; once the packets are transferred on the local > wire, any jumbo-ness is lost. GSO/TSO mostly squeezes interframe gaps out of > the wire and perhaps limits the amount of work the driver has to do. The > real value of an end to end (IP) jumbo frame is that the receiving system > experiences less interrupt load - a 9K frame replaces half a dozen 1500 byte > frames, and as a result the receiver experiences 1/5 or 1/6 of the > interrupts. Given that it has to save state, activate the kernel thread, and > at least enqueue and perhaps acknowledge the received message, reducing > interrupt load on the receiver makes it far more effective. This has the > greatest effect on multi-gigabit file transfers. > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > [-- Attachment #2: Type: text/html, Size: 1997 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-13 5:00 ` Kevin Gross @ 2011-05-13 14:35 ` Rick Jones 2011-05-13 14:54 ` Dave Taht 0 siblings, 1 reply; 8+ messages in thread From: Rick Jones @ 2011-05-13 14:35 UTC (permalink / raw) To: Kevin Gross; +Cc: bloat On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: > One of the principal reasons jumbo frames have not been standardized > is due to latency concerns. I assume this group can appreciate the > IEEE holding ground on this. Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds of queuing delay. I don't think this list is worrying about the tens of microseconds difference between the transmission time of a 9000 byte frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds difference at 10 GbE. The "lets try to get onto the Top 500 list" crowd might, but official sanction for a 9000 byte MTU (or larger) doesn't mean it *must* be used. > For a short time, servers with gigabit NICs suffered but smarter NICs > were developed (TSO, LRO, other TLAs) and OSs upgraded to support them > and I believe it is no longer a significant issue. Are TSO and LRO going to be sufficient at 40 and 100 GbE? Cores aren't getting any faster. Only more plentiful. And while it isn't the strongest point in the world, one might even argue that the need to use TSO/LRO to achieve performance hinders new transport protocol adoption - the presence of NIC offloads for only TCP (or UDP) leaves a new transport protocol (perhaps SCTP) at a disadvantage. rick jones > Kevin Gross > > On Thu, May 12, 2011 at 10:31 AM, Fred Baker <fred@cisco.com> wrote: > > On May 9, 2011, at 11:06 AM, Rick Jones wrote: > > > GSO/TSO can be thought of as a symptom of standards bodies > (eg the IEEE) > > refusing to standardize an increase in frame sizes. Put > another way, > > they are a "poor man's jumbo frames." > > I'll agree, but only half; once the packets are transferred on > the local wire, any jumbo-ness is lost. GSO/TSO mostly > squeezes interframe gaps out of the wire and perhaps limits > the amount of work the driver has to do. The real value of an > end to end (IP) jumbo frame is that the receiving system > experiences less interrupt load - a 9K frame replaces half a > dozen 1500 byte frames, and as a result the receiver > experiences 1/5 or 1/6 of the interrupts. Given that it has to > save state, activate the kernel thread, and at least enqueue > and perhaps acknowledge the received message, reducing > interrupt load on the receiver makes it far more effective. > This has the greatest effect on multi-gigabit file transfers. > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Burst Loss 2011-05-13 14:35 ` Rick Jones @ 2011-05-13 14:54 ` Dave Taht 2011-05-13 20:03 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross 0 siblings, 1 reply; 8+ messages in thread From: Dave Taht @ 2011-05-13 14:54 UTC (permalink / raw) To: rick.jones2; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 1280 bytes --] On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote: > On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: > > One of the principal reasons jumbo frames have not been standardized > > is due to latency concerns. I assume this group can appreciate the > > IEEE holding ground on this. > > Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds > of queuing delay. I don't think this list is worrying about the tens of > microseconds difference between the transmission time of a 9000 byte > frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds > difference at 10 GbE. > Heh. With the first iteration of the bismark project I'm trying to get to where I have less than 30ms latency under load and have far larger problems to worry about than jumbo frames. I'll be lucky to manage 1/10th that (300ms) at this point. Not, incidentally that I mind the idea of jumbo frames. It seems silly to be saddled with default frame sizes that made sense in the 70s, and in an age where we will be seeing ever more packet encapsulation, reducing the header size as a ratio to data size strikes me as a very worthy goal. -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com [-- Attachment #2: Type: text/html, Size: 1699 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-13 14:54 ` Dave Taht @ 2011-05-13 20:03 ` Kevin Gross 2011-05-14 20:48 ` Fred Baker 0 siblings, 1 reply; 8+ messages in thread From: Kevin Gross @ 2011-05-13 20:03 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 2529 bytes --] Do we think that bufferbloat is just a WAN problem? I work on live media applications for LANs and campus networks. I'm seeing what I think could be characterized as bufferbloat in LAN equipment. The timescales on 1 Gb Ethernet are orders of magnitude shorter and the performance problems caused are in many cases a bit different but root cause and potential solutions are, I'm hoping, very similar. Keeping the frame byte size small while the frame time has shrunk maintains the overhead at the same level. Again, this has been a conscious decision not a stubborn relic. Ethernet improvements have increased bandwidth by orders of magnitude. Do we really need to increase it by a couple percentage points more by reducing overhead for large payloads? The cost of that improved marginal bandwidth efficiency is a 6x increase in latency. Many applications would not notice an increase from 12 us to 72 us for a Gigabit switch hop. But on a large network it adds up, some applications are absolutely that sensitive (transaction processing, cluster computing, SANs) and (I thought I'd be preaching to the choir here) there's no way to ever recover the lost performance. Kevin Gross From: Dave Taht [mailto:dave.taht@gmail.com] Sent: Friday, May 13, 2011 8:54 AM To: rick.jones2@hp.com Cc: Kevin Gross; bloat@lists.bufferbloat.net Subject: Re: [Bloat] Burst Loss On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote: On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: > One of the principal reasons jumbo frames have not been standardized > is due to latency concerns. I assume this group can appreciate the > IEEE holding ground on this. Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds of queuing delay. I don't think this list is worrying about the tens of microseconds difference between the transmission time of a 9000 byte frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds difference at 10 GbE. Heh. With the first iteration of the bismark project I'm trying to get to where I have less than 30ms latency under load and have far larger problems to worry about than jumbo frames. I'll be lucky to manage 1/10th that (300ms) at this point. Not, incidentally that I mind the idea of jumbo frames. It seems silly to be saddled with default frame sizes that made sense in the 70s, and in an age where we will be seeing ever more packet encapsulation, reducing the header size as a ratio to data size strikes me as a very worthy goal. [-- Attachment #2: Type: text/html, Size: 8491 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-13 20:03 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross @ 2011-05-14 20:48 ` Fred Baker 2011-05-15 18:28 ` Jonathan Morton 0 siblings, 1 reply; 8+ messages in thread From: Fred Baker @ 2011-05-14 20:48 UTC (permalink / raw) To: Kevin Gross; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 5084 bytes --] On May 13, 2011, at 1:03 PM, Kevin Gross wrote: > Do we think that bufferbloat is just a WAN problem? I work on live media applications for LANs and campus networks. I'm seeing what I think could be characterized as bufferbloat in LAN equipment. The timescales on 1 Gb Ethernet are orders of magnitude shorter and the performance problems caused are in many cases a bit different but root cause and potential solutions are, I'm hoping, very similar. Bufferbloat is most noticeable on WANs, because they have longer delays, but yes LAN equipment does the same thing. It shows up as extended delay or as an increase in loss rates. A lot of LAN equipment has very shallow buffers due to cost (LAN markets are very cost-sensitive). One myth with bufferbloat is that a reasonable solution is to make the buffer shallow; no, because when the queue fills you now have an increased loss rate, which shows up in timeout-driven retransmissions - you really want a deep buffer (for bursts and temporary surges) that you keep shallow using AQM techniques. > Keeping the frame byte size small while the frame time has shrunk maintains the overhead at the same level. Again, this has been a conscious decision not a stubborn relic. Ethernet improvements have increased bandwidth by orders of magnitude. Do we really need to increase it by a couple percentage points more by reducing overhead for large payloads? You might talk with folks who do the LAN Speed records. They generally view end to end jumboframes as material to the achievement. It's not about changing the serialization delay, it's about changing the amount of processing at the endpoints. > The cost of that improved marginal bandwidth efficiency is a 6x increase in latency. Many applications would not notice an increase from 12 us to 72 us for a Gigabit switch hop. But on a large network it adds up, some applications are absolutely that sensitive (transaction processing, cluster computing, SANs) and (I thought I'd be preaching to the choir here) there's no way to ever recover the lost performance. Well, the extra delay is solvable in the transport. The question isn't really what the impact on the network is; it's what the requirements of the application are. For voice, if a voice sample is delayed 50 ms the jitter buffer in the codec resolves that - microseconds are irrelevant. Video codecs generally keep at least three video frames in their jitter buffer; at 30 fps, that's 100 milliseconds of acceptable variation in delay. milliseconds. Where it gets dicey is in elastic applications (applications using transports with the characteristics of TCP) that are retransmitting or otherwise reacting in timeframes comparable to the RTT and the RTT is small, or in elastic applications in which the timeout-retransmission interval is on the order of hundreds of milliseconds to seconds (true of most TCPs) but the RTT is on the order of microseconds to milliseconds. In the former, a deep queue buildup and trigger a transmission that further builds the queue; in the latter, a hiccup can have dramatic side effects. There is ongoing research on how best to do such things in data centers. My suspicion is that the right approach is something akin to 802.2 at the link layer, but with NACK retransmission - system A enumerates the data it sends to system B, and if system B sees a number skip it asks A to retransmit the indicated datagram. You might take a look at RFC 5401/5740/5776 for implementation suggestions. > Kevin Gross > > From: Dave Taht [mailto:dave.taht@gmail.com] > Sent: Friday, May 13, 2011 8:54 AM > To: rick.jones2@hp.com > Cc: Kevin Gross; bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Burst Loss > > > > On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote: > On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote: > > One of the principal reasons jumbo frames have not been standardized > > is due to latency concerns. I assume this group can appreciate the > > IEEE holding ground on this. > > Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds > of queuing delay. I don't think this list is worrying about the tens of > microseconds difference between the transmission time of a 9000 byte > frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds > difference at 10 GbE. > > Heh. With the first iteration of the bismark project I'm trying to get to where I have less than 30ms latency under load and have far larger problems to worry about than jumbo frames. I'll be lucky to manage 1/10th that (300ms) at this point. > > Not, incidentally that I mind the idea of jumbo frames. It seems silly to be saddled with default frame sizes that made sense in the 70s, and in an age where we will be seeing ever more packet encapsulation, reducing the header size as a ratio to data size strikes me as a very worthy goal. > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat [-- Attachment #2: Type: text/html, Size: 15595 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-14 20:48 ` Fred Baker @ 2011-05-15 18:28 ` Jonathan Morton 2011-05-15 20:49 ` Fred Baker 0 siblings, 1 reply; 8+ messages in thread From: Jonathan Morton @ 2011-05-15 18:28 UTC (permalink / raw) To: Fred Baker; +Cc: bloat On 14 May, 2011, at 11:48 pm, Fred Baker wrote: > My suspicion is that the right approach is something akin to 802.2 at the link layer, but with NACK retransmission - system A enumerates the data it sends to system B, and if system B sees a number skip it asks A to retransmit the indicated datagram. You might take a look at RFC 5401/5740/5776 for implementation suggestions. This sounds like "reliable datagram" semantics to me. It also sounds a lot like ARQ as used in amateur packet radio. I believe similar mechanisms are built into 802.11. The fundamental thing is that the sender must be able to know when sent frames can be flushed from the buffer because they don't need to be retransmitted. So if there's a NACK, there must also be an ACK - at which point the ACK serves the purpose of the NACK, as it does in TCP. The only alternative is a wall-time TTL, which is doable on single hops but requires careful design. Let's face it. UDP is unreliable by design - applications using it *must* anticipate and cope with dropped and delayed packets, either by exponential RTO or ARQ or NACK or FEC, all at the application layer. And, in a congested network, some UDP packets *will* be lost. TCP is reliable but needs to maintain appropriate window sizes - which it doesn't at present because a lossless network without ECN provides insufficient feedback (and AQM, which is required for good ECN signals, is usually absent), and in the quest for performance, the trend has been inexorably towards more aggressive window sizing (of which TCP-Fit is the latest example). At the receiver end, it is possible to restrain this trend by reducing the receive window. Unfortunately, it's useless to expect Ethernet switches to turn on ECN. They operate at a lower stack level than IP, so they will not modify the IP TOS headers. However, recent versions of Ethernet *do* support a throttling feedback mechanism, and this can and should be exploited to tell the edge host or router that ECN *might* be needed. Also, with throttling feedback throughout the LAN, the Ethernet can for practical purposes be treated as almost-reliable. This is *better* in terms of packet loss than ARQ or NACK, although if the Ethernet's buffers are large, it will still increase delay. (With small buffers, it will just decrease throughput to the capacity, which is fine.) - Jonathan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-15 18:28 ` Jonathan Morton @ 2011-05-15 20:49 ` Fred Baker 2011-05-16 0:31 ` Jonathan Morton 0 siblings, 1 reply; 8+ messages in thread From: Fred Baker @ 2011-05-15 20:49 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat On May 15, 2011, at 11:28 AM, Jonathan Morton wrote: > The fundamental thing is that the sender must be able to know when sent frames can be flushed from the buffer because they don't need to be retransmitted. So if there's a NACK, there must also be an ACK - at which point the ACK serves the purpose of the NACK, as it does in TCP. The only alternative is a wall-time TTL, which is doable on single hops but requires careful design. To a point. NORM holds a frame for possible retransmission for a stated period of time, and if retransmission isn't requested in that interval forgets it. So the ack isn't actually necessary; what is necessary is that the retention interval be long enough that a nack has a high probability of succeeding in getting the message through. A 100 Gbit interface can handle 97656 per millisecond (100G/(8*128*1000). We're looking at something on the order of 18 bits (4 ms to retransmit without falling back to TCP) for a rational sequence number at 100 Gbps; 16 bits would be enough at 10 Gbps, and 12 bits would be enough at 1 Gbps. > ...recent versions of Ethernet *do* support a throttling feedback mechanism, and this can and should be exploited to tell the edge host or router that ECN *might* be needed. Also, with throttling feedback throughout the LAN, the Ethernet can for practical purposes be treated as almost-reliable. This is *better* in terms of packet loss than ARQ or NACK, although if the Ethernet's buffers are large, it will still increase delay. (With small buffers, it will just decrease throughput to the capacity, which is fine.) It increases the delay anyway. It just pushes the retention buffer to another place. What do you think the packet is doing during the "don't transmit" interval? Throughput never exceeds capacity. If I have a 10 GBPS link, I will never get more than 10 GBPS through it. Buffer fill rate is statistically predictable. With small buffers, the fill rate acheives the top sooner. They increase the probability that the buffers are full, which is to say the drop probability. Which puts us to an end to end retransmission, which is the worst case of what you were worried about. I'm not going to argue against letting retransmission go end to end; it's an endless debate. I'll simply note that several link layers, including but not limited to those you mention, find that applications using them work better if there is a high high probability of retransmission in an interval on the order of the link RTT as opposed to the end to end RTT. You brought up data centers (aka variable delays in LAN networks); those have been heavily the province of fiberchannel, which is a link layer protocol with retransmission. Think about it. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-15 20:49 ` Fred Baker @ 2011-05-16 0:31 ` Jonathan Morton 2011-05-16 7:51 ` Richard Scheffenegger 0 siblings, 1 reply; 8+ messages in thread From: Jonathan Morton @ 2011-05-16 0:31 UTC (permalink / raw) To: Fred Baker; +Cc: bloat On 15 May, 2011, at 11:49 pm, Fred Baker wrote: > > On May 15, 2011, at 11:28 AM, Jonathan Morton wrote: >> The fundamental thing is that the sender must be able to know when sent frames can be flushed from the buffer because they don't need to be retransmitted. So if there's a NACK, there must also be an ACK - at which point the ACK serves the purpose of the NACK, as it does in TCP. The only alternative is a wall-time TTL, which is doable on single hops but requires careful design. > > To a point. NORM holds a frame for possible retransmission for a stated period of time, and if retransmission isn't requested in that interval forgets it. So the ack isn't actually necessary; what is necessary is that the retention interval be long enough that a nack has a high probability of succeeding in getting the message through. Okay, so because it can fall back to TCP's retransmit, the retention requirements can be relaxed. >> ...recent versions of Ethernet *do* support a throttling feedback mechanism, and this can and should be exploited to tell the edge host or router that ECN *might* be needed. Also, with throttling feedback throughout the LAN, the Ethernet can for practical purposes be treated as almost-reliable. This is *better* in terms of packet loss than ARQ or NACK, although if the Ethernet's buffers are large, it will still increase delay. (With small buffers, it will just decrease throughput to the capacity, which is fine.) > > It increases the delay anyway. It just pushes the retention buffer to another place. What do you think the packet is doing during the "don't transmit" interval? Most packets delayed by Ethernet throttling would, with small buffers, end up waiting in the sending host (or router). They thus spend more time in a potentially active queue instead of in a dumb one. But even if the host queue is dumb, the overall delay is no worse than with the larger Ethernet buffers. > Throughput never exceeds capacity. If I have a 10 GBPS link, I will never get more than 10 GBPS through it. Buffer fill rate is statistically predictable. With small buffers, the fill rate acheives the top sooner. They increase the probability that the buffers are full, which is to say the drop probability. Which puts us to an end to end retransmission, which is the worst case of what you were worried about. Let's suppose someone has generously provisioned an office with GigE throughout, using a two-level hierarchy of switches. Some dumb schmuck then schedules every single computer to run it's backups (to a single fileserver) at the same time. That's say 100 computers all competing for one GigE link to the fileserver. If the switches are fair, each computer should get 10Mbps - that's the capacity. With throttling, each computer sees the link closed 99% of the time. It can send at link rate for the remaining 1% of the time. On medium timescales, that looks like a 10Mbps bottleneck at the first link. So the throughput on that link equals the capacity, and hopefully the goodput is also thus. The only queue that is likely to overflow is the one on the sending computer, and one would hope there is enough feedback in a host's own TCP/IP stack to prevent that. Without throttling but with ARQ, NACK or whatever you want to call it, the host has no signal to tell it to slow down - so the throughput on the edge link is more than 10Mbps (but the goodput will be less). The buffer in the outer switch fills up - no matter how big or small it is - and starts dropping packets. The switch then won't ask for retransmission of packets it's just dropped, because it has nowhere to put them. The same process then repeats at the inner switch. Finally, the server sees the missing packets, and asks for the retransmission - but these requests have to be switched all the way back to the clients, because the missing packets aren't in the switches' buffers. It's therefore no better than a TCP SACK retransmission. So there you have a classic congested network scenario in which throttling solves the problem, but link-level retransmission can't. Where ARQ and/or NACK come in handy is where the link itself is unreliable, such as on WLANs (hence the use in amateur radio) and last-mile links. In that case, the reason for the packet loss is not a full receive buffer, so asking for a retransmission is not inherently self-defeating. > I'm not going to argue against letting retransmission go end to end; it's an endless debate. I'll simply note that several link layers, including but not limited to those you mention, find that applications using them work better if there is a high high probability of retransmission in an interval on the order of the link RTT as opposed to the end to end RTT. You brought up data centers (aka variable delays in LAN networks); those have been heavily the province of fiberchannel, which is a link layer protocol with retransmission. Think about it. What I'd like to see is a complete absence of need for retransmission on a properly built wired network. Obviously the capability still needs to be there to cope with the parts that aren't properly built or aren't wired, but TCP can do that. Throttling (in the form of Ethernet PAUSE) is simply the third possible method of signalling congestion in the network, alongside delay and loss - and it happens to be quite widely deployed already. - Jonathan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-16 0:31 ` Jonathan Morton @ 2011-05-16 7:51 ` Richard Scheffenegger 2011-05-16 9:49 ` Fred Baker 0 siblings, 1 reply; 8+ messages in thread From: Richard Scheffenegger @ 2011-05-16 7:51 UTC (permalink / raw) To: Jonathan Morton, Fred Baker; +Cc: bloat Jonathan, > What I'd like to see is a complete absence of need for retransmission on a > properly > built wired network. Obviously the capability still needs to be there to > cope with > the parts that aren't properly built or aren't wired, but TCP can do that. > Throttling > (in the form of Ethernet PAUSE) is simply the third possible method of > signalling > congestion in the network, alongside delay and loss - and it happens to be > quite > widely deployed already. Two comments: TCP can currently NOT deal properly with non-congestion loss (with other words, any loss will lead to a congestion control reaction - reduction of sending rate). TCP can only (mostly) deal with the recovery part in a hopefully timely fashion. In this area you'll find a high number of possible approaches, none of which is quite backwards-compatible with "standard" TCP. Second, you wouldn't want to deploy basic 802.3x to any network consisting of more than a single switch. If you do, you can run into an effect called congestion tree formation, where (simplified) the slowest receiver determines the global speed of your ethernet network. 802.1Qbb is also prone to congestion trees, even though the probability is somewhat reduced provided all priority classes are being used. Unfortunately, most traffic is in the same 802.1p class... Adequate solutions (more complex than the FCP buffer-credit based congestion avoidance) like 802.1Qau / QCN are not available commercially afaik. (They need new NICs + new Switches for the HW support). But I agree, a L3 device should be able to distribute L2 congestion information into the L3 header (even though today, cheap generic broadcom and perhaps even Realtek chipsets support ECN marking even when they are running as L2 switch; a speciality firmware (see the DCTCP papers) is required though. Best regards, Richard ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) 2011-05-16 7:51 ` Richard Scheffenegger @ 2011-05-16 9:49 ` Fred Baker 2011-05-16 11:23 ` [Bloat] Jumbo frames and LAN buffers Jim Gettys 0 siblings, 1 reply; 8+ messages in thread From: Fred Baker @ 2011-05-16 9:49 UTC (permalink / raw) To: Richard Scheffenegger; +Cc: bloat On May 16, 2011, at 9:51 AM, Richard Scheffenegger wrote: > Second, you wouldn't want to deploy basic 802.3x to any network consisting of more than a single switch. actually, it's pretty common practice. Three layers, even. People build backbones, and then ring them with workgroup switches, and then put small switches on their desks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 9:49 ` Fred Baker @ 2011-05-16 11:23 ` Jim Gettys 2011-05-16 13:15 ` Kevin Gross 0 siblings, 1 reply; 8+ messages in thread From: Jim Gettys @ 2011-05-16 11:23 UTC (permalink / raw) To: bloat On 05/16/2011 05:49 AM, Fred Baker wrote: > On May 16, 2011, at 9:51 AM, Richard Scheffenegger wrote: > >> Second, you wouldn't want to deploy basic 802.3x to any network consisting of more than a single switch. > actually, it's pretty common practice. Three layers, even. People build backbones, and then ring them with workgroup switches, and then put small switches on their desks. > Not necessarily out of knowledge or desire (since it isn't usually controllable in the small switches you buy for home). It can cause trouble even in small environments as your house. http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html I know I'm at least three consumer switches deep, and it's not by choice. - Jim ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 11:23 ` [Bloat] Jumbo frames and LAN buffers Jim Gettys @ 2011-05-16 13:15 ` Kevin Gross 2011-05-16 13:22 ` Jim Gettys 2011-05-16 18:36 ` Richard Scheffenegger 0 siblings, 2 replies; 8+ messages in thread From: Kevin Gross @ 2011-05-16 13:15 UTC (permalink / raw) To: bloat All the stand-alone switches I've looked at recently either do not support 802.3x or support it in the (desireable) manner described in the last paragraph of the linked blog post. I don't believe Ethernet flow control is a factor in current LANs. I'd be interested to know the specifics if anyone sees it differently. My understanding is that 802.1au, "lossless Ethernet", was designed primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and LAN can share a common infrastructure in datacenters. I don't believe anyone intends for it to be enabled for traffic classes carrying TCP. Kevin Gross -----Original Message----- From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys Sent: Monday, May 16, 2011 5:24 AM To: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Jumbo frames and LAN buffers Not necessarily out of knowledge or desire (since it isn't usually controllable in the small switches you buy for home). It can cause trouble even in small environments as your house. http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html I know I'm at least three consumer switches deep, and it's not by choice. - Jim ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 13:15 ` Kevin Gross @ 2011-05-16 13:22 ` Jim Gettys 2011-05-16 13:42 ` Kevin Gross [not found] ` <-854731558634984958@unknownmsgid> 2011-05-16 18:36 ` Richard Scheffenegger 1 sibling, 2 replies; 8+ messages in thread From: Jim Gettys @ 2011-05-16 13:22 UTC (permalink / raw) To: bloat On 05/16/2011 09:15 AM, Kevin Gross wrote: > All the stand-alone switches I've looked at recently either do not support > 802.3x or support it in the (desireable) manner described in the last > paragraph of the linked blog post. I don't believe Ethernet flow control is > a factor in current LANs. I'd be interested to know the specifics if anyone > sees it differently. Heh. Plug wireshark into current off the shelf cheap consumer switches intended for the home. You won't like what you see. And you have no way to manage them. I was quite surprised last fall when doing my home experiments to see 802.3 frames; I had been blissfully unaware of its existence, and had to go read up on it as a result. I don't think any of the enterprise switches are so brain damaged. So i suspect it's mostly lurking to cause trouble in home and small office environments, exactly where no-one will know what's going on. - Jim > My understanding is that 802.1au, "lossless Ethernet", was designed > primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and > LAN can share a common infrastructure in datacenters. I don't believe anyone > intends for it to be enabled for traffic classes carrying TCP. > > Kevin Gross > > -----Original Message----- > From: bloat-bounces@lists.bufferbloat.net > [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys > Sent: Monday, May 16, 2011 5:24 AM > To: bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Jumbo frames and LAN buffers > > Not necessarily out of knowledge or desire (since it isn't usually > controllable in the small switches you buy for home). It can cause > trouble even in small environments as your house. > > http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html > > I know I'm at least three consumer switches deep, and it's not by choice. > - Jim > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 13:22 ` Jim Gettys @ 2011-05-16 13:42 ` Kevin Gross 2011-05-16 15:23 ` Jim Gettys [not found] ` <-854731558634984958@unknownmsgid> 1 sibling, 1 reply; 8+ messages in thread From: Kevin Gross @ 2011-05-16 13:42 UTC (permalink / raw) To: bloat I would like to try this. Can you suggest specific equipment to look at. Due to integration and low port count, most of the cheap consumer stuff has surprisingly good layer-2 performance. I've tested a bunch of Linksys and other small/medium business 5 to 24 port gigabit switches. Since I measure latency, I expect I would have noticed if flow control were kicking in. Kevin Gross -----Original Message----- From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys Sent: Monday, May 16, 2011 7:23 AM To: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Jumbo frames and LAN buffers On 05/16/2011 09:15 AM, Kevin Gross wrote: > All the stand-alone switches I've looked at recently either do not support > 802.3x or support it in the (desireable) manner described in the last > paragraph of the linked blog post. I don't believe Ethernet flow control is > a factor in current LANs. I'd be interested to know the specifics if anyone > sees it differently. Heh. Plug wireshark into current off the shelf cheap consumer switches intended for the home. You won't like what you see. And you have no way to manage them. I was quite surprised last fall when doing my home experiments to see 802.3 frames; I had been blissfully unaware of its existence, and had to go read up on it as a result. I don't think any of the enterprise switches are so brain damaged. So i suspect it's mostly lurking to cause trouble in home and small office environments, exactly where no-one will know what's going on. - Jim ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 13:42 ` Kevin Gross @ 2011-05-16 15:23 ` Jim Gettys 0 siblings, 0 replies; 8+ messages in thread From: Jim Gettys @ 2011-05-16 15:23 UTC (permalink / raw) To: bloat On 05/16/2011 09:42 AM, Kevin Gross wrote: > I would like to try this. Can you suggest specific equipment to look at. Due > to integration and low port count, most of the cheap consumer stuff has > surprisingly good layer-2 performance. I've tested a bunch of Linksys and > other small/medium business 5 to 24 port gigabit switches. Since I measure > latency, I expect I would have noticed if flow control were kicking in. I think I was using a D-Link DGS2208. (8 port consumer switch). I then went and looked at the spec sheets of some of the other consumer kit out there and found they all had the "feature" of 802.3 flow control. I may have been using iperf to tickle it, rather than ssh. I was also playing around with an old 100Mbps switch, as documented in my blog; I don't remember if I saw it there. - Jim > Kevin Gross > > -----Original Message----- > From: bloat-bounces@lists.bufferbloat.net > [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys > Sent: Monday, May 16, 2011 7:23 AM > To: bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Jumbo frames and LAN buffers > > On 05/16/2011 09:15 AM, Kevin Gross wrote: >> All the stand-alone switches I've looked at recently either do not support >> 802.3x or support it in the (desireable) manner described in the last >> paragraph of the linked blog post. I don't believe Ethernet flow control > is >> a factor in current LANs. I'd be interested to know the specifics if > anyone >> sees it differently. > Heh. Plug wireshark into current off the shelf cheap consumer switches > intended for the home. You won't like what you see. And you have no > way to manage them. I was quite surprised last fall when doing my home > experiments to see 802.3 frames; I had been blissfully unaware of its > existence, and had to go read up on it as a result. > > I don't think any of the enterprise switches are so brain damaged. So i > suspect it's mostly lurking to cause trouble in home and small office > environments, exactly where no-one will know what's going on. > - Jim > > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <-854731558634984958@unknownmsgid>]
* Re: [Bloat] Jumbo frames and LAN buffers [not found] ` <-854731558634984958@unknownmsgid> @ 2011-05-16 13:45 ` Dave Taht 0 siblings, 0 replies; 8+ messages in thread From: Dave Taht @ 2011-05-16 13:45 UTC (permalink / raw) To: Kevin Gross; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 2196 bytes --] On Mon, May 16, 2011 at 7:42 AM, Kevin Gross <kevin.gross@avanw.com> wrote: > I would like to try this. Can you suggest specific equipment to look at. > Due > to integration and low port count, most of the cheap consumer stuff has > surprisingly good layer-2 performance. I've tested a bunch of Linksys and > other small/medium business 5 to 24 port gigabit switches. Since I measure > latency, I expect I would have noticed if flow control were kicking in. > I would certainly appreciate more people looking at the switch in the wndr3700v2 we're using on the bismark project. I'm seeing some pretty deep buffering on it > > Kevin Gross > > -----Original Message----- > From: bloat-bounces@lists.bufferbloat.net > [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys > Sent: Monday, May 16, 2011 7:23 AM > To: bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Jumbo frames and LAN buffers > > On 05/16/2011 09:15 AM, Kevin Gross wrote: > > All the stand-alone switches I've looked at recently either do not > support > > 802.3x or support it in the (desireable) manner described in the last > > paragraph of the linked blog post. I don't believe Ethernet flow control > is > > a factor in current LANs. I'd be interested to know the specifics if > anyone > > sees it differently. > > Heh. Plug wireshark into current off the shelf cheap consumer switches > intended for the home. You won't like what you see. And you have no > way to manage them. I was quite surprised last fall when doing my home > experiments to see 802.3 frames; I had been blissfully unaware of its > existence, and had to go read up on it as a result. > > I don't think any of the enterprise switches are so brain damaged. So i > suspect it's mostly lurking to cause trouble in home and small office > environments, exactly where no-one will know what's going on. > - Jim > > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com [-- Attachment #2: Type: text/html, Size: 3233 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Jumbo frames and LAN buffers 2011-05-16 13:15 ` Kevin Gross 2011-05-16 13:22 ` Jim Gettys @ 2011-05-16 18:36 ` Richard Scheffenegger 1 sibling, 0 replies; 8+ messages in thread From: Richard Scheffenegger @ 2011-05-16 18:36 UTC (permalink / raw) To: Kevin Gross, bloat Kevin, > My understanding is that 802.1au, "lossless Ethernet", was designed > primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and > LAN can share a common infrastructure in datacenters. I don't believe > anyone > intends for it to be enabled for traffic classes carrying TCP. Well, QCN requires a L2 MAC sender, network and receiver cooperation (thus you need fancy "CNA" converged network adapters, to start using it - these would be reaction/reflection points; plus the congestion points - switches - would need HW support too; nothing one can buy today; higher-grade (carrier?) switches may have the reaction/reflection points built into them, and could use legacy 802.3x signalling outside the 802.1Qau cloud). The following may be too simplistic Once the hardware has a reaction point support, it classifies traffic, and calculates the per flow congestion of the path (with flow really being the classification rules by the sender), the intermediates / receiver sample the flow and return the congestion back to the sender - and within the sender, a token bucket-like rate limiter will adjust the sending rate of the appropriate flow(s) to adjust to the observed network conditions. http://www.stanford.edu/~balaji/presentations/au-prabhakar-qcn-description.pdf http://www.ieee802.org/1/files/public/docs2007/au-pan-qcn-details-053007.pdf The congestion control loop has a lot of similarities to TCP CC as you will note... Also, I haven't found out how fine-grained the classification is supposed to be (per L2 address pair? Group of flows? Which hashing then to use for mapping L2 flows into those groups between reaction/congestion/reflection points...). Anyway, for the here and now, this is pretty much esoteric stuff not relevant in this context :) Best regards, Richard ----- Original Message ----- From: "Kevin Gross" <kevin.gross@avanw.com> To: <bloat@lists.bufferbloat.net> Sent: Monday, May 16, 2011 3:15 PM Subject: Re: [Bloat] Jumbo frames and LAN buffers > All the stand-alone switches I've looked at recently either do not support > 802.3x or support it in the (desireable) manner described in the last > paragraph of the linked blog post. I don't believe Ethernet flow control > is > a factor in current LANs. I'd be interested to know the specifics if > anyone > sees it differently. > > My understanding is that 802.1au, "lossless Ethernet", was designed > primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and > LAN can share a common infrastructure in datacenters. I don't believe > anyone > intends for it to be enabled for traffic classes carrying TCP. > > Kevin Gross > > -----Original Message----- > From: bloat-bounces@lists.bufferbloat.net > [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys > Sent: Monday, May 16, 2011 5:24 AM > To: bloat@lists.bufferbloat.net > Subject: Re: [Bloat] Jumbo frames and LAN buffers > > Not necessarily out of knowledge or desire (since it isn't usually > controllable in the small switches you buy for home). It can cause > trouble even in small environments as your house. > > http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html > > I know I'm at least three consumer switches deep, and it's not by choice. > - Jim > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-05-16 18:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-05-16 18:40 [Bloat] Jumbo frames and LAN buffers Richard Scheffenegger -- strict thread matches above, loose matches on Subject: below -- 2011-04-26 17:05 [Bloat] Network computing article on bloat Dave Taht 2011-04-26 18:13 ` Dave Hart 2011-04-26 18:17 ` Dave Taht 2011-04-26 18:32 ` Wesley Eddy 2011-04-30 19:18 ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger 2011-05-05 16:01 ` Jim Gettys 2011-05-05 16:10 ` Stephen Hemminger 2011-05-05 16:49 ` [Bloat] Burst Loss Neil Davies 2011-05-08 12:42 ` Richard Scheffenegger 2011-05-09 18:06 ` Rick Jones 2011-05-12 16:31 ` Fred Baker 2011-05-13 5:00 ` Kevin Gross 2011-05-13 14:35 ` Rick Jones 2011-05-13 14:54 ` Dave Taht 2011-05-13 20:03 ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross 2011-05-14 20:48 ` Fred Baker 2011-05-15 18:28 ` Jonathan Morton 2011-05-15 20:49 ` Fred Baker 2011-05-16 0:31 ` Jonathan Morton 2011-05-16 7:51 ` Richard Scheffenegger 2011-05-16 9:49 ` Fred Baker 2011-05-16 11:23 ` [Bloat] Jumbo frames and LAN buffers Jim Gettys 2011-05-16 13:15 ` Kevin Gross 2011-05-16 13:22 ` Jim Gettys 2011-05-16 13:42 ` Kevin Gross 2011-05-16 15:23 ` Jim Gettys [not found] ` <-854731558634984958@unknownmsgid> 2011-05-16 13:45 ` Dave Taht 2011-05-16 18:36 ` Richard Scheffenegger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox