Re: [Bloat] Jumbo frames and LAN buffers

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* Re: [Bloat] Jumbo frames and LAN buffers
@ 2011-05-16 18:40 Richard Scheffenegger
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Scheffenegger @ 2011-05-16 18:40 UTC (permalink / raw)
  To: Kevin Gross, bloat


Also found this:
http://www.stanford.edu/~balaji/papers/QCN.pdf

Jim, you may notice that the congestion feedback probability function looks 
just like the basic RED marking function :)

Regards,
  Richard

----- Original Message ----- 
From: "Richard Scheffenegger" <rscheff@gmx.at>
To: "Kevin Gross" <kevin.gross@avanw.com>; <bloat@lists.bufferbloat.net>
Sent: Monday, May 16, 2011 8:36 PM
Subject: Re: [Bloat] Jumbo frames and LAN buffers


> Kevin,
>
>> My understanding is that 802.1au, "lossless Ethernet", was designed
>> primarily to allow Fibre Channel to be carried over 10 GbE so that SAN 
>> and
>> LAN can share a common infrastructure in datacenters. I don't believe 
>> anyone
>> intends for it to be enabled for traffic classes carrying TCP.
>
> Well, QCN requires a L2 MAC sender, network and receiver cooperation (thus 
> you need fancy "CNA" converged network adapters, to start using it - these 
> would be reaction/reflection points; plus the congestion points - 
> switches - would need HW support too; nothing one can buy today; 
> higher-grade (carrier?) switches may have the reaction/reflection points 
> built into them, and could use legacy 802.3x signalling outside the 
> 802.1Qau cloud).
>
> The following may be too simplistic
>
> Once the hardware has a reaction point support, it classifies traffic, and 
> calculates the per flow congestion of the path (with flow really being the 
> classification rules by the sender), the intermediates / receiver sample 
> the flow and return the congestion back to the sender - and within the 
> sender, a token bucket-like rate limiter will adjust the sending rate of 
> the appropriate flow(s) to adjust to the observed network conditions.
>
> http://www.stanford.edu/~balaji/presentations/au-prabhakar-qcn-description.pdf
> http://www.ieee802.org/1/files/public/docs2007/au-pan-qcn-details-053007.pdf
>
> The congestion control loop has a lot of similarities to TCP CC as you 
> will note...
>
> Also, I haven't found out how fine-grained the classification is supposed 
> to be (per L2 address pair? Group of flows? Which hashing then to use for 
> mapping L2 flows into those groups between reaction/congestion/reflection 
> points...).
>
>
> Anyway, for the here and now, this is pretty much esoteric stuff not 
> relevant in this context :)
>
> Best regards,
>  Richard
>
> ----- Original Message ----- 
> From: "Kevin Gross" <kevin.gross@avanw.com>
> To: <bloat@lists.bufferbloat.net>
> Sent: Monday, May 16, 2011 3:15 PM
> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>
>
>> All the stand-alone switches I've looked at recently either do not 
>> support
>> 802.3x or support it in the (desireable) manner described in the last
>> paragraph of the linked blog post. I don't believe Ethernet flow control 
>> is
>> a factor in current LANs. I'd be interested to know the specifics if 
>> anyone
>> sees it differently.
>>
>> My understanding is that 802.1au, "lossless Ethernet", was designed
>> primarily to allow Fibre Channel to be carried over 10 GbE so that SAN 
>> and
>> LAN can share a common infrastructure in datacenters. I don't believe 
>> anyone
>> intends for it to be enabled for traffic classes carrying TCP.
>>
>> Kevin Gross
>>
>> -----Original Message-----
>> From: bloat-bounces@lists.bufferbloat.net
>> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys
>> Sent: Monday, May 16, 2011 5:24 AM
>> To: bloat@lists.bufferbloat.net
>> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>>
>> Not necessarily out of knowledge or desire (since it isn't usually
>> controllable in the small switches you buy for home).  It can cause
>> trouble even in small environments as your house.
>>
>> http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
>>
>> I know I'm at least three consumer switches deep, and it's not by choice.
>>                     - Jim
>>
>>
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bloat] Network computing article on bloat
@ 2011-04-26 17:05 Dave Taht
  2011-04-26 18:13 ` Dave Hart
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Taht @ 2011-04-26 17:05 UTC (permalink / raw)
  To: bloat

Not bad, although I can live without the title. Coins a new-ish phrase
"insertion latency"

http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php

-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Network computing article on bloat
  2011-04-26 17:05 [Bloat] Network computing article on bloat Dave Taht
@ 2011-04-26 18:13 ` Dave Hart
  2011-04-26 18:17   ` Dave Taht
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Hart @ 2011-04-26 18:13 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

On Tue, Apr 26, 2011 at 17:05 UTC, Dave Taht <dave.taht@gmail.com> wrote:
> Not bad, although I can live without the title. Coins a new-ish phrase
> "insertion latency"
>
> http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php

The piece ends with a paragraph claiming preventing packet loss is
addressing a more fundamental problem which contributes to
bufferbloat.  As long as the writer and readers believe packet loss is
an unmitigated evil, the battle is lost.  More encouraging would have
been a statement that packet loss is preferable to excessive queueing
and a required TCP feedback signal when ECN isn't in play.

Cheers,
Dave Hart

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Network computing article on bloat
  2011-04-26 18:13 ` Dave Hart
@ 2011-04-26 18:17   ` Dave Taht
  2011-04-26 18:32     ` Wesley Eddy
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Taht @ 2011-04-26 18:17 UTC (permalink / raw)
  To: bloat; +Cc: dave greenfield

"Big Buffers Bad. Small Buffers Good."

"*Some* packet loss is essential for the correct operation of the Internet"

are two of the memes I try to propagate, in their simplicity. Even
then there are so many qualifiers to both of those that the core
message gets lost.



On Tue, Apr 26, 2011 at 12:13 PM, Dave Hart <davehart@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 17:05 UTC, Dave Taht <dave.taht@gmail.com> wrote:
>> Not bad, although I can live without the title. Coins a new-ish phrase
>> "insertion latency"
>>
>> http://www.networkcomputing.com/end-to-end-apm/bufferbloat-and-the-collapse-of-the-internet.php
>
> The piece ends with a paragraph claiming preventing packet loss is
> addressing a more fundamental problem which contributes to
> bufferbloat.  As long as the writer and readers believe packet loss is
> an unmitigated evil, the battle is lost.  More encouraging would have
> been a statement that packet loss is preferable to excessive queueing
> and a required TCP feedback signal when ECN isn't in play.
>
> Cheers,
> Dave Hart
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Network computing article on bloat
  2011-04-26 18:17   ` Dave Taht
@ 2011-04-26 18:32     ` Wesley Eddy
  2011-04-30 19:18       ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger
  0 siblings, 1 reply; 8+ messages in thread
From: Wesley Eddy @ 2011-04-26 18:32 UTC (permalink / raw)
  To: bloat

On 4/26/2011 2:17 PM, Dave Taht wrote:
> "Big Buffers Bad. Small Buffers Good."
>
> "*Some* packet loss is essential for the correct operation of the Internet"
>
> are two of the memes I try to propagate, in their simplicity. Even
> then there are so many qualifiers to both of those that the core
> message gets lost.


The second one is actually backwards; it should be "the Internet can
operate correctly with some packet loss".

-- 
Wes Eddy
MTI Systems

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bloat]  Goodput fraction w/ AQM vs bufferbloat
  2011-04-26 18:32     ` Wesley Eddy
@ 2011-04-30 19:18       ` Richard Scheffenegger
  2011-05-05 16:01         ` Jim Gettys
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Scheffenegger @ 2011-04-30 19:18 UTC (permalink / raw)
  To: bloat

I'm curious, has anyone done some simulations to check if the following 
qualitative statement holds true, and if, what the quantitative effect is:

With bufferbloat, the TCP congestion control reaction is unduely delayed. 
When it finally happens, the tcp stream is likely facing a "burst loss" 
event - multiple consecutive packets get dropped. Worse yet, the sender with 
the lowest RTT across the bottleneck will likely start to retransmit while 
the (tail-drop) queue is still overflowing.

And a lost retransmission means a major setback in bandwidth (except for 
Linux with bulk transfers and SACK enabled), as the standard (RFC 
documented) behaviour asks for a RTO (1sec nominally, 200-500 ms typically) 
to recover such a lost retransmission...

The second part (more important as an incentive to the ISPs actually), how 
does the fraction of goodput vs. throughput change, when AQM schemes are 
deployed, and TCP CC reacts in a timely manner? Small ISPs have to pay for 
their upstream volume, regardless if that is "real" work (goodput) or 
unneccessary retransmissions.

When I was at a small cable ISP in switzerland last week, surely enough 
bufferbloat was readily observable (17ms -> 220ms after 30 sec of a bulk 
transfer), but at first they had the "not our problem" view, until I started 
discussing burst loss / retransmissions / goodput vs throughput - with the 
latest point being a real commercial incentive to them. (They promised to 
check if AQM would be available in the CPE / CMTS, and put latency bounds in 
their tenders going forward).

Best regards,
   Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat
  2011-04-30 19:18       ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger
@ 2011-05-05 16:01         ` Jim Gettys
  2011-05-05 16:10           ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Jim Gettys @ 2011-05-05 16:01 UTC (permalink / raw)
  To: bloat

On 04/30/2011 03:18 PM, Richard Scheffenegger wrote:
> I'm curious, has anyone done some simulations to check if the 
> following qualitative statement holds true, and if, what the 
> quantitative effect is:
>
> With bufferbloat, the TCP congestion control reaction is unduely 
> delayed. When it finally happens, the tcp stream is likely facing a 
> "burst loss" event - multiple consecutive packets get dropped. Worse 
> yet, the sender with the lowest RTT across the bottleneck will likely 
> start to retransmit while the (tail-drop) queue is still overflowing.
>
> And a lost retransmission means a major setback in bandwidth (except 
> for Linux with bulk transfers and SACK enabled), as the standard (RFC 
> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms 
> typically) to recover such a lost retransmission...
>
> The second part (more important as an incentive to the ISPs actually), 
> how does the fraction of goodput vs. throughput change, when AQM 
> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs 
> have to pay for their upstream volume, regardless if that is "real" 
> work (goodput) or unneccessary retransmissions.
>
> When I was at a small cable ISP in switzerland last week, surely 
> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec 
> of a bulk transfer), but at first they had the "not our problem" view, 
> until I started discussing burst loss / retransmissions / goodput vs 
> throughput - with the latest point being a real commercial incentive 
> to them. (They promised to check if AQM would be available in the CPE 
> / CMTS, and put latency bounds in their tenders going forward).
>
I wish I had a good answer to your very good questions.  Simulation 
would be interesting though real daa is more convincing.

I haven't looked in detail at all that many traces to try to get a feel 
for how much bandwidth waste there actually is, and more formal studies 
like Netalyzr, SamKnows, or the Bismark project would be needed to 
quantify the loss on the network as a whole.

I did spend some time last fall with the traces I've taken.  In those, 
I've typically been seeing 1-3% packet loss in the main TCP transfers.  
On the wireless trace I took, I saw 9% loss, but whether that is 
bufferbloat induced loss or not, I don't know (the data is out there for 
those who might want to dig).  And as you note, the losses are 
concentrated in bursts (probably due to the details of Cubic, so I'm told).

I've had anecdotal reports (and some first hand experience) with much 
higher loss rates, for example from Nick Weaver at ICSI; but I believe 
in playing things conservatively with any numbers I quote and I've not 
gotten consistent results when I've tried, so I just report what's in 
the packet captures I did take.

A phenomena that could be occurring is that during congestion avoidance 
(until TCP loses its cookies entirely and probes for a higher operating 
point) that TCP is carefully timing it's packets to keep the buffers 
almost exactly full, so that competing flows (in my case, simple pings) 
are likely to arrive just when there is no buffer space to accept them 
and therefore you see higher losses on them than you would on the single 
flow I've been tracing and getting loss statistics from.

People who want to look into this further would be a great help.
                 - Jim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat
  2011-05-05 16:01         ` Jim Gettys
@ 2011-05-05 16:10           ` Stephen Hemminger
  2011-05-05 16:49             ` [Bloat] Burst Loss Neil Davies
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2011-05-05 16:10 UTC (permalink / raw)
  To: Jim Gettys; +Cc: bloat

On Thu, 05 May 2011 12:01:22 -0400
Jim Gettys <jg@freedesktop.org> wrote:

> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote:
> > I'm curious, has anyone done some simulations to check if the 
> > following qualitative statement holds true, and if, what the 
> > quantitative effect is:
> >
> > With bufferbloat, the TCP congestion control reaction is unduely 
> > delayed. When it finally happens, the tcp stream is likely facing a 
> > "burst loss" event - multiple consecutive packets get dropped. Worse 
> > yet, the sender with the lowest RTT across the bottleneck will likely 
> > start to retransmit while the (tail-drop) queue is still overflowing.
> >
> > And a lost retransmission means a major setback in bandwidth (except 
> > for Linux with bulk transfers and SACK enabled), as the standard (RFC 
> > documented) behaviour asks for a RTO (1sec nominally, 200-500 ms 
> > typically) to recover such a lost retransmission...
> >
> > The second part (more important as an incentive to the ISPs actually), 
> > how does the fraction of goodput vs. throughput change, when AQM 
> > schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs 
> > have to pay for their upstream volume, regardless if that is "real" 
> > work (goodput) or unneccessary retransmissions.
> >
> > When I was at a small cable ISP in switzerland last week, surely 
> > enough bufferbloat was readily observable (17ms -> 220ms after 30 sec 
> > of a bulk transfer), but at first they had the "not our problem" view, 
> > until I started discussing burst loss / retransmissions / goodput vs 
> > throughput - with the latest point being a real commercial incentive 
> > to them. (They promised to check if AQM would be available in the CPE 
> > / CMTS, and put latency bounds in their tenders going forward).
> >
> I wish I had a good answer to your very good questions.  Simulation 
> would be interesting though real daa is more convincing.
> 
> I haven't looked in detail at all that many traces to try to get a feel 
> for how much bandwidth waste there actually is, and more formal studies 
> like Netalyzr, SamKnows, or the Bismark project would be needed to 
> quantify the loss on the network as a whole.
> 
> I did spend some time last fall with the traces I've taken.  In those, 
> I've typically been seeing 1-3% packet loss in the main TCP transfers.  
> On the wireless trace I took, I saw 9% loss, but whether that is 
> bufferbloat induced loss or not, I don't know (the data is out there for 
> those who might want to dig).  And as you note, the losses are 
> concentrated in bursts (probably due to the details of Cubic, so I'm told).
> 
> I've had anecdotal reports (and some first hand experience) with much 
> higher loss rates, for example from Nick Weaver at ICSI; but I believe 
> in playing things conservatively with any numbers I quote and I've not 
> gotten consistent results when I've tried, so I just report what's in 
> the packet captures I did take.
> 
> A phenomena that could be occurring is that during congestion avoidance 
> (until TCP loses its cookies entirely and probes for a higher operating 
> point) that TCP is carefully timing it's packets to keep the buffers 
> almost exactly full, so that competing flows (in my case, simple pings) 
> are likely to arrive just when there is no buffer space to accept them 
> and therefore you see higher losses on them than you would on the single 
> flow I've been tracing and getting loss statistics from.
> 
> People who want to look into this further would be a great help.
>                  - Jim

I would not put a lot of trust in measuring loss with pings. 
I heard that some ISP's do different processing on ICMP's used
for ping packets. They either prioritize them high to provide 
artificially good response (better marketing numbers); or 
prioritize them low since they aren't useful traffic.
There are also filters that only allow N ICMP requests per second
which means repeated probes will be dropped.



-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bloat] Burst Loss
  2011-05-05 16:10           ` Stephen Hemminger
@ 2011-05-05 16:49             ` Neil Davies
  2011-05-08 12:42               ` Richard Scheffenegger
  0 siblings, 1 reply; 8+ messages in thread
From: Neil Davies @ 2011-05-05 16:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: bloat

On the issue of loss - we did a study of the UK's ADSL access network back in 2006 over several weeks, looking at the loss and delay that was introduced into the bi-directional traffic.

We found that the delay variability (that bit left over after you've taken the effects of geography and line sync rates) was broadly
the same over the half dozen locations we studied - it was there all the time to the same level of  variance and that what did vary by time of day was the loss rate.

We also found out, at the time much to our surprise - but we understand why now, that loss was broadly independent of the offered load - we used a constant data rate (with either fixed or variable packet sizes) .

We found that loss rates were in the range 1% to 3% (which is what would be expected from a large number of TCP streams contending for a limiting resource).

As for burst loss, yes it does occur - but it could be argued that this more the fault of the sending TCP stack than the network.

This phenomenon was well covered in the academic literature in the '90s (if I remember correctly folks at INRIA lead the way) - it is all down to the nature of random processes and how you observe them.  

Back to back packets see higher loss rates than packets more spread out in time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec link into a queue being serviced at 34Mbit/sec, the first packet being 'lost' is equivalent to saying that the first packet 'observed' the queue full - the system's state is no longer a random variable - it is known to be full. The second packet (lets assume it is also a full one) 'makes an observation' of the state of that queue about 12us later - but that is only 3% of the time that it takes to service such large packets at 34 Mbit/sec. The system has not had any time to 'relax' anywhere near to back its steady state, it is highly likely that it is still full. 

Fixing this makes a phenomenal difference on the goodput (with the usual delay effects that implies), we've even built and deployed systems with this sort of engineering embedded (deployed as a network 'wrap') that mean that end users can sustainably (days on end) achieve effective throughput that is better than 98% of (the transmission media imposed) maximum. What we had done is make the network behave closer to the underlying statistical assumptions made in TCP's design.

Neil

On 5 May 2011, at 17:10, Stephen Hemminger wrote:

> On Thu, 05 May 2011 12:01:22 -0400
> Jim Gettys <jg@freedesktop.org> wrote:
> 
>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote:
>>> I'm curious, has anyone done some simulations to check if the 
>>> following qualitative statement holds true, and if, what the 
>>> quantitative effect is:
>>> 
>>> With bufferbloat, the TCP congestion control reaction is unduely 
>>> delayed. When it finally happens, the tcp stream is likely facing a 
>>> "burst loss" event - multiple consecutive packets get dropped. Worse 
>>> yet, the sender with the lowest RTT across the bottleneck will likely 
>>> start to retransmit while the (tail-drop) queue is still overflowing.
>>> 
>>> And a lost retransmission means a major setback in bandwidth (except 
>>> for Linux with bulk transfers and SACK enabled), as the standard (RFC 
>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms 
>>> typically) to recover such a lost retransmission...
>>> 
>>> The second part (more important as an incentive to the ISPs actually), 
>>> how does the fraction of goodput vs. throughput change, when AQM 
>>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs 
>>> have to pay for their upstream volume, regardless if that is "real" 
>>> work (goodput) or unneccessary retransmissions.
>>> 
>>> When I was at a small cable ISP in switzerland last week, surely 
>>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec 
>>> of a bulk transfer), but at first they had the "not our problem" view, 
>>> until I started discussing burst loss / retransmissions / goodput vs 
>>> throughput - with the latest point being a real commercial incentive 
>>> to them. (They promised to check if AQM would be available in the CPE 
>>> / CMTS, and put latency bounds in their tenders going forward).
>>> 
>> I wish I had a good answer to your very good questions.  Simulation 
>> would be interesting though real daa is more convincing.
>> 
>> I haven't looked in detail at all that many traces to try to get a feel 
>> for how much bandwidth waste there actually is, and more formal studies 
>> like Netalyzr, SamKnows, or the Bismark project would be needed to 
>> quantify the loss on the network as a whole.
>> 
>> I did spend some time last fall with the traces I've taken.  In those, 
>> I've typically been seeing 1-3% packet loss in the main TCP transfers.  
>> On the wireless trace I took, I saw 9% loss, but whether that is 
>> bufferbloat induced loss or not, I don't know (the data is out there for 
>> those who might want to dig).  And as you note, the losses are 
>> concentrated in bursts (probably due to the details of Cubic, so I'm told).
>> 
>> I've had anecdotal reports (and some first hand experience) with much 
>> higher loss rates, for example from Nick Weaver at ICSI; but I believe 
>> in playing things conservatively with any numbers I quote and I've not 
>> gotten consistent results when I've tried, so I just report what's in 
>> the packet captures I did take.
>> 
>> A phenomena that could be occurring is that during congestion avoidance 
>> (until TCP loses its cookies entirely and probes for a higher operating 
>> point) that TCP is carefully timing it's packets to keep the buffers 
>> almost exactly full, so that competing flows (in my case, simple pings) 
>> are likely to arrive just when there is no buffer space to accept them 
>> and therefore you see higher losses on them than you would on the single 
>> flow I've been tracing and getting loss statistics from.
>> 
>> People who want to look into this further would be a great help.
>>                 - Jim
> 
> I would not put a lot of trust in measuring loss with pings. 
> I heard that some ISP's do different processing on ICMP's used
> for ping packets. They either prioritize them high to provide 
> artificially good response (better marketing numbers); or 
> prioritize them low since they aren't useful traffic.
> There are also filters that only allow N ICMP requests per second
> which means repeated probes will be dropped.
> 
> 
> 
> -- 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Burst Loss
  2011-05-05 16:49             ` [Bloat] Burst Loss Neil Davies
@ 2011-05-08 12:42               ` Richard Scheffenegger
  2011-05-09 18:06                 ` Rick Jones
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Scheffenegger @ 2011-05-08 12:42 UTC (permalink / raw)
  To: Neil Davies, Stephen Hemminger; +Cc: bloat


I'm not an expert in TSO / GSO, and NIC driver design, but what I gathered 
is, that with these schemes, and mordern NICs that do scatter/gather DMA of 
dotzends of "independent" header/data chuncks directly from memory, the NIC 
will typically send out non-interleaved trains of segments all belonging to 
single TCP sessions. With the implicit assumption, that these burst of up to 
180 segments (Intel supports 256kB data per chain) can be absorped by the 
buffer at the bottleneck and spread out in time there...

From my perspective, having such GSO / TSO to "cycle" through all the 
different chains belonging to different sessions (to not introduce 
reordering at the sender even), should already help pace the segments per 
session somewhat; a slightly more sophisticated DMA engine could check each 
of the chains for how much data is to be sent by those, and then clock an 
appropriate number of interleaved segmets out... I do understand that this 
is "work" for a HW DMA engine and slows down GSO software implementations, 
but may severly reduce the instantaneous rate of a single session, and 
thereby the impact of burst loss to to momenary buffer overload...

(Let me know if I should draw a picture of the way I understand TSO / HW DMA 
is currently working, and where it could be improved upon):

Best regards,
   Richard


----- Original Message ----- 
> Back to back packets see higher loss rates than packets more spread out in 
> time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec 
> link into a queue being serviced at 34Mbit/sec, the first packet being 
> 'lost' is equivalent to saying that the first packet 'observed' the queue 
> full - the system's state is no longer a random variable - it is known to 
> be full. The second packet (lets assume it is also a full one) 'makes an 
> observation' of the state of that queue about 12us later - but that is 
> only 3% of the time that it takes to service such large packets at 34 
> Mbit/sec. The system has not had any time to 'relax' anywhere near to back 
> its steady state, it is highly likely that it is still full.
>
> Fixing this makes a phenomenal difference on the goodput (with the usual 
> delay effects that implies), we've even built and deployed systems with 
> this sort of engineering embedded (deployed as a network 'wrap') that mean 
> that end users can sustainably (days on end) achieve effective throughput 
> that is better than 98% of (the transmission media imposed) maximum. What 
> we had done is make the network behave closer to the underlying 
> statistical assumptions made in TCP's design.
>
> Neil
>
>
>
>
> On 5 May 2011, at 17:10, Stephen Hemminger wrote:
>
>> On Thu, 05 May 2011 12:01:22 -0400
>> Jim Gettys <jg@freedesktop.org> wrote:
>>
>>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote:
>>>> I'm curious, has anyone done some simulations to check if the
>>>> following qualitative statement holds true, and if, what the
>>>> quantitative effect is:
>>>>
>>>> With bufferbloat, the TCP congestion control reaction is unduely
>>>> delayed. When it finally happens, the tcp stream is likely facing a
>>>> "burst loss" event - multiple consecutive packets get dropped. Worse
>>>> yet, the sender with the lowest RTT across the bottleneck will likely
>>>> start to retransmit while the (tail-drop) queue is still overflowing.
>>>>
>>>> And a lost retransmission means a major setback in bandwidth (except
>>>> for Linux with bulk transfers and SACK enabled), as the standard (RFC
>>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms
>>>> typically) to recover such a lost retransmission...
>>>>
>>>> The second part (more important as an incentive to the ISPs actually),
>>>> how does the fraction of goodput vs. throughput change, when AQM
>>>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs
>>>> have to pay for their upstream volume, regardless if that is "real"
>>>> work (goodput) or unneccessary retransmissions.
>>>>
>>>> When I was at a small cable ISP in switzerland last week, surely
>>>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec
>>>> of a bulk transfer), but at first they had the "not our problem" view,
>>>> until I started discussing burst loss / retransmissions / goodput vs
>>>> throughput - with the latest point being a real commercial incentive
>>>> to them. (They promised to check if AQM would be available in the CPE
>>>> / CMTS, and put latency bounds in their tenders going forward).
>>>>
>>> I wish I had a good answer to your very good questions.  Simulation
>>> would be interesting though real daa is more convincing.
>>>
>>> I haven't looked in detail at all that many traces to try to get a feel
>>> for how much bandwidth waste there actually is, and more formal studies
>>> like Netalyzr, SamKnows, or the Bismark project would be needed to
>>> quantify the loss on the network as a whole.
>>>
>>> I did spend some time last fall with the traces I've taken.  In those,
>>> I've typically been seeing 1-3% packet loss in the main TCP transfers.
>>> On the wireless trace I took, I saw 9% loss, but whether that is
>>> bufferbloat induced loss or not, I don't know (the data is out there for
>>> those who might want to dig).  And as you note, the losses are
>>> concentrated in bursts (probably due to the details of Cubic, so I'm 
>>> told).
>>>
>>> I've had anecdotal reports (and some first hand experience) with much
>>> higher loss rates, for example from Nick Weaver at ICSI; but I believe
>>> in playing things conservatively with any numbers I quote and I've not
>>> gotten consistent results when I've tried, so I just report what's in
>>> the packet captures I did take.
>>>
>>> A phenomena that could be occurring is that during congestion avoidance
>>> (until TCP loses its cookies entirely and probes for a higher operating
>>> point) that TCP is carefully timing it's packets to keep the buffers
>>> almost exactly full, so that competing flows (in my case, simple pings)
>>> are likely to arrive just when there is no buffer space to accept them
>>> and therefore you see higher losses on them than you would on the single
>>> flow I've been tracing and getting loss statistics from.
>>>
>>> People who want to look into this further would be a great help.
>>>                 - Jim
>>
>> I would not put a lot of trust in measuring loss with pings.
>> I heard that some ISP's do different processing on ICMP's used
>> for ping packets. They either prioritize them high to provide
>> artificially good response (better marketing numbers); or
>> prioritize them low since they aren't useful traffic.
>> There are also filters that only allow N ICMP requests per second
>> which means repeated probes will be dropped.
>>
>>
>>
>> -- 
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Burst Loss
  2011-05-08 12:42               ` Richard Scheffenegger
@ 2011-05-09 18:06                 ` Rick Jones
  2011-05-12 16:31                   ` Fred Baker
  0 siblings, 1 reply; 8+ messages in thread
From: Rick Jones @ 2011-05-09 18:06 UTC (permalink / raw)
  To: Richard Scheffenegger; +Cc: Stephen Hemminger, bloat

On Sun, 2011-05-08 at 14:42 +0200, Richard Scheffenegger wrote:
> I'm not an expert in TSO / GSO, and NIC driver design, but what I gathered 
> is, that with these schemes, and mordern NICs that do scatter/gather DMA of 
> dotzends of "independent" header/data chuncks directly from memory, the NIC 
> will typically send out non-interleaved trains of segments all belonging to 
> single TCP sessions. With the implicit assumption, that these burst of up to 
> 180 segments (Intel supports 256kB data per chain) can be absorped by the 
> buffer at the bottleneck and spread out in time there...
> 
> From my perspective, having such GSO / TSO to "cycle" through all the 
> different chains belonging to different sessions (to not introduce 
> reordering at the sender even), should already help pace the segments per 
> session somewhat; a slightly more sophisticated DMA engine could check each 
> of the chains for how much data is to be sent by those, and then clock an 
> appropriate number of interleaved segmets out... I do understand that this 
> is "work" for a HW DMA engine and slows down GSO software implementations, 
> but may severly reduce the instantaneous rate of a single session, and 
> thereby the impact of burst loss to to momenary buffer overload...
> 
> (Let me know if I should draw a picture of the way I understand TSO / HW DMA 
> is currently working, and where it could be improved upon):

GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE)
refusing to standardize an increase in frame sizes.  Put another way,
they are a "poor man's jumbo frames."

Within the context of a given "priority" at least, NICs are
setup/designed to do things in order.  I too cannot claim to be a NIC
designer, but suspect it would be a non-trivial, if straight-forward
exercise to get a NIC to cycle through multiple GSO/TSO sends.  Yes,
they could probably (ab)use any prioritization support they have.

NICs and drivers are accustomed to "in order" processing - grab packet,
send packet, update status, lather, rinse, repeat (modulo some
pre-fetching).  Those rings aren't really amenable to "out of order"
completion notifications, so the NIC would have to still do "in order"
retirement of packets or the driver model will loose simplicity.

As for the issue below, even if the NIC(s) upstream did interleave
between two GSO'd sends, you are simply trading back-to-back frames of a
single flow for back-to-back frames of different flows.  And if there is
only the one flow upstream of this bottleneck, whether GSO is on or not
probably won't make a huge difference in the timing - only how much CPU
is burned on the source host.

> Best regards,
>    Richard
> 
> 
> ----- Original Message ----- 
> > Back to back packets see higher loss rates than packets more spread out in 
> > time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec 
> > link into a queue being serviced at 34Mbit/sec, the first packet being 
> > 'lost' is equivalent to saying that the first packet 'observed' the queue 
> > full - the system's state is no longer a random variable - it is known to 
> > be full. The second packet (lets assume it is also a full one) 'makes an 
> > observation' of the state of that queue about 12us later - but that is 
> > only 3% of the time that it takes to service such large packets at 34 
> > Mbit/sec. The system has not had any time to 'relax' anywhere near to back 
> > its steady state, it is highly likely that it is still full.
> >
> > Fixing this makes a phenomenal difference on the goodput (with the usual 
> > delay effects that implies), we've even built and deployed systems with 
> > this sort of engineering embedded (deployed as a network 'wrap') that mean 
> > that end users can sustainably (days on end) achieve effective throughput 
> > that is better than 98% of (the transmission media imposed) maximum. What 
> > we had done is make the network behave closer to the underlying 
> > statistical assumptions made in TCP's design.
> >
> > Neil
> >
> >
> >
> >
> > On 5 May 2011, at 17:10, Stephen Hemminger wrote:
> >
> >> On Thu, 05 May 2011 12:01:22 -0400
> >> Jim Gettys <jg@freedesktop.org> wrote:
> >>
> >>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote:
> >>>> I'm curious, has anyone done some simulations to check if the
> >>>> following qualitative statement holds true, and if, what the
> >>>> quantitative effect is:
> >>>>
> >>>> With bufferbloat, the TCP congestion control reaction is unduely
> >>>> delayed. When it finally happens, the tcp stream is likely facing a
> >>>> "burst loss" event - multiple consecutive packets get dropped. Worse
> >>>> yet, the sender with the lowest RTT across the bottleneck will likely
> >>>> start to retransmit while the (tail-drop) queue is still overflowing.
> >>>>
> >>>> And a lost retransmission means a major setback in bandwidth (except
> >>>> for Linux with bulk transfers and SACK enabled), as the standard (RFC
> >>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms
> >>>> typically) to recover such a lost retransmission...
> >>>>
> >>>> The second part (more important as an incentive to the ISPs actually),
> >>>> how does the fraction of goodput vs. throughput change, when AQM
> >>>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs
> >>>> have to pay for their upstream volume, regardless if that is "real"
> >>>> work (goodput) or unneccessary retransmissions.
> >>>>
> >>>> When I was at a small cable ISP in switzerland last week, surely
> >>>> enough bufferbloat was readily observable (17ms -> 220ms after 30 sec
> >>>> of a bulk transfer), but at first they had the "not our problem" view,
> >>>> until I started discussing burst loss / retransmissions / goodput vs
> >>>> throughput - with the latest point being a real commercial incentive
> >>>> to them. (They promised to check if AQM would be available in the CPE
> >>>> / CMTS, and put latency bounds in their tenders going forward).
> >>>>
> >>> I wish I had a good answer to your very good questions.  Simulation
> >>> would be interesting though real daa is more convincing.
> >>>
> >>> I haven't looked in detail at all that many traces to try to get a feel
> >>> for how much bandwidth waste there actually is, and more formal studies
> >>> like Netalyzr, SamKnows, or the Bismark project would be needed to
> >>> quantify the loss on the network as a whole.
> >>>
> >>> I did spend some time last fall with the traces I've taken.  In those,
> >>> I've typically been seeing 1-3% packet loss in the main TCP transfers.
> >>> On the wireless trace I took, I saw 9% loss, but whether that is
> >>> bufferbloat induced loss or not, I don't know (the data is out there for
> >>> those who might want to dig).  And as you note, the losses are
> >>> concentrated in bursts (probably due to the details of Cubic, so I'm 
> >>> told).
> >>>
> >>> I've had anecdotal reports (and some first hand experience) with much
> >>> higher loss rates, for example from Nick Weaver at ICSI; but I believe
> >>> in playing things conservatively with any numbers I quote and I've not
> >>> gotten consistent results when I've tried, so I just report what's in
> >>> the packet captures I did take.
> >>>
> >>> A phenomena that could be occurring is that during congestion avoidance
> >>> (until TCP loses its cookies entirely and probes for a higher operating
> >>> point) that TCP is carefully timing it's packets to keep the buffers
> >>> almost exactly full, so that competing flows (in my case, simple pings)
> >>> are likely to arrive just when there is no buffer space to accept them
> >>> and therefore you see higher losses on them than you would on the single
> >>> flow I've been tracing and getting loss statistics from.
> >>>
> >>> People who want to look into this further would be a great help.
> >>>                 - Jim
> >>
> >> I would not put a lot of trust in measuring loss with pings.
> >> I heard that some ISP's do different processing on ICMP's used
> >> for ping packets. They either prioritize them high to provide
> >> artificially good response (better marketing numbers); or
> >> prioritize them low since they aren't useful traffic.
> >> There are also filters that only allow N ICMP requests per second
> >> which means repeated probes will be dropped.
> >>
> >>
> >>
> >> -- 
> >> _______________________________________________
> >> Bloat mailing list
> >> Bloat@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/bloat
> >
> > _______________________________________________
> > Bloat mailing list
> > Bloat@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat 
> 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Burst Loss
  2011-05-09 18:06                 ` Rick Jones
@ 2011-05-12 16:31                   ` Fred Baker
  2011-05-13  5:00                     ` Kevin Gross
  0 siblings, 1 reply; 8+ messages in thread
From: Fred Baker @ 2011-05-12 16:31 UTC (permalink / raw)
  To: rick.jones2; +Cc: Stephen Hemminger, bloat

On May 9, 2011, at 11:06 AM, Rick Jones wrote:

> GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE)
> refusing to standardize an increase in frame sizes.  Put another way,
> they are a "poor man's jumbo frames."

I'll agree, but only half; once the packets are transferred on the local wire, any jumbo-ness is lost. GSO/TSO mostly squeezes interframe gaps out of the wire and perhaps limits the amount of work the driver has to do. The real value of an end to end (IP) jumbo frame is that the receiving system experiences less interrupt load - a 9K frame replaces half a dozen 1500 byte frames, and as a result the receiver experiences 1/5 or 1/6 of the interrupts. Given that it has to save state, activate the kernel thread, and at least enqueue and perhaps acknowledge the received message, reducing interrupt load on the receiver makes it far more effective. This has the greatest effect on multi-gigabit file transfers.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Burst Loss
  2011-05-12 16:31                   ` Fred Baker
@ 2011-05-13  5:00                     ` Kevin Gross
  2011-05-13 14:35                       ` Rick Jones
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Gross @ 2011-05-13  5:00 UTC (permalink / raw)
  To: bloat

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

One of the principal reasons jumbo frames have not been standardized is due
to latency concerns. I assume this group can appreciate the IEEE holding
ground on this. For a short time, servers with gigabit NICs suffered but
smarter NICs were developed (TSO, LRO, other TLAs) and OSs upgraded to
support them and I believe it is no longer a significant issue.

Kevin Gross

On Thu, May 12, 2011 at 10:31 AM, Fred Baker <fred@cisco.com> wrote:

>
> On May 9, 2011, at 11:06 AM, Rick Jones wrote:
>
> > GSO/TSO can be thought of as a symptom of standards bodies (eg the IEEE)
> > refusing to standardize an increase in frame sizes.  Put another way,
> > they are a "poor man's jumbo frames."
>
> I'll agree, but only half; once the packets are transferred on the local
> wire, any jumbo-ness is lost. GSO/TSO mostly squeezes interframe gaps out of
> the wire and perhaps limits the amount of work the driver has to do. The
> real value of an end to end (IP) jumbo frame is that the receiving system
> experiences less interrupt load - a 9K frame replaces half a dozen 1500 byte
> frames, and as a result the receiver experiences 1/5 or 1/6 of the
> interrupts. Given that it has to save state, activate the kernel thread, and
> at least enqueue and perhaps acknowledge the received message, reducing
> interrupt load on the receiver makes it far more effective. This has the
> greatest effect on multi-gigabit file transfers.
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>

[-- Attachment #2: Type: text/html, Size: 1997 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Burst Loss
  2011-05-13  5:00                     ` Kevin Gross
@ 2011-05-13 14:35                       ` Rick Jones
  2011-05-13 14:54                         ` Dave Taht
  0 siblings, 1 reply; 8+ messages in thread
From: Rick Jones @ 2011-05-13 14:35 UTC (permalink / raw)
  To: Kevin Gross; +Cc: bloat

On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote:
> One of the principal reasons jumbo frames have not been standardized
> is due to latency concerns. I assume this group can appreciate the
> IEEE holding ground on this.

Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds
of queuing delay.  I don't think this list is worrying about the tens of
microseconds difference between the transmission time of a 9000 byte
frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds
difference at 10 GbE.

The "lets try to get onto the Top 500 list" crowd might, but official
sanction for a 9000 byte MTU (or larger) doesn't mean it *must* be used.
 
> For a short time, servers with gigabit NICs suffered but smarter NICs
> were developed (TSO, LRO, other TLAs) and OSs upgraded to support them
> and I believe it is no longer a significant issue.

Are TSO and LRO going to be sufficient at 40 and 100 GbE?  Cores aren't
getting any faster. Only more plentiful.  And while it isn't the
strongest point in the world, one might even argue that the need to use
TSO/LRO to achieve performance hinders new transport protocol adoption -
the presence of NIC offloads for only TCP (or UDP) leaves a new
transport protocol (perhaps SCTP) at a disadvantage.

rick jones

> Kevin Gross
> 
> On Thu, May 12, 2011 at 10:31 AM, Fred Baker <fred@cisco.com> wrote:
>         
>         On May 9, 2011, at 11:06 AM, Rick Jones wrote:
>         
>         > GSO/TSO can be thought of as a symptom of standards bodies
>         (eg the IEEE)
>         > refusing to standardize an increase in frame sizes.  Put
>         another way,
>         > they are a "poor man's jumbo frames."
>         
>         I'll agree, but only half; once the packets are transferred on
>         the local wire, any jumbo-ness is lost. GSO/TSO mostly
>         squeezes interframe gaps out of the wire and perhaps limits
>         the amount of work the driver has to do. The real value of an
>         end to end (IP) jumbo frame is that the receiving system
>         experiences less interrupt load - a 9K frame replaces half a
>         dozen 1500 byte frames, and as a result the receiver
>         experiences 1/5 or 1/6 of the interrupts. Given that it has to
>         save state, activate the kernel thread, and at least enqueue
>         and perhaps acknowledge the received message, reducing
>         interrupt load on the receiver makes it far more effective.
>         This has the greatest effect on multi-gigabit file transfers.
>         _______________________________________________
>         Bloat mailing list
>         Bloat@lists.bufferbloat.net
>         https://lists.bufferbloat.net/listinfo/bloat
> 
> 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Burst Loss
  2011-05-13 14:35                       ` Rick Jones
@ 2011-05-13 14:54                         ` Dave Taht
  2011-05-13 20:03                           ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Taht @ 2011-05-13 14:54 UTC (permalink / raw)
  To: rick.jones2; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 1280 bytes --]

On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote:

> On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote:
> > One of the principal reasons jumbo frames have not been standardized
> > is due to latency concerns. I assume this group can appreciate the
> > IEEE holding ground on this.
>
> Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds
> of queuing delay.  I don't think this list is worrying about the tens of
> microseconds difference between the transmission time of a 9000 byte
> frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds
> difference at 10 GbE.
>

Heh.  With the first iteration of the bismark project I'm trying to get to
where I have less than 30ms latency under load and have far larger problems
to worry about than jumbo frames. I'll be lucky to manage 1/10th that
(300ms) at this point.

Not, incidentally that I mind the idea of jumbo frames. It seems silly to be
saddled with default frame sizes that made sense in the 70s, and in an age
where we will be seeing ever more packet encapsulation, reducing the header
size as a ratio to data size strikes me as a very worthy goal.

-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

[-- Attachment #2: Type: text/html, Size: 1699 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bloat] Jumbo frames and LAN buffers (was: RE:  Burst Loss)
  2011-05-13 14:54                         ` Dave Taht
@ 2011-05-13 20:03                           ` Kevin Gross
  2011-05-14 20:48                             ` Fred Baker
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Gross @ 2011-05-13 20:03 UTC (permalink / raw)
  To: bloat

[-- Attachment #1: Type: text/plain, Size: 2529 bytes --]

Do we think that bufferbloat is just a WAN problem? I work on live media
applications for LANs and campus networks. I'm seeing what I think could be
characterized as bufferbloat in LAN equipment. The timescales on 1 Gb
Ethernet are orders of magnitude shorter and the performance problems caused
are in many cases a bit different but root cause and potential solutions
are, I'm hoping, very similar.

Keeping the frame byte size small while the frame time has shrunk maintains
the overhead at the same level. Again, this has been a conscious decision
not a stubborn relic. Ethernet improvements have increased bandwidth by
orders of magnitude. Do we really need to increase it by a couple percentage
points more by reducing overhead for large payloads?

The cost of that improved marginal bandwidth efficiency is a 6x increase in
latency. Many applications would not notice an increase from 12 us to 72 us
for a Gigabit switch hop. But on a large network it adds up, some
applications are absolutely that sensitive (transaction processing, cluster
computing, SANs) and (I thought I'd be preaching to the choir here) there's
no way to ever recover the lost performance.

Kevin Gross

From: Dave Taht [mailto:dave.taht@gmail.com] 
Sent: Friday, May 13, 2011 8:54 AM
To: rick.jones2@hp.com
Cc: Kevin Gross; bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Burst Loss

On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote:

On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote:
> One of the principal reasons jumbo frames have not been standardized
> is due to latency concerns. I assume this group can appreciate the
> IEEE holding ground on this.

Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds
of queuing delay.  I don't think this list is worrying about the tens of
microseconds difference between the transmission time of a 9000 byte
frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds
difference at 10 GbE.

Heh.  With the first iteration of the bismark project I'm trying to get to
where I have less than 30ms latency under load and have far larger problems
to worry about than jumbo frames. I'll be lucky to manage 1/10th that
(300ms) at this point. 

Not, incidentally that I mind the idea of jumbo frames. It seems silly to be
saddled with default frame sizes that made sense in the 70s, and in an age
where we will be seeing ever more packet encapsulation, reducing the header
size as a ratio to data size strikes me as a very worthy goal.

[-- Attachment #2: Type: text/html, Size: 8491 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers (was: RE:  Burst Loss)
  2011-05-13 20:03                           ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross
@ 2011-05-14 20:48                             ` Fred Baker
  2011-05-15 18:28                               ` Jonathan Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Fred Baker @ 2011-05-14 20:48 UTC (permalink / raw)
  To: Kevin Gross; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 5084 bytes --]

On May 13, 2011, at 1:03 PM, Kevin Gross wrote:

> Do we think that bufferbloat is just a WAN problem? I work on live media applications for LANs and campus networks. I'm seeing what I think could be characterized as bufferbloat in LAN equipment. The timescales on 1 Gb Ethernet are orders of magnitude shorter and the performance problems caused are in many cases a bit different but root cause and potential solutions are, I'm hoping, very similar.

Bufferbloat is most noticeable on WANs, because they have longer delays, but yes LAN equipment does the same thing. It shows up as extended delay or as an increase in loss rates. A lot of LAN equipment has very shallow buffers due to cost (LAN markets are very cost-sensitive). One myth with bufferbloat is that a reasonable solution is to make the buffer shallow; no, because when the queue fills you now have an increased loss rate, which shows up in timeout-driven retransmissions - you really want a deep buffer (for bursts and temporary surges) that you keep shallow using AQM techniques.

> Keeping the frame byte size small while the frame time has shrunk maintains the overhead at the same level. Again, this has been a conscious decision not a stubborn relic. Ethernet improvements have increased bandwidth by orders of magnitude. Do we really need to increase it by a couple percentage points more by reducing overhead for large payloads?

You might talk with folks who do the LAN Speed records. They generally view end to end jumboframes as material to the achievement. It's not about changing the serialization delay, it's about changing the amount of processing at the endpoints.

> The cost of that improved marginal bandwidth efficiency is a 6x increase in latency. Many applications would not notice an increase from 12 us to 72 us for a Gigabit switch hop. But on a large network it adds up, some applications are absolutely that sensitive (transaction processing, cluster computing, SANs) and (I thought I'd be preaching to the choir here) there's no way to ever recover the lost performance.

Well, the extra delay is solvable in the transport. The question isn't really what the impact on the network is; it's what the requirements of the application are. For voice, if a voice sample is delayed 50 ms the jitter buffer in the codec resolves that - microseconds are irrelevant. Video codecs generally keep at least three video frames in their jitter buffer; at 30 fps, that's 100 milliseconds of acceptable variation in delay. milliseconds. 

Where it gets dicey is in elastic applications (applications using transports with the characteristics of TCP) that are retransmitting or otherwise reacting in timeframes comparable to the RTT and the RTT is small, or in elastic applications in which the timeout-retransmission interval is on the order of hundreds of milliseconds to seconds (true of most TCPs) but the RTT is on the order of microseconds to milliseconds. In the former, a deep queue buildup and trigger a transmission that further builds the queue; in the latter, a hiccup can have dramatic side effects. There is ongoing research on how best to do such things in data centers. My suspicion is that the right approach is something akin to 802.2 at the link layer, but with NACK retransmission - system A enumerates the data it sends to system B, and if system B sees a number skip it asks A to retransmit the indicated datagram. You might take a look at RFC 5401/5740/5776 for implementation suggestions. 

> Kevin Gross
>  
> From: Dave Taht [mailto:dave.taht@gmail.com] 
> Sent: Friday, May 13, 2011 8:54 AM
> To: rick.jones2@hp.com
> Cc: Kevin Gross; bloat@lists.bufferbloat.net
> Subject: Re: [Bloat] Burst Loss
>  
>  
> 
> On Fri, May 13, 2011 at 8:35 AM, Rick Jones <rick.jones2@hp.com> wrote:
> On Thu, 2011-05-12 at 23:00 -0600, Kevin Gross wrote:
> > One of the principal reasons jumbo frames have not been standardized
> > is due to latency concerns. I assume this group can appreciate the
> > IEEE holding ground on this.
> 
> Thusfar at least, bloaters are fighting to eliminate 10s of milliseconds
> of queuing delay.  I don't think this list is worrying about the tens of
> microseconds difference between the transmission time of a 9000 byte
> frame at 1 GbE vs a 1500 byte frame, or the single digit microseconds
> difference at 10 GbE.
> 
> Heh.  With the first iteration of the bismark project I'm trying to get to where I have less than 30ms latency under load and have far larger problems to worry about than jumbo frames. I'll be lucky to manage 1/10th that (300ms) at this point. 
> 
> Not, incidentally that I mind the idea of jumbo frames. It seems silly to be saddled with default frame sizes that made sense in the 70s, and in an age where we will be seeing ever more packet encapsulation, reducing the header size as a ratio to data size strikes me as a very worthy goal.
> 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

[-- Attachment #2: Type: text/html, Size: 15595 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers (was: RE:  Burst Loss)
  2011-05-14 20:48                             ` Fred Baker
@ 2011-05-15 18:28                               ` Jonathan Morton
  2011-05-15 20:49                                 ` Fred Baker
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Morton @ 2011-05-15 18:28 UTC (permalink / raw)
  To: Fred Baker; +Cc: bloat

On 14 May, 2011, at 11:48 pm, Fred Baker wrote:

> My suspicion is that the right approach is something akin to 802.2 at the link layer, but with NACK retransmission - system A enumerates the data it sends to system B, and if system B sees a number skip it asks A to retransmit the indicated datagram. You might take a look at RFC 5401/5740/5776 for implementation suggestions.

This sounds like "reliable datagram" semantics to me.  It also sounds a lot like ARQ as used in amateur packet radio.  I believe similar mechanisms are built into 802.11.

The fundamental thing is that the sender must be able to know when sent frames can be flushed from the buffer because they don't need to be retransmitted.  So if there's a NACK, there must also be an ACK - at which point the ACK serves the purpose of the NACK, as it does in TCP.  The only alternative is a wall-time TTL, which is doable on single hops but requires careful design.

Let's face it.  UDP is unreliable by design - applications using it *must* anticipate and cope with dropped and delayed packets, either by exponential RTO or ARQ or NACK or FEC, all at the application layer.  And, in a congested network, some UDP packets *will* be lost.

TCP is reliable but needs to maintain appropriate window sizes - which it doesn't at present because a lossless network without ECN provides insufficient feedback (and AQM, which is required for good ECN signals, is usually absent), and in the quest for performance, the trend has been inexorably towards more aggressive window sizing (of which TCP-Fit is the latest example).  At the receiver end, it is possible to restrain this trend by reducing the receive window.

Unfortunately, it's useless to expect Ethernet switches to turn on ECN.  They operate at a lower stack level than IP, so they will not modify the IP TOS headers.  However, recent versions of Ethernet *do* support a throttling feedback mechanism, and this can and should be exploited to tell the edge host or router that ECN *might* be needed.  Also, with throttling feedback throughout the LAN, the Ethernet can for practical purposes be treated as almost-reliable.  This is *better* in terms of packet loss than ARQ or NACK, although if the Ethernet's buffers are large, it will still increase delay.  (With small buffers, it will just decrease throughput to the capacity, which is fine.)

 - Jonathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers (was: RE:  Burst Loss)
  2011-05-15 18:28                               ` Jonathan Morton
@ 2011-05-15 20:49                                 ` Fred Baker
  2011-05-16  0:31                                   ` Jonathan Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Fred Baker @ 2011-05-15 20:49 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

On May 15, 2011, at 11:28 AM, Jonathan Morton wrote:
> The fundamental thing is that the sender must be able to know when sent frames can be flushed from the buffer because they don't need to be retransmitted.  So if there's a NACK, there must also be an ACK - at which point the ACK serves the purpose of the NACK, as it does in TCP.  The only alternative is a wall-time TTL, which is doable on single hops but requires careful design.

To a point. NORM holds a frame for possible retransmission for a stated period of time, and if retransmission isn't requested in that interval forgets it. So the ack isn't actually necessary; what is necessary is that the retention interval be long enough that a nack has a high probability of succeeding in getting the message through. A 100 Gbit interface can handle 97656 per millisecond (100G/(8*128*1000). We're looking at something on the order of 18 bits (4 ms to retransmit without falling back to TCP) for a rational sequence number at 100 Gbps; 16 bits would be enough at 10 Gbps, and 12 bits would be enough at 1 Gbps.

> ...recent versions of Ethernet *do* support a throttling feedback mechanism, and this can and should be exploited to tell the edge host or router that ECN *might* be needed.  Also, with throttling feedback throughout the LAN, the Ethernet can for practical purposes be treated as almost-reliable.  This is *better* in terms of packet loss than ARQ or NACK, although if the Ethernet's buffers are large, it will still increase delay.  (With small buffers, it will just decrease throughput to the capacity, which is fine.)

It increases the delay anyway. It just pushes the retention buffer to another place. What do you think the packet is doing during the "don't transmit" interval?

Throughput never exceeds capacity. If I have a 10 GBPS link, I will never get more than 10 GBPS through it. Buffer fill rate is statistically predictable. With small buffers, the fill rate acheives the top sooner. They increase the probability that the buffers are full, which is to say the drop probability. Which puts us to an end to end retransmission, which is the worst case of what you were worried about.

I'm not going to argue against letting retransmission go end to end; it's an endless debate. I'll simply note that several link layers, including but not limited to those you mention, find that applications using them work better if there is a high high probability of retransmission in an interval on the order of the link RTT as opposed to the end to end RTT. You brought up data centers (aka variable delays in LAN networks); those have been heavily the province of fiberchannel, which is a link layer protocol with retransmission. Think about it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers (was: RE:  Burst Loss)
  2011-05-15 20:49                                 ` Fred Baker
@ 2011-05-16  0:31                                   ` Jonathan Morton
  2011-05-16  7:51                                     ` Richard Scheffenegger
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Morton @ 2011-05-16  0:31 UTC (permalink / raw)
  To: Fred Baker; +Cc: bloat

On 15 May, 2011, at 11:49 pm, Fred Baker wrote:

> 
> On May 15, 2011, at 11:28 AM, Jonathan Morton wrote:
>> The fundamental thing is that the sender must be able to know when sent frames can be flushed from the buffer because they don't need to be retransmitted.  So if there's a NACK, there must also be an ACK - at which point the ACK serves the purpose of the NACK, as it does in TCP.  The only alternative is a wall-time TTL, which is doable on single hops but requires careful design.
> 
> To a point. NORM holds a frame for possible retransmission for a stated period of time, and if retransmission isn't requested in that interval forgets it. So the ack isn't actually necessary; what is necessary is that the retention interval be long enough that a nack has a high probability of succeeding in getting the message through.

Okay, so because it can fall back to TCP's retransmit, the retention requirements can be relaxed.

>> ...recent versions of Ethernet *do* support a throttling feedback mechanism, and this can and should be exploited to tell the edge host or router that ECN *might* be needed.  Also, with throttling feedback throughout the LAN, the Ethernet can for practical purposes be treated as almost-reliable.  This is *better* in terms of packet loss than ARQ or NACK, although if the Ethernet's buffers are large, it will still increase delay.  (With small buffers, it will just decrease throughput to the capacity, which is fine.)
> 
> It increases the delay anyway. It just pushes the retention buffer to another place. What do you think the packet is doing during the "don't transmit" interval?

Most packets delayed by Ethernet throttling would, with small buffers, end up waiting in the sending host (or router).  They thus spend more time in a potentially active queue instead of in a dumb one.  But even if the host queue is dumb, the overall delay is no worse than with the larger Ethernet buffers.

> Throughput never exceeds capacity. If I have a 10 GBPS link, I will never get more than 10 GBPS through it. Buffer fill rate is statistically predictable. With small buffers, the fill rate acheives the top sooner. They increase the probability that the buffers are full, which is to say the drop probability. Which puts us to an end to end retransmission, which is the worst case of what you were worried about.

Let's suppose someone has generously provisioned an office with GigE throughout, using a two-level hierarchy of switches.  Some dumb schmuck then schedules every single computer to run it's backups (to a single fileserver) at the same time.  That's say 100 computers all competing for one GigE link to the fileserver.  If the switches are fair, each computer should get 10Mbps - that's the capacity.

With throttling, each computer sees the link closed 99% of the time.  It can send at link rate for the remaining 1% of the time.  On medium timescales, that looks like a 10Mbps bottleneck at the first link.  So the throughput on that link equals the capacity, and hopefully the goodput is also thus.  The only queue that is likely to overflow is the one on the sending computer, and one would hope there is enough feedback in a host's own TCP/IP stack to prevent that.

Without throttling but with ARQ, NACK or whatever you want to call it, the host has no signal to tell it to slow down - so the throughput on the edge link is more than 10Mbps (but the goodput will be less).  The buffer in the outer switch fills up - no matter how big or small it is - and starts dropping packets.  The switch then won't ask for retransmission of packets it's just dropped, because it has nowhere to put them.  The same process then repeats at the inner switch.  Finally, the server sees the missing packets, and asks for the retransmission - but these requests have to be switched all the way back to the clients, because the missing packets aren't in the switches' buffers.  It's therefore no better than a TCP SACK retransmission.

So there you have a classic congested network scenario in which throttling solves the problem, but link-level retransmission can't.

Where ARQ and/or NACK come in handy is where the link itself is unreliable, such as on WLANs (hence the use in amateur radio) and last-mile links.  In that case, the reason for the packet loss is not a full receive buffer, so asking for a retransmission is not inherently self-defeating.

> I'm not going to argue against letting retransmission go end to end; it's an endless debate. I'll simply note that several link layers, including but not limited to those you mention, find that applications using them work better if there is a high high probability of retransmission in an interval on the order of the link RTT as opposed to the end to end RTT. You brought up data centers (aka variable delays in LAN networks); those have been heavily the province of fiberchannel, which is a link layer protocol with retransmission. Think about it.

What I'd like to see is a complete absence of need for retransmission on a properly built wired network.  Obviously the capability still needs to be there to cope with the parts that aren't properly built or aren't wired, but TCP can do that. Throttling (in the form of Ethernet PAUSE) is simply the third possible method of signalling congestion in the network, alongside delay and loss - and it happens to be quite widely deployed already.

 - Jonathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers (was: RE:  Burst Loss)
  2011-05-16  0:31                                   ` Jonathan Morton
@ 2011-05-16  7:51                                     ` Richard Scheffenegger
  2011-05-16  9:49                                       ` Fred Baker
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Scheffenegger @ 2011-05-16  7:51 UTC (permalink / raw)
  To: Jonathan Morton, Fred Baker; +Cc: bloat

Jonathan,

> What I'd like to see is a complete absence of need for retransmission on a 
> properly
> built wired network.  Obviously the capability still needs to be there to 
> cope with
> the parts that aren't properly built or aren't wired, but TCP can do that. 
> Throttling
> (in the form of Ethernet PAUSE) is simply the third possible method of 
> signalling
> congestion in the network, alongside delay and loss - and it happens to be 
> quite
> widely deployed already.

Two comments: TCP can currently NOT deal properly with non-congestion loss 
(with
other words, any loss will lead to a congestion control reaction - reduction 
of sending
rate). TCP can only (mostly) deal with the recovery part in a hopefully 
timely fashion.
In this area you'll find a high number of possible approaches, none of which 
is quite
backwards-compatible with "standard" TCP.

Second, you wouldn't want to deploy basic 802.3x to any network consisting 
of more
than a single switch. If you do, you can run into an effect called 
congestion tree formation,
where (simplified) the slowest receiver determines the global speed of your
ethernet network. 802.1Qbb is also prone to congestion trees, even though 
the probability
is somewhat reduced provided all priority classes are being used. 
Unfortunately, most
traffic is in the same 802.1p class... Adequate solutions (more complex than 
the FCP
buffer-credit based congestion avoidance) like 802.1Qau / QCN are not 
available
commercially afaik. (They need new NICs +  new Switches for the HW support).

But I agree, a L3 device should be able to distribute L2 congestion 
information
into the L3 header (even though today, cheap generic broadcom and perhaps 
even
Realtek chipsets support ECN marking even when they are running as L2 
switch;
a speciality firmware (see the DCTCP papers) is required though.

Best regards,
   Richard 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers (was: RE:  Burst Loss)
  2011-05-16  7:51                                     ` Richard Scheffenegger
@ 2011-05-16  9:49                                       ` Fred Baker
  2011-05-16 11:23                                         ` [Bloat] Jumbo frames and LAN buffers Jim Gettys
  0 siblings, 1 reply; 8+ messages in thread
From: Fred Baker @ 2011-05-16  9:49 UTC (permalink / raw)
  To: Richard Scheffenegger; +Cc: bloat


On May 16, 2011, at 9:51 AM, Richard Scheffenegger wrote:

> Second, you wouldn't want to deploy basic 802.3x to any network consisting of more than a single switch. 

actually, it's pretty common practice. Three layers, even. People build backbones, and then ring them with workgroup switches, and then put small switches on their desks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers
  2011-05-16  9:49                                       ` Fred Baker
@ 2011-05-16 11:23                                         ` Jim Gettys
  2011-05-16 13:15                                           ` Kevin Gross
  0 siblings, 1 reply; 8+ messages in thread
From: Jim Gettys @ 2011-05-16 11:23 UTC (permalink / raw)
  To: bloat

On 05/16/2011 05:49 AM, Fred Baker wrote:
> On May 16, 2011, at 9:51 AM, Richard Scheffenegger wrote:
>
>> Second, you wouldn't want to deploy basic 802.3x to any network consisting of more than a single switch.
> actually, it's pretty common practice. Three layers, even. People build backbones, and then ring them with workgroup switches, and then put small switches on their desks.
>
Not necessarily out of knowledge or desire (since it isn't usually 
controllable in the small switches you buy for home).  It can cause 
trouble even in small environments as your house.

http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html

I know I'm at least three consumer switches deep, and it's not by choice.
                     - Jim




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers
  2011-05-16 11:23                                         ` [Bloat] Jumbo frames and LAN buffers Jim Gettys
@ 2011-05-16 13:15                                           ` Kevin Gross
  2011-05-16 13:22                                             ` Jim Gettys
  2011-05-16 18:36                                             ` Richard Scheffenegger
  0 siblings, 2 replies; 8+ messages in thread
From: Kevin Gross @ 2011-05-16 13:15 UTC (permalink / raw)
  To: bloat

All the stand-alone switches I've looked at recently either do not support
802.3x or support it in the (desireable) manner described in the last
paragraph of the linked blog post. I don't believe Ethernet flow control is
a factor in current LANs. I'd be interested to know the specifics if anyone
sees it differently.

My understanding is that 802.1au, "lossless Ethernet", was designed
primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and
LAN can share a common infrastructure in datacenters. I don't believe anyone
intends for it to be enabled for traffic classes carrying TCP.

Kevin Gross

-----Original Message-----
From: bloat-bounces@lists.bufferbloat.net
[mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys
Sent: Monday, May 16, 2011 5:24 AM
To: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Jumbo frames and LAN buffers

Not necessarily out of knowledge or desire (since it isn't usually 
controllable in the small switches you buy for home).  It can cause 
trouble even in small environments as your house.

http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html

I know I'm at least three consumer switches deep, and it's not by choice.
                     - Jim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers
  2011-05-16 13:15                                           ` Kevin Gross
@ 2011-05-16 13:22                                             ` Jim Gettys
  2011-05-16 13:42                                               ` Kevin Gross
       [not found]                                               ` <-854731558634984958@unknownmsgid>
  2011-05-16 18:36                                             ` Richard Scheffenegger
  1 sibling, 2 replies; 8+ messages in thread
From: Jim Gettys @ 2011-05-16 13:22 UTC (permalink / raw)
  To: bloat

On 05/16/2011 09:15 AM, Kevin Gross wrote:
> All the stand-alone switches I've looked at recently either do not support
> 802.3x or support it in the (desireable) manner described in the last
> paragraph of the linked blog post. I don't believe Ethernet flow control is
> a factor in current LANs. I'd be interested to know the specifics if anyone
> sees it differently.

Heh.  Plug wireshark into current off the shelf cheap consumer switches 
intended for the home.  You won't like what you see.  And you have no 
way to manage them.  I was quite surprised last fall when doing my home 
experiments to see 802.3 frames; I had been blissfully unaware of its 
existence, and had to go read up on it as a result.

I don't think any of the enterprise switches are so brain damaged.  So i 
suspect it's mostly lurking to cause trouble in home and small office 
environments, exactly where no-one will know what's going on.
                         - Jim

> My understanding is that 802.1au, "lossless Ethernet", was designed
> primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and
> LAN can share a common infrastructure in datacenters. I don't believe anyone
> intends for it to be enabled for traffic classes carrying TCP.
>
> Kevin Gross
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net
> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys
> Sent: Monday, May 16, 2011 5:24 AM
> To: bloat@lists.bufferbloat.net
> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>
> Not necessarily out of knowledge or desire (since it isn't usually
> controllable in the small switches you buy for home).  It can cause
> trouble even in small environments as your house.
>
> http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
>
> I know I'm at least three consumer switches deep, and it's not by choice.
>                       - Jim
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers
  2011-05-16 13:22                                             ` Jim Gettys
@ 2011-05-16 13:42                                               ` Kevin Gross
  2011-05-16 15:23                                                 ` Jim Gettys
       [not found]                                               ` <-854731558634984958@unknownmsgid>
  1 sibling, 1 reply; 8+ messages in thread
From: Kevin Gross @ 2011-05-16 13:42 UTC (permalink / raw)
  To: bloat

I would like to try this. Can you suggest specific equipment to look at. Due
to integration and low port count, most of the cheap consumer stuff has
surprisingly good layer-2 performance. I've tested a bunch of Linksys and
other small/medium business 5 to 24 port gigabit switches. Since I measure
latency, I expect I would have noticed if flow control were kicking in.

Kevin Gross

-----Original Message-----
From: bloat-bounces@lists.bufferbloat.net
[mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys
Sent: Monday, May 16, 2011 7:23 AM
To: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Jumbo frames and LAN buffers

On 05/16/2011 09:15 AM, Kevin Gross wrote:
> All the stand-alone switches I've looked at recently either do not support
> 802.3x or support it in the (desireable) manner described in the last
> paragraph of the linked blog post. I don't believe Ethernet flow control
is
> a factor in current LANs. I'd be interested to know the specifics if
anyone
> sees it differently.

Heh.  Plug wireshark into current off the shelf cheap consumer switches 
intended for the home.  You won't like what you see.  And you have no 
way to manage them.  I was quite surprised last fall when doing my home 
experiments to see 802.3 frames; I had been blissfully unaware of its 
existence, and had to go read up on it as a result.

I don't think any of the enterprise switches are so brain damaged.  So i 
suspect it's mostly lurking to cause trouble in home and small office 
environments, exactly where no-one will know what's going on.
                         - Jim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers
  2011-05-16 13:42                                               ` Kevin Gross
@ 2011-05-16 15:23                                                 ` Jim Gettys
  0 siblings, 0 replies; 8+ messages in thread
From: Jim Gettys @ 2011-05-16 15:23 UTC (permalink / raw)
  To: bloat

On 05/16/2011 09:42 AM, Kevin Gross wrote:
> I would like to try this. Can you suggest specific equipment to look at. Due
> to integration and low port count, most of the cheap consumer stuff has
> surprisingly good layer-2 performance. I've tested a bunch of Linksys and
> other small/medium business 5 to 24 port gigabit switches. Since I measure
> latency, I expect I would have noticed if flow control were kicking in.

I think I was using a D-Link DGS2208. (8 port consumer switch).

I then went and looked at the spec sheets of some of the other consumer 
kit out there and found they all had the "feature" of 802.3 flow control.

I may have been using iperf to tickle it, rather than ssh.

I was also playing around with an old 100Mbps switch, as documented in 
my blog; I don't remember if I saw it there.
                         - Jim

> Kevin Gross
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net
> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys
> Sent: Monday, May 16, 2011 7:23 AM
> To: bloat@lists.bufferbloat.net
> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>
> On 05/16/2011 09:15 AM, Kevin Gross wrote:
>> All the stand-alone switches I've looked at recently either do not support
>> 802.3x or support it in the (desireable) manner described in the last
>> paragraph of the linked blog post. I don't believe Ethernet flow control
> is
>> a factor in current LANs. I'd be interested to know the specifics if
> anyone
>> sees it differently.
> Heh.  Plug wireshark into current off the shelf cheap consumer switches
> intended for the home.  You won't like what you see.  And you have no
> way to manage them.  I was quite surprised last fall when doing my home
> experiments to see 802.3 frames; I had been blissfully unaware of its
> existence, and had to go read up on it as a result.
>
> I don't think any of the enterprise switches are so brain damaged.  So i
> suspect it's mostly lurking to cause trouble in home and small office
> environments, exactly where no-one will know what's going on.
>                           - Jim
>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <-854731558634984958@unknownmsgid>]

* Re: [Bloat] Jumbo frames and LAN buffers
       [not found]                                               ` <-854731558634984958@unknownmsgid>
@ 2011-05-16 13:45                                                 ` Dave Taht
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Taht @ 2011-05-16 13:45 UTC (permalink / raw)
  To: Kevin Gross; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]

On Mon, May 16, 2011 at 7:42 AM, Kevin Gross <kevin.gross@avanw.com> wrote:

> I would like to try this. Can you suggest specific equipment to look at.
> Due
> to integration and low port count, most of the cheap consumer stuff has
> surprisingly good layer-2 performance. I've tested a bunch of Linksys and
> other small/medium business 5 to 24 port gigabit switches. Since I measure
> latency, I expect I would have noticed if flow control were kicking in.
>

I would certainly appreciate more people looking at the switch in the
wndr3700v2 we're using on the bismark project.

I'm seeing some pretty deep buffering on it

>
> Kevin Gross
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net
> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys
> Sent: Monday, May 16, 2011 7:23 AM
> To: bloat@lists.bufferbloat.net
> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>
> On 05/16/2011 09:15 AM, Kevin Gross wrote:
> > All the stand-alone switches I've looked at recently either do not
> support
> > 802.3x or support it in the (desireable) manner described in the last
> > paragraph of the linked blog post. I don't believe Ethernet flow control
> is
> > a factor in current LANs. I'd be interested to know the specifics if
> anyone
> > sees it differently.
>
> Heh.  Plug wireshark into current off the shelf cheap consumer switches
> intended for the home.  You won't like what you see.  And you have no
> way to manage them.  I was quite surprised last fall when doing my home
> experiments to see 802.3 frames; I had been blissfully unaware of its
> existence, and had to go read up on it as a result.
>
> I don't think any of the enterprise switches are so brain damaged.  So i
> suspect it's mostly lurking to cause trouble in home and small office
> environments, exactly where no-one will know what's going on.
>                         - Jim
>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com

[-- Attachment #2: Type: text/html, Size: 3233 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bloat] Jumbo frames and LAN buffers
  2011-05-16 13:15                                           ` Kevin Gross
  2011-05-16 13:22                                             ` Jim Gettys
@ 2011-05-16 18:36                                             ` Richard Scheffenegger
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Scheffenegger @ 2011-05-16 18:36 UTC (permalink / raw)
  To: Kevin Gross, bloat

Kevin,

> My understanding is that 802.1au, "lossless Ethernet", was designed
> primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and
> LAN can share a common infrastructure in datacenters. I don't believe 
> anyone
> intends for it to be enabled for traffic classes carrying TCP.

Well, QCN requires a L2 MAC sender, network and receiver cooperation (thus 
you need fancy "CNA" converged network adapters, to start using it - these 
would be reaction/reflection points; plus the congestion points - switches - 
would need HW support too; nothing one can buy today; higher-grade 
(carrier?) switches may have the reaction/reflection points built into them, 
and could use legacy 802.3x signalling outside the 802.1Qau cloud).

The following may be too simplistic

Once the hardware has a reaction point support, it classifies traffic, and 
calculates the per flow congestion of the path (with flow really being the 
classification rules by the sender), the intermediates / receiver sample the 
flow and return the congestion back to the sender - and within the sender, a 
token bucket-like rate limiter will adjust the sending rate of the 
appropriate flow(s) to adjust to the observed network conditions.

http://www.stanford.edu/~balaji/presentations/au-prabhakar-qcn-description.pdf
http://www.ieee802.org/1/files/public/docs2007/au-pan-qcn-details-053007.pdf

The congestion control loop has a lot of similarities to TCP CC as you will 
note...

Also, I haven't found out how fine-grained the classification is supposed to 
be (per L2 address pair? Group of flows? Which hashing then to use for 
mapping L2 flows into those groups between reaction/congestion/reflection 
points...).

Anyway, for the here and now, this is pretty much esoteric stuff not 
relevant in this context :)

Best regards,
  Richard

----- Original Message ----- 
From: "Kevin Gross" <kevin.gross@avanw.com>
To: <bloat@lists.bufferbloat.net>
Sent: Monday, May 16, 2011 3:15 PM
Subject: Re: [Bloat] Jumbo frames and LAN buffers

> All the stand-alone switches I've looked at recently either do not support
> 802.3x or support it in the (desireable) manner described in the last
> paragraph of the linked blog post. I don't believe Ethernet flow control 
> is
> a factor in current LANs. I'd be interested to know the specifics if 
> anyone
> sees it differently.
>
> My understanding is that 802.1au, "lossless Ethernet", was designed
> primarily to allow Fibre Channel to be carried over 10 GbE so that SAN and
> LAN can share a common infrastructure in datacenters. I don't believe 
> anyone
> intends for it to be enabled for traffic classes carrying TCP.
>
> Kevin Gross
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net
> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Jim Gettys
> Sent: Monday, May 16, 2011 5:24 AM
> To: bloat@lists.bufferbloat.net
> Subject: Re: [Bloat] Jumbo frames and LAN buffers
>
> Not necessarily out of knowledge or desire (since it isn't usually
> controllable in the small switches you buy for home).  It can cause
> trouble even in small environments as your house.
>
> http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
>
> I know I'm at least three consumer switches deep, and it's not by choice.
>                     - Jim
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-05-16 18:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-16 18:40 [Bloat] Jumbo frames and LAN buffers Richard Scheffenegger
  -- strict thread matches above, loose matches on Subject: below --
2011-04-26 17:05 [Bloat] Network computing article on bloat Dave Taht
2011-04-26 18:13 ` Dave Hart
2011-04-26 18:17   ` Dave Taht
2011-04-26 18:32     ` Wesley Eddy
2011-04-30 19:18       ` [Bloat] Goodput fraction w/ AQM vs bufferbloat Richard Scheffenegger
2011-05-05 16:01         ` Jim Gettys
2011-05-05 16:10           ` Stephen Hemminger
2011-05-05 16:49             ` [Bloat] Burst Loss Neil Davies
2011-05-08 12:42               ` Richard Scheffenegger
2011-05-09 18:06                 ` Rick Jones
2011-05-12 16:31                   ` Fred Baker
2011-05-13  5:00                     ` Kevin Gross
2011-05-13 14:35                       ` Rick Jones
2011-05-13 14:54                         ` Dave Taht
2011-05-13 20:03                           ` [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) Kevin Gross
2011-05-14 20:48                             ` Fred Baker
2011-05-15 18:28                               ` Jonathan Morton
2011-05-15 20:49                                 ` Fred Baker
2011-05-16  0:31                                   ` Jonathan Morton
2011-05-16  7:51                                     ` Richard Scheffenegger
2011-05-16  9:49                                       ` Fred Baker
2011-05-16 11:23                                         ` [Bloat] Jumbo frames and LAN buffers Jim Gettys
2011-05-16 13:15                                           ` Kevin Gross
2011-05-16 13:22                                             ` Jim Gettys
2011-05-16 13:42                                               ` Kevin Gross
2011-05-16 15:23                                                 ` Jim Gettys
     [not found]                                               ` <-854731558634984958@unknownmsgid>
2011-05-16 13:45                                                 ` Dave Taht
2011-05-16 18:36                                             ` Richard Scheffenegger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox