[Bloat] Goodput fraction w/ AQM vs bufferbloat

Thu May 5 12:30:31 EDT 2011

On 05/05/2011 12:10 PM, Stephen Hemminger wrote:
> On Thu, 05 May 2011 12:01:22 -0400
> Jim Gettys<jg at freedesktop.org>  wrote:
>
>> On 04/30/2011 03:18 PM, Richard Scheffenegger wrote:
>>> I'm curious, has anyone done some simulations to check if the
>>> following qualitative statement holds true, and if, what the
>>> quantitative effect is:
>>>
>>> With bufferbloat, the TCP congestion control reaction is unduely
>>> delayed. When it finally happens, the tcp stream is likely facing a
>>> "burst loss" event - multiple consecutive packets get dropped. Worse
>>> yet, the sender with the lowest RTT across the bottleneck will likely
>>> start to retransmit while the (tail-drop) queue is still overflowing.
>>>
>>> And a lost retransmission means a major setback in bandwidth (except
>>> for Linux with bulk transfers and SACK enabled), as the standard (RFC
>>> documented) behaviour asks for a RTO (1sec nominally, 200-500 ms
>>> typically) to recover such a lost retransmission...
>>>
>>> The second part (more important as an incentive to the ISPs actually),
>>> how does the fraction of goodput vs. throughput change, when AQM
>>> schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs
>>> have to pay for their upstream volume, regardless if that is "real"
>>> work (goodput) or unneccessary retransmissions.
>>>
>>> When I was at a small cable ISP in switzerland last week, surely
>>> enough bufferbloat was readily observable (17ms ->  220ms after 30 sec
>>> of a bulk transfer), but at first they had the "not our problem" view,
>>> until I started discussing burst loss / retransmissions / goodput vs
>>> throughput - with the latest point being a real commercial incentive
>>> to them. (They promised to check if AQM would be available in the CPE
>>> / CMTS, and put latency bounds in their tenders going forward).
>>>
>> I wish I had a good answer to your very good questions.  Simulation
>> would be interesting though real daa is more convincing.
>>
>> I haven't looked in detail at all that many traces to try to get a feel
>> for how much bandwidth waste there actually is, and more formal studies
>> like Netalyzr, SamKnows, or the Bismark project would be needed to
>> quantify the loss on the network as a whole.
>>
>> I did spend some time last fall with the traces I've taken.  In those,
>> I've typically been seeing 1-3% packet loss in the main TCP transfers.
>> On the wireless trace I took, I saw 9% loss, but whether that is
>> bufferbloat induced loss or not, I don't know (the data is out there for
>> those who might want to dig).  And as you note, the losses are
>> concentrated in bursts (probably due to the details of Cubic, so I'm told).
>>
>> I've had anecdotal reports (and some first hand experience) with much
>> higher loss rates, for example from Nick Weaver at ICSI; but I believe
>> in playing things conservatively with any numbers I quote and I've not
>> gotten consistent results when I've tried, so I just report what's in
>> the packet captures I did take.
>>
>> A phenomena that could be occurring is that during congestion avoidance
>> (until TCP loses its cookies entirely and probes for a higher operating
>> point) that TCP is carefully timing it's packets to keep the buffers
>> almost exactly full, so that competing flows (in my case, simple pings)
>> are likely to arrive just when there is no buffer space to accept them
>> and therefore you see higher losses on them than you would on the single
>> flow I've been tracing and getting loss statistics from.
>>
>> People who want to look into this further would be a great help.
>>                   - Jim
> I would not put a lot of trust in measuring loss with pings.
> I heard that some ISP's do different processing on ICMP's used
> for ping packets. They either prioritize them high to provide
> artificially good response (better marketing numbers); or
> prioritize them low since they aren't useful traffic.
> There are also filters that only allow N ICMP requests per second
> which means repeated probes will be dropped.
I didn't use ping for my loss measurements above, but derived them from 
the traces themselves (using tstat: see: 
http://tstat.tlc.polito.it/index.shtml).

Your explanation is part of why I don't use what I've seen when using 
ping for loss rates (though I have yet to actually see the behaviour of 
messing with priorities or preferentially dropping that many have claimed.

Ping does often get processed on network gear slow paths, and it is 
believable that on loaded routers or broad band head end under load the 
pings might get dropped, classified or otherwise messed with.  So I made 
sure to avoid that in the loss numbers I quote on the traces I looked at.

It's also why I worked with Folkert Van Heusden last summer and fall to 
ensure that there was a TCP based ping program available (you can use 
options to httping http://www.vanheusden.com/httping/ to get an HTTP 
based ping using HTTP persistent connections), with one packet out and 
exactly one back, so it should be prioritised exactly as web traffic.  
So far, it and conventional ICMP ping have always returned effectively 
identical tests in the paths I've probed.

How much of the anecdotal information of ISP's doing this or that with 
ICMP I'd believe is therefore not clear.  But at least with httpping we 
can figure out what extent it may be true, and certainly care is in 
order on any measurements.
                     Best regards,
                                 - jim

                 - Jim