[Bloat] [e2e] bufferbloat paper

Tue Jan 8 10:29:35 EST 2013

Re: "the only thing that counts is peak throughput" - it's a pretty cynical stance to say "I'm a professional engineer, but the marketing guys don't have a clue, so I'm not going to build a usable system".

It's even worse when fellow engineers *disparage* or downplay the work of engineers who are actually trying hard to fix this across the entire Internet.

Does competition require such foolishness?   Have any of the folks who work for operators and equipment suppliers followed Richard Woundy's lead (he is SVP at Comcast) and tried to *fix* the problem and get the fix deployed.  Richard is an engineer, and took the time to develop a proposed fix to DOCSIS 3.0, and also to write a "best practices" document about how to deploy that fix.  The one thing he could not do is get Comcast or its competitors to invest money in deploying the fix more rapidly.

First, it's important to measure the "right thing" - which in this case is "how much queueing *delay* builds up in the bottleneck link under load" and how bad is the user experience when that queueing delay stabilizes at more than about 20 msec.

That cannot be determined by measuring throughput, which is all the operators measure.  (I have the sworn testimony of every provider in Canada when asked by the CRTC "do you measure latency on your internet service", the answer was uniformly - we measure throughput *only*, and by Little's Lemma we can determine latency).

Engineers actually have a positive duty to society, not just to profits.  And actually, in this case, better service *would* lead to more profits!  Not directly, but because there is competition for experience, even more than for "bitrate", despite the claims of engineers.

So talk to your CEO's.  When I've done so, they say they have *never* heard of the issue.  Maybe that's due to denial throughout the organization.

(by the way, what woke Comcast up was getting hauled in front of the FCC for deploying DPI-based RST injection that disrupted large classes of connections - because they had not realized what their problem was, and the marketers wanted to blame "pirates" for clogging the circuits - for which claim they had no data other than self-serving and proprietary "studies" from the vendors like Sandvine and Ellacoya).

Actual measurements of actual network behavior revealed the bufferbloat phenomenon was the cause of disruptive events due to load in *every* case observed by me, and I've looked at a lot.  It used to happen on Frame Relay links all the time, and in datacenter TCP/IP internal deployments.

So measure first.  Measure the right thing (latency growth under load).  Ask "why is this happening?"  and don't jump to the non sequitur (pirates or "interference") without proving that the non sequitur actually explains the entire phenomenon (something Comcast failed to do, instead reasoning from anecdotal links between bittorrent and the problem.

And then when your measurements are right, and you can demonstrate a solution that *works* (rather than something that in academia would be an "interesting Ph.D. proposal"), then deploy it and monitor it.

-----Original Message-----
From: "Ingemar Johansson S" <ingemar.s.johansson at ericsson.com>
Sent: Tuesday, January 8, 2013 8:19am
To: "Keith Winstein" <keithw at mit.edu>
Cc: "mallman at icir.org" <mallman at icir.org>, "end2end-interest at postel.org" <end2end-interest at postel.org>, "bloat at lists.bufferbloat.net" <bloat at lists.bufferbloat.net>
Subject: Re: [e2e] bufferbloat paper

OK...

Likely means that AQM is not turned on in the eNodeB, can't be 100% sure though but it seems so.
At least one company I know of  offers AQM in eNodeB. However one problem seems to be that the only thing that counts is peak throughput, you have probably too seen these "up to X Mbps" slogans.  Competition is fierce snd for this reason it could be tempting to turn off AQM as it may reduce peak throughput slightly. I know and most people on these mailing lists knows that peak throughput is the "mexapixels" of the internet, one need to address other aspects in the benchmarks.

/Ingemar

> -----Original Message-----
> From: winstein at gmail.com [mailto:winstein at gmail.com] On Behalf Of Keith
> Winstein
> Sent: den 8 januari 2013 13:44
> To: Ingemar Johansson S
> Cc: end2end-interest at postel.org; bloat at lists.bufferbloat.net;
> mallman at icir.org
> Subject: Re: [e2e] bufferbloat paper
> 
> Hello Ingemar,
> 
> Thanks for your feedback and your own graph.
> 
> This is testing the LTE downlink, not the uplink. It was a TCP download.
> 
> There was zero packet loss on the ICMP pings. I did not measure the TCP
> flow itself but I suspect packet loss was minimal if not also zero.
> 
> Best,
> Keith
> 
> On Tue, Jan 8, 2013 at 7:19 AM, Ingemar Johansson S
> <ingemar.s.johansson at ericsson.com> wrote:
> > Hi
> >
> > Interesting graph, thanks for sharing it.
> > It is likely that the delay is only limited by TCPs maximum congestion
> window, for instance at T=70 the thoughput is ~15Mbps and the RTT~0.8s,
> giving a congestion window of 1.5e7/8/0.8 = 2343750 bytes, recalculations at
> other time instants seems to give a similar figure.
> > Do you see any packet loss ?
> >
> > The easiest way to mitigate bufferbloat in LTE UL is AQM in the terminal as
> the packets are buffered there.
> > The eNodeB does not buffer up packets in UL* so I would in this particular
> case argue that the problem is best solved in the terminal.
> > Implementing AQM for UL in eNodeB is probably doable but AFAIK nothing
> that is standardized also I cannot tell how feasible it is.
> >
> > /Ingemar
> >
> > BTW... UL = uplink
> > * RLC-AM retransmissions can be said to cause delay in the eNodeB but
> then again the main problem is that packets are being queued up in the
> terminals sendbuffer. The MAC layer HARQ can too cause some delay but
> this is a necessity to get an optimal performance for LTE, moreover the
> added delay due to HARQ reTx is marginal in this context.
> >
> >> -----Original Message-----
> >> From: winstein at gmail.com [mailto:winstein at gmail.com] On Behalf Of
> >> Keith Winstein
> >> Sent: den 8 januari 2013 11:42
> >> To: Ingemar Johansson S
> >> Cc: end2end-interest at postel.org; bloat at lists.bufferbloat.net;
> >> mallman at icir.org
> >> Subject: Re: [e2e] bufferbloat paper
> >>
> >> I'm sorry to report that the problem is not (in practice) better on
> >> LTE, even though the standard may support features that could be used
> >> to mitigate the problem.
> >>
> >> Here is a plot (also at
> >> http://web.mit.edu/keithw/www/verizondown.png)
> >> from a computer tethered to a Samsung Galaxy Nexus running Android
> >> 4.0.4 on Verizon LTE service, taken just now in Cambridge, Mass.
> >>
> >> The phone was stationary during the test and had four bars (a full
> >> signal) of "4G" service. The computer ran a single full-throttle TCP
> >> CUBIC download from one well-connected but unremarkable Linux host
> >> (ssh hostname 'cat /dev/urandom') while pinging at 4 Hz across the
> >> same tethered LTE interface. There were zero lost pings during the
> >> entire test
> >> (606/606 delivered).
> >>
> >> The RTT grows to 1-2 seconds and stays stable in that region for most
> >> of the test, except for one 12-second period of >5 seconds RTT. We
> >> have also tried measuring only "one-way delay" (instead of RTT) by
> >> sending UDP datagrams out of the computer's Ethernet interface over
> >> the Internet, over LTE to the cell phone and back to the originating
> >> computer via USB tethering. This gives similar results to ICMP ping.
> >>
> >> I don't doubt that the carriers could implement reasonable AQM or
> >> even a smaller buffer at the head-end, or that the phone could
> >> implement AQM for the uplink. For that matter I'm not sure the details of
> the air interface (LTE vs.
> >> UMTS vs. 1xEV-DO) necessarily makes a difference here.
> >>
> >> But at present, at least with AT&T, Verizon, Sprint and T-Mobile in
> >> Eastern Massachusetts, the carrier is willing to queue and hold on to
> >> packets for >1 second. Even a single long-running TCP download (>15
> >> megabytes) is enough to tickle this problem.
> >>
> >> In the CCR paper, even flows >1 megabyte were almost nonexistent,
> >> which may be part of how these findings are compatible.
> >>
> >> On Tue, Jan 8, 2013 at 2:35 AM, Ingemar Johansson S
> >> <ingemar.s.johansson at ericsson.com> wrote:
> >> > Hi
> >> >
> >> > Include Mark's original post (below) as it was scrubbed
> >> >
> >> > I don't have an data of bufferbloat for wireline access and the
> >> > fiber
> >> connection that I have at home shows little evidence of bufferbloat.
> >> >
> >> > Wireless access seems to be a different story though.
> >> > After reading the "Tackling Bufferbloat in 3G/4G Mobile Networks"
> >> > by Jiang et al. I decided to make a few measurements of my own
> >> > (hope that the attached png is not removed)
> >> >
> >> > The measurement setup was quite simple, a Laptop with Ubuntu 12.04
> >> with a 3G modem attached.
> >> > The throughput was computed from the wireshark logs and RTT was
> >> measured with ping (towards a webserver hosted by Akamai). The
> >> location is Luleå city centre, Sweden (fixed locations) and the
> >> measurement was made at lunchtime on Dec 6 2012 .
> >> >
> >> > During the measurement session I did some close to normal websurf,
> >> including watching embedded videoclips and youtube. In some cases the
> >> effects of bufferbloat was clearly noticeable.
> >> > Admit that this is just one sample, a more elaborate study with
> >> > more
> >> samples would be interesting to see.
> >> >
> >> > 3G has the interesting feature that packets are very seldom lost in
> >> downlink (data going to the terminal). I did not see a single packet
> >> loss in this test!. I wont elaborate on the reasons in this email.
> >> > I would however believe that LTE is better off in this respect as
> >> > long as
> >> AQM is implemented, mainly because LTE is a packet-switched
> architecture.
> >> >
> >> > /Ingemar
> >> >
> >> > Marks post.
> >> > ********
> >> > [I tried to post this in a couple places to ensure I hit folks who
> >> > would  be interested.  If you end up with multiple copies of the
> >> > email, my  apologies.  --allman]
> >> >
> >> > I know bufferbloat has been an interest of lots of folks recently.
> >> > So, I thought I'd flog a recent paper that presents a little data
> >> > on the topic ...
> >> >
> >> >     Mark Allman.  Comments on Bufferbloat, ACM SIGCOMM Computer
> >> >     Communication Review, 43(1), January 2013.
> >> >     http://www.icir.org/mallman/papers/bufferbloat-ccr13.pdf
> >> >
> >> > Its an initial paper.  I think more data would be great!
> >> >
> >> > allman
> >> >
> >> >
> >> > --
> >> > http://www.icir.org/mallman/
> >> >
> >> >
> >> >
> >> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20130108/89fa139b/attachment-0003.html>