[Bloat] http/2

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* [Bloat] http/2
@ 2015-03-06 21:38 Kartik Agaram
  2015-03-12 15:02 ` Jonathan Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Kartik Agaram @ 2015-03-06 21:38 UTC (permalink / raw)
  To: bloat; +Cc: Jordan Peacock

[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]

Has HTTP/2[1] been discussed on this list?[2] I've been thinking about
bufferbloat as I read the spec, and had a couple of questions that weren't
answered in the FAQ[3]:

1. HTTP/2 reduces the number of connections per webpage. Assume for a
second that all players instantaneously adopt HTTP/2 and so reduce their
buffer sizes everywhere. Latencies will improve and there'll be less
congestion. Now back to the real world with people building websites,
trying to improve performance of websites and devices all over the place.
Will bufferbloat stay eradicated, or will the gains be temporary?

2. More generally, is there any technical way for bufferbloat to stay
solved? Or is it an inevitable tragedy of the commons dynamic that we just
have to live with and make temporary dents in?

3. Has there been discussion of solving bufferbloat at the TCP layer, by
making buffers harder to fill up? I'm thinking of heuristics like
disallowing a single site from using 80% of the buffer, thereby leaving
some slack available for other bursty requirements.

I'm sure these questions are quite naive. Pointers to further reading
greatly appreciated.

Kartik
http://akkartik.name/about

[1] https://insouciant.org/tech/http-slash-2-considerations-and-tradeoffs

[2] Google search on "site:https://lists.bufferbloat.net" didn't turn up
anything, and I get "permission denied" when trying to access the
downloadable archives at https://lists.bufferbloat.net/pipermail/bloat.

[3] https://gettys.wordpress.com/bufferbloat-faq

[-- Attachment #2: Type: text/html, Size: 2007 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-06 21:38 [Bloat] http/2 Kartik Agaram
@ 2015-03-12 15:02 ` Jonathan Morton
  2015-03-12 18:18   ` Narseo Vallina Rodriguez
  2015-03-12 18:05 ` Rich Brown
  2015-03-15  7:13 ` David Lang
  2 siblings, 1 reply; 11+ messages in thread
From: Jonathan Morton @ 2015-03-12 15:02 UTC (permalink / raw)
  To: Kartik Agaram; +Cc: Jordan Peacock, bloat

[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]

I think you may be conflating several different buffers which exist in
different places and are controlled by different means.  I'll try to
illustrate this using a single scenario with which I'm personally familiar:
a half megabit 3G connection without AQM.

Status quo is that loading a web page with many resources on it is
unreliable. Early connections succeed and become established, the
congestion window opens, the buffer in the 3G tower begins to fill up,
inducing several seconds of latency, and subsequent DNS lookups and TCP
handshakes tend to time out. End result: often, half the images on the page
are broken.

Status quo is also that a single big, continuous download (such as a
software update) is capable of inducing 45 seconds of latency on the same
connection, making it virtually impossible to do anything else with it
concurrently. This corresponds to several megabytes of dumb buffering in
the tower AND several megabytes of TCP receive window AND several megabytes
of TCP congestion window. Lose any one of those three things and the
induced latency disappears. But it's there, with a single connection.

As far as bufferbloat is concerned, HTTP 2 just converts the first
situation into the second one. If images and other resources are loaded
from the same server as the base page, as they should be, then they'll load
more reliably. But any resource loaded externally (even just sharded off)
will become less reliable (if anything) in the presence of bufferbloat,
because a separate connection still has to be made per host server.

If the queue in the tower was less dumb, then TCP would be given congestion
signals when it began to fill up. In that situation, HTTP 2 helps because
there are fewer connections that need to receive that signal to be
effective.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 1931 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-06 21:38 [Bloat] http/2 Kartik Agaram
  2015-03-12 15:02 ` Jonathan Morton
@ 2015-03-12 18:05 ` Rich Brown
  2015-03-15  7:13 ` David Lang
  2 siblings, 0 replies; 11+ messages in thread
From: Rich Brown @ 2015-03-12 18:05 UTC (permalink / raw)
  To: Kartik Agaram; +Cc: Jordan Peacock, bloat

[-- Attachment #1.1: Type: text/plain, Size: 4049 bytes --]

Hi Kartik,

Thanks for the questions.

> Has HTTP/2[1] been discussed on this list?[2] I've been thinking about bufferbloat as I read the spec, and had a couple of questions that weren't answered in the FAQ[3]:
> 
> 1. HTTP/2 reduces the number of connections per webpage. Assume for a second that all players instantaneously adopt HTTP/2 and so reduce their buffer sizes everywhere. Latencies will improve and there'll be less congestion. Now back to the real world with people building websites, trying to improve performance of websites and devices all over the place. Will bufferbloat stay eradicated, or will the gains be temporary?

A Bufferbloat algorithm (fq_codel, or other SQM (smart queue management)) is required to minimize the number of buffers queued at *any* bottleneck in a network. This occurs frequently at the home router/edge of the network, but can appear anywhere. Wherever a queue begins to build up in a network, optimal performance demands some kind of SQM

HTTP/2 may well help by requesting fewer connections, but the SQM in the router will still be in effect. If the smaller number of HTTP/2 requests from the browser don't create a queue, then SQM won't even become active. But if the browser traffic does manage to generate a queue, the router's SQM will keep it under control.

I want to emphasize that point: SQM doesn't force any fixed allocations of bandwidth, packet rate, etc. It actually measures the queue length (in msec) for each traffic flow. If all the packets are whistling through without any congestion, every sender will get the full rate of the link. SQM  only becomes active when there *is* congestion, and it throttles those flows that are sending the most traffic (to preserve the link capacity for the "little flows" that are time sensitive. 

> 2. More generally, is there any technical way for bufferbloat to stay solved? Or is it an inevitable tragedy of the commons dynamic that we just have to live with and make temporary dents in?

Yes, it will stay solved. No, there's no tragedy of the commons. (Great question, though.) The SQM algorithm only examines packets within a single router, so multiple routers are essentially independent. There's no central communication required - it's all local to a router.

In fact, the "tragedy" of solving bufferbloat is that it needs to be solved *everywhere*. That is to say that *every* router, cell phone, DSLAM, Cable modem (home and head-end), personal computer OS, and other piece of equipment on the planet needs to be updated. This is the hard part.

> 3. Has there been discussion of solving bufferbloat at the TCP layer, by making buffers harder to fill up? I'm thinking of heuristics like disallowing a single site from using 80% of the buffer, thereby leaving some slack available for other bursty requirements.

I am personally not hopeful for this kind of approach. a) The TCP algorithm in hosts isn't easily made aware of congestion elsewhere in the network, so it can't react to that congestion; b) there aren't a lot of tested proposals (beyond dropping packets) to make things better; c) it suffers from exactly the same problem as solving bufferbloat - it needs to be rolled out in every piece of gear. (We can't even attract the attention of vendors (Apple, Microsoft, most routing gear, etc). to implement the solved algorithms to improve bufferbloat. Sigh.)

> I'm sure these questions are quite naive. Pointers to further reading greatly appreciated.
> 
> Kartik
> http://akkartik.name/about
> 
> [1] https://insouciant.org/tech/http-slash-2-considerations-and-tradeoffs
> 
> [2] Google search on "site:https://lists.bufferbloat.net" didn't turn up anything, and I get "permission denied" when trying to access the downloadable archives at https://lists.bufferbloat.net/pipermail/bloat.
> 
> [3] https://gettys.wordpress.com/bufferbloat-faq
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

[-- Attachment #1.2: Type: text/html, Size: 5099 bytes --]

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-12 15:02 ` Jonathan Morton
@ 2015-03-12 18:18   ` Narseo Vallina Rodriguez
  2015-03-12 18:39     ` Jonathan Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Narseo Vallina Rodriguez @ 2015-03-12 18:18 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Jordan Peacock, Kartik Agaram, bloat

Hi Jonathan

> Status quo is that loading a web page with many resources on it is
> unreliable. Early connections succeed and become established, the congestion
> window opens, the buffer in the 3G tower begins to fill up, inducing several
> seconds of latency, and subsequent DNS lookups and TCP handshakes tend to
> time out. End result: often, half the images on the page are broken.
>

The way you're describing this specific part, sounds more to me like a
control-plane latency issue (i.e., the time for the RNC to allocate a
radio channel to the client by promoting it from IDLE/FACH to DCH)
rather than a buffer size related issue (which is actually introduced
both on the handset and the RNC/eNB to deal with the C-Plane latency)

https://www.qualcomm.com/media/documents/files/qualcomm-research-latency-in-hspa-data-networks.pdf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-12 18:18   ` Narseo Vallina Rodriguez
@ 2015-03-12 18:39     ` Jonathan Morton
  2015-03-12 18:56       ` Narseo Vallina Rodriguez
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Morton @ 2015-03-12 18:39 UTC (permalink / raw)
  To: Narseo Vallina Rodriguez; +Cc: Kartik Agaram, Jordan Peacock, bloat

[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]

On 12 Mar 2015 20:18, "Narseo Vallina Rodriguez" <narseo@icsi.berkeley.edu>
wrote:
>
> Hi Jonathan
>
> > Status quo is that loading a web page with many resources on it is
> > unreliable. Early connections succeed and become established, the
congestion
> > window opens, the buffer in the 3G tower begins to fill up, inducing
several
> > seconds of latency, and subsequent DNS lookups and TCP handshakes tend
to
> > time out. End result: often, half the images on the page are broken.
> >
>
> The way you're describing this specific part, sounds more to me like a
> control-plane latency issue (i.e., the time for the RNC to allocate a
> radio channel to the client by promoting it from IDLE/FACH to DCH)
> rather than a buffer size related issue (which is actually introduced
> both on the handset and the RNC/eNB to deal with the C-Plane latency)
>
>
https://www.qualcomm.com/media/documents/files/qualcomm-research-latency-in-hspa-data-networks.pdf

No, that's backwards. The first connection is the most reliable, because
the link isn't loaded yet, and trying to make later connections times out
because the buffers are full from the first ones, still in progress. If
C-plane latency was the problem, the symptoms would be reversed - unless
the system is inexplicably reverting to the idle state between packets in a
continuous stream, and I refuse to believe it's that dumb without firm
evidence.

Unloaded latency on this link is on the order of 100ms.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 1893 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-12 18:39     ` Jonathan Morton
@ 2015-03-12 18:56       ` Narseo Vallina Rodriguez
  2015-03-12 19:07         ` Jonathan Morton
  2015-03-15  7:23         ` Mikael Abrahamsson
  0 siblings, 2 replies; 11+ messages in thread
From: Narseo Vallina Rodriguez @ 2015-03-12 18:56 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Kartik Agaram, Jordan Peacock, bloat

>> > Status quo is that loading a web page with many resources on it is
>> > unreliable. Early connections succeed and become established, the
>> > congestion
>> > window opens, the buffer in the 3G tower begins to fill up, inducing
>> > several
>> > seconds of latency, and subsequent DNS lookups and TCP handshakes tend
>> > to
>> > time out. End result: often, half the images on the page are broken.
>> >
>>
>> The way you're describing this specific part, sounds more to me like a
>> control-plane latency issue (i.e., the time for the RNC to allocate a
>> radio channel to the client by promoting it from IDLE/FACH to DCH)
>> rather than a buffer size related issue (which is actually introduced
>> both on the handset and the RNC/eNB to deal with the C-Plane latency)
>>
>>
>> https://www.qualcomm.com/media/documents/files/qualcomm-research-latency-in-hspa-data-networks.pdf
>
> No, that's backwards. The first connection is the most reliable, because the
> link isn't loaded yet, and trying to make later connections times out
> because the buffers are full from the first ones, still in progress. If
> C-plane latency was the problem, the symptoms would be reversed - unless the
> system is inexplicably reverting to the idle state between packets in a
> continuous stream, and I refuse to believe it's that dumb without firm
> evidence.
>
> Unloaded latency on this link is on the order of 100ms.
>

It depends and I'm not sure if we're now on the same page :).

Control-plane latency can affect more than you think and the
control-plane dynamics can be very complex, including also promotions
and demotions between UMTS channels to HS(D/U)PA(+) channels which
also increase user-plane latency. The latter case affects more during
long flows as a result of fairness policies implemented by the RNC as
the number of HSPA channels are limited (each HSPA category has a
defined number of channels using TDM).

The most common demotion (or inactivity) timeout from DCH to FACH/IDLE
is 6 seconds in most mobile operators which is triggered even if a TCP
connection is kept alive but no packet was transmitted during this
interval. The timeout can be lower for some operators with more
aggressive configurations, larger for others more conservative (at the
expenses of draining the battery of the phones) or even 0s for
operators and handsets supporting "Fast Dormancy". If the handset is
demoted, then the next packet will suffer the control plane latency
again that is in the order of 1 to 2 seconds depending on signaling
congestion at the RNC, SNR, and 3GPP standard.

There's a lot of evidence of these dynamics

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-12 18:56       ` Narseo Vallina Rodriguez
@ 2015-03-12 19:07         ` Jonathan Morton
  2015-03-12 19:28           ` Narseo Vallina Rodriguez
  2015-03-15  7:23         ` Mikael Abrahamsson
  1 sibling, 1 reply; 11+ messages in thread
From: Jonathan Morton @ 2015-03-12 19:07 UTC (permalink / raw)
  To: Narseo Vallina Rodriguez; +Cc: Kartik Agaram, Jordan Peacock, bloat

[-- Attachment #1: Type: text/plain, Size: 414 bytes --]

Tell me: does a Nokia E70 support fast dormancy? And how does a less than 2
second setup latency translate into a 45 seconds latency under continuous
load with continuous, smooth packet delivery? Which evaporates to half a
second if I clamp the TCP receive window down to a sane value?

Those are the facts I established years ago. They're still true today with
a newer handset/dongle tethered.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 477 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-12 19:07         ` Jonathan Morton
@ 2015-03-12 19:28           ` Narseo Vallina Rodriguez
  2015-03-12 19:42             ` Jonathan Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Narseo Vallina Rodriguez @ 2015-03-12 19:28 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Kartik Agaram, Jordan Peacock, bloat

> Tell me: does a Nokia E70 support fast dormancy? And how does a less than 2
> second setup latency translate into a 45 seconds latency under continuous
> load with continuous, smooth packet delivery? Which evaporates to half a
> second if I clamp the TCP receive window down to a sane value?
>

Well, I'm not saying that you're not right.

I'm just saying that there are many more dynamics that you cannot
control at the control-plane as they are transparent to the handset.

In the particular case you're describing, it's very likely a buffer
issue but it could be also a network issue as you could be connected
through an old APN (gateway). The E70 does not have fast dormancy, and
it does not support HSPA. It's a basic UMTS phone from 2005 which is
not very representative as of today.

The latency you're describing could be caused by signaling overload or
poor radio conditions but also very likely by buffers as you pointed.
Are there any other buffers on the RIL interface of the handset? How
do they behave? I know some operators use leaky bucket.

In any case, I was referring to a general scenario and as you framed
it initially "first packet", DNS, etc, it sounded as a control-plane
case. That's why I said in the previous email that I was not sure if
we're on the same page.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-12 19:28           ` Narseo Vallina Rodriguez
@ 2015-03-12 19:42             ` Jonathan Morton
  0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Morton @ 2015-03-12 19:42 UTC (permalink / raw)
  To: Narseo Vallina Rodriguez; +Cc: Kartik Agaram, Jordan Peacock, bloat

[-- Attachment #1: Type: text/plain, Size: 665 bytes --]

I have sometimes noticed that the first packet after an idle period gets
additional latency on 3G. It's not much, maybe a quarter second. I'm not
worried about that, and it doesn't really cause any problems for me. I
expect that's the C-plane latency you're on about.

I was talking specifically about what happens when traffic is already
flowing - NOT idle - and then an additional, concurrent flow wants to start
up. If it goes to a different server, it's likely to start with a DNS
lookup, and resolvers tend to have remarkably short timeouts these days. A
few seconds induced latency, due to plain old bufferbloat, is enough to
make it fail.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 748 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-06 21:38 [Bloat] http/2 Kartik Agaram
  2015-03-12 15:02 ` Jonathan Morton
  2015-03-12 18:05 ` Rich Brown
@ 2015-03-15  7:13 ` David Lang
  2 siblings, 0 replies; 11+ messages in thread
From: David Lang @ 2015-03-15  7:13 UTC (permalink / raw)
  To: Kartik Agaram; +Cc: Jordan Peacock, bloat

[-- Attachment #1: Type: TEXT/Plain, Size: 886 bytes --]

On Fri, 6 Mar 2015, Kartik Agaram wrote:

> 3. Has there been discussion of solving bufferbloat at the TCP layer, by
> making buffers harder to fill up? I'm thinking of heuristics like
> disallowing a single site from using 80% of the buffer, thereby leaving
> some slack available for other bursty requirements.

There are already multiple solutions at the TCP layer.

1. TCP slows down if it gets ECN responses, so it won't fill up buffers

   not everything implements ECN and some firewall-type devices strip out or lie 
about ECN data

2. TCP slows down if packets get dropped. Not all dropping needs to wait until 
the buffers are full.

2a fq_codel does exactly this, it drops packets when there is congestion, before 
the buffers fill up.

The problem is just getting these changes into new equipment/software and then 
replacing the equipment/software in the field.

David Lang

[-- Attachment #2: Type: TEXT/PLAIN, Size: 140 bytes --]

_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bloat] http/2
  2015-03-12 18:56       ` Narseo Vallina Rodriguez
  2015-03-12 19:07         ` Jonathan Morton
@ 2015-03-15  7:23         ` Mikael Abrahamsson
  1 sibling, 0 replies; 11+ messages in thread
From: Mikael Abrahamsson @ 2015-03-15  7:23 UTC (permalink / raw)
  To: Narseo Vallina Rodriguez; +Cc: bloat

On Thu, 12 Mar 2015, Narseo Vallina Rodriguez wrote:

> Control-plane latency can affect more than you think and the 
> control-plane dynamics can be very complex, including also promotions 
> and demotions between UMTS channels to HS(D/U)PA(+) channels which also 
> increase user-plane latency. The latter case affects more during long 
> flows as a result of fairness policies implemented by the RNC as the 
> number of HSPA channels are limited (each HSPA category has a defined 
> number of channels using TDM).

Ok, I understand you're trying to get this right, however I don't see this 
as the most probable explanation for the use-case described.

Most of the time for this use-case, you'll see the HSPA channels get 
properly established after approximately 1 second, and they'll stay up 
until the transfer is done.

One RNC vendor I have fairly well knowledge of, would 400 packets of 
buffering in the GGSN->RNC->eNodeB->Handset direction. I don't know about 
the others.

With half a megabit/s of buffer drain, that means max 10 seconds of 
buffering if my calculations are correct. There can potentially be 
buffering in the GGSN/SGSN as well. This is if everything is working 
perfectly. If there are other problems, the drain rate might be slower 
than half a megabit/s and this might induce further latency.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-03-15  7:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-06 21:38 [Bloat] http/2 Kartik Agaram
2015-03-12 15:02 ` Jonathan Morton
2015-03-12 18:18   ` Narseo Vallina Rodriguez
2015-03-12 18:39     ` Jonathan Morton
2015-03-12 18:56       ` Narseo Vallina Rodriguez
2015-03-12 19:07         ` Jonathan Morton
2015-03-12 19:28           ` Narseo Vallina Rodriguez
2015-03-12 19:42             ` Jonathan Morton
2015-03-15  7:23         ` Mikael Abrahamsson
2015-03-12 18:05 ` Rich Brown
2015-03-15  7:13 ` David Lang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox