[Bloat] Detecting bufferbloat from outside a node

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* [Bloat]  Detecting bufferbloat from outside a node
@ 2015-04-27  9:48 Paolo Valente
  2015-04-27  9:54 ` Neil Davies
  2015-04-27  9:57 ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 46+ messages in thread
From: Paolo Valente @ 2015-04-27  9:48 UTC (permalink / raw)
  To: bloat

Hi,
a network-monitoring company got curious about bufferbloat issues and asked me to investigate a little bit the following issue (quite interesting in my opinion). Is it possible to detect, from outside a node, if the node is bufferbloated? In particular, the only action allowed would be to observe the packets entering and leaving the node (plus, of course, their timing).

If such a general problem is to hard or impossible to solve, do you think it is still possible at least to understand, for some type of application, if the application is experiencing a high latency because of bloated buffers inside the node? (As above, by just observing packet flows from outside the node.)

Thanks,
Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27  9:48 [Bloat] Detecting bufferbloat from outside a node Paolo Valente
@ 2015-04-27  9:54 ` Neil Davies
  2015-04-27 10:45   ` Toke Høiland-Jørgensen
  2015-04-27 11:54   ` Paolo Valente
  2015-04-27  9:57 ` Toke Høiland-Jørgensen
  1 sibling, 2 replies; 46+ messages in thread
From: Neil Davies @ 2015-04-27  9:54 UTC (permalink / raw)
  To: Paolo Valente; +Cc: bloat

Paolo

Yes, it is - there is a whole methodology for detecting this and associated algebra for manipulation. It has been used at CERN, in various telcos and in various large scale, real time distributed systems to relate end user outcomes to the delay/loss characteristics of the network.

Take a look at http://www.pnsol.com/publications.html, you may find http://www.pnsol.com/public/PP-PNS-2009-02.pdf as a good starting point.

Neil

On 27 Apr 2015, at 10:48, Paolo Valente <paolo.valente@unimore.it> wrote:

> Hi,
> a network-monitoring company got curious about bufferbloat issues and asked me to investigate a little bit the following issue (quite interesting in my opinion). Is it possible to detect, from outside a node, if the node is bufferbloated? In particular, the only action allowed would be to observe the packets entering and leaving the node (plus, of course, their timing).
> 
> If such a general problem is to hard or impossible to solve, do you think it is still possible at least to understand, for some type of application, if the application is experiencing a high latency because of bloated buffers inside the node? (As above, by just observing packet flows from outside the node.)
> 
> Thanks,
> Paolo
> 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27  9:48 [Bloat] Detecting bufferbloat from outside a node Paolo Valente
  2015-04-27  9:54 ` Neil Davies
@ 2015-04-27  9:57 ` Toke Høiland-Jørgensen
  2015-04-27 10:10   ` Paolo Valente
  1 sibling, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2015-04-27  9:57 UTC (permalink / raw)
  To: Paolo Valente; +Cc: bloat

Paolo Valente <paolo.valente@unimore.it> writes:

> a network-monitoring company got curious about bufferbloat issues and
> asked me to investigate a little bit the following issue (quite
> interesting in my opinion). Is it possible to detect, from outside a
> node, if the node is bufferbloated? In particular, the only action
> allowed would be to observe the packets entering and leaving the node
> (plus, of course, their timing).

Sure. Just measure the timing when the network is unloaded and compare
it to when it is loaded to capacity. We do that all the time.

The details of course depend on what you define by a 'node', what role
it plays in the network (does it forward or originate packets?), and
what control you have over the traffic flowing through it. :)

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27  9:57 ` Toke Høiland-Jørgensen
@ 2015-04-27 10:10   ` Paolo Valente
  2015-04-27 10:19     ` Paolo Valente
  0 siblings, 1 reply; 46+ messages in thread
From: Paolo Valente @ 2015-04-27 10:10 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat


Il giorno 27/apr/2015, alle ore 11:57, Toke Høiland-Jørgensen <toke@toke.dk> ha scritto:

> Paolo Valente <paolo.valente@unimore.it> writes:
> 
>> a network-monitoring company got curious about bufferbloat issues and
>> asked me to investigate a little bit the following issue (quite
>> interesting in my opinion). Is it possible to detect, from outside a
>> node, if the node is bufferbloated? In particular, the only action
>> allowed would be to observe the packets entering and leaving the node
>> (plus, of course, their timing).
> 
> Sure. Just measure the timing when the network is unloaded and compare
> it to when it is loaded to capacity. We do that all the time.
> 
> The details of course depend on what you define by a 'node', what role
> it plays in the network (does it forward or originate packets?), and
> what control you have over the traffic flowing through it. :)
> 

Let us consider, for example, a host with a VoIP call and a large-file transfer in progress. My concern is: from inside the host, we can measure the delays experienced by the VoIP application, but, form outside, how can we detect that the application is experiencing a high latency, or, indirectly, that there is bufferbloat and hence that the application is likely to be experiencing a high latency? (Of course, I am also about to read the documents suggested by Neil.)

Thanks,
Paolo

> -Toke


--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:10   ` Paolo Valente
@ 2015-04-27 10:19     ` Paolo Valente
  2015-04-27 10:23       ` Toke Høiland-Jørgensen
  2015-04-27 10:26       ` Neil Davies
  0 siblings, 2 replies; 46+ messages in thread
From: Paolo Valente @ 2015-04-27 10:19 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat


Il giorno 27/apr/2015, alle ore 12:10, Paolo Valente <paolo.valente@unimore.it> ha scritto:

> 
> Il giorno 27/apr/2015, alle ore 11:57, Toke Høiland-Jørgensen <toke@toke.dk> ha scritto:
> 
>> Paolo Valente <paolo.valente@unimore.it> writes:
>> 
>>> a network-monitoring company got curious about bufferbloat issues and
>>> asked me to investigate a little bit the following issue (quite
>>> interesting in my opinion). Is it possible to detect, from outside a
>>> node, if the node is bufferbloated? In particular, the only action
>>> allowed would be to observe the packets entering and leaving the node
>>> (plus, of course, their timing).
>> 
>> Sure. Just measure the timing when the network is unloaded and compare
>> it to when it is loaded to capacity. We do that all the time.
>> 
>> The details of course depend on what you define by a 'node', what role
>> it plays in the network (does it forward or originate packets?), and
>> what control you have over the traffic flowing through it. :)
>> 
> 
> Let us consider, for example, a host with a VoIP call and a large-file transfer in progress. My concern is: from inside the host, we can measure the delays experienced by the VoIP application, but, form outside, how can we detect that the application is experiencing a high latency, or, indirectly, that there is bufferbloat and hence that the application is likely to be experiencing a high latency? (Of course, I am also about to read the documents suggested by Neil.)
> 

I am sorry, but I realized that what I said was incomplete. The main cause of my concern is that, from outside the node, we do not know whether a VoIP packet departs ad a given time because the application wants it to be sent at that time or because it has waited in the buffer for a lot of time. Similarly, we do not know how long the VoIP application will wait before getting its incoming packets delivered.

Of course, if a bufferbloated state can be measured by other external measurements, then we can infer the problem indirectly.

Are there flaws in my above considerations?

Thanks,
Paolo

> Thanks,
> Paolo
> 
>> -Toke
> 
> 
> --
> Paolo Valente                                                 
> Algogroup
> Dipartimento di Fisica, Informatica e Matematica		
> Via Campi, 213/B
> 41125 Modena - Italy        				  
> homepage:  http://algogroup.unimore.it/people/paolo/


--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:19     ` Paolo Valente
@ 2015-04-27 10:23       ` Toke Høiland-Jørgensen
  2015-04-27 10:53         ` Paolo Valente
  2015-04-27 10:26       ` Neil Davies
  1 sibling, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2015-04-27 10:23 UTC (permalink / raw)
  To: Paolo Valente; +Cc: bloat

Paolo Valente <paolo.valente@unimore.it> writes:

> I am sorry, but I realized that what I said was incomplete. The main
> cause of my concern is that, from outside the node, we do not know
> whether a VoIP packet departs ad a given time because the application
> wants it to be sent at that time or because it has waited in the
> buffer for a lot of time. Similarly, we do not know how long the VoIP
> application will wait before getting its incoming packets delivered.

No, not unless the application tells you (by, for instance,
timestamping; depending on where in the network stack the timestamp is
applied, you can measure different instances of bloat). Or if you know
that an application is supposed to answer you immediately (as is the
case with a regular 'ping'), you can measure if it does so even when
otherwise loaded.

Of course, you also might not measure anything, if the bottleneck is
elsewhere. But if you can control the conditions well enough, you can
probably avoid this; just be aware of it. In Linux, combating
bufferbloat has been quite the game of whack-a-mole over the last
several years :)

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:19     ` Paolo Valente
  2015-04-27 10:23       ` Toke Høiland-Jørgensen
@ 2015-04-27 10:26       ` Neil Davies
  2015-04-27 10:32         ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 46+ messages in thread
From: Neil Davies @ 2015-04-27 10:26 UTC (permalink / raw)
  To: Paolo Valente; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 3344 bytes --]

Paolo

You are asking about the epistemology! Good start. The only things you can “know” outside the node are the things you can observe. You can infer, from the characteristics of the observations, conformance to some avowed model of behaviour.

Welcome to the world operational, denotational and intentional semantics as it relates to network performance!

Neil

On 27 Apr 2015, at 11:19, Paolo Valente <paolo.valente@unimore.it> wrote:

> 
> Il giorno 27/apr/2015, alle ore 12:10, Paolo Valente <paolo.valente@unimore.it> ha scritto:
> 
>> 
>> Il giorno 27/apr/2015, alle ore 11:57, Toke Høiland-Jørgensen <toke@toke.dk> ha scritto:
>> 
>>> Paolo Valente <paolo.valente@unimore.it> writes:
>>> 
>>>> a network-monitoring company got curious about bufferbloat issues and
>>>> asked me to investigate a little bit the following issue (quite
>>>> interesting in my opinion). Is it possible to detect, from outside a
>>>> node, if the node is bufferbloated? In particular, the only action
>>>> allowed would be to observe the packets entering and leaving the node
>>>> (plus, of course, their timing).
>>> 
>>> Sure. Just measure the timing when the network is unloaded and compare
>>> it to when it is loaded to capacity. We do that all the time.
>>> 
>>> The details of course depend on what you define by a 'node', what role
>>> it plays in the network (does it forward or originate packets?), and
>>> what control you have over the traffic flowing through it. :)
>>> 
>> 
>> Let us consider, for example, a host with a VoIP call and a large-file transfer in progress. My concern is: from inside the host, we can measure the delays experienced by the VoIP application, but, form outside, how can we detect that the application is experiencing a high latency, or, indirectly, that there is bufferbloat and hence that the application is likely to be experiencing a high latency? (Of course, I am also about to read the documents suggested by Neil.)
>> 
> 
> I am sorry, but I realized that what I said was incomplete. The main cause of my concern is that, from outside the node, we do not know whether a VoIP packet departs ad a given time because the application wants it to be sent at that time or because it has waited in the buffer for a lot of time. Similarly, we do not know how long the VoIP application will wait before getting its incoming packets delivered.
> 
> Of course, if a bufferbloated state can be measured by other external measurements, then we can infer the problem indirectly.
> 
> Are there flaws in my above considerations?
> 
> Thanks,
> Paolo
> 
>> Thanks,
>> Paolo
>> 
>>> -Toke
>> 
>> 
>> --
>> Paolo Valente                                                 
>> Algogroup
>> Dipartimento di Fisica, Informatica e Matematica		
>> Via Campi, 213/B
>> 41125 Modena - Italy        				  
>> homepage:  http://algogroup.unimore.it/people/paolo/
> 
> 
> --
> Paolo Valente                                                 
> Algogroup
> Dipartimento di Fisica, Informatica e Matematica		
> Via Campi, 213/B
> 41125 Modena - Italy        				  
> homepage:  http://algogroup.unimore.it/people/paolo/
> 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


[-- Attachment #2: Type: text/html, Size: 5977 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:26       ` Neil Davies
@ 2015-04-27 10:32         ` Toke Høiland-Jørgensen
  2015-04-27 10:38           ` Neil Davies
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2015-04-27 10:32 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

Neil Davies <neil.davies@pnsol.com> writes:

> You are asking about the epistemology! Good start. The only things you
> can “know” outside the node are the things you can observe. You can
> infer, from the characteristics of the observations, conformance to
> some avowed model of behaviour.

If we're going all philosophy of science on this, I'll add that the nice
thing about networking is that the computers attached to it can give us
some very specific measurements and we can (with care) provoke quite
specific behaviour to reason about the performance. Try asking a
biologist about the interactions between the life cycle of parasitic
species, their hosts and the ecosystem they live in! :)

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:32         ` Toke Høiland-Jørgensen
@ 2015-04-27 10:38           ` Neil Davies
  2015-04-27 10:52             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Neil Davies @ 2015-04-27 10:38 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat

Toke

Absolutely - the individual nodes can make (locally informed) decisions as to how to “behave”.

The interesting thing is making all those local decisions add up to a (set of) end-to-end outcomes, and the
answer is not to make the same decision(s) everywhere - unfortunately that doesn’t stack up.

We have (some level) of control over our “universe of discourse” - my joke with my mates at CERN is that
they only have one universe to investigate, we can create three in one day and still be home in time for tea!

Neil

On 27 Apr 2015, at 11:32, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Neil Davies <neil.davies@pnsol.com> writes:
> 
>> You are asking about the epistemology! Good start. The only things you
>> can “know” outside the node are the things you can observe. You can
>> infer, from the characteristics of the observations, conformance to
>> some avowed model of behaviour.
> 
> If we're going all philosophy of science on this, I'll add that the nice
> thing about networking is that the computers attached to it can give us
> some very specific measurements and we can (with care) provoke quite
> specific behaviour to reason about the performance. Try asking a
> biologist about the interactions between the life cycle of parasitic
> species, their hosts and the ecosystem they live in! :)
> 
> -Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27  9:54 ` Neil Davies
@ 2015-04-27 10:45   ` Toke Høiland-Jørgensen
  2015-04-27 10:57     ` Neil Davies
  2015-04-27 11:54   ` Paolo Valente
  1 sibling, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2015-04-27 10:45 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

Neil Davies <neil.davies@pnsol.com> writes:

> Take a look at http://www.pnsol.com/publications.html, you may find
> http://www.pnsol.com/public/PP-PNS-2009-02.pdf as a good starting
> point.

I've seen this referred on the list before (I assume by you ;)), but
haven't really grok'ed it before. I like the delta-Q measure as a way of
thinking about overall network performance; and there's definitely
parallels to thinking about the end-to-end 'latency budget', which I
also find quite instructive (I think I it picked up from Joe Touch at
some point).

How do you deal with the fact that loss and delay can be exchanged for
each other (via retransmissions)?

Also, which publication would you recommend if I'm interested in
specifically how you 'Mathematically model behaviour and delta-Q' as you
mention in that slide set? :)

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:38           ` Neil Davies
@ 2015-04-27 10:52             ` Toke Høiland-Jørgensen
  2015-04-27 11:03               ` Neil Davies
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2015-04-27 10:52 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

Neil Davies <neil.davies@pnsol.com> writes:

> The interesting thing is making all those local decisions add up to a
> (set of) end-to-end outcomes, and the answer is not to make the same
> decision(s) everywhere - unfortunately that doesn’t stack up.

Yes, well, I do also like the E2E principle of not making too many
decisions within the network, instead letting the endpoints sort it out.
For me, the fight against bufferbloat is mostly about restoring the
assumptions that it has eroded (i.e. "packet loss is not to be feared,
but on the contrary is an important indicator that we're hitting
congestion"). I'd really rather prefer the network itself to be fairly
dumb...

> We have (some level) of control over our “universe of discourse” - my
> joke with my mates at CERN is that they only have one universe to
> investigate, we can create three in one day and still be home in time
> for tea!

Hehe, quite. That is both fascinating and frustrating! :)

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:23       ` Toke Høiland-Jørgensen
@ 2015-04-27 10:53         ` Paolo Valente
  2015-04-27 20:39           ` David Lang
  0 siblings, 1 reply; 46+ messages in thread
From: Paolo Valente @ 2015-04-27 10:53 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat


Il giorno 27/apr/2015, alle ore 12:23, Toke Høiland-Jørgensen <toke@toke.dk> ha scritto:

> Paolo Valente <paolo.valente@unimore.it> writes:
> 
>> I am sorry, but I realized that what I said was incomplete. The main
>> cause of my concern is that, from outside the node, we do not know
>> whether a VoIP packet departs ad a given time because the application
>> wants it to be sent at that time or because it has waited in the
>> buffer for a lot of time. Similarly, we do not know how long the VoIP
>> application will wait before getting its incoming packets delivered.
> 
> No, not unless the application tells you (by, for instance,
> timestamping; depending on where in the network stack the timestamp is
> applied, you can measure different instances of bloat).

That’s exactly what I was thinking about. Actually it seems the only solution to me.

What apparently makes things more difficult is that I am not allowed either to choose the applications to run or to interfere in any way with the flows (e.g., by injecting some extra packet).

Any pointer to previous/current work on this topic?

> Or if you know
> that an application is supposed to answer you immediately (as is the
> case with a regular 'ping'), you can measure if it does so even when
> otherwise loaded.
> 

A ping was one of the first simple actions I suggested, but the answer was, as above: no you cannot ‘touch' the network!

> Of course, you also might not measure anything, if the bottleneck is
> elsewhere. But if you can control the conditions well enough, you can
> probably avoid this; just be aware of it. In Linux, combating
> bufferbloat has been quite the game of whack-a-mole over the last
> several years :)
> 

Then I guess that now I am trying to build a good mallet according to the rules of the game for this company :)

In any case, the target networks should be observable at such a level that, yes, all relevant conditions should be under control (if one does not make mistakes). My problem is, as I wrote above, to find out what information I can and have to look at.

Thanks,
Paolo


> -Toke


--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:45   ` Toke Høiland-Jørgensen
@ 2015-04-27 10:57     ` Neil Davies
  2015-04-27 14:22       ` Toke Høiland-Jørgensen
  2015-04-27 15:51       ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 46+ messages in thread
From: Neil Davies @ 2015-04-27 10:57 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat


On 27 Apr 2015, at 11:45, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Neil Davies <neil.davies@pnsol.com> writes:
> 
>> Take a look at http://www.pnsol.com/publications.html, you may find
>> http://www.pnsol.com/public/PP-PNS-2009-02.pdf as a good starting
>> point.
> 
> I've seen this referred on the list before (I assume by you ;)), but
> haven't really grok'ed it before. I like the delta-Q measure as a way of
> thinking about overall network performance; and there's definitely
> parallels to thinking about the end-to-end 'latency budget', which I
> also find quite instructive (I think I it picked up from Joe Touch at
> some point).

You'll find the way that ∆Q can be decomposed into basis set of
∆Q|G, ∆Q|S and ∆Q|V - helps work out which parts of the budget 
get eaten up by different elements of the network design/architecture.

> 
> How do you deal with the fact that loss and delay can be exchanged for
> each other (via retransmissions)?

Yes, the QTA/∆Q framework is recursive in that sense - you can relate (for 
example) the "TCP layer ∆Q" to the "IP packet level ∆Q".

> 
> Also, which publication would you recommend if I'm interested in
> specifically how you 'Mathematically model behaviour and delta-Q' as you
> mention in that slide set? :)
> 

Depends on your starting point: 
 - if it is "how does this relate to the end user" - look at "the properties and mathematics of data transport quality" 
-  if it is "how do I measure this stuff in a real network" - take a look at "advanced network performance measurement techniques" (you can get to it off the links tab)

> -Toke

Neil

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:52             ` Toke Høiland-Jørgensen
@ 2015-04-27 11:03               ` Neil Davies
  2015-04-27 12:03                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Neil Davies @ 2015-04-27 11:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat


On 27 Apr 2015, at 11:52, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Neil Davies <neil.davies@pnsol.com> writes:
> 
>> The interesting thing is making all those local decisions add up to a
>> (set of) end-to-end outcomes, and the answer is not to make the same
>> decision(s) everywhere - unfortunately that doesn’t stack up.
> 
> Yes, well, I do also like the E2E principle of not making too many
> decisions within the network, instead letting the endpoints sort it out.
> For me, the fight against bufferbloat is mostly about restoring the
> assumptions that it has eroded (i.e. "packet loss is not to be feared,
> but on the contrary is an important indicator that we're hitting
> congestion"). I'd really rather prefer the network itself to be fairly
> dumb...

I don't think that the E2E principle can manage the emerging performance
hazards that are arising. We've seen this recently in practice: take a look
at http://www.martingeddes.com/how-far-can-the-internet-scale/ - it is based
on a real problem we'd encountered.

In someways this is just control theory 101 rearing its head... in another it
is a large technical challenge for internet provision.

>> We have (some level) of control over our “universe of discourse” - my
>> joke with my mates at CERN is that they only have one universe to
>> investigate, we can create three in one day and still be home in time
>> for tea!
> 
> Hehe, quite. That is both fascinating and frustrating! :)
> 
> -Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27  9:54 ` Neil Davies
  2015-04-27 10:45   ` Toke Høiland-Jørgensen
@ 2015-04-27 11:54   ` Paolo Valente
  2015-04-27 15:25     ` Jonathan Morton
  2015-04-27 20:13     ` Neil Davies
  1 sibling, 2 replies; 46+ messages in thread
From: Paolo Valente @ 2015-04-27 11:54 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

Thanks for the pointers. As for the epistemological implications of to my concerns, I must admit that I find them a little bit frightening :)

After browsing some of the presentations, the relevant component for my problem seems to be the variability V. However, it is not clear to me how I can measure or infer it in my situation, i.e., if I cannot:
1) have any feedback on the user experience;
2) measure the time that elapses between when a time-sensitive application puts a message in a socket and when that message is actually sent by the node in a packet;
3) in the opposite direction, measure the time that elapses between when a packet arrives to the node and when the application receives the message contained in the packet.

On which documents should I concentrate more to better understand this point?

Thanks,
Paolo
 
Il giorno 27/apr/2015, alle ore 11:54, Neil Davies <neil.davies@pnsol.com> ha scritto:

> Paolo
> 
> Yes, it is - there is a whole methodology for detecting this and associated algebra for manipulation. It has been used at CERN, in various telcos and in various large scale, real time distributed systems to relate end user outcomes to the delay/loss characteristics of the network.
> 
> Take a look at http://www.pnsol.com/publications.html, you may find http://www.pnsol.com/public/PP-PNS-2009-02.pdf as a good starting point.
> 
> Neil
> 
> On 27 Apr 2015, at 10:48, Paolo Valente <paolo.valente@unimore.it> wrote:
> 
>> Hi,
>> a network-monitoring company got curious about bufferbloat issues and asked me to investigate a little bit the following issue (quite interesting in my opinion). Is it possible to detect, from outside a node, if the node is bufferbloated? In particular, the only action allowed would be to observe the packets entering and leaving the node (plus, of course, their timing).
>> 
>> If such a general problem is to hard or impossible to solve, do you think it is still possible at least to understand, for some type of application, if the application is experiencing a high latency because of bloated buffers inside the node? (As above, by just observing packet flows from outside the node.)
>> 
>> Thanks,
>> Paolo
>> 
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
> 


--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 11:03               ` Neil Davies
@ 2015-04-27 12:03                 ` Toke Høiland-Jørgensen
  2015-04-27 20:19                   ` Neil Davies
  2015-05-19 21:23                   ` Alan Jenkins
  0 siblings, 2 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2015-04-27 12:03 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

Neil Davies <neil.davies@pnsol.com> writes:

> I don't think that the E2E principle can manage the emerging
> performance hazards that are arising.

Well, probably not entirely (smart queueing certainly has a place). My
worry is, however, that going too far in the other direction will turn
into a Gordian knot of constraints, where anything that doesn't fit into
the preconceived traffic classes is impossible to do something useful
with.

Or, to put it another way, I'd like the network to have exactly as much
intelligence as is needed, but no more. And I'm not sure I trust my ISP
to make that tradeoff... :(

> We've seen this recently in practice: take a look at
> http://www.martingeddes.com/how-far-can-the-internet-scale/ - it is
> based on a real problem we'd encountered.

Well that, and the post linked to from it
(http://www.martingeddes.com/think-tank/the-future-of-the-internet-the-end-to-end-argument/),
is certainly quite the broadside against end-to-end principle. Colour me
intrigued.

> In someways this is just control theory 101 rearing its head... in
> another it is a large technical challenge for internet provision.

It's been bugging me for a while that most control theory analysis (of
AQMs in particular) seems to completely ignore transient behaviour and
jump straight to the steady state.

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:57     ` Neil Davies
@ 2015-04-27 14:22       ` Toke Høiland-Jørgensen
  2015-04-27 20:27         ` Neil Davies
  2015-04-27 15:51       ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2015-04-27 14:22 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

Neil Davies <neil.davies@pnsol.com> writes:

> You'll find the way that ∆Q can be decomposed into basis set of ∆Q|G,
> ∆Q|S and ∆Q|V - helps work out which parts of the budget get eaten up
> by different elements of the network design/architecture.

Right, I got that part. What I'm missing is how you define the actual
measure for ∆Q -- but that is application dependent? E.g. for web sites
it might be load time, for VoIP it might be MOS score?

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 11:54   ` Paolo Valente
@ 2015-04-27 15:25     ` Jonathan Morton
  2015-04-27 20:30       ` Neil Davies
  2015-04-27 20:13     ` Neil Davies
  1 sibling, 1 reply; 46+ messages in thread
From: Jonathan Morton @ 2015-04-27 15:25 UTC (permalink / raw)
  To: Paolo Valente; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 687 bytes --]

One thing that might help you here is the TCP Timestamps option. The
timestamps thus produced are opaque, but you can observe them and measure
the time intervals between their production and echo. You should be able to
infer something from that, with care.

To determine the difference between loaded and unloaded states, you may
need to observe for an extended period of time. Eventually you'll observe
some sort of bulk flow, even if it's just a software update cycle. It's not
quite so certain that you'll observe an idle state, but it is sufficient to
observe an instance of the link not being completely saturated, which is
likely to occur at least occasionally.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 762 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:57     ` Neil Davies
  2015-04-27 14:22       ` Toke Høiland-Jørgensen
@ 2015-04-27 15:51       ` Toke Høiland-Jørgensen
  2015-04-27 20:38         ` Neil Davies
  1 sibling, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2015-04-27 15:51 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

Neil Davies <neil.davies@pnsol.com> writes:

> Depends on your starting point:

Right, having looked a bit more at this:

>  - if it is "how does this relate to the end user" - look at "the
>  properties and mathematics of data transport quality"

This mentions, on slide 30, an analytical model for predicting (changes
in) ∆Q. Is this Judy Holyer's "A Queueing Theory Model for Real Data
Networks", or does it refer to something else?

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 11:54   ` Paolo Valente
  2015-04-27 15:25     ` Jonathan Morton
@ 2015-04-27 20:13     ` Neil Davies
  1 sibling, 0 replies; 46+ messages in thread
From: Neil Davies @ 2015-04-27 20:13 UTC (permalink / raw)
  To: Paolo Valente; +Cc: bloat

Paolo

On 27 Apr 2015, at 12:54, Paolo Valente <paolo.valente@unimore.it> wrote:

> Thanks for the pointers. As for the epistemological implications of to my concerns, I must admit that I find them a little bit frightening :)
> 
> After browsing some of the presentations, the relevant component for my problem seems to be the variability V. However, it is not clear to me how I can measure or infer it in my situation, i.e., if I cannot:
> 1) have any feedback on the user experience;

You'll need to consider the ∆Q|G and the ∆Q|S as well - they contribute to the overall delay and that contributes to the delivered performance. Response time to get to a certain data flow rate (say for VoD) or recovery from a packet loss (for TCP) etc interact with all the factors.

We've created models (both analytic and experimental) which relate the delivered ∆Q (all of its factors) to QoE for various apps

> 2) measure the time that elapses between when a time-sensitive application puts a message in a socket and when that message is actually sent by the node in a packet;

This is why the capture model has to be one of "timed traces" of "observables" in the TCP model the socket is one location to measure (at the two ends), the network interfaces would be another

> 3) in the opposite direction, measure the time that elapses between when a packet arrives to the node and when the application receives the message contained in the packet.

Yes, you need to view this in both directions as you need to isloate the effect of the network transport from the processing (which also has an influence on the performance)

> 
> On which documents should I concentrate more to better understand this point?
> 
> Thanks,
> Paolo
> 
> Il giorno 27/apr/2015, alle ore 11:54, Neil Davies <neil.davies@pnsol.com> ha scritto:
> 
>> Paolo
>> 
>> Yes, it is - there is a whole methodology for detecting this and associated algebra for manipulation. It has been used at CERN, in various telcos and in various large scale, real time distributed systems to relate end user outcomes to the delay/loss characteristics of the network.
>> 
>> Take a look at http://www.pnsol.com/publications.html, you may find http://www.pnsol.com/public/PP-PNS-2009-02.pdf as a good starting point.
>> 
>> Neil
>> 
>> On 27 Apr 2015, at 10:48, Paolo Valente <paolo.valente@unimore.it> wrote:
>> 
>>> Hi,
>>> a network-monitoring company got curious about bufferbloat issues and asked me to investigate a little bit the following issue (quite interesting in my opinion). Is it possible to detect, from outside a node, if the node is bufferbloated? In particular, the only action allowed would be to observe the packets entering and leaving the node (plus, of course, their timing).
>>> 
>>> If such a general problem is to hard or impossible to solve, do you think it is still possible at least to understand, for some type of application, if the application is experiencing a high latency because of bloated buffers inside the node? (As above, by just observing packet flows from outside the node.)
>>> 
>>> Thanks,
>>> Paolo
>>> 
>>> _______________________________________________
>>> Bloat mailing list
>>> Bloat@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>> 
> 
> 
> --
> Paolo Valente                                                 
> Algogroup
> Dipartimento di Fisica, Informatica e Matematica		
> Via Campi, 213/B
> 41125 Modena - Italy        				  
> homepage:  http://algogroup.unimore.it/people/paolo/
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 12:03                 ` Toke Høiland-Jørgensen
@ 2015-04-27 20:19                   ` Neil Davies
  2015-05-19 21:23                   ` Alan Jenkins
  1 sibling, 0 replies; 46+ messages in thread
From: Neil Davies @ 2015-04-27 20:19 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat

Toke


On 27 Apr 2015, at 13:03, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Neil Davies <neil.davies@pnsol.com> writes:
> 
>> I don't think that the E2E principle can manage the emerging
>> performance hazards that are arising.
> 
> Well, probably not entirely (smart queueing certainly has a place). My
> worry is, however, that going too far in the other direction will turn
> into a Gordian knot of constraints, where anything that doesn't fit into
> the preconceived traffic classes is impossible to do something useful
> with.
> 
> Or, to put it another way, I'd like the network to have exactly as much
> intelligence as is needed, but no more. And I'm not sure I trust my ISP
> to make that tradeoff... :(

Ah - no such thing as intelligence here - you need to go for stochastics: 
there is no-way that E2E (or any other non-local control mechanism, cf SDN) 
can respond quickly enough - the system is more "ballistic" (hence the stochastic
dynamics as a better framing)

> 
>> We've seen this recently in practice: take a look at
>> http://www.martingeddes.com/how-far-can-the-internet-scale/ - it is
>> based on a real problem we'd encountered.
> 
> Well that, and the post linked to from it
> (http://www.martingeddes.com/think-tank/the-future-of-the-internet-the-end-to-end-argument/),
> is certainly quite the broadside against end-to-end principle. Colour me
> intrigued.

Yep - direct consequence of packet neutral behaviour !

> 
>> In someways this is just control theory 101 rearing its head... in
>> another it is a large technical challenge for internet provision.
> 
> It's been bugging me for a while that most control theory analysis (of
> AQMs in particular) seems to completely ignore transient behaviour and
> jump straight to the steady state.

Several years ago I calculated how long it would take a gigabit ethernet (with a 100 
buffer queue) to reach steady state (be within 1 part in 10^5)  when offered 100% 
offered load - it was six months! (usual mathematical caveats apply). Networks are
*NEVER* in steady state! We tend to try and make the "predictable" over 10 seconds -
at least beyond 10seconds control theory has a chance!

> 
> -Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 14:22       ` Toke Høiland-Jørgensen
@ 2015-04-27 20:27         ` Neil Davies
  0 siblings, 0 replies; 46+ messages in thread
From: Neil Davies @ 2015-04-27 20:27 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat

Toke


On 27 Apr 2015, at 15:22, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Neil Davies <neil.davies@pnsol.com> writes:
> 
>> You'll find the way that ∆Q can be decomposed into basis set of ∆Q|G,
>> ∆Q|S and ∆Q|V - helps work out which parts of the budget get eaten up
>> by different elements of the network design/architecture.
> 
> Right, I got that part. What I'm missing is how you define the actual
> measure for ∆Q -- but that is application dependent? E.g. for web sites
> it might be load time, for VoIP it might be MOS score?
> 
> -Toke


For VoIP the aspects of ∆Q that are of interest (after the total one way delay) are the 
delay variance and loss rate - take a look at https://ivanovic.web.cern.ch/ivanovic/articles/open-2004-007.pdf 
(which is the tech report behind what was published in SPECTS'03 (Beuran, R., Ivanovici, M., Dobinson,
 B., Davies, N., Thompson, P.: Network Quality of Service Measurement System for
Application Requirements Evaluation)

In other aspects it might be "time to first frame" (VoD) or "possibility of a buffer starvation event per operating hour" (VoD)
the QoE (Qualty of Experience) metric needs to be one that has a relationship with the users' perception.]

Neil

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 15:25     ` Jonathan Morton
@ 2015-04-27 20:30       ` Neil Davies
  2015-04-27 23:11         ` Jonathan Morton
  0 siblings, 1 reply; 46+ messages in thread
From: Neil Davies @ 2015-04-27 20:30 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 1266 bytes --]

Hi Jonathan
On 27 Apr 2015, at 16:25, Jonathan Morton <chromatix99@gmail.com> wrote:

> One thing that might help you here is the TCP Timestamps option. The timestamps thus produced are opaque, but you can observe them and measure the time intervals between their production and echo. You should be able to infer something from that, with care.
> 
> To determine the difference between loaded and unloaded states, you may need to observe for an extended period of time. Eventually you'll observe some sort of bulk flow, even if it's just a software update cycle. It's not quite so certain that you'll observe an idle state, but it is sufficient to observe an instance of the link not being completely saturated, which is likely to occur at least occasionally.
> 
> - Jonathan Morton
We looked at using TCP timestamps early on in our work. The problem is that they don't really help extract the fine-grained information needed. The timestamps can move in very large steps, and the accuracy (and precision) can vary widely from implementation to implementation.

The timestamps are there to try and get a gross (if my memory serves me right ~100ms) approximation to the RTT - not good enough for reasoning about TCP based interactive/"real time" apps

Neil

[-- Attachment #2: Type: text/html, Size: 1698 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 15:51       ` Toke Høiland-Jørgensen
@ 2015-04-27 20:38         ` Neil Davies
  2015-04-27 21:37           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Neil Davies @ 2015-04-27 20:38 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat

Toke

∆Q is both the concept (quallity attenuation - the fact that delay and potential for loss is both conserved and only every increases[1]) and for its representation as an improper random variable (one who's CDF doesn't necessarily reach one).

One of my adages is that "network quality" doesn' t exist - just like you can't buy a box of "dark" and make a room dark by opening the box, you can't buy a box of "network quality" - delivering quality in networks is managing (through bounding) the "quality attenuation"

Neil
[1] Delay and loss can be traded - i.e resends or even forward error correction - but you can't reduce the ∆Q - resends mean increased delay to cover loss, forward error correction means increased delay to cover bit error rates) etc This is true at any (and all layers) and in *every* queueing and scheduling mechanism, just one of those nasty universal properties that we can't get away from.

On 27 Apr 2015, at 16:51, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Neil Davies <neil.davies@pnsol.com> writes:
> 
>> Depends on your starting point:
> 
> Right, having looked a bit more at this:
> 
>> - if it is "how does this relate to the end user" - look at "the
>> properties and mathematics of data transport quality"
> 
> This mentions, on slide 30, an analytical model for predicting (changes
> in) ∆Q. Is this Judy Holyer's "A Queueing Theory Model for Real Data
> Networks", or does it refer to something else?
> 
> -Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 10:53         ` Paolo Valente
@ 2015-04-27 20:39           ` David Lang
  2015-05-04 10:31             ` Paolo Valente
  0 siblings, 1 reply; 46+ messages in thread
From: David Lang @ 2015-04-27 20:39 UTC (permalink / raw)
  To: Paolo Valente; +Cc: bloat

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3401 bytes --]

On Mon, 27 Apr 2015, Paolo Valente wrote:

> Il giorno 27/apr/2015, alle ore 12:23, Toke Høiland-Jørgensen <toke@toke.dk> ha scritto:
>
>> Paolo Valente <paolo.valente@unimore.it> writes:
>>
>>> I am sorry, but I realized that what I said was incomplete. The main
>>> cause of my concern is that, from outside the node, we do not know
>>> whether a VoIP packet departs ad a given time because the application
>>> wants it to be sent at that time or because it has waited in the
>>> buffer for a lot of time. Similarly, we do not know how long the VoIP
>>> application will wait before getting its incoming packets delivered.
>>
>> No, not unless the application tells you (by, for instance,
>> timestamping; depending on where in the network stack the timestamp is
>> applied, you can measure different instances of bloat).
>
> That’s exactly what I was thinking about. Actually it seems the only solution to me.
>
> What apparently makes things more difficult is that I am not allowed either to choose the applications to run or to interfere in any way with the flows (e.g., by injecting some extra packet).
>
> Any pointer to previous/current work on this topic?
>
>> Or if you know
>> that an application is supposed to answer you immediately (as is the
>> case with a regular 'ping'), you can measure if it does so even when
>> otherwise loaded.
>>
>
> A ping was one of the first simple actions I suggested, but the answer was, as above: no you cannot ‘touch' the network!
>
>> Of course, you also might not measure anything, if the bottleneck is
>> elsewhere. But if you can control the conditions well enough, you can
>> probably avoid this; just be aware of it. In Linux, combating
>> bufferbloat has been quite the game of whack-a-mole over the last
>> several years :)
>>
>
> Then I guess that now I am trying to build a good mallet according to the 
> rules of the game for this company :)
>
> In any case, the target networks should be observable at such a level that, 
> yes, all relevant conditions should be under control (if one does not make 
> mistakes). My problem is, as I wrote above, to find out what information I can 
> and have to look at.

What is it that you do have available?

Bufferbloat usually isn't a huge problem on the leaf node where the applications 
are running. They usually have a fast local LAN link.

Bufferbloat causes most of it's problems when it's on a middlebox where the 
available bandwidth changes so that one link becomes congested.

If you can monitor packets going in and out of such links, you should be able to 
exactly measure the latency you get going through the device.

If you are trying to probe the network from the outside, without being able to 
even generate ping packets, then you have a problem.

If you can monitor ping packets going into the network, you can figure out how 
long they take to get back out.

Look for other protocols that should have a very fast response time. DNS and NTP 
are probably pretty good options. HTTP requests for small static pages aren't 
always reliable, but can be useful (or especially ones that check for cache 
expiration, HTTP HEAD commands for example)

If you can look at such traffic over a shortish, but not tiny, timeframe, you 
should be able to find the minimum response time for such traffic, and that can 
give you a pretty good idea of the about of minimum latency involved.

David Lang

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 20:38         ` Neil Davies
@ 2015-04-27 21:37           ` Toke Høiland-Jørgensen
  2015-04-28  7:14             ` Neil Davies
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2015-04-27 21:37 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

Neil Davies <neil.davies@pnsol.com> writes:

> ∆Q is both the concept (quallity attenuation - the fact that delay and
> potential for loss is both conserved and only every increases[1]) and
> for its representation as an improper random variable (one who's CDF
> doesn't necessarily reach one).

Right, got it.

> One of my adages is that "network quality" doesn' t exist - just like
> you can't buy a box of "dark" and make a room dark by opening the box,
> you can't buy a box of "network quality" - delivering quality in
> networks is managing (through bounding) the "quality attenuation"

That much seems obvious. But do you have any analytical models that can
actual predict the magnitude of the quality attenuation for a given
network, say? And if so, are they available somewhere?

Also, some of the documents linked to from your web site seems to allude
to a scheduling algorithm of some sort. Is that available in paper form
(or better, code!) anywhere?

Thanks for your answers, will also take a look at the paper you linked
in your other email. :)

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 20:30       ` Neil Davies
@ 2015-04-27 23:11         ` Jonathan Morton
  2015-04-28  7:17           ` Neil Davies
  0 siblings, 1 reply; 46+ messages in thread
From: Jonathan Morton @ 2015-04-27 23:11 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 2293 bytes --]

On 27 Apr 2015 23:31, "Neil Davies" <neil.davies@pnsol.com> wrote:
>
> Hi Jonathan
>
> On 27 Apr 2015, at 16:25, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> One thing that might help you here is the TCP Timestamps option. The
timestamps thus produced are opaque, but you can observe them and measure
the time intervals between their production and echo. You should be able to
infer something from that, with care.
>>
>> To determine the difference between loaded and unloaded states, you may
need to observe for an extended period of time. Eventually you'll observe
some sort of bulk flow, even if it's just a software update cycle. It's not
quite so certain that you'll observe an idle state, but it is sufficient to
observe an instance of the link not being completely saturated, which is
likely to occur at least occasionally.
>>
>> - Jonathan Morton
>
> We looked at using TCP timestamps early on in our work. The problem is
that they don't really help extract the fine-grained information needed.
The timestamps can move in very large steps, and the accuracy (and
precision) can vary widely from implementation to implementation.

Well, that's why you have to treat them as opaque, just like I said. Ignore
whatever meaning the end host producing them might embed in them, and
simply watch which ones get echoed back and when. You only have to rely on
the resolution of your own clocks.

> The timestamps are there to try and get a gross (if my memory serves me
right ~100ms) approximation to the RTT - not good enough for reasoning
about TCP based interactive/"real time" apps

On the contrary, these timestamps can indicate much better precision than
that; in particular they indicate an upper bound on the instantaneous RTT
which can be quite tight under favourable circumstances. On a LAN, you
could reliably determine that the RTT was below 1ms this way.

Now, what it doesn't give you is a strict lower bound. But you can often
look at what's going on in that TCP stream and determine that favourable
circumstances exist, such that the upper bound RTT estimate is probably
reasonably tight. Or you could observe that the stream is mostly idle, and
thus probably influenced by delayed acks and Nagle's algorithm, and
discount that measurement accordingly.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 2665 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 21:37           ` Toke Høiland-Jørgensen
@ 2015-04-28  7:14             ` Neil Davies
  0 siblings, 0 replies; 46+ messages in thread
From: Neil Davies @ 2015-04-28  7:14 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bloat

On 27 Apr 2015, at 22:37, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Neil Davies <neil.davies@pnsol.com> writes:
> 
> .......
>> One of my adages is that "network quality" doesn' t exist - just like
>> you can't buy a box of "dark" and make a room dark by opening the box,
>> you can't buy a box of "network quality" - delivering quality in
>> networks is managing (through bounding) the "quality attenuation"
> 
> That much seems obvious. But do you have any analytical models that can
> actual predict the magnitude of the quality attenuation for a given
> network, say? And if so, are they available somewhere?

Yes, we have models for various network elements (see Lucian's
thesis for the stuff at CERN for example - see website) and the rest we can ascertain
by measurement.

The problem with a generic model is that is configuration dependent, you can 
estimate the overall ∆Q|V (as a starting point) using existing queueing theory. 
The issue is then how such V is distributed on various timescales (see below).

We measure and model this stuff commercially, and customers tend to be very
sensitive about such metrics!  We do have a technical report that provides
a lot more background for a specific deployment (funded by a public body)
that has been accepted for publication and should be published soon.

> 
> Also, some of the documents linked to from your web site seems to allude
> to a scheduling algorithm of some sort. Is that available in paper form
> (or better, code!) anywhere?

Toke, once you accept that ∆Q exists and is conserved then the only role of 
queueing and scheduling is to "share out the disappointment" (i.e assign ∆Q|V
to the set of competing streams/flows/aggregates).

Yes there is better code (and better approaches, see the patents) but we've found
that the key issue is creating the configurations. Given ∆Q|V's conservation and
that network elements are overbooked (and they are definitely more overbooked in 
the desire for low loss and and consistent low latency than they are for capacity) the
question becomes "what is the overall desired outcome as the network element
reaches saturation" - we can tailor that to any (feasible) collection of desires - now
get people to express, in any way that can be quantified, those desires!

The "code" consists of a collection "quality attenuators" - like cherish/urgency
multiplexers, stochastic shaper/policers arranged in an acyclic graph (all in the 
patent disclosures) - that make up the data path. Associated with that is 
the configuration system: given a set of QTAs - "quantities of quality required (bounds 
on ∆Q) and precedence for breach during saturation"  - it constructs a configuration 
(if one exists) that fulfils that set of requirements - also returning probabilistic measures
of the extent to which individual QTAs are likely not to be fulfilled.

> 
> 
> Thanks for your answers, will also take a look at the paper you linked
> in your other email. :)
> 
> -Toke

Neil

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 23:11         ` Jonathan Morton
@ 2015-04-28  7:17           ` Neil Davies
  2015-04-28  9:58             ` Sebastian Moeller
  2015-04-28 16:05             ` Rick Jones
  0 siblings, 2 replies; 46+ messages in thread
From: Neil Davies @ 2015-04-28  7:17 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 2776 bytes --]

Jonathan

The timestamps don't change very quickly - dozens (or more) of packets can have the same timestamp, so it doesn't give you the appropriate discrimination power. Timed observations at key points gives you all you need (actually, appropriately gathered they give you all you can possibly know - by observation)

Neil

On 28 Apr 2015, at 00:11, Jonathan Morton <chromatix99@gmail.com> wrote:

> On 27 Apr 2015 23:31, "Neil Davies" <neil.davies@pnsol.com> wrote:
> >
> > Hi Jonathan
> >
> > On 27 Apr 2015, at 16:25, Jonathan Morton <chromatix99@gmail.com> wrote:
> >
> >> One thing that might help you here is the TCP Timestamps option. The timestamps thus produced are opaque, but you can observe them and measure the time intervals between their production and echo. You should be able to infer something from that, with care.
> >>
> >> To determine the difference between loaded and unloaded states, you may need to observe for an extended period of time. Eventually you'll observe some sort of bulk flow, even if it's just a software update cycle. It's not quite so certain that you'll observe an idle state, but it is sufficient to observe an instance of the link not being completely saturated, which is likely to occur at least occasionally.
> >>
> >> - Jonathan Morton
> >
> > We looked at using TCP timestamps early on in our work. The problem is that they don't really help extract the fine-grained information needed. The timestamps can move in very large steps, and the accuracy (and precision) can vary widely from implementation to implementation.
> 
> Well, that's why you have to treat them as opaque, just like I said. Ignore whatever meaning the end host producing them might embed in them, and simply watch which ones get echoed back and when. You only have to rely on the resolution of your own clocks.
> 
> > The timestamps are there to try and get a gross (if my memory serves me right ~100ms) approximation to the RTT - not good enough for reasoning about TCP based interactive/"real time" apps
> 
> On the contrary, these timestamps can indicate much better precision than that; in particular they indicate an upper bound on the instantaneous RTT which can be quite tight under favourable circumstances. On a LAN, you could reliably determine that the RTT was below 1ms this way.
> 
> Now, what it doesn't give you is a strict lower bound. But you can often look at what's going on in that TCP stream and determine that favourable circumstances exist, such that the upper bound RTT estimate is probably reasonably tight. Or you could observe that the stream is mostly idle, and thus probably influenced by delayed acks and Nagle's algorithm, and discount that measurement accordingly.
> 
> - Jonathan Morton
> 


[-- Attachment #2: Type: text/html, Size: 3412 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-28  7:17           ` Neil Davies
@ 2015-04-28  9:58             ` Sebastian Moeller
  2015-04-28 10:23               ` Neil Davies
  2015-04-28 16:05             ` Rick Jones
  1 sibling, 1 reply; 46+ messages in thread
From: Sebastian Moeller @ 2015-04-28  9:58 UTC (permalink / raw)
  To: Neil Davies; +Cc: Jonathan Morton, bloat

Hi Neil,


On Apr 28, 2015, at 09:17 , Neil Davies <neil.davies@pnsol.com> wrote:

> Jonathan
> 
> The timestamps don't change very quickly - dozens (or more) of packets can have the same timestamp, so it doesn't give you the appropriate discrimination power. Timed observations at key points gives you all you need (actually, appropriately gathered they give you all you can possibly know - by observation)

	But this has two issues:
1) “timed observations”: relatively easy if all nodes are under your control otherwise hard. I know about the CERN paper, but they had all nodes under their control, symmetric bandwidth and shipload of samples, so over the wild internet “timed observations” are still hard (and harder as the temporal precision requirement goes up)

2) “key points”: once you know the key points you already must have a decent understanding on the effective topology of the network, which again over the wider internet is much harder than if one has all nodes under control.


I am not sure how Paolo’s “no-touching” problem fits into the requirements for your deltaQ (meta-)math ;)

Best Regards
	Sebastian

> 
> Neil
> 
> On 28 Apr 2015, at 00:11, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
>> On 27 Apr 2015 23:31, "Neil Davies" <neil.davies@pnsol.com> wrote:
>> >
>> > Hi Jonathan
>> >
>> > On 27 Apr 2015, at 16:25, Jonathan Morton <chromatix99@gmail.com> wrote:
>> >
>> >> One thing that might help you here is the TCP Timestamps option. The timestamps thus produced are opaque, but you can observe them and measure the time intervals between their production and echo. You should be able to infer something from that, with care.
>> >>
>> >> To determine the difference between loaded and unloaded states, you may need to observe for an extended period of time. Eventually you'll observe some sort of bulk flow, even if it's just a software update cycle. It's not quite so certain that you'll observe an idle state, but it is sufficient to observe an instance of the link not being completely saturated, which is likely to occur at least occasionally.
>> >>
>> >> - Jonathan Morton
>> >
>> > We looked at using TCP timestamps early on in our work. The problem is that they don't really help extract the fine-grained information needed. The timestamps can move in very large steps, and the accuracy (and precision) can vary widely from implementation to implementation.
>> 
>> Well, that's why you have to treat them as opaque, just like I said. Ignore whatever meaning the end host producing them might embed in them, and simply watch which ones get echoed back and when. You only have to rely on the resolution of your own clocks.
>> 
>> > The timestamps are there to try and get a gross (if my memory serves me right ~100ms) approximation to the RTT - not good enough for reasoning about TCP based interactive/"real time" apps
>> 
>> On the contrary, these timestamps can indicate much better precision than that; in particular they indicate an upper bound on the instantaneous RTT which can be quite tight under favourable circumstances. On a LAN, you could reliably determine that the RTT was below 1ms this way.
>> 
>> Now, what it doesn't give you is a strict lower bound. But you can often look at what's going on in that TCP stream and determine that favourable circumstances exist, such that the upper bound RTT estimate is probably reasonably tight. Or you could observe that the stream is mostly idle, and thus probably influenced by delayed acks and Nagle's algorithm, and discount that measurement accordingly.
>> 
>> - Jonathan Morton
>> 
> 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-28  9:58             ` Sebastian Moeller
@ 2015-04-28 10:23               ` Neil Davies
  2015-05-04 10:10                 ` Paolo Valente
  0 siblings, 1 reply; 46+ messages in thread
From: Neil Davies @ 2015-04-28 10:23 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Jonathan Morton, bloat


On 28 Apr 2015, at 10:58, Sebastian Moeller <moeller0@gmx.de> wrote:

> Hi Neil,
> 
> 
> On Apr 28, 2015, at 09:17 , Neil Davies <neil.davies@pnsol.com> wrote:
> 
>> Jonathan
>> 
>> The timestamps don't change very quickly - dozens (or more) of packets can have the same timestamp, so it doesn't give you the appropriate discrimination power. Timed observations at key points gives you all you need (actually, appropriately gathered they give you all you can possibly know - by observation)
> 
> 	But this has two issues:
> 1) “timed observations”: relatively easy if all nodes are under your control otherwise hard. I know about the CERN paper, but they had all nodes under their control, symmetric bandwidth and shipload of samples, so over the wild internet “timed observations” are still hard (and harder as the temporal precision requirement goes up)

∆Q (with its improper CDF semantics and G,S and V basis set) has composition and de-composisition properties - this means that you don’t need to be able to observe everywhere - even in Lucian’s case his observation points were limited (certain systems) - the rest of the analysis is derived using the properies of the ∆Q calculus.

Lucian also demonstrated how the standard timing observations (which include issues of clock drift and distributed accuracy) can be resolved in a practical situation - he reproduced - starting from libpcap captures on machines - results that CERN guys build specialist h/w with better than 20ns timing only 5 years before.

The good thing about Lucian’s thesis is that it is in the public domain - but we use the same approach over wide (i.e world) networks and get same properties (unfortunately that is done in a commercial context). This all arises because we can perform the appropriate measurement error analysis, and hence use standard statistical techniques.

> 
> 2) “key points”: once you know the key points you already must have a decent understanding on the effective topology of the network, which again over the wider internet is much harder than if one has all nodes under control.

Not really - the key points (as a start) are the end ones - and those you have (reasonable) access to - and even if you don’t have access to the *actual* end points - you can easily spin up a measurement point that is very close (in ∆Q terms) to the ones you are interested in - AWS and Google Compute are your friends here.

> 
> 
> I am not sure how Paolo’s “no-touching” problem fits into the requirements for your deltaQ (meta-)math ;)

I see “no touching” as “no modification” - you can’t deduce information in the absence of data - what you need to understand is the minimum data requirements to achieve the measurement outcome - ∆Q calculus gives you that handle.

> 
> Best Regards
> 	Sebastian
> 
>> 
>> Neil
>> 
>> On 28 Apr 2015, at 00:11, Jonathan Morton <chromatix99@gmail.com> wrote:
>> 
>>> On 27 Apr 2015 23:31, "Neil Davies" <neil.davies@pnsol.com> wrote:
>>>> 
>>>> Hi Jonathan
>>>> 
>>>> On 27 Apr 2015, at 16:25, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>> 
>>>>> One thing that might help you here is the TCP Timestamps option. The timestamps thus produced are opaque, but you can observe them and measure the time intervals between their production and echo. You should be able to infer something from that, with care.
>>>>> 
>>>>> To determine the difference between loaded and unloaded states, you may need to observe for an extended period of time. Eventually you'll observe some sort of bulk flow, even if it's just a software update cycle. It's not quite so certain that you'll observe an idle state, but it is sufficient to observe an instance of the link not being completely saturated, which is likely to occur at least occasionally.
>>>>> 
>>>>> - Jonathan Morton
>>>> 
>>>> We looked at using TCP timestamps early on in our work. The problem is that they don't really help extract the fine-grained information needed. The timestamps can move in very large steps, and the accuracy (and precision) can vary widely from implementation to implementation.
>>> 
>>> Well, that's why you have to treat them as opaque, just like I said. Ignore whatever meaning the end host producing them might embed in them, and simply watch which ones get echoed back and when. You only have to rely on the resolution of your own clocks.
>>> 
>>>> The timestamps are there to try and get a gross (if my memory serves me right ~100ms) approximation to the RTT - not good enough for reasoning about TCP based interactive/"real time" apps
>>> 
>>> On the contrary, these timestamps can indicate much better precision than that; in particular they indicate an upper bound on the instantaneous RTT which can be quite tight under favourable circumstances. On a LAN, you could reliably determine that the RTT was below 1ms this way.
>>> 
>>> Now, what it doesn't give you is a strict lower bound. But you can often look at what's going on in that TCP stream and determine that favourable circumstances exist, such that the upper bound RTT estimate is probably reasonably tight. Or you could observe that the stream is mostly idle, and thus probably influenced by delayed acks and Nagle's algorithm, and discount that measurement accordingly.
>>> 
>>> - Jonathan Morton
>>> 
>> 
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-28  7:17           ` Neil Davies
  2015-04-28  9:58             ` Sebastian Moeller
@ 2015-04-28 16:05             ` Rick Jones
  1 sibling, 0 replies; 46+ messages in thread
From: Rick Jones @ 2015-04-28 16:05 UTC (permalink / raw)
  To: bloat

On 04/28/2015 12:17 AM, Neil Davies wrote:
> Jonathan
>
> The timestamps don't change very quickly - dozens (or more) of packets
> can have the same timestamp, so it doesn't give you the appropriate
> discrimination power. Timed observations at key points gives you all you
> need (actually, appropriately gathered they give you all you can
> possibly know - by observation)

So probably a silly question, but can't you just consider the first time 
you see a timestamp going in the one direction and then the first time 
you see that timestamp coming back?

rick jones


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-28 10:23               ` Neil Davies
@ 2015-05-04 10:10                 ` Paolo Valente
  2015-05-04 10:21                   ` Neil Davies
  2015-05-04 10:28                   ` Jonathan Morton
  0 siblings, 2 replies; 46+ messages in thread
From: Paolo Valente @ 2015-05-04 10:10 UTC (permalink / raw)
  To: Neil Davies; +Cc: Jonathan Morton, bloat

I have tried to fully digest this information (thanks), but there is still some piece that I am missing. To highlight it, I would like to try with an oversimplified example. I hope this will make it easier to point out flaws in my understanding.

Suppose that one wants/needs to discover whether outbound and/or inbound packets experience high, internal queueing delays, in a given node A, because some buffers are bloated (inside the node). For any packet leaving or entering the node, we have that, regardless of whether the packet exits from the node after experiencing a high internal output-queueing delay, or whether the packet will experience a high internal input-queueing delay after being received by node, the per-hop or end-to-end delays experienced by the packet outside the node are exactly the same. If this statement is true, then, since no information of any sort is available about queueing delays inside the node, and since the delays measurable from outside the node are invariant with respect to the internal queueing delays, how can we deduce internal delays from external ones?

Thanks,
Paolo

Il giorno 28/apr/2015, alle ore 12:23, Neil Davies <neil.davies@pnsol.com> ha scritto:

> 
> On 28 Apr 2015, at 10:58, Sebastian Moeller <moeller0@gmx.de> wrote:
> 
>> Hi Neil,
>> 
>> 
>> On Apr 28, 2015, at 09:17 , Neil Davies <neil.davies@pnsol.com> wrote:
>> 
>>> Jonathan
>>> 
>>> The timestamps don't change very quickly - dozens (or more) of packets can have the same timestamp, so it doesn't give you the appropriate discrimination power. Timed observations at key points gives you all you need (actually, appropriately gathered they give you all you can possibly know - by observation)
>> 
>> 	But this has two issues:
>> 1) “timed observations”: relatively easy if all nodes are under your control otherwise hard. I know about the CERN paper, but they had all nodes under their control, symmetric bandwidth and shipload of samples, so over the wild internet “timed observations” are still hard (and harder as the temporal precision requirement goes up)
> 
> ∆Q (with its improper CDF semantics and G,S and V basis set) has composition and de-composisition properties - this means that you don’t need to be able to observe everywhere - even in Lucian’s case his observation points were limited (certain systems) - the rest of the analysis is derived using the properies of the ∆Q calculus.
> 
> Lucian also demonstrated how the standard timing observations (which include issues of clock drift and distributed accuracy) can be resolved in a practical situation - he reproduced - starting from libpcap captures on machines - results that CERN guys build specialist h/w with better than 20ns timing only 5 years before.
> 
> The good thing about Lucian’s thesis is that it is in the public domain - but we use the same approach over wide (i.e world) networks and get same properties (unfortunately that is done in a commercial context). This all arises because we can perform the appropriate measurement error analysis, and hence use standard statistical techniques.
> 
>> 
>> 2) “key points”: once you know the key points you already must have a decent understanding on the effective topology of the network, which again over the wider internet is much harder than if one has all nodes under control.
> 
> Not really - the key points (as a start) are the end ones - and those you have (reasonable) access to - and even if you don’t have access to the *actual* end points - you can easily spin up a measurement point that is very close (in ∆Q terms) to the ones you are interested in - AWS and Google Compute are your friends here.
> 
>> 
>> 
>> I am not sure how Paolo’s “no-touching” problem fits into the requirements for your deltaQ (meta-)math ;)
> 
> I see “no touching” as “no modification” - you can’t deduce information in the absence of data - what you need to understand is the minimum data requirements to achieve the measurement outcome - ∆Q calculus gives you that handle.
> 
>> 
>> Best Regards
>> 	Sebastian
>> 
>>> 
>>> Neil
>>> 
>>> On 28 Apr 2015, at 00:11, Jonathan Morton <chromatix99@gmail.com> wrote:
>>> 
>>>> On 27 Apr 2015 23:31, "Neil Davies" <neil.davies@pnsol.com> wrote:
>>>>> 
>>>>> Hi Jonathan
>>>>> 
>>>>> On 27 Apr 2015, at 16:25, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>>> 
>>>>>> One thing that might help you here is the TCP Timestamps option. The timestamps thus produced are opaque, but you can observe them and measure the time intervals between their production and echo. You should be able to infer something from that, with care.
>>>>>> 
>>>>>> To determine the difference between loaded and unloaded states, you may need to observe for an extended period of time. Eventually you'll observe some sort of bulk flow, even if it's just a software update cycle. It's not quite so certain that you'll observe an idle state, but it is sufficient to observe an instance of the link not being completely saturated, which is likely to occur at least occasionally.
>>>>>> 
>>>>>> - Jonathan Morton
>>>>> 
>>>>> We looked at using TCP timestamps early on in our work. The problem is that they don't really help extract the fine-grained information needed. The timestamps can move in very large steps, and the accuracy (and precision) can vary widely from implementation to implementation.
>>>> 
>>>> Well, that's why you have to treat them as opaque, just like I said. Ignore whatever meaning the end host producing them might embed in them, and simply watch which ones get echoed back and when. You only have to rely on the resolution of your own clocks.
>>>> 
>>>>> The timestamps are there to try and get a gross (if my memory serves me right ~100ms) approximation to the RTT - not good enough for reasoning about TCP based interactive/"real time" apps
>>>> 
>>>> On the contrary, these timestamps can indicate much better precision than that; in particular they indicate an upper bound on the instantaneous RTT which can be quite tight under favourable circumstances. On a LAN, you could reliably determine that the RTT was below 1ms this way.
>>>> 
>>>> Now, what it doesn't give you is a strict lower bound. But you can often look at what's going on in that TCP stream and determine that favourable circumstances exist, such that the upper bound RTT estimate is probably reasonably tight. Or you could observe that the stream is mostly idle, and thus probably influenced by delayed acks and Nagle's algorithm, and discount that measurement accordingly.
>>>> 
>>>> - Jonathan Morton
>>>> 
>>> 
>>> _______________________________________________
>>> Bloat mailing list
>>> Bloat@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>> 
> 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 10:10                 ` Paolo Valente
@ 2015-05-04 10:21                   ` Neil Davies
  2015-05-04 10:28                   ` Jonathan Morton
  1 sibling, 0 replies; 46+ messages in thread
From: Neil Davies @ 2015-05-04 10:21 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jonathan Morton, bloat


On 4 May 2015, at 11:10, Paolo Valente <paolo.valente@unimore.it> wrote:

> I have tried to fully digest this information (thanks), but there is still some piece that I am missing. To highlight it, I would like to try with an oversimplified example. I hope this will make it easier to point out flaws in my understanding.
> 
> Suppose that one wants/needs to discover whether outbound and/or inbound packets experience high, internal queueing delays, in a given node A, because some buffers are bloated (inside the node). For any packet leaving or entering the node, we have that, regardless of whether the packet exits from the node after experiencing a high internal output-queueing delay, or whether the packet will experience a high internal input-queueing delay after being received by node, the per-hop or end-to-end delays experienced by the packet outside the node are exactly the same. If this statement is true, then, since no information of any sort is available about queueing delays inside the node, and since the delays measurable from outside the node are invariant with respect to the internal queueing delays, how can we deduce internal delays from external ones?

Paolo, 

as you surmise - without appropriate observation points you can’t *definitively* isolate where the “delay” (really ∆Q) is accruing. If you can construct time-traces at more than one observation point you can isolate it to between observation points.

If you have a model of how it is supposed to behave (i.e. have knowledge about the intermediate network elements along with some idea of their configuration) you can start (by observing the pattern of change in the delay/loss) to make inferences that certain elements are being driven in a certain way - but beware, the model of inference is key - use of “bandwidth” or even just “delay” doesn’t work - it does appear to work if you use the ∆Q|G,S,V as the basis set

Neil

> 
> Thanks,
> Paolo
> 
> Il giorno 28/apr/2015, alle ore 12:23, Neil Davies <neil.davies@pnsol.com> ha scritto:
> 
>> 
>> On 28 Apr 2015, at 10:58, Sebastian Moeller <moeller0@gmx.de> wrote:
>> 
>>> Hi Neil,
>>> 
>>> 
>>> On Apr 28, 2015, at 09:17 , Neil Davies <neil.davies@pnsol.com> wrote:
>>> 
>>>> Jonathan
>>>> 
>>>> The timestamps don't change very quickly - dozens (or more) of packets can have the same timestamp, so it doesn't give you the appropriate discrimination power. Timed observations at key points gives you all you need (actually, appropriately gathered they give you all you can possibly know - by observation)
>>> 
>>> 	But this has two issues:
>>> 1) “timed observations”: relatively easy if all nodes are under your control otherwise hard. I know about the CERN paper, but they had all nodes under their control, symmetric bandwidth and shipload of samples, so over the wild internet “timed observations” are still hard (and harder as the temporal precision requirement goes up)
>> 
>> ∆Q (with its improper CDF semantics and G,S and V basis set) has composition and de-composisition properties - this means that you don’t need to be able to observe everywhere - even in Lucian’s case his observation points were limited (certain systems) - the rest of the analysis is derived using the properies of the ∆Q calculus.
>> 
>> Lucian also demonstrated how the standard timing observations (which include issues of clock drift and distributed accuracy) can be resolved in a practical situation - he reproduced - starting from libpcap captures on machines - results that CERN guys build specialist h/w with better than 20ns timing only 5 years before.
>> 
>> The good thing about Lucian’s thesis is that it is in the public domain - but we use the same approach over wide (i.e world) networks and get same properties (unfortunately that is done in a commercial context). This all arises because we can perform the appropriate measurement error analysis, and hence use standard statistical techniques.
>> 
>>> 
>>> 2) “key points”: once you know the key points you already must have a decent understanding on the effective topology of the network, which again over the wider internet is much harder than if one has all nodes under control.
>> 
>> Not really - the key points (as a start) are the end ones - and those you have (reasonable) access to - and even if you don’t have access to the *actual* end points - you can easily spin up a measurement point that is very close (in ∆Q terms) to the ones you are interested in - AWS and Google Compute are your friends here.
>> 
>>> 
>>> 
>>> I am not sure how Paolo’s “no-touching” problem fits into the requirements for your deltaQ (meta-)math ;)
>> 
>> I see “no touching” as “no modification” - you can’t deduce information in the absence of data - what you need to understand is the minimum data requirements to achieve the measurement outcome - ∆Q calculus gives you that handle.
>> 
>>> 
>>> Best Regards
>>> 	Sebastian
>>> 
>>>> 
>>>> Neil
>>>> 
>>>> On 28 Apr 2015, at 00:11, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>> 
>>>>> On 27 Apr 2015 23:31, "Neil Davies" <neil.davies@pnsol.com> wrote:
>>>>>> 
>>>>>> Hi Jonathan
>>>>>> 
>>>>>> On 27 Apr 2015, at 16:25, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>>>> 
>>>>>>> One thing that might help you here is the TCP Timestamps option. The timestamps thus produced are opaque, but you can observe them and measure the time intervals between their production and echo. You should be able to infer something from that, with care.
>>>>>>> 
>>>>>>> To determine the difference between loaded and unloaded states, you may need to observe for an extended period of time. Eventually you'll observe some sort of bulk flow, even if it's just a software update cycle. It's not quite so certain that you'll observe an idle state, but it is sufficient to observe an instance of the link not being completely saturated, which is likely to occur at least occasionally.
>>>>>>> 
>>>>>>> - Jonathan Morton
>>>>>> 
>>>>>> We looked at using TCP timestamps early on in our work. The problem is that they don't really help extract the fine-grained information needed. The timestamps can move in very large steps, and the accuracy (and precision) can vary widely from implementation to implementation.
>>>>> 
>>>>> Well, that's why you have to treat them as opaque, just like I said. Ignore whatever meaning the end host producing them might embed in them, and simply watch which ones get echoed back and when. You only have to rely on the resolution of your own clocks.
>>>>> 
>>>>>> The timestamps are there to try and get a gross (if my memory serves me right ~100ms) approximation to the RTT - not good enough for reasoning about TCP based interactive/"real time" apps
>>>>> 
>>>>> On the contrary, these timestamps can indicate much better precision than that; in particular they indicate an upper bound on the instantaneous RTT which can be quite tight under favourable circumstances. On a LAN, you could reliably determine that the RTT was below 1ms this way.
>>>>> 
>>>>> Now, what it doesn't give you is a strict lower bound. But you can often look at what's going on in that TCP stream and determine that favourable circumstances exist, such that the upper bound RTT estimate is probably reasonably tight. Or you could observe that the stream is mostly idle, and thus probably influenced by delayed acks and Nagle's algorithm, and discount that measurement accordingly.
>>>>> 
>>>>> - Jonathan Morton
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bloat mailing list
>>>> Bloat@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/bloat
>>> 
>> 
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
> 
> 
> --
> Paolo Valente                                                 
> Algogroup
> Dipartimento di Fisica, Informatica e Matematica		
> Via Campi, 213/B
> 41125 Modena - Italy        				  
> homepage:  http://algogroup.unimore.it/people/paolo/
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 10:10                 ` Paolo Valente
  2015-05-04 10:21                   ` Neil Davies
@ 2015-05-04 10:28                   ` Jonathan Morton
  2015-05-04 10:41                     ` Paolo Valente
  2015-05-04 10:42                     ` Neil Davies
  1 sibling, 2 replies; 46+ messages in thread
From: Jonathan Morton @ 2015-05-04 10:28 UTC (permalink / raw)
  To: Paolo Valente; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

Generally, the minimum observed delay will correspond to the case when both
inbound and outbound queues are empty throughout the path. This delay
should correspond to basic propagation and forwarding delays, which can't
be reduced further without altering some aspect of the network.

Higher observed delays than this will tend to correspond to one or both of
the buffers at the bottleneck being persistently filled. To work out which
one, you'll need to estimate the network load in each direction. This is of
course easiest if you can see all or most of the traffic passing the
bottleneck link, or if you yourself are participating in that load, but
it's probably possible in some other situations if you get creative.

To determine that bloat is NOT present, you need to observe delays that are
close to the baseline unloaded condition, while also being fairly sure that
the bottleneck link is saturated in the relevant direction.

The most reliable indication of link saturation is to observe ECN marked
packets, which will only normally be produced by an AQM algorithm
signalling link congestion (where both endpoints of the flow have
negotiated ECN support). A slightly less reliable indication of saturation
is to observe lost packets, either via retransmission or ack patterns,
especially if they occur in bursts or at remarkably regular intervals.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 1481 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 20:39           ` David Lang
@ 2015-05-04 10:31             ` Paolo Valente
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Valente @ 2015-05-04 10:31 UTC (permalink / raw)
  To: David Lang; +Cc: bloat


Il giorno 27/apr/2015, alle ore 22:39, David Lang <david@lang.hm> ha scritto:

> On Mon, 27 Apr 2015, Paolo Valente wrote:
> 
>> Il giorno 27/apr/2015, alle ore 12:23, Toke Høiland-Jørgensen <toke@toke.dk> ha scritto:
>> 
>>> Paolo Valente <paolo.valente@unimore.it> writes:
>>> 
>>>> I am sorry, but I realized that what I said was incomplete. The main
>>>> cause of my concern is that, from outside the node, we do not know
>>>> whether a VoIP packet departs ad a given time because the application
>>>> wants it to be sent at that time or because it has waited in the
>>>> buffer for a lot of time. Similarly, we do not know how long the VoIP
>>>> application will wait before getting its incoming packets delivered.
>>> 
>>> No, not unless the application tells you (by, for instance,
>>> timestamping; depending on where in the network stack the timestamp is
>>> applied, you can measure different instances of bloat).
>> 
>> That’s exactly what I was thinking about. Actually it seems the only solution to me.
>> 
>> What apparently makes things more difficult is that I am not allowed either to choose the applications to run or to interfere in any way with the flows (e.g., by injecting some extra packet).
>> 
>> Any pointer to previous/current work on this topic?
>> 
>>> Or if you know
>>> that an application is supposed to answer you immediately (as is the
>>> case with a regular 'ping'), you can measure if it does so even when
>>> otherwise loaded.
>>> 
>> 
>> A ping was one of the first simple actions I suggested, but the answer was, as above: no you cannot ‘touch' the network!
>> 
>>> Of course, you also might not measure anything, if the bottleneck is
>>> elsewhere. But if you can control the conditions well enough, you can
>>> probably avoid this; just be aware of it. In Linux, combating
>>> bufferbloat has been quite the game of whack-a-mole over the last
>>> several years :)
>>> 
>> 
>> Then I guess that now I am trying to build a good mallet according to the rules of the game for this company :)
>> 
>> In any case, the target networks should be observable at such a level that, yes, all relevant conditions should be under control (if one does not make mistakes). My problem is, as I wrote above, to find out what information I can and have to look at.
> 
> What is it that you do have available?
> 
> Bufferbloat usually isn't a huge problem on the leaf node where the applications are running. They usually have a fast local LAN link.
> 
> Bufferbloat causes most of it's problems when it's on a middlebox where the available bandwidth changes so that one link becomes congested.

Thanks for this clarification. So there are at least two classes of problems to look at: endogenous bufferbloat (unlikely, but probably still to be controlled in a sound network-monitoring tool) and external bufferbloat.

> 
> If you can monitor packets going in and out of such links, you should be able to exactly measure the latency you get going through the device.
> 
> If you are trying to probe the network from the outside, without being able to even generate ping packets, then you have a problem.
> 
> If you can monitor ping packets going into the network, you can figure out how long they take to get back out.
> 

Being measures passive, I think I can only if some users/applications happens to generate these packets.

> Look for other protocols that should have a very fast response time. DNS and NTP are probably pretty good options. HTTP requests for small static pages aren't always reliable, but can be useful (or especially ones that check for cache expiration, HTTP HEAD commands for example)
> 
> If you can look at such traffic over a shortish, but not tiny, timeframe, you should be able to find the minimum response time for such traffic, and that can give you a pretty good idea of the about of minimum latency involved.
> 

Thanks a lot. Differently from ping, I guess that these types of traffic are likely to be frequently in progress in typical networks.

Could your considerations be further extended/generalized as follows: for any node belonging to the network (being monitored), if the node forwards packets, then we can detect somehow if it is suffering from bufferbloat by measuring for how long it holds forwarded packets? (Which is probably related to what Neil patiently tried to explain me in his just-arrived email.)

Thanks,
Paolo

> David Lang


--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 10:28                   ` Jonathan Morton
@ 2015-05-04 10:41                     ` Paolo Valente
  2015-05-04 10:44                       ` Neil Davies
  2015-05-04 10:42                     ` Neil Davies
  1 sibling, 1 reply; 46+ messages in thread
From: Paolo Valente @ 2015-05-04 10:41 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

Thanks for this extra information and suggestions. Just to be certain that I am not missing anything: I am assuming that also the observed delay you mention is the delay observed from outside endpoints, and not the total delay that one would obtain by adding also the queueing delays inside end points (which, in my case, is one of the unobservable quantities).

Il giorno 04/mag/2015, alle ore 12:28, Jonathan Morton <chromatix99@gmail.com> ha scritto:

> Generally, the minimum observed delay will correspond to the case when both inbound and outbound queues are empty throughout the path. This delay should correspond to basic propagation and forwarding delays, which can't be reduced further without altering some aspect of the network.
> 
> Higher observed delays than this will tend to correspond to one or both of the buffers at the bottleneck being persistently filled. To work out which one, you'll need to estimate the network load in each direction. This is of course easiest if you can see all or most of the traffic passing the bottleneck link, or if you yourself are participating in that load, but it's probably possible in some other situations if you get creative.
> 
> To determine that bloat is NOT present, you need to observe delays that are close to the baseline unloaded condition, while also being fairly sure that the bottleneck link is saturated in the relevant direction.
> 
> The most reliable indication of link saturation is to observe ECN marked packets, which will only normally be produced by an AQM algorithm signalling link congestion (where both endpoints of the flow have negotiated ECN support). A slightly less reliable indication of saturation is to observe lost packets, either via retransmission or ack patterns, especially if they occur in bursts or at remarkably regular intervals.
> 
> - Jonathan Morton


--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 10:28                   ` Jonathan Morton
  2015-05-04 10:41                     ` Paolo Valente
@ 2015-05-04 10:42                     ` Neil Davies
  2015-05-04 11:33                       ` Jonathan Morton
  1 sibling, 1 reply; 46+ messages in thread
From: Neil Davies @ 2015-05-04 10:42 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 2418 bytes --]


On 4 May 2015, at 11:28, Jonathan Morton <chromatix99@gmail.com> wrote:

> Generally, the minimum observed delay will correspond to the case when both inbound and outbound queues are empty throughout the path. This delay should correspond to basic propagation and forwarding delays, which can't be reduced further without altering some aspect of the network.
> 
> 

Yep, that corresponds to (∆Q|G + ∆Q|S(min packet size)) - note that the composition/de-composition only work on the individual bases - i.e need to deal with the delay in terms of ∆Q|G seperately.  We call the ∆Q|G and ∆Q|S the "structural delay" - as you point out needs change in some aspects of the network elements/their arrangement.

> Higher observed delays than this will tend to correspond to one or both of the buffers at the bottleneck being persistently filled.
> 
the ∆Q|V (which can be established by measuring the delay and subtracting the (∆Q|G + ∆Q|S(packet size)) can measure this as an instantaneous value (not just at a persistent filing) - we've got measurements that show queues filling and emptying .


> To work out which one, you'll need to estimate the network load in each direction. This is of course easiest if you can see all or most of the traffic passing the bottleneck link, or if you yourself are participating in that load, but it's probably possible in some other situations if you get creative.
> 
> 

To estimate the contention for the common resource you need more than the load - you need the traffic pattern as well 
> To determine that bloat is NOT present, you need to observe delays that are close to the baseline unloaded condition, while also being fairly sure that the bottleneck link is saturated in the relevant direction.
> 
> 

Noting that, delay and loss is, of course, a natural consequence of having a shared medium and that (sorry for being a bit contentious) - bloat is a subjective not objective term as
> The most reliable indication of link saturation is to observe ECN marked packets, which will only normally be produced by an AQM algorithm signalling link congestion (where both endpoints of the flow have negotiated ECN support). A slightly less reliable indication of saturation is to observe lost packets, either via retransmission or ack patterns, especially if they occur in bursts or at remarkably regular intervals.
> 
> - Jonathan Morton


[-- Attachment #2: Type: text/html, Size: 3102 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 10:41                     ` Paolo Valente
@ 2015-05-04 10:44                       ` Neil Davies
  0 siblings, 0 replies; 46+ messages in thread
From: Neil Davies @ 2015-05-04 10:44 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jonathan Morton, bloat


On 4 May 2015, at 11:41, Paolo Valente <paolo.valente@unimore.it> wrote:

> Thanks for this extra information and suggestions. Just to be certain that I am not missing anything: I am assuming that also the observed delay you mention is the delay observed from outside endpoints, and not the total delay that one would obtain by adding also the queueing delays inside end points (which, in my case, is one of the unobservable quantities).

Paolo, this is where the difference between ∆Q|G, ∆Q|S and ∆Q|V are important - the extraction of the ∆Q|G and ∆Q|S establishes the structural delay, differences from that structural delay are caused by the contention for the onward transmission resources.

> Il giorno 04/mag/2015, alle ore 12:28, Jonathan Morton <chromatix99@gmail.com> ha scritto:
> 
>> Generally, the minimum observed delay will correspond to the case when both inbound and outbound queues are empty throughout the path. This delay should correspond to basic propagation and forwarding delays, which can't be reduced further without altering some aspect of the network.
>> 
>> Higher observed delays than this will tend to correspond to one or both of the buffers at the bottleneck being persistently filled. To work out which one, you'll need to estimate the network load in each direction. This is of course easiest if you can see all or most of the traffic passing the bottleneck link, or if you yourself are participating in that load, but it's probably possible in some other situations if you get creative.
>> 
>> To determine that bloat is NOT present, you need to observe delays that are close to the baseline unloaded condition, while also being fairly sure that the bottleneck link is saturated in the relevant direction.
>> 
>> The most reliable indication of link saturation is to observe ECN marked packets, which will only normally be produced by an AQM algorithm signalling link congestion (where both endpoints of the flow have negotiated ECN support). A slightly less reliable indication of saturation is to observe lost packets, either via retransmission or ack patterns, especially if they occur in bursts or at remarkably regular intervals.
>> 
>> - Jonathan Morton
> 
> 
> --
> Paolo Valente                                                 
> Algogroup
> Dipartimento di Fisica, Informatica e Matematica		
> Via Campi, 213/B
> 41125 Modena - Italy        				  
> homepage:  http://algogroup.unimore.it/people/paolo/
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 10:42                     ` Neil Davies
@ 2015-05-04 11:33                       ` Jonathan Morton
  2015-05-04 11:39                         ` Neil Davies
  0 siblings, 1 reply; 46+ messages in thread
From: Jonathan Morton @ 2015-05-04 11:33 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

> On 4 May, 2015, at 13:42, Neil Davies <neil.davies@pnsol.com> wrote:
> 
> Noting that, delay and loss is, of course, a natural consequence of having a shared medium

Not so.  Delay and loss are inherent to link oversubscription, not to contention.  Without ECN, delay is traded off against loss by the size of the buffer; a higher loss rate keeps the queue shorter and thus the induced delay lower.

You can have just as much delay and loss in a single flow on a dedicated, point-to-point, full-duplex link (in other words, one that is *not* a shared medium) as on the same link with multiple flows contending for it.

Conversely, we can demonstrate almost zero flow-to-flow induced delay and zero loss by adding AQM, FQ and ECN, even in a fairly heavy multi-flow, multi-host scenario.

AQM with ECN solves the oversubscription problem (send rates will oscillate around the true link rate instead of exceeding it), without causing packet loss (because ECN can signal congestion instead), and FQ further reduces the most easily perceived delay (ie. flow-to-flow induced) as well as improving fairness.

Of course, loss can also be caused by poor link quality, but that’s an entirely separate problem.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 11:33                       ` Jonathan Morton
@ 2015-05-04 11:39                         ` Neil Davies
  2015-05-04 12:17                           ` Jonathan Morton
  0 siblings, 1 reply; 46+ messages in thread
From: Neil Davies @ 2015-05-04 11:39 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat


On 4 May 2015, at 12:33, Jonathan Morton <chromatix99@gmail.com> wrote:

> 
>> On 4 May, 2015, at 13:42, Neil Davies <neil.davies@pnsol.com> wrote:
>> 
>> Noting that, delay and loss is, of course, a natural consequence of having a shared medium
> 
> Not so.  Delay and loss are inherent to link oversubscription, not to contention.  Without ECN, delay is traded off against loss by the size of the buffer; a higher loss rate keeps the queue shorter and thus the induced delay lower.

Sorry Jonathan - that’s not what we’ve observed. We’ve measured “excessive” delay on links that are averagely loaded << 0.1% (as measured over a 15 min period) - I can supply pointers to the graphs for that. 

> 
> You can have just as much delay and loss in a single flow on a dedicated, point-to-point, full-duplex link (in other words, one that is *not* a shared medium) as on the same link with multiple flows contending for it.

A single flow can contend the medium just as much as a multiple ones - it is the total arrival pattern that is important, which may be related to the number of flows (in that there is more freedom in the system).

> 
> Conversely, we can demonstrate almost zero flow-to-flow induced delay and zero loss by adding AQM, FQ and ECN, even in a fairly heavy multi-flow, multi-host scenario.
> 
> AQM with ECN solves the oversubscription problem (send rates will oscillate around the true link rate instead of exceeding it), without causing packet loss (because ECN can signal congestion instead), and FQ further reduces the most easily perceived delay (ie. flow-to-flow induced) as well as improving fairness.
> 
> Of course, loss can also be caused by poor link quality, but that’s an entirely separate problem.

It is a separate cause, agreed, but it has a similar effect...

> 
> - Jonathan Morton
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 11:39                         ` Neil Davies
@ 2015-05-04 12:17                           ` Jonathan Morton
  2015-05-04 12:35                             ` Neil Davies
  0 siblings, 1 reply; 46+ messages in thread
From: Jonathan Morton @ 2015-05-04 12:17 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

> On 4 May, 2015, at 14:39, Neil Davies <neil.davies@pnsol.com> wrote:
> 
>>> Noting that, delay and loss is, of course, a natural consequence of having a shared medium
>> 
>> Not so.  Delay and loss are inherent to link oversubscription, not to contention.  Without ECN, delay is traded off against loss by the size of the buffer; a higher loss rate keeps the queue shorter and thus the induced delay lower.
> 
> Sorry Jonathan - that’s not what we’ve observed. We’ve measured “excessive” delay on links that are averagely loaded << 0.1% (as measured over a 15 min period) - I can supply pointers to the graphs for that. 

Presumably those would involve oversubscription on short timescales, and a lot of link idle time between those episodes.

One ISP I know of charges by data volume per month, currently in units of 75GB (minimum 2 per month, so 150GB).  This is on ADSL lines where the link rate might reasonably be 15Mbps or so in the relevant direction.  At that speed, it would take 100,000 seconds to exhaust the first two units - which is not much more than 24 hours.  There is therefore roughly a 26-fold mismatch between the peak rate available to the user and the average rate he must maintain to stay within the data allowance.

(I am ignoring small niceties in the calculations here, in favour of revealing the big picture without too much heavy maths.)

By your measure, that would mean that the link could only ever be 3.85% utilised (1/26th) on month-long timescales, and is therefore undersubscribed.  But I can assure you that, during the small percentage of time that the link is in active use, it will spend some time at 100% utilisation on RTT timescales, with TCP/IP straining to achieve more than that.  That is link oversubscription which results in high induced delay.

More precisely, instantaneous link oversubscription results in either *increasing* induced delay (as the buffer fills) or lost packets (which *will* happen if the buffer becomes completely full), while instantaneous link undersubscription results in either *decreasing* induced delay (as the buffer drains) or link idle periods.  Long-timescale measures of link utilisation are simply averages of these instantaneous measures.

> A single flow can contend the medium just as much as a multiple ones

I think here, again, we are using wildly different terminology.

There is no contention for the medium on the dedicated full-duplex link I described, only for queue space - and given a single flow, it cannot contend with itself.

The same goes for a full-duplex shared-access medium (such as DOCSIS cable) with only one host active.  There is no contention for the medium, because it is always available when that single host requests it, which it will as soon as it has at least one packet in its queue.  There is a more-or-less fixed latency for medium access, which becomes part of what you call the structural delay.  The rest is down to over- or under-subscription on short timescales, as above.

On a half-duplex medium, such as obsolete bus Ethernet or not-so-obsolete wifi, then there can be some contention for the medium between forward data and reverse ack packets.  But I was *not* talking about half-duplex.  Full-duplex is an important enough subset of the problem - covering at least ADSL, cable, VDSL, satellite, fibre - on which most of the important effects can be observed, including the ones we’re talking about.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 12:17                           ` Jonathan Morton
@ 2015-05-04 12:35                             ` Neil Davies
  2015-05-04 17:39                               ` David Lang
  0 siblings, 1 reply; 46+ messages in thread
From: Neil Davies @ 2015-05-04 12:35 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

Jonathan

We see the problem as the difference between averages and instantaneous. 

Network media is never “average” used - it is either “in-use” or “idle” - what we were seeing (and it was not an ISP but the core of a public service network here in the UK) was that delay can be “high” even when the loading is “low” (in the particular 5minute period the actual offered traffic was <0.01% of the capacity) - it was that the path under examination happened to be the constraining factor for a bulk transfer - the induced delay was high enough to place at risk other real-time applications (as defined by the public service network’s users).

The reasoning that you seem to be applying below assumes a time-homogenity that doesn’t correspond to network traffic patterns that occur in the engagements we’ve done over the last 15 years. The graph I was referring to is the one example that we can publicly discuss (all the rest are under NDA!).

What you are describing - if I’m understanding it properly - is the “busy period”. I would accept that Network Providers (ISP’s, telcos etc) have a problem in that they are relying on the system becoming idle frequently (the busy periods not accreting into longer and longer periods of non-idleness). However that is a pattern as well as a load dependent phenomena. 

Neil

On 4 May 2015, at 13:17, Jonathan Morton <chromatix99@gmail.com> wrote:

> 
>> On 4 May, 2015, at 14:39, Neil Davies <neil.davies@pnsol.com> wrote:
>> 
>>>> Noting that, delay and loss is, of course, a natural consequence of having a shared medium
>>> 
>>> Not so.  Delay and loss are inherent to link oversubscription, not to contention.  Without ECN, delay is traded off against loss by the size of the buffer; a higher loss rate keeps the queue shorter and thus the induced delay lower.
>> 
>> Sorry Jonathan - that’s not what we’ve observed. We’ve measured “excessive” delay on links that are averagely loaded << 0.1% (as measured over a 15 min period) - I can supply pointers to the graphs for that. 
> 
> Presumably those would involve oversubscription on short timescales, and a lot of link idle time between those episodes.
> 
> One ISP I know of charges by data volume per month, currently in units of 75GB (minimum 2 per month, so 150GB).  This is on ADSL lines where the link rate might reasonably be 15Mbps or so in the relevant direction.  At that speed, it would take 100,000 seconds to exhaust the first two units - which is not much more than 24 hours.  There is therefore roughly a 26-fold mismatch between the peak rate available to the user and the average rate he must maintain to stay within the data allowance.
> 
> (I am ignoring small niceties in the calculations here, in favour of revealing the big picture without too much heavy maths.)
> 
> By your measure, that would mean that the link could only ever be 3.85% utilised (1/26th) on month-long timescales, and is therefore undersubscribed.  But I can assure you that, during the small percentage of time that the link is in active use, it will spend some time at 100% utilisation on RTT timescales, with TCP/IP straining to achieve more than that.  That is link oversubscription which results in high induced delay.
> 
> More precisely, instantaneous link oversubscription results in either *increasing* induced delay (as the buffer fills) or lost packets (which *will* happen if the buffer becomes completely full), while instantaneous link undersubscription results in either *decreasing* induced delay (as the buffer drains) or link idle periods.  Long-timescale measures of link utilisation are simply averages of these instantaneous measures.
> 
>> A single flow can contend the medium just as much as a multiple ones
> 
> I think here, again, we are using wildly different terminology.
> 
> There is no contention for the medium on the dedicated full-duplex link I described, only for queue space - and given a single flow, it cannot contend with itself.
> 
> The same goes for a full-duplex shared-access medium (such as DOCSIS cable) with only one host active.  There is no contention for the medium, because it is always available when that single host requests it, which it will as soon as it has at least one packet in its queue.  There is a more-or-less fixed latency for medium access, which becomes part of what you call the structural delay.  The rest is down to over- or under-subscription on short timescales, as above.
> 
> On a half-duplex medium, such as obsolete bus Ethernet or not-so-obsolete wifi, then there can be some contention for the medium between forward data and reverse ack packets.  But I was *not* talking about half-duplex.  Full-duplex is an important enough subset of the problem - covering at least ADSL, cable, VDSL, satellite, fibre - on which most of the important effects can be observed, including the ones we’re talking about.
> 
> - Jonathan Morton
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 12:35                             ` Neil Davies
@ 2015-05-04 17:39                               ` David Lang
  2015-05-04 19:09                                 ` Jonathan Morton
  0 siblings, 1 reply; 46+ messages in thread
From: David Lang @ 2015-05-04 17:39 UTC (permalink / raw)
  To: Neil Davies; +Cc: Jonathan Morton, bloat

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1452 bytes --]

On Mon, 4 May 2015, Neil Davies wrote:

> Jonathan
>
> We see the problem as the difference between averages and instantaneous.
>
> Network media is never “average” used - it is either “in-use” or “idle” - what 
> we were seeing (and it was not an ISP but the core of a public service network 
> here in the UK) was that delay can be “high” even when the loading is “low” 
> (in the particular 5minute period the actual offered traffic was <0.01% of the 
> capacity) - it was that the path under examination happened to be the 
> constraining factor for a bulk transfer - the induced delay was high enough to 
> place at risk other real-time applications (as defined by the public service 
> network’s users).

If you are doing a single bulk data transfer through a link and are at a small 
percentage of the capacity but yet experiencing long delays, then something is 
wrong.

either you have a small window size so that you aren't actually using the full 
capacity of the link, you are in the ramp-up time, or you have something 
dropping packets preventing you from getting up to full speed. But all of this 
should be affecting the sending machine and the speed that it is generating 
packets.

the device should not be buffering anything noticable if it's at such a small 
percentage of utilization. check to see if there is QoS configuration on the 
system that may be slowing down this traffic (and therefor causing it to queue 
up)

David Lang

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-05-04 17:39                               ` David Lang
@ 2015-05-04 19:09                                 ` Jonathan Morton
  0 siblings, 0 replies; 46+ messages in thread
From: Jonathan Morton @ 2015-05-04 19:09 UTC (permalink / raw)
  To: David Lang; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 189 bytes --]

Or you are defining various terms in a way that makes no sense to us. Given
that you're obviously working with core rather than edge networks, that is
entirely possible.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Bloat] Detecting bufferbloat from outside a node
  2015-04-27 12:03                 ` Toke Høiland-Jørgensen
  2015-04-27 20:19                   ` Neil Davies
@ 2015-05-19 21:23                   ` Alan Jenkins
  1 sibling, 0 replies; 46+ messages in thread
From: Alan Jenkins @ 2015-05-19 21:23 UTC (permalink / raw)
  To: bloat

On 27/04/15 13:03, Toke Høiland-Jørgensen wrote:
> Neil Davies <neil.davies@pnsol.com> writes:
>
>> I don't think that the E2E principle can manage the emerging
>> performance hazards that are arising.
>
> Well, probably not entirely (smart queueing certainly has a place). My
> worry is, however, that going too far in the other direction will turn
> into a Gordian knot of constraints, where anything that doesn't fit into
> the preconceived traffic classes is impossible to do something useful
> with.
>
> Or, to put it another way, I'd like the network to have exactly as much
> intelligence as is needed, but no more. And I'm not sure I trust my ISP
> to make that tradeoff... :(
>
>> We've seen this recently in practice: take a look at
>> http://www.martingeddes.com/how-far-can-the-internet-scale/ - it is
>> based on a real problem we'd encountered.
>
> Well that, and the post linked to from it
> (http://www.martingeddes.com/think-tank/the-future-of-the-internet-the-end-to-end-argument/),
> is certainly quite the broadside against end-to-end principle. Colour me
> intrigued.
>
>> In someways this is just control theory 101 rearing its head... in
>> another it is a large technical challenge for internet provision.
>
> It's been bugging me for a while that most control theory analysis (of
> AQMs in particular) seems to completely ignore transient behaviour and
> jump straight to the steady state.
>
> -Toke

I may be too slow and obvious to be interesting or just plain wrong, but...

A network developer at Google seems to think end-to-end is not yet 
played out.  And that they *do* have an incentive to improve behavior.

https://lists.bufferbloat.net/pipermail/bloat/2015-April/002764.html
https://lists.bufferbloat.net/pipermail/bloat/2015-April/002776.html

Pacing in sch_fq should improve video-on-demand.

HTTP/2 also provides some improvement for web traffic.  *And* the 
multiplexing should remove incentives for websites to stop forcing 
multiple connections ("sharding").  The incentive then reverses because 
connect() (still) requires an RTT.

The two big applications blamed by the article, mitigated out of 
self-interest?  :-).

I can believe dQ / other math might require more than that.  That hiding 
problems with more bandwidth doesn't scale.  ISPs suffering is more 
difficult to swallow from a customer point of view.  But still... 
'Worse is better' [just-in-time fixes] has been a very powerful 
strategy.  <rhetorical> What does the first step look like, and what is 
the cost for customers?

Strawman: How hard is a global _lower_-priority class?  Couldn't 
video-on-demand utilize it to fill an over-size buffer and then smooth 
over these 30 seconds of transient congestion?

Alan

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-05-19 21:23 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-27  9:48 [Bloat] Detecting bufferbloat from outside a node Paolo Valente
2015-04-27  9:54 ` Neil Davies
2015-04-27 10:45   ` Toke Høiland-Jørgensen
2015-04-27 10:57     ` Neil Davies
2015-04-27 14:22       ` Toke Høiland-Jørgensen
2015-04-27 20:27         ` Neil Davies
2015-04-27 15:51       ` Toke Høiland-Jørgensen
2015-04-27 20:38         ` Neil Davies
2015-04-27 21:37           ` Toke Høiland-Jørgensen
2015-04-28  7:14             ` Neil Davies
2015-04-27 11:54   ` Paolo Valente
2015-04-27 15:25     ` Jonathan Morton
2015-04-27 20:30       ` Neil Davies
2015-04-27 23:11         ` Jonathan Morton
2015-04-28  7:17           ` Neil Davies
2015-04-28  9:58             ` Sebastian Moeller
2015-04-28 10:23               ` Neil Davies
2015-05-04 10:10                 ` Paolo Valente
2015-05-04 10:21                   ` Neil Davies
2015-05-04 10:28                   ` Jonathan Morton
2015-05-04 10:41                     ` Paolo Valente
2015-05-04 10:44                       ` Neil Davies
2015-05-04 10:42                     ` Neil Davies
2015-05-04 11:33                       ` Jonathan Morton
2015-05-04 11:39                         ` Neil Davies
2015-05-04 12:17                           ` Jonathan Morton
2015-05-04 12:35                             ` Neil Davies
2015-05-04 17:39                               ` David Lang
2015-05-04 19:09                                 ` Jonathan Morton
2015-04-28 16:05             ` Rick Jones
2015-04-27 20:13     ` Neil Davies
2015-04-27  9:57 ` Toke Høiland-Jørgensen
2015-04-27 10:10   ` Paolo Valente
2015-04-27 10:19     ` Paolo Valente
2015-04-27 10:23       ` Toke Høiland-Jørgensen
2015-04-27 10:53         ` Paolo Valente
2015-04-27 20:39           ` David Lang
2015-05-04 10:31             ` Paolo Valente
2015-04-27 10:26       ` Neil Davies
2015-04-27 10:32         ` Toke Høiland-Jørgensen
2015-04-27 10:38           ` Neil Davies
2015-04-27 10:52             ` Toke Høiland-Jørgensen
2015-04-27 11:03               ` Neil Davies
2015-04-27 12:03                 ` Toke Høiland-Jørgensen
2015-04-27 20:19                   ` Neil Davies
2015-05-19 21:23                   ` Alan Jenkins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox