[Cerowrt-devel] better business bufferbloat monitoring tools?

Development issues regarding the cerowrt test router project
 help / color / mirror / Atom feed

* [Cerowrt-devel] better business bufferbloat monitoring tools?
@ 2015-05-12 16:00 Dave Taht
  2015-05-12 22:17 ` David Lang
  2015-05-13 13:20 ` [Cerowrt-devel] [Bloat] " Bill Ver Steeg (versteb)
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Taht @ 2015-05-12 16:00 UTC (permalink / raw)
  To: bloat, cerowrt-devel

One thread bothering me on dslreports.com is that some folk seem to
think you only get bufferbloat if you stress test the network, where
transient bufferbloat is happening all the time, everywhere.

On one of my main sqm'd network gateways, day in, day out, it reports
about 6000 drops or ecn marks on ingress, and about 300 on egress.
Before I doubled the bandwidth that main box got, the drop rate used
to be much higher, and a great deal of the bloat, drops, etc, has now
moved into the wifi APs deeper into the network where I am not
monitoring it effectively.

I would love to see tools like mrtg, cacti, nagios and smokeping[1] be
more closely integrated, with bloat related plugins, and in
particular, as things like fq_codel and other ecn enabled aqms deploy,
start also tracking congestive events like loss and ecn CE markings on
the bandwidth tracking graphs.

This would counteract to some extent the classic 5 minute bandwidth
summaries everyone looks at, that hide real traffic bursts, latencies
and loss at sub 5 minute timescales.

mrtg and cacti rely on snmp. While loss statistics are deeply part of
snmp, I am not aware of there being a mib for CE events and a quick
google search was unrevealing. ?

There is also a need for more cross-network monitoring using tools
such as that done by this excellent paper.

http://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/measurement_analysis_internet_interconnection.pdf

[1] the network monitoring tools market is quite vast and has many
commercial applications, like intermapper, forks of nagios, vendor
specific producs from cisco, etc, etc. Far too many to list, and so
far as I know, none are reporting ECN related stats, nor combining
latency and loss with bandwidth graphs. I would love to know if any
products, commercial or open source, did....

-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] better business bufferbloat monitoring tools?
  2015-05-12 16:00 [Cerowrt-devel] better business bufferbloat monitoring tools? Dave Taht
@ 2015-05-12 22:17 ` David Lang
  2015-05-13 13:20 ` [Cerowrt-devel] [Bloat] " Bill Ver Steeg (versteb)
  1 sibling, 0 replies; 8+ messages in thread
From: David Lang @ 2015-05-12 22:17 UTC (permalink / raw)
  To: Dave Taht; +Cc: cerowrt-devel, bloat

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2679 bytes --]

On Tue, 12 May 2015, Dave Taht wrote:

> One thread bothering me on dslreports.com is that some folk seem to
> think you only get bufferbloat if you stress test the network, where
> transient bufferbloat is happening all the time, everywhere.
>
> On one of my main sqm'd network gateways, day in, day out, it reports
> about 6000 drops or ecn marks on ingress, and about 300 on egress.
> Before I doubled the bandwidth that main box got, the drop rate used
> to be much higher, and a great deal of the bloat, drops, etc, has now
> moved into the wifi APs deeper into the network where I am not
> monitoring it effectively.
>
> I would love to see tools like mrtg, cacti, nagios and smokeping[1] be
> more closely integrated, with bloat related plugins, and in
> particular, as things like fq_codel and other ecn enabled aqms deploy,
> start also tracking congestive events like loss and ecn CE markings on
> the bandwidth tracking graphs.
>
> This would counteract to some extent the classic 5 minute bandwidth
> summaries everyone looks at, that hide real traffic bursts, latencies
> and loss at sub 5 minute timescales.

The problem is that too many people don't realize that network utilization is 
never 50%, it's always 0% or 100% (if you look at a small enough timeslice). 
With a 5 min ave, 20% utilization could be 100% maxed out and buffering for 60 
seconds, and then idle for the remainder of the time.

I always set my graphs for 1 min samples (and am very tempted to go shorter) 
just because 5 min hides so much.

David Lang

> mrtg and cacti rely on snmp. While loss statistics are deeply part of
> snmp, I am not aware of there being a mib for CE events and a quick
> google search was unrevealing. ?
>
> There is also a need for more cross-network monitoring using tools
> such as that done by this excellent paper.
>
> http://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/measurement_analysis_internet_interconnection.pdf
>
> [1] the network monitoring tools market is quite vast and has many
> commercial applications, like intermapper, forks of nagios, vendor
> specific producs from cisco, etc, etc. Far too many to list, and so
> far as I know, none are reporting ECN related stats, nor combining
> latency and loss with bandwidth graphs. I would love to know if any
> products, commercial or open source, did....
>
> -- 
> Dave Täht
> Open Networking needs **Open Source Hardware**
>
> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] [Bloat] better business bufferbloat monitoring tools?
  2015-05-12 16:00 [Cerowrt-devel] better business bufferbloat monitoring tools? Dave Taht
  2015-05-12 22:17 ` David Lang
@ 2015-05-13 13:20 ` Bill Ver Steeg (versteb)
  2015-05-13 13:36   ` Dave Taht
  2015-05-13 15:30   ` [Cerowrt-devel] " Jim Gettys
  1 sibling, 2 replies; 8+ messages in thread
From: Bill Ver Steeg (versteb) @ 2015-05-13 13:20 UTC (permalink / raw)
  To: Dave Taht, bloat, cerowrt-devel

Time scales are important. Any time you use TCP to send a moderately large file, you drive the link into congestion. Sometimes this is for a few milliseconds per hour and sometimes this is for 10s of minutes per hour.

For instance, watching a 3 Mbps video (Netflix/YouTube/whatever) on a 4 Mbps link with no cross traffic can cause significant bloat, particularly on older tail drop middleboxes.  The host code does an HTTP get every N seconds, and drives the link as hard as it can until it gets the video chunk. It waits a second or two and then does it again. Rinse and Repeat. You end up with a very characteristic delay plot. The bloat starts at 0, builds until the middlebox provides congestion feedback, then sawtooths around at about the buffer size. When the burst ends, the middlebox burns down its buffer and bloat goes back to zero. Wait a second or two and do it again.

You can't fix this by adding bandwidth to the link. The endpoint's TCP sessions will simply ramp up to fill the link. You will shorten the congested phase of the cycle, but TCP will ALWAYS FILL THE LINK (given enough time to ramp up)

The new AQM (and FQ_AQM) algorithms do a much better job of controlling the oscillatory bloat, but you can still see ABR video patterns in the delay figures. 

Bvs

-----Original Message-----
From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
Sent: Tuesday, May 12, 2015 12:00 PM
To: bloat; cerowrt-devel@lists.bufferbloat.net
Subject: [Bloat] better business bufferbloat monitoring tools?

One thread bothering me on dslreports.com is that some folk seem to think you only get bufferbloat if you stress test the network, where transient bufferbloat is happening all the time, everywhere.

On one of my main sqm'd network gateways, day in, day out, it reports about 6000 drops or ecn marks on ingress, and about 300 on egress.
Before I doubled the bandwidth that main box got, the drop rate used to be much higher, and a great deal of the bloat, drops, etc, has now moved into the wifi APs deeper into the network where I am not monitoring it effectively.

I would love to see tools like mrtg, cacti, nagios and smokeping[1] be more closely integrated, with bloat related plugins, and in particular, as things like fq_codel and other ecn enabled aqms deploy, start also tracking congestive events like loss and ecn CE markings on the bandwidth tracking graphs.

This would counteract to some extent the classic 5 minute bandwidth summaries everyone looks at, that hide real traffic bursts, latencies and loss at sub 5 minute timescales.

mrtg and cacti rely on snmp. While loss statistics are deeply part of snmp, I am not aware of there being a mib for CE events and a quick google search was unrevealing. ?

There is also a need for more cross-network monitoring using tools such as that done by this excellent paper.

http://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/measurement_analysis_internet_interconnection.pdf

[1] the network monitoring tools market is quite vast and has many commercial applications, like intermapper, forks of nagios, vendor specific producs from cisco, etc, etc. Far too many to list, and so far as I know, none are reporting ECN related stats, nor combining latency and loss with bandwidth graphs. I would love to know if any products, commercial or open source, did....

--
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] [Bloat] better business bufferbloat monitoring tools?
  2015-05-13 13:20 ` [Cerowrt-devel] [Bloat] " Bill Ver Steeg (versteb)
@ 2015-05-13 13:36   ` Dave Taht
  2015-05-13 17:51     ` Bill Ver Steeg (versteb)
  2015-05-13 15:30   ` [Cerowrt-devel] " Jim Gettys
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Taht @ 2015-05-13 13:36 UTC (permalink / raw)
  To: Bill Ver Steeg (versteb); +Cc: cerowrt-devel, bloat

On Wed, May 13, 2015 at 6:20 AM, Bill Ver Steeg (versteb)
<versteb@cisco.com> wrote:
> Time scales are important. Any time you use TCP to send a moderately large file, you drive the link into congestion. Sometimes this is for a few milliseconds per hour and sometimes this is for 10s of minutes per hour.
>
> For instance, watching a 3 Mbps video (Netflix/YouTube/whatever) on a 4 Mbps link with no cross traffic can cause significant bloat, particularly on older tail drop middleboxes.  The host code does an HTTP get every N seconds, and drives the link as hard as it can until it gets the video chunk. It waits a second or two and then does it again. Rinse and Repeat. You end up with a very characteristic delay plot. The bloat starts at 0, builds until the middlebox provides congestion feedback, then sawtooths around at about the buffer size. When the burst ends, the middlebox burns down its buffer and bloat goes back to zero. Wait a second or two and do it again.

The dslreports tests are opening 8 or more full rate streams at once.
Not pretty results.

Web browsers expend most of their flows entirely in slow start.

Etc.

I am very concerned with what 4k streaming looks like, and just got an
amazon box to take a look at it. (but have not put out the cash for a
suitable monitor)

> You can't fix this by adding bandwidth to the link. The endpoint's TCP sessions will simply ramp up to fill the link. You will shorten the congested phase of the cycle, but TCP will ALWAYS FILL THE LINK (given enough time to ramp up)

It is important to keep stressing this point as the memes propagate outwards.

>
> The new AQM (and FQ_AQM) algorithms do a much better job of controlling the oscillatory bloat, but you can still see ABR video patterns in the delay figures.

It has generally been my hope that most of the big movie streaming
folk have moved to some form of pacing by now but have no data on it.
(?)

Certainly I'm happy with what I saw of quic and have hope that http/2
will cut the number of simultaneous flows in progress.

But I return to my original point in that I would like to continue to
find more ways to make the sub 5 minute behaviors visible and
comprehensible to more people...

> Bvs
>
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
> Sent: Tuesday, May 12, 2015 12:00 PM
> To: bloat; cerowrt-devel@lists.bufferbloat.net
> Subject: [Bloat] better business bufferbloat monitoring tools?
>
> One thread bothering me on dslreports.com is that some folk seem to think you only get bufferbloat if you stress test the network, where transient bufferbloat is happening all the time, everywhere.
>
> On one of my main sqm'd network gateways, day in, day out, it reports about 6000 drops or ecn marks on ingress, and about 300 on egress.
> Before I doubled the bandwidth that main box got, the drop rate used to be much higher, and a great deal of the bloat, drops, etc, has now moved into the wifi APs deeper into the network where I am not monitoring it effectively.
>
> I would love to see tools like mrtg, cacti, nagios and smokeping[1] be more closely integrated, with bloat related plugins, and in particular, as things like fq_codel and other ecn enabled aqms deploy, start also tracking congestive events like loss and ecn CE markings on the bandwidth tracking graphs.
>
> This would counteract to some extent the classic 5 minute bandwidth summaries everyone looks at, that hide real traffic bursts, latencies and loss at sub 5 minute timescales.
>
> mrtg and cacti rely on snmp. While loss statistics are deeply part of snmp, I am not aware of there being a mib for CE events and a quick google search was unrevealing. ?
>
> There is also a need for more cross-network monitoring using tools such as that done by this excellent paper.
>
> http://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/measurement_analysis_internet_interconnection.pdf
>
> [1] the network monitoring tools market is quite vast and has many commercial applications, like intermapper, forks of nagios, vendor specific producs from cisco, etc, etc. Far too many to list, and so far as I know, none are reporting ECN related stats, nor combining latency and loss with bandwidth graphs. I would love to know if any products, commercial or open source, did....
>
> --
> Dave Täht
> Open Networking needs **Open Source Hardware**
>
> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] [Bloat] better business bufferbloat monitoring tools?
  2015-05-13 13:20 ` [Cerowrt-devel] [Bloat] " Bill Ver Steeg (versteb)
  2015-05-13 13:36   ` Dave Taht
@ 2015-05-13 15:30   ` Jim Gettys
  2015-05-14 14:28     ` dpreed
  1 sibling, 1 reply; 8+ messages in thread
From: Jim Gettys @ 2015-05-13 15:30 UTC (permalink / raw)
  To: Bill Ver Steeg (versteb); +Cc: cerowrt-devel, bloat

[-- Attachment #1: Type: text/plain, Size: 4972 bytes --]

On Wed, May 13, 2015 at 9:20 AM, Bill Ver Steeg (versteb) <versteb@cisco.com
> wrote:

> Time scales are important. Any time you use TCP to send a moderately large
> file, you drive the link into congestion. Sometimes this is for a few
> milliseconds per hour and sometimes this is for 10s of minutes per hour.
>
> For instance, watching a 3 Mbps video (Netflix/YouTube/whatever) on a 4
> Mbps link with no cross traffic can cause significant bloat, particularly
> on older tail drop middleboxes.  The host code does an HTTP get every N
> seconds, and drives the link as hard as it can until it gets the video
> chunk. It waits a second or two and then does it again. Rinse and Repeat.
> You end up with a very characteristic delay plot. The bloat starts at 0,
> builds until the middlebox provides congestion feedback, then sawtooths
> around at about the buffer size. When the burst ends, the middlebox burns
> down its buffer and bloat goes back to zero. Wait a second or two and do it
> again.
>

It's time to do some packet traces to see what the video providers are
doing.  In YouTube's case, I believe the traffic is using the new sched_fq
qdisc, which does packet pacing; but exactly how this plays out by the time
packets reach the home isn't entirely clear to me. Other video
providers/CDN's may/may not have started generating clues.

Also note that so far, no one is trying to pace the IW transmission at all.



>
> You can't fix this by adding bandwidth to the link. The endpoint's TCP
> sessions will simply ramp up to fill the link. You will shorten the
> congested phase of the cycle, but TCP will ALWAYS FILL THE LINK (given
> enough time to ramp up)
>

That has been the behavior in the past, but it's no longer safe to
presume we should tar everyone with the same brush, rather, we should do a
bit of science, and then try to hold people's feet to the fire that do not
"play nice" with the network.

Some packet captures in the home can easily sort this out.

Jim



>
> The new AQM (and FQ_AQM) algorithms do a much better job of controlling
> the oscillatory bloat, but you can still see ABR video patterns in the
> delay figures.
>
> Bvs
>
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net [mailto:
> bloat-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
> Sent: Tuesday, May 12, 2015 12:00 PM
> To: bloat; cerowrt-devel@lists.bufferbloat.net
> Subject: [Bloat] better business bufferbloat monitoring tools?
>
> One thread bothering me on dslreports.com is that some folk seem to think
> you only get bufferbloat if you stress test the network, where transient
> bufferbloat is happening all the time, everywhere.
>
> On one of my main sqm'd network gateways, day in, day out, it reports
> about 6000 drops or ecn marks on ingress, and about 300 on egress.
> Before I doubled the bandwidth that main box got, the drop rate used to be
> much higher, and a great deal of the bloat, drops, etc, has now moved into
> the wifi APs deeper into the network where I am not monitoring it
> effectively.
>
> I would love to see tools like mrtg, cacti, nagios and smokeping[1] be
> more closely integrated, with bloat related plugins, and in particular, as
> things like fq_codel and other ecn enabled aqms deploy, start also tracking
> congestive events like loss and ecn CE markings on the bandwidth tracking
> graphs.
>
> This would counteract to some extent the classic 5 minute bandwidth
> summaries everyone looks at, that hide real traffic bursts, latencies and
> loss at sub 5 minute timescales.
>
> mrtg and cacti rely on snmp. While loss statistics are deeply part of
> snmp, I am not aware of there being a mib for CE events and a quick google
> search was unrevealing. ?
>
> There is also a need for more cross-network monitoring using tools such as
> that done by this excellent paper.
>
>
> http://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/measurement_analysis_internet_interconnection.pdf
>
> [1] the network monitoring tools market is quite vast and has many
> commercial applications, like intermapper, forks of nagios, vendor specific
> producs from cisco, etc, etc. Far too many to list, and so far as I know,
> none are reporting ECN related stats, nor combining latency and loss with
> bandwidth graphs. I would love to know if any products, commercial or open
> source, did....
>
> --
> Dave Täht
> Open Networking needs **Open Source Hardware**
>
> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>

[-- Attachment #2: Type: text/html, Size: 7239 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] [Bloat] better business bufferbloat monitoring tools?
  2015-05-13 13:36   ` Dave Taht
@ 2015-05-13 17:51     ` Bill Ver Steeg (versteb)
  2015-05-14 15:40       ` [Cerowrt-devel] RE : " luca.muscariello
  0 siblings, 1 reply; 8+ messages in thread
From: Bill Ver Steeg (versteb) @ 2015-05-13 17:51 UTC (permalink / raw)
  To: Dave Taht; +Cc: cerowrt-devel, bloat

Dave That said - It has generally been my hope that most of the big movie streaming folk have moved to some form of pacing by now but have no data on it. (?)

Bill VerSteeg replies - Based on my recent tests, the production ABR flows are still quite bursty. There has been some work done in this area, but I do not think bloat is top-of-mind for the ABR folks, and I do not think it has made it into production systems. Some of the work is in the area of pacing TCP's micro-bursts using sch_fq-like methods. Some has been in the area of application rate estimation. Some of the IW10 pacing stuff may also be useful.

I am actually giving a talk on AQM to a small ABR video conference next week. The executive summary of my talk is "AQM makes bursty ABR flows less impactful to the network buffers (and thus cross traffic), but the bursts still cause problems. The problems are really bad on legacy buffer management algorithms. The new AQM algorithms take care of most of the issues, but bursts of data make the new algorithms work harder and do cause some second-order problems." 

The main problem that I have seen in my testing has been in the CoDel/PIE (as opposed to FQ_XXX) variants. When the bottleneck link drops packets as the elephant bursts, the mice flows suffer. Rather than completing in a handful of RTTs, it takes several times longer for the timeouts and rexmits to complete the transfer. When running FQ_Codel or FQ_PIE, the elephant flow only impacts itself, as the mice are on their own queues. There are also some corner cases when the offered load is extremely high, but these seem to be third order effects.

I will let the list know what the current state of the art on pacing is after next week's conference, but I suspect that the ABR folks are still on a learning curve here.

Bvs

-----Original Message-----
From: Dave Taht [mailto:dave.taht@gmail.com] 
Sent: Wednesday, May 13, 2015 9:37 AM
To: Bill Ver Steeg (versteb)
Cc: bloat; cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Bloat] better business bufferbloat monitoring tools?

On Wed, May 13, 2015 at 6:20 AM, Bill Ver Steeg (versteb) <versteb@cisco.com> wrote:
> Time scales are important. Any time you use TCP to send a moderately large file, you drive the link into congestion. Sometimes this is for a few milliseconds per hour and sometimes this is for 10s of minutes per hour.
>
> For instance, watching a 3 Mbps video (Netflix/YouTube/whatever) on a 4 Mbps link with no cross traffic can cause significant bloat, particularly on older tail drop middleboxes.  The host code does an HTTP get every N seconds, and drives the link as hard as it can until it gets the video chunk. It waits a second or two and then does it again. Rinse and Repeat. You end up with a very characteristic delay plot. The bloat starts at 0, builds until the middlebox provides congestion feedback, then sawtooths around at about the buffer size. When the burst ends, the middlebox burns down its buffer and bloat goes back to zero. Wait a second or two and do it again.

The dslreports tests are opening 8 or more full rate streams at once.
Not pretty results.

Web browsers expend most of their flows entirely in slow start.

Etc.

I am very concerned with what 4k streaming looks like, and just got an amazon box to take a look at it. (but have not put out the cash for a suitable monitor)

> You can't fix this by adding bandwidth to the link. The endpoint's TCP 
> sessions will simply ramp up to fill the link. You will shorten the 
> congested phase of the cycle, but TCP will ALWAYS FILL THE LINK (given 
> enough time to ramp up)

It is important to keep stressing this point as the memes propagate outwards.

>
> The new AQM (and FQ_AQM) algorithms do a much better job of controlling the oscillatory bloat, but you can still see ABR video patterns in the delay figures.

It has generally been my hope that most of the big movie streaming folk have moved to some form of pacing by now but have no data on it.
(?)

Certainly I'm happy with what I saw of quic and have hope that http/2 will cut the number of simultaneous flows in progress.

But I return to my original point in that I would like to continue to find more ways to make the sub 5 minute behaviors visible and comprehensible to more people...

> Bvs
>
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net 
> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
> Sent: Tuesday, May 12, 2015 12:00 PM
> To: bloat; cerowrt-devel@lists.bufferbloat.net
> Subject: [Bloat] better business bufferbloat monitoring tools?
>
> One thread bothering me on dslreports.com is that some folk seem to think you only get bufferbloat if you stress test the network, where transient bufferbloat is happening all the time, everywhere.
>
> On one of my main sqm'd network gateways, day in, day out, it reports about 6000 drops or ecn marks on ingress, and about 300 on egress.
> Before I doubled the bandwidth that main box got, the drop rate used to be much higher, and a great deal of the bloat, drops, etc, has now moved into the wifi APs deeper into the network where I am not monitoring it effectively.
>
> I would love to see tools like mrtg, cacti, nagios and smokeping[1] be more closely integrated, with bloat related plugins, and in particular, as things like fq_codel and other ecn enabled aqms deploy, start also tracking congestive events like loss and ecn CE markings on the bandwidth tracking graphs.
>
> This would counteract to some extent the classic 5 minute bandwidth summaries everyone looks at, that hide real traffic bursts, latencies and loss at sub 5 minute timescales.
>
> mrtg and cacti rely on snmp. While loss statistics are deeply part of snmp, I am not aware of there being a mib for CE events and a quick google search was unrevealing. ?
>
> There is also a need for more cross-network monitoring using tools such as that done by this excellent paper.
>
> http://www.caida.org/publications/papers/2014/measurement_analysis_int
> ernet_interconnection/measurement_analysis_internet_interconnection.pd
> f
>
> [1] the network monitoring tools market is quite vast and has many commercial applications, like intermapper, forks of nagios, vendor specific producs from cisco, etc, etc. Far too many to list, and so far as I know, none are reporting ECN related stats, nor combining latency and loss with bandwidth graphs. I would love to know if any products, commercial or open source, did....
>
> --
> Dave Täht
> Open Networking needs **Open Source Hardware**
>
> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

--
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] [Bloat] better business bufferbloat monitoring tools?
  2015-05-13 15:30   ` [Cerowrt-devel] " Jim Gettys
@ 2015-05-14 14:28     ` dpreed
  0 siblings, 0 replies; 8+ messages in thread
From: dpreed @ 2015-05-14 14:28 UTC (permalink / raw)
  To: Jim Gettys; +Cc: cerowrt-devel, Bill Ver Steeg (versteb), bloat

[-- Attachment #1: Type: text/plain, Size: 6536 bytes --]

Tools, tools, tools.  Make it trivially easy to capture packets in the home (don't require cerowrt, for obvious reasons).  For example, an iPhone app that does a tcpdump and sends it to us would be fantastic to diagnose "make wifi fast" issues and also bufferbloat issues.  Give feedback that is helpful to every one who contributes data.  (That's what made netalyzr work so well... you got feedback ASAP that could be used to understand your own situation).

Not sure an iPhone app can be disseminated.  An Android app might be, as could a MacBook app and a WIndows app.

Linux/FreeBSD options: One could  generate a memstick app that would boot Linux on a standard windows laptop to run tcpdump and upload the results, or something that would run in Parallels or VMWare fusion on a Mac.

I've started looking at a hardware measurement platform for my "make WiFi fast" work - currently looks like a Rangely board will do the trick.  But that won't scale well outside my home since it costs a few hundred bucks for the hardware.

On Wednesday, May 13, 2015 11:30am, "Jim Gettys" <jg@freedesktop.org> said:

On Wed, May 13, 2015 at 9:20 AM, Bill Ver Steeg (versteb) <[ versteb@cisco.com ]( mailto:versteb@cisco.com )> wrote:
Time scales are important. Any time you use TCP to send a moderately large file, you drive the link into congestion. Sometimes this is for a few milliseconds per hour and sometimes this is for 10s of minutes per hour.

 For instance, watching a 3 Mbps video (Netflix/YouTube/whatever) on a 4 Mbps link with no cross traffic can cause significant bloat, particularly on older tail drop middleboxes.  The host code does an HTTP get every N seconds, and drives the link as hard as it can until it gets the video chunk. It waits a second or two and then does it again. Rinse and Repeat. You end up with a very characteristic delay plot. The bloat starts at 0, builds until the middlebox provides congestion feedback, then sawtooths around at about the buffer size. When the burst ends, the middlebox burns down its buffer and bloat goes back to zero. Wait a second or two and do it again.

It's time to do some packet traces to see what the video providers are doing.  In YouTube's case, I believe the traffic is using the new sched_fq qdisc, which does packet pacing; but exactly how this plays out by the time packets reach the home isn't entirely clear to me. Other video providers/CDN's may/may not have started generating clues.

Also note that so far, no one is trying to pace the IW transmission at all.

 You can't fix this by adding bandwidth to the link. The endpoint's TCP sessions will simply ramp up to fill the link. You will shorten the congested phase of the cycle, but TCP will ALWAYS FILL THE LINK (given enough time to ramp up)

That has been the behavior in the past, but it's no longer safe to presume we should tar everyone with the same brush, rather, we should do a bit of science, and then try to hold people's feet to the fire that do not "play nice" with the network.
Some packet captures in the home can easily sort this out.
Jim

 The new AQM (and FQ_AQM) algorithms do a much better job of controlling the oscillatory bloat, but you can still see ABR video patterns in the delay figures.

 Bvs

 -----Original Message-----
 From: [ bloat-bounces@lists.bufferbloat.net ]( mailto:bloat-bounces@lists.bufferbloat.net ) [mailto:[ bloat-bounces@lists.bufferbloat.net ]( mailto:bloat-bounces@lists.bufferbloat.net )] On Behalf Of Dave Taht
 Sent: Tuesday, May 12, 2015 12:00 PM
 To: bloat; [ cerowrt-devel@lists.bufferbloat.net ]( mailto:cerowrt-devel@lists.bufferbloat.net )
 Subject: [Bloat] better business bufferbloat monitoring tools?

 One thread bothering me on [ dslreports.com ]( http://dslreports.com ) is that some folk seem to think you only get bufferbloat if you stress test the network, where transient bufferbloat is happening all the time, everywhere.

 On one of my main sqm'd network gateways, day in, day out, it reports about 6000 drops or ecn marks on ingress, and about 300 on egress.
 Before I doubled the bandwidth that main box got, the drop rate used to be much higher, and a great deal of the bloat, drops, etc, has now moved into the wifi APs deeper into the network where I am not monitoring it effectively.

 I would love to see tools like mrtg, cacti, nagios and smokeping[1] be more closely integrated, with bloat related plugins, and in particular, as things like fq_codel and other ecn enabled aqms deploy, start also tracking congestive events like loss and ecn CE markings on the bandwidth tracking graphs.

 This would counteract to some extent the classic 5 minute bandwidth summaries everyone looks at, that hide real traffic bursts, latencies and loss at sub 5 minute timescales.

 mrtg and cacti rely on snmp. While loss statistics are deeply part of snmp, I am not aware of there being a mib for CE events and a quick google search was unrevealing. ?

 There is also a need for more cross-network monitoring using tools such as that done by this excellent paper.

[ http://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/measurement_analysis_internet_interconnection.pdf ]( http://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/measurement_analysis_internet_interconnection.pdf )

 [1] the network monitoring tools market is quite vast and has many commercial applications, like intermapper, forks of nagios, vendor specific producs from cisco, etc, etc. Far too many to list, and so far as I know, none are reporting ECN related stats, nor combining latency and loss with bandwidth graphs. I would love to know if any products, commercial or open source, did....

 --
 Dave Täht
 Open Networking needs **Open Source Hardware**

[ https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67 ]( https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67 )
 _______________________________________________Bloat mailing list
[ Bloat@lists.bufferbloat.net ]( mailto:Bloat@lists.bufferbloat.net )
[ https://lists.bufferbloat.net/listinfo/bloat ]( https://lists.bufferbloat.net/listinfo/bloat )

_______________________________________________
 Cerowrt-devel mailing list
[ Cerowrt-devel@lists.bufferbloat.net ]( mailto:Cerowrt-devel@lists.bufferbloat.net )
[ https://lists.bufferbloat.net/listinfo/cerowrt-devel ]( https://lists.bufferbloat.net/listinfo/cerowrt-devel )

[-- Attachment #2: Type: text/html, Size: 9133 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cerowrt-devel] RE : [Bloat] better business bufferbloat monitoring tools?
  2015-05-13 17:51     ` Bill Ver Steeg (versteb)
@ 2015-05-14 15:40       ` luca.muscariello
  0 siblings, 0 replies; 8+ messages in thread
From: luca.muscariello @ 2015-05-14 15:40 UTC (permalink / raw)
  To: Bill Ver Steeg (versteb), Dave Taht; +Cc: cerowrt-devel, bloat

[-- Attachment #1: Type: text/plain, Size: 8784 bytes --]

Bill

I beleive you hit the limit of what you can do with AQM w/o FQ.

something more can be achieved with paced sources as said in this thread.

I do not see  incentives for ABR folks to do true pacing however.

doing partial pacing to fix the TSO/GSO problem is of course a must but won't solve the problem you mention.

see you on monday at the conference.  I 'm giving a talk right before you.

Luca

-------- Message d'origine --------
De : "Bill Ver Steeg (versteb)"
Date :2015/05/14 00:54 (GMT+01:00)
À : Dave Taht
Cc : cerowrt-devel@lists.bufferbloat.net, bloat
Objet : Re: [Bloat] better business bufferbloat monitoring tools?

Dave That said - It has generally been my hope that most of the big movie streaming folk have moved to some form of pacing by now but have no data on it. (?)

Bill VerSteeg replies - Based on my recent tests, the production ABR flows are still quite bursty. There has been some work done in this area, but I do not think bloat is top-of-mind for the ABR folks, and I do not think it has made it into production systems. Some of the work is in the area of pacing TCP's micro-bursts using sch_fq-like methods. Some has been in the area of application rate estimation. Some of the IW10 pacing stuff may also be useful.

I am actually giving a talk on AQM to a small ABR video conference next week. The executive summary of my talk is "AQM makes bursty ABR flows less impactful to the network buffers (and thus cross traffic), but the bursts still cause problems. The problems are really bad on legacy buffer management algorithms. The new AQM algorithms take care of most of the issues, but bursts of data make the new algorithms work harder and do cause some second-order problems."

The main problem that I have seen in my testing has been in the CoDel/PIE (as opposed to FQ_XXX) variants. When the bottleneck link drops packets as the elephant bursts, the mice flows suffer. Rather than completing in a handful of RTTs, it takes several times longer for the timeouts and rexmits to complete the transfer. When running FQ_Codel or FQ_PIE, the elephant flow only impacts itself, as the mice are on their own queues. There are also some corner cases when the offered load is extremely high, but these seem to be third order effects.

I will let the list know what the current state of the art on pacing is after next week's conference, but I suspect that the ABR folks are still on a learning curve here.

Bvs

-----Original Message-----
From: Dave Taht [mailto:dave.taht@gmail.com]
Sent: Wednesday, May 13, 2015 9:37 AM
To: Bill Ver Steeg (versteb)
Cc: bloat; cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Bloat] better business bufferbloat monitoring tools?

On Wed, May 13, 2015 at 6:20 AM, Bill Ver Steeg (versteb) <versteb@cisco.com> wrote:
> Time scales are important. Any time you use TCP to send a moderately large file, you drive the link into congestion. Sometimes this is for a few milliseconds per hour and sometimes this is for 10s of minutes per hour.
>
> For instance, watching a 3 Mbps video (Netflix/YouTube/whatever) on a 4 Mbps link with no cross traffic can cause significant bloat, particularly on older tail drop middleboxes.  The host code does an HTTP get every N seconds, and drives the link as hard as it can until it gets the video chunk. It waits a second or two and then does it again. Rinse and Repeat. You end up with a very characteristic delay plot. The bloat starts at 0, builds until the middlebox provides congestion feedback, then sawtooths around at about the buffer size. When the burst ends, the middlebox burns down its buffer and bloat goes back to zero. Wait a second or two and do it again.

The dslreports tests are opening 8 or more full rate streams at once.
Not pretty results.

Web browsers expend most of their flows entirely in slow start.

Etc.

I am very concerned with what 4k streaming looks like, and just got an amazon box to take a look at it. (but have not put out the cash for a suitable monitor)

> You can't fix this by adding bandwidth to the link. The endpoint's TCP
> sessions will simply ramp up to fill the link. You will shorten the
> congested phase of the cycle, but TCP will ALWAYS FILL THE LINK (given
> enough time to ramp up)

It is important to keep stressing this point as the memes propagate outwards.

>
> The new AQM (and FQ_AQM) algorithms do a much better job of controlling the oscillatory bloat, but you can still see ABR video patterns in the delay figures.

It has generally been my hope that most of the big movie streaming folk have moved to some form of pacing by now but have no data on it.
(?)

Certainly I'm happy with what I saw of quic and have hope that http/2 will cut the number of simultaneous flows in progress.

But I return to my original point in that I would like to continue to find more ways to make the sub 5 minute behaviors visible and comprehensible to more people...

> Bvs
>
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net
> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
> Sent: Tuesday, May 12, 2015 12:00 PM
> To: bloat; cerowrt-devel@lists.bufferbloat.net
> Subject: [Bloat] better business bufferbloat monitoring tools?
>
> One thread bothering me on dslreports.com is that some folk seem to think you only get bufferbloat if you stress test the network, where transient bufferbloat is happening all the time, everywhere.
>
> On one of my main sqm'd network gateways, day in, day out, it reports about 6000 drops or ecn marks on ingress, and about 300 on egress.
> Before I doubled the bandwidth that main box got, the drop rate used to be much higher, and a great deal of the bloat, drops, etc, has now moved into the wifi APs deeper into the network where I am not monitoring it effectively.
>
> I would love to see tools like mrtg, cacti, nagios and smokeping[1] be more closely integrated, with bloat related plugins, and in particular, as things like fq_codel and other ecn enabled aqms deploy, start also tracking congestive events like loss and ecn CE markings on the bandwidth tracking graphs.
>
> This would counteract to some extent the classic 5 minute bandwidth summaries everyone looks at, that hide real traffic bursts, latencies and loss at sub 5 minute timescales.
>
> mrtg and cacti rely on snmp. While loss statistics are deeply part of snmp, I am not aware of there being a mib for CE events and a quick google search was unrevealing. ?
>
> There is also a need for more cross-network monitoring using tools such as that done by this excellent paper.
>
> http://www.caida.org/publications/papers/2014/measurement_analysis_int
> ernet_interconnection/measurement_analysis_internet_interconnection.pd
> f
>
> [1] the network monitoring tools market is quite vast and has many commercial applications, like intermapper, forks of nagios, vendor specific producs from cisco, etc, etc. Far too many to list, and so far as I know, none are reporting ECN related stats, nor combining latency and loss with bandwidth graphs. I would love to know if any products, commercial or open source, did....
>
> --
> Dave Täht
> Open Networking needs **Open Source Hardware**
>
> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

--
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

[-- Attachment #2: Type: text/html, Size: 10630 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-14 15:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-12 16:00 [Cerowrt-devel] better business bufferbloat monitoring tools? Dave Taht
2015-05-12 22:17 ` David Lang
2015-05-13 13:20 ` [Cerowrt-devel] [Bloat] " Bill Ver Steeg (versteb)
2015-05-13 13:36   ` Dave Taht
2015-05-13 17:51     ` Bill Ver Steeg (versteb)
2015-05-14 15:40       ` [Cerowrt-devel] RE : " luca.muscariello
2015-05-13 15:30   ` [Cerowrt-devel] " Jim Gettys
2015-05-14 14:28     ` dpreed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox