Historic archive of defunct list bloat-devel@lists.bufferbloat.net
 help / color / mirror / Atom feed
* Network tests as discussed in Washington, DC
@ 2012-11-10 10:54 Dave Taht
  2012-11-10 14:18 ` [Cerowrt-devel] " dpreed
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dave Taht @ 2012-11-10 10:54 UTC (permalink / raw)
  To: bloat, cerowrt-devel, bloat-devel

The RRUL test idea presented earlier this week would do best with many
underlying servers, instrumented with TCP stats collection, and a
large backend for analysis, and thus I got interested in the current
state of affairs in the internet as to how to pull that together in
conjunction with larger labs and universities.

So, I started sorting through the debates at the FCC about network
testing. A wide range of opinion is presented, AND represented - I
randomly clicked on the names I recognised (like isc, verizon, karl
auerbach, measurement labs, new horizon foundation, and many, many
others. I'm pretty sure to get a balanced view I need to click on the
names I don't recognise!)

see position papers here:

http://apps.fcc.gov/ecfs/proceeding/view.action?name=04-36

After reading a dozen, I sighed, and went back to coding.

I'm just this guy, trying to fix bufferbloat, by all technical means
necessary. Additionally we're working on fixing ipv6, home routers,
home gateways, naming, etc - and I rarely pay attention to politics!

If anyone can provide a summary here of the debate as it stands now
and who the players are, it would be enlightening.

In other news:

I was very pleased to hear that Srikanth Sundaresan of the  Bismark
project ( http://projectbismark.net/ ) won the "Applied Networking
Research Prize" at this past week's ietf ( http://irtf.org/anrp ),
for:

"Broadband Internet Performance: A View From the Gateway"

http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p134.pdf

An early version of cerowrt was used in some versions of bismark (they
later went to openwrt stable). I continue to build their repos for
possible use in cerowrt (or vice versa), and I'm contemplating
extending their open sourced backend database schema to include data
from the rrul tests....

https://github.com/dtaht/dashboard-db


-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [Cerowrt-devel] Network tests as discussed in Washington, DC
  2012-11-10 10:54 Network tests as discussed in Washington, DC Dave Taht
@ 2012-11-10 14:18 ` dpreed
  2012-11-11  8:35 ` [Bloat] " Daniel Berger
  2012-11-11 23:59 ` [Cerowrt-devel] " Michael Richardson
  2 siblings, 0 replies; 7+ messages in thread
From: dpreed @ 2012-11-10 14:18 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat-devel, cerowrt-devel, bloat

[-- Attachment #1: Type: text/plain, Size: 3913 bytes --]


I follow this pretty closely.  The person you want to talk to is Sascha Meinrath, about M-Lab - and if you have a good proposal, he has money to fund a measurement project.
 
M-Lab has been a partial failure politically.  The reason is this: the FCC does not want to do anything that would annoy incumbents running current Internet access systems.   Direct measurements of latency are too dangerous.  If you speak to the only Commissioner who ever tried to get unbiased measurements (Copps) you will find that his attempts were sabotaged on every side.
 
The same thing happened in the CRTC inquiry regarding Net Neutrality, where I testified as an "expert witness" on the argument that prioritization was necessary to reduce congestion.  My testimony pointed out that congestion was best measured by direct latency measurements.  The incumbent operators *refused* to provide latency data, claiming a) they never measured it, and b) latency can be calculated from average traffic volume measurements by "Little's Lemma", so it was unnecessary.  (this last is categorically false, but it was "one expert vs. another").
 
I wish you godspeed in getting direct latency measurements publicized.  The dataset collected by Netalyzr has been attacked by the incumbent operators pretty strongly as "biased".  (of course the measurements the FCC commissioned to a contractor recommended by a Congressional staffer are "unbiased").
 
-----Original Message-----
From: "Dave Taht" <dave.taht@gmail.com>
Sent: Saturday, November 10, 2012 5:54am
To: "bloat" <bloat@lists.bufferbloat.net>, cerowrt-devel@lists.bufferbloat.net, "bloat-devel" <bloat-devel@lists.bufferbloat.net>
Subject: [Cerowrt-devel] Network tests as discussed in Washington, DC



The RRUL test idea presented earlier this week would do best with many
underlying servers, instrumented with TCP stats collection, and a
large backend for analysis, and thus I got interested in the current
state of affairs in the internet as to how to pull that together in
conjunction with larger labs and universities.

So, I started sorting through the debates at the FCC about network
testing. A wide range of opinion is presented, AND represented - I
randomly clicked on the names I recognised (like isc, verizon, karl
auerbach, measurement labs, new horizon foundation, and many, many
others. I'm pretty sure to get a balanced view I need to click on the
names I don't recognise!)

see position papers here:

http://apps.fcc.gov/ecfs/proceeding/view.action?name=04-36

After reading a dozen, I sighed, and went back to coding.

I'm just this guy, trying to fix bufferbloat, by all technical means
necessary. Additionally we're working on fixing ipv6, home routers,
home gateways, naming, etc - and I rarely pay attention to politics!

If anyone can provide a summary here of the debate as it stands now
and who the players are, it would be enlightening.

In other news:

I was very pleased to hear that Srikanth Sundaresan of the  Bismark
project ( http://projectbismark.net/ ) won the "Applied Networking
Research Prize" at this past week's ietf ( http://irtf.org/anrp ),
for:

"Broadband Internet Performance: A View From the Gateway"

http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p134.pdf

An early version of cerowrt was used in some versions of bismark (they
later went to openwrt stable). I continue to build their repos for
possible use in cerowrt (or vice versa), and I'm contemplating
extending their open sourced backend database schema to include data
from the rrul tests....

https://github.com/dtaht/dashboard-db


-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel

[-- Attachment #2: Type: text/html, Size: 4669 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bloat] Network tests as discussed in Washington, DC
  2012-11-10 10:54 Network tests as discussed in Washington, DC Dave Taht
  2012-11-10 14:18 ` [Cerowrt-devel] " dpreed
@ 2012-11-11  8:35 ` Daniel Berger
  2012-11-11 13:39   ` [Cerowrt-devel] " Dave Taht
  2012-11-11 23:59 ` [Cerowrt-devel] " Michael Richardson
  2 siblings, 1 reply; 7+ messages in thread
From: Daniel Berger @ 2012-11-11  8:35 UTC (permalink / raw)
  To: bloat; +Cc: bloat-devel, cerowrt-devel

Hi everybody,

I totally love the idea to test for browsing performance. Thanks for
that ;-)
Nevertheless, I have another critical question on this 40s network test
idea:
Did someone consider the robustness of the results? That is, did sb
check for statistical significance?
I currently see that there are two steps:
First, the test with few load, which shows (I guess) low jitter/variance.
Second, busy queues.
 This second "phase" is probably when jitter/variance will inflate a
lot, right?
 Then, also the mean (and most other statistical summary-measures) won't
be stable.
  Thus, I doubt that in order to compute an aggregate "score" we can
rely on this, in all cases.

Obviously the best solution would be to run the test long enough so that
confidence intervals appear to be small and similar for both steps.
Probably is not feasible to expand the test into unusual long intervals
but at least computing a 95% confidence interval would give me a better
sense of results.

Doing this might also be a means to account for a broad variety of
testing/real-world environment and still get reliable results.

Anyone else with this thought?

Cheers,
Daniel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cerowrt-devel] [Bloat] Network tests as discussed in Washington, DC
  2012-11-11  8:35 ` [Bloat] " Daniel Berger
@ 2012-11-11 13:39   ` Dave Taht
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Taht @ 2012-11-11 13:39 UTC (permalink / raw)
  To: Daniel Berger, frank.rowand; +Cc: bloat-devel, cerowrt-devel, bloat

On Sun, Nov 11, 2012 at 9:40 AM, Daniel Berger <dberger@student.ethz.ch> wrote:
> Hi everybody,
>
> I totally love the idea to test for browsing performance. Thanks for
> that ;-)

Jim's demos of the effect of network load on the performance of web
sites are quite revealing,

http://gettys.wordpress.com/2012/02/01/bufferbloat-demonstration-videos/

using the chrome web page benchmark available here:

https://chrome.google.com/webstore/detail/page-benchmarker/channimfdomahekjcahlbpccbgaopjll

You can fairly easily replicate his results on your own hardware, both
locally and over the internet. Go for it!

However in attempting to get to a general purpose test, the simplicity
of his demo (which used a very short path to MIT) didn't work well,
thus, I came up with the methods described in the rRul document. They
seem to scale fairly well up past 60ms RTT. More testers would be
nice!

One of the things that really bugs me about today's overbuffered
networks is doing things like a file upload via scp, which nearly
completes, then stops, and retransmits, over and over again, like jon
corbet's example of what happened to him at a conference hotel last
year.

http://lwn.net/Articles/496509/

> Nevertheless, I have another critical question on this 40s network test
> idea:
> Did someone consider the robustness of the results? That is, did sb
> check for statistical significance?

Presently the effects on multiple sorts of networks are interesting.
As one example, here is a run of one rrul prototype on wired and wifi
toke put together:

http://www.teklibre.com/~d/bloat/rrul-denmark-germany-wired-pfifo-fast.pdf

vs

http://www.teklibre.com/~d/bloat/rrul-denmark-germany-wlan2.pdf

I LOVE the first graph (configured for pfifo_fast on the gateways) as
it clearly shows classic drop tail "TCP global synchronization" on the
egress gateway, and the resulting loss of utilization. It's nice to
have been able to get it on a 50+ms *real-world* path.

It also shows how traffic classification of TCP doesn't work across
the internet very well, as the TCP's classified, different ways,
evolve and change places.

The second (taken on a good wifi) shows how noisy the data is..

http://www.teklibre.com/~d/bloat/rrul-denmark-germany-wlan2.pdf

(I note that using a TCP "ping" is a bad idea except for showing why
tcp encapsulated inside tcp is a bad idea, which gets progressively
worse at longer RTTs. Anyone have a decent RTP test we can replace
this with?)

A graph taken against a loaded wifi network is pretty horrify-ing...

http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Bufferbloat_on_wifi.pdf

(don't look. Halloween is over)

I have a ton of interesting statistics gathered at ietf and linuxcon
this past week... but finding good ways to present it remain a problem
and I note that most of the stuff above is intended as a BACKGROUND
process while loading web pages and doing useful work like making
phone calls is the real intended result of the benchmark.

So, no, the only statistical significance so far calculated is that
tests like this can cause a network to have one to three orders of
magnitude of latency inserted into it.  Compared to that, I'm not
terribly concerned with a few percentage points here or there, at this
time, but I'd welcome analysis.

The biggest major unknown in the test is the optimal TCP ack count,
and TCP's response to packet loss (retransmits) which could account
for a great deal of the actual data transmitted, vs the amount of
useful data transmitted.

"useful data transmitted under no load and under load" would be
tremendously useful statistic.

It is my hope that the volume web and dns traffic projected to be in
the test are going to be fairly minimal compared to the rest of it,
but I'm not counting on it. That needs to be measured too.

It's a pretty big project to do this up right in other words!

> I currently see that there are two steps:
> First, the test with few load, which shows (I guess) low jitter/variance.
> Second, busy queues.
>  This second "phase" is probably when jitter/variance will inflate a
> lot, right?
>  Then, also the mean (and most other statistical summary-measures) won't
> be stable.

Correct.

>   Thus, I doubt that in order to compute an aggregate "score" we can
> rely on this, in all cases.

The "score" as a ratio of various measured parameters from unloaded to
load seems viable.

> Obviously the best solution would be to run the test long enough so that
> confidence intervals appear to be small and similar for both steps.

There is nothing stopping a network engineer, device driver writer, or
device maker, or mathematician or network queue theorist or sysadmin,
or manager or concerned citizen...

from running the test continuously, going from unloaded, to load, to
unload, to load, and tweaking various underlying variables in the
network stack and path. I do this all the time!

It is my hope, certainly, that those that should do so, will do so. A
core component IS the "mtr" tool which will point at the issues on the
bottleneck link, which might be anything from the local OS, or device,
to wireless ap, to cpe, to somewhere else on the path. Giving the end
user data with (occasionally) something other than their ISP to blame
would be a goodness, and having tools available to find and fix it,
even better.

However, the average citizen is not going to sit still for 60 seconds
on a regular basis, which is the purpose of trying to come up with a
score and panel of useful results that can be presented briefly and
clearly.

I also have hope that a test as robust and stressful as this can be
run on edge gateways automatically, in the background, on selected
routers throughout the world, much as bismark already does. See
examples at:

http://networkdashboard.org/

> Probably is not feasible to expand the test into unusual long intervals
> but at least computing a 95% confidence interval would give me a better
> sense of results.

Go for it!

"Bufferbloat.net: Making network research fun since 2011!"

I note that the rRul work being done right now is the spare time
project of myself and one grad student, leveraging the hard work that
has been put into the Linux OS over the last year by so many, and the
multitude of useful enhancements like classification, priority and
congestion control algorithm that rick jones has put into netperf over
the past year, also in his spare time.

No funding for this work has yet arrived. Various proposals for grants
have been ignored, but we're not grant writing experts.

Cerowrt is getting some lovely support from interested users, but the
size of the task to get code written, analyzed, and tests deployed is
intimidating.

There are a wealth of other tests that can be performed, while under a
RRUL-like load. For example, this december I'll be at the connext
conference in Nice, with some early results from the lincs.fr lab
regarding the interactions of AQM and LEDBAT. I hope to be doing some
follow up work on that paper also in december, against codel and
fq_codel, and more realistic representations of uTP.

a rrul-like test would be useful for analyzing and creating
comparitive the results from any congestion control algorithm, alone
or in combination, such as TCP-LP, or DC-TCP, or (as one potentially
very interesting example) the latest work done at MIT on their TCP,
that I forget the name of right now.

I am very interested in how video sharding technologies work - what
often happens there is that there is a HTTP get of one of 10 seconds
of video at various rates. The client measures the delivery time of
that 10 second shard and increases or decreases the next get to suit.

This generally pushes TCP into slow start, repeatedly, and slams the
downstream portion of the network, repeatedly.

Then there's videoconferencing. Which I care about a lot. I like it
when people's lips match up with what they are saying, being partially
deaf, myself.

And gaming. I'd like very much to have a better picture (packet
captures!) of how various online games such as quake, starcraft, and
world of warcraft interact with the network.

(I think this last item would be rather fun for a team of grad
students to take on. Heck, I'd enjoy "working" on this portion of the
problem, too. :) )

> Doing this might also be a means to account for a broad variety of
> testing/real-world environment and still get reliable results.

I would argue that settling on a clear definition of the tests,
writing the code, and collecting a large set of data would be
"interesting". As for being able to draw general conclusions from it,
I generally propose that we prototype tests, and iterate, going deeply
into packet captures, until we get things that make sense in the lab
and in the field...

and rapidly bug report everything that is found.

A great number of pathological behaviors we've discovered so far have
turned out to be bugs at various levels in various stacks. It's
generally been rather difficult to get to a "paper-writing stage", the
way my life seems to work looks like this:

> Anyone else with this thought?

An example of how you can fool yourself with network statistics, misapplied:

https://lists.bufferbloat.net/pipermail/bloat/2011-November/000715.html

Frank Rowand gave a very good (heretical!) presentation on core
analysis and presentation ideas at last weeks linuxconf - particularly
when it comes to analyzing real time performance of anything.

I don't know if it's up yet.

I have generally found that using mountain and cdf plots are the best
ways to deal with the extremely noisy data collected from wifi and
over the open internet, and that having packet captures and tcp
instrumentation is useful also.


-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cerowrt-devel] Network tests as discussed in Washington, DC
  2012-11-10 10:54 Network tests as discussed in Washington, DC Dave Taht
  2012-11-10 14:18 ` [Cerowrt-devel] " dpreed
  2012-11-11  8:35 ` [Bloat] " Daniel Berger
@ 2012-11-11 23:59 ` Michael Richardson
  2012-11-13  2:11   ` dpreed
  2 siblings, 1 reply; 7+ messages in thread
From: Michael Richardson @ 2012-11-11 23:59 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat-devel, cerowrt-devel, bloat

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]


Re: FCC.
The IETF IAB plenary last Wednesday had a presenter from the FCC,
who claimed to care about latency in response to my question.

I don't claim it's a good use of your time to listen to these, but
probably having the links in the archive is useful.

http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-6
http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-7
http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-8

http://ietf85.conf.meetecho.com/index.php/Recorded_Sessions
http://recordings.conf.meetecho.com/Recordings/watch.jsp?recording=IETF85_PLENARY&chapter=part_11

5. Technical Topic: Measurement Issues in the Internet

   Moderator:  Alissa Cooper
   Speakers:  Sam Crawford and Henning Schulzrinne

   Although network performance measurement has been a topic of research,
   standardization, and development for decades, recent efforts to create
   national, regional, and global access network measurement testbeds and
   frameworks are drawing renewed interest.  These efforts seek to provide
   consistent measurements of fine-grained performance metrics such as
   packet loss, delay, and throughput, as well as higher order tests to
   capture quality of experience with respect to specific applications and
   services.  Some standardized metrics and tests exist, including those
   developed in the IPPM WG, but a unified framework for observing and
   reporting the quality metrics that define users experiences across
   different networks has yet to be developed.  This plenary will examine
   the implications of large-scale network performance measurement for the
   Internet by exploring the approaches, results, and challenges involved
   with measurement efforts from around the world.
-- 
Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works 



[-- Attachment #2: Type: application/pgp-signature, Size: 307 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cerowrt-devel] Network tests as discussed in Washington, DC
  2012-11-11 23:59 ` [Cerowrt-devel] " Michael Richardson
@ 2012-11-13  2:11   ` dpreed
  2012-12-08  6:53     ` Dave's "Water Videos" posted to YouTube Richard Brown
  0 siblings, 1 reply; 7+ messages in thread
From: dpreed @ 2012-11-13  2:11 UTC (permalink / raw)
  To: Michael Richardson; +Cc: bloat-devel, cerowrt-devel, bloat

I personally am happy that Henning Schulzrinne is pushing for this.  However, it should be remembered that the Commissioners rule the day at the FCC.  I hope he doesn't suffer what the Spectrum Policy Task Force at the FCC suffered when I was one of the many people who was involved.  Let's just say a lot of staffers at the FCC were forced to leave because they dared to focus on technology "third rails".

-----Original Message-----
From: "Michael Richardson" <mcr+ietf@sandelman.ca>
Sent: Sunday, November 11, 2012 6:58pm
To: "Dave Taht" <dave.taht@gmail.com>
Cc: "bloat-devel" <bloat-devel@lists.bufferbloat.net>, cerowrt-devel@lists.bufferbloat.net, "bloat" <bloat@lists.bufferbloat.net>
Subject: Re: [Cerowrt-devel] Network tests as discussed in Washington, DC

_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Re: FCC.
The IETF IAB plenary last Wednesday had a presenter from the FCC,
who claimed to care about latency in response to my question.

I don't claim it's a good use of your time to listen to these, but
probably having the links in the archive is useful.

http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-6
http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-7
http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-8

http://ietf85.conf.meetecho.com/index.php/Recorded_Sessions
http://recordings.conf.meetecho.com/Recordings/watch.jsp?recording=IETF85_PLENARY&chapter=part_11

5. Technical Topic: Measurement Issues in the Internet

   Moderator:  Alissa Cooper
   Speakers:  Sam Crawford and Henning Schulzrinne

   Although network performance measurement has been a topic of research,
   standardization, and development for decades, recent efforts to create
   national, regional, and global access network measurement testbeds and
   frameworks are drawing renewed interest.  These efforts seek to provide
   consistent measurements of fine-grained performance metrics such as
   packet loss, delay, and throughput, as well as higher order tests to
   capture quality of experience with respect to specific applications and
   services.  Some standardized metrics and tests exist, including those
   developed in the IPPM WG, but a unified framework for observing and
   reporting the quality metrics that define users experiences across
   different networks has yet to be developed.  This plenary will examine
   the implications of large-scale network performance measurement for the
   Internet by exploring the approaches, results, and challenges involved
   with measurement efforts from around the world.
-- 
Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works 





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Dave's "Water Videos" posted to YouTube
  2012-11-13  2:11   ` dpreed
@ 2012-12-08  6:53     ` Richard Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Brown @ 2012-12-08  6:53 UTC (permalink / raw)
  Cc: bloat-devel, cerowrt-devel, bloat

Folks,

I have pushed all the videos from teklibre to YouTube. (The raw .mp4's are still available on teklibre) There's a little bit of description on each of the pages to tell what each segment is about. They're available from the CeroWrt wiki page:

http://www.bufferbloat.net/projects/cerowrt/wiki/Bloat-videos

Dave: Did you have a chance to post the slides for your talk?

Best regards,

Rich Brown
Hanover, NH USA

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-12-08  6:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-10 10:54 Network tests as discussed in Washington, DC Dave Taht
2012-11-10 14:18 ` [Cerowrt-devel] " dpreed
2012-11-11  8:35 ` [Bloat] " Daniel Berger
2012-11-11 13:39   ` [Cerowrt-devel] " Dave Taht
2012-11-11 23:59 ` [Cerowrt-devel] " Michael Richardson
2012-11-13  2:11   ` dpreed
2012-12-08  6:53     ` Dave's "Water Videos" posted to YouTube Richard Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox