[Bloat] Comparing bufferbloat tests (was: We built a new bufferbloat test and keen for feedback)

Thu Nov 5 11:06:14 EST 2020

Hello everyone! 

My name is Arshan and I’m one of the developers of the Bufferbloat project! Firstly thank you so much for your feedbacks! So many good points were raised that we will take into consideration for our later versions. I will attempt to answer your questions to the best of my knowledge so far (I am an intern, nonetheless :) ). 

Caution: Very long email ahead. I apologize in advance for this, I tried to address your individual questions, and I was offline for a while so I’m replying to lots of points here.

@Michael Richardson
> the latency measurement involves a TCP three-way handshake, with
>   the fourth packet providing the end of the process.
>   No TLS, I hope?

@Toke Høiland-Jørgensen
> The latency measurement puzzled me a
> bit (your tool says 16.6ms, but I get half that when I ping the
> cloudfront.net CDN, which I think is what you're measuring against?),
> but it does seem to stay fairly constant.

I believe TLS handshake time is not included here. I’m using the Resource Timing API <https://developer.mozilla.org/en-US/docs/Web/API/Resource_Timing_API> to measure the time-to-first-byte for a request that I’m sending to retrieve a static file. The resource loading phases <https://developer.mozilla.org/en-US/docs/Web/API/Resource_Timing_API/Using_the_Resource_Timing_API> section of the documentation explicitly shows the different stages for DNS Lookup, TCP connection establishment, etc. I’m using the difference between requestStart and responseStart values. This value is deemed to be the same as time-to-first-byte <https://stackoverflow.com/questions/6533456/time-to-first-byte-with-javascript> seen in the inspector’s network tab.

We’re using this static file <https://fonts.gstatic.com/l/font?kit=KFOmCnqEu92Fr1Me4GZNCzcPK4I&skey=a0a0114a1dcab3ac&v=v20> that is hosted on a google CDN. We tried multiple different files, and this one had the lowest latency in both locations that we tested it (I’m in Toronto, and my colleague Sina is in San Francisco).

@Michael Richardson
> Would webrtc APIs have helped?

We took a look at WebRTC and it would be a really good option as it uses udp so we can even measure things like packetloss (packetlosstest.com <http://packetlosstest.com/> does this). However this requires that we host a WebRTC backend, and we’d have to have multiple deployments globally distributed so that the latency values are consistent elsewhere. Between that and fetching a static file backed by google’s CDN, we chose the latter for simplicity. 

@Toke Høiland-Jørgensen
> Your test does a decent job and comes pretty close, at least
> in Chromium (about 800 Mbps which is not too far off at the application
> layer, considering I have a constant 100Mbps flow in the background
> taking up some of the bandwidth). Firefox seems way off (one test said
> 500Mbps the other >1000).

The way I’m measuring download is that I make multiple simultaneous requests to cloudflare’s backend requesting 100MB files. Their backend simply returns a file that has “0”s in the body repeated until 100MB of file is generated. Then I use readable streams <https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams> to make multiple measurements of (total bytes downloaded, timestamp). Then I fit a line to the measurements collected, and the slope of that line is the calculated bandwidth. For gigabit connections, this download happens very quickly, and it may be the case that not a lot of points are collected, in which case the fitted line is not accurate and one might get overly-huge bandwidths as is the >1000 case in ur Firefox browser. I think this might be fixed if we increase the download time. Currently it’s 5s, maybe changing that to 10-20s would help. I think in general it’d be a good feature to have a "more advanced options” feature that allows the user to adjust some parameters of the connection (such as number of parallel connections, download scenario’s duration, upload scenario’s duration, etc.)

The reason I do this line-fitting is because I want to get rid of the bandwidth ramp-up time when the download begins. 

Real-time Bandwidth Reporting
Using readable-streams also allows for instantaneous bandwidth reporting (maybe using average of a moving window) similar to what fast.com <http://fast.com/> or speedtest.net <http://speedtest.net/> do, but I unfortunately am not able to do the same thing with upload, since getting progress on http uploads adds some pre-flight OPTIONS requests which cloudflare’s speedtest backend <https://speed.cloudflare.com/> doesn’t allow those requests. For this test we are directly hitting cloudflare’s backend, you can see this in the network tab: 

Our download is by sending an http GET request to this endpoint: https://speed.cloudflare.com/__down?bytes=100000000 <https://speed.cloudflare.com/__down?bytes=100000000>
and our upload is done by sending and http POST request to this endpoint: https://speed.cloudflare.com/__up <https://speed.cloudflare.com/__up>

Since we are using cloudflare’s backend we are limited by what they allow us to do. 

I did try making my own worker which essentially does the same thing as cloudflare’s speedtest backend (They do have this template worker <https://github.com/cloudflare/worker-speedtest-template> that for the most part does the same thing.) I modified that worker a bit so that it allows http progress on upload for real-time measurements, but we hit another wall with that: we could not saturate gigabit internet connections. Turns out that cloudflare has tiers for the workers and the bundle tier that we are using doesn’t get the most priority in terms of bandwidth, so we could only get up to 600mbps measurements. Their own speed test is hosted on an enterprise tier, which is around $6-7k USD and is way too expensive. We are however, requesting for a demo from them, and it’s an ongoing progress. 

So since we can  measure instantaneous download speeds  but not upload speeds, we don’t report it for either one. But I can still make the adjustment to report it for download at least. 

@Toke Høiland-Jørgensen
> How do you calculate the jitter score? It's not obvious how you get from
> the percentiles to the jitter.

Jitter here is the standard deviation of the latency measurements in each stage. Is this a good definition?

@Toke Høiland-Jørgensen
> I found it hard to tell whether it was doing anything while the test was
> running. Most other tests have some kind of very obvious feedback
> (moving graphs of bandwidth-over-time for cloudflare/dslreports, a
> honking big number going up and down for fast.com), which I was missing
> here. I would also have liked to a measure of bandwidth over time, it
> seems a bit suspicious (from a "verify that this is doing something
> reasonable" PoV) that it just spits out a number at the end without
> telling me how long it ran, or how it got to that number.

Yeah I think we need to either report real-time bandwidths, or put some sort of animation.

@Toke Høiland-Jørgensen
> It wasn't obvious at first either that the header changes from
> "bufferbloat test" to "your bufferbloat grade" once the test is over I
> think the stages + result would be better put somewhere else where it's
> more obvious (the rest of the page grows downwards, so why isn't the
> result at the "end"?)

Good point!

@Y intruder_tkyf at yahoo.fr <http://yahoo.fr/>
> Great job. This is the result of my slow internees. I would like to know the criteria for the grade.

@Toke Høiland-Jørgensen
> Also, what are the shields below the grade supposed to mean? Do they
> change depending on the result? On which criteria? 

They do change! The criteria are listed below. Note that in the criteria below:
Latency is calculated as the maximum median of latency across all three stages.
Latency with Jitter is calculated as the maximum  of (median + std) across all three stages.

Web Browsing:
Downlink: > 1mbps
Uplink: > 1mbps

Audio Calls:
Downlink: > 3mbps
Uplink: > 3mbps
Latency: < 150ms
Latency with Jitter: < 200ms

4K Video Streaming:
Downlink: > 40mbps

Video Conferencing:
Downlink: > 2.5mbps
Uplink: > 2.5mbps
Latency: < 150ms
Latency with Jitter: < 200ms

Online Gaming:
Downlink: > 3mbps
Uplink: > 0.5mbps
Latency: < 100ms
Latency with Jitter: < 150ms

For the bufferbloat grade we use the same criteria as DSL reports <http://www.dslreports.com/faq/17930>.

@Toke Høiland-Jørgensen
> And it's telling me I have an A+ grade, so why is there a link to fix my bufferbloat issues?

We should hide that for A+ grades. 😬

> Less than 5ms (average of down bloat and up bloat) - A+
> Less than 30ms - A
> Less than 60ms - B
> Less than 200ms - C
> Less than 400ms - D
> 400ms+ - F

@Toke Høiland-Jørgensen 
> Smaller nit, I found the up/down arrows in "up saturated" and "down
> saturated" a bit hard to grasp at first, I think spelling out
> upload/download would be better. Also not sure I like the "saturated"
> term in the first place; do people know what that means in a networking
> context? And can you be sure the network is actually *being* saturated?
> Why is the "before you start" text below the page? Shouldn't it be at
> the top? And maybe explain *why* this is important?

All amazing points! Thanks! 

@Toke Høiland-Jørgensen 
> As far as the web page itself is concerned, holy cross-domain script
> deluge, Batman! I'm one of those people who run uMatrix in my regular
> Firefox session, and I disallow all cross-site script loads by default.
> I had to allow 15(!) different cross-site requests, *and* whitelist a
> few domains in my ad blocker as well to even get the test to run. Please
> fix this! I get that you can't completely avoid cross-domain requests
> due to the nature of the test, but why is a speedtest pulling in scripts
> from 'shopify.com' and three different ad tracking networks?

Hahahah this is because we’re using Shopify App Proxies <https://shopify.dev/tutorials/display-data-on-an-online-store-with-an-application-proxy-app-extension>. It’s a technique we use to import assets from our main store, and make it appear such that this is part of our main store whereas in reality it’s a separately-hosted application. This allows us to bring in the header, footer and the chatbot. This is a really good point though, I wonder what we can do with that. 

@Dave Collier-Brown
>   *   Why is unloaded a large number, and loaded a small one?
>   *   milliseconds sound like delay, so 111.7 ms sounds slower than 0.0 ms

This is a good point! We measure bufferbloat as the “change” in latency, so the value reported as loaded is “relative” to the unloaded, and not an “absolute” value. In case of a good router with small bufferbloat, this value will always be smaller than the unloaded case. We should probably include a text explaining that. 

@Dave Collier-Brown
>   *   Is bloat and latency something bad? The zeroes are in green, does that mean they're good?
>   *   Is max "bad"? In that case I'd call it "worst" and min "best"
>   *   Is median the middle or the average? (no kidding, I've been asked that! I'd call it average)

It’s actually middle, and we meant to report middle since a huge latency spike tends to move the average pretty dramatically in some bad networks. 

@Dave Collier-Brown
>   *   Is 25% twenty-five percent of the packets? (I suspect it's a percentile)
>   *   What does this mean in terms of how many Skype calls I can have happening at my house? I have two kids, a wife and a grandmother, all of whom Skype a lot.
@Dave Collier-Brown
> Looking at the cool stuff in the banner, it looks like I can browse, do audio calls, video calls (just one, or many?) but not streaming (any or just 4k?) or gaming.  Emphasizing that would be instantly understandable by grandma and the kids.

All very good questions. I think we should try and answer them in the FAQ’s below the test.

@Dave Collier-Brown
> DSLReports says
> 
>   *   144.7 Mb/s down
>   *   14.05 MB/s up
>   *   bufferbloat A+
>   *   downloading lag 40-100 ms
> 
> Waveform says:
> 
>   *   43.47 Mbps down
>   *   16.05 Mbps up
>   *   bufferbloat grade A+
>   *   unloaded latency 93.5 ms
> 
> So we're reporting different speeds and RTTs. Are we using different units or definitions, I wonder?

This is most likely an issue with our test. Do you consistently get low downlink values with our test? maybe increasing the download stage duration will help with this.

@Sebastian Moeller
>         [SM] This is a decent starting point. In addition it might be helpful to at least optionally include a test with with bidirectional saturating load, in the past such tests typically were quite successful in detecting bufferbloat sources, that were less obvious in the uni-directional load tests. I am not sure however how well that can work with a browser based test?
>  
> [SM] Mmmh, I like that this is a relevant latency measure, it might make sense though to make sure users realize that this is not the eqivalent number to runing a ICMP eche request against the same endpoint?
> 
> [SM] On heavily bufferbloated links it often takes a considerable amount of time for the bottleneck buffers to drain after a uni-directional test, so it might make sense to separate the two direction test with an additional phase of idle latency measurements. If that latency is like the initial unloaded latency, all is well, but if latency slowly ramps down in that phase you have a smoking gun for bad bufferbloat.

All very good points! I think having a fixed idle-time between stages 2 and 3 would be good. It appears to me that the ICMP ping values are still very close to the measured latency, however. 

Thank you everyone for your feedback! 

Arshan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20201105/c414f465/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 71429 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20201105/c414f465/attachment-0001.png>