[Bloat] review: Deployment of RITE mechanisms, in use-case trial testbeds report part 1

Mon Feb 29 12:55:26 EST 2016

On Sun, Feb 28, 2016 at 5:33 AM, Alan Jenkins
<alan.christopher.jenkins at gmail.com> wrote:
> On 27/02/2016, Dave Täht <dave at taht.net> wrote:
>> On 2/26/16 3:23 AM, De Schepper, Koen (Nokia - BE) wrote:
>> I am aware of several unpublished studies. There was also something that
>> compared 1-3 HAS flows from several years back from stanford that I've
>> longed to be repeated against these aqm technologies.
>>
>> https://reproducingnetworkresearch.wordpress.com/2014/06/03/cs244-14-confused-timid-and-unstable-picking-a-video-streaming-rate-is-hard/
>
> Wow.

Updated for 2015, here:
https://reproducingnetworkresearch.wordpress.com/2015/05/31/cs244-15-confused-timid-and-unstable-picking-a-video-streaming-rate-is-hard/

I love this class over at stanford. I've read everything they've
published. With huge percentages of research papers being shown to not
be reproducible, and networking being a moving target rather than
rooted in the laws of physics, chemistry or biology, CS244 is
desperately needed.

If *every* university doing networking work had a second year class
that took on classic and recently popular papers and updated them for
current conditions we'd make short work of the backlog of stuff I'm
dubious about and have better assumptions to stand on going forward.
There are decades of work, in particular, that finds results at
<4mbits that I do not think hold at > 20mbit or > 100mbit.

Actually, more work spent reproducing papers (and having reproducible
papers) would be great across all fields of science.....

>
>
>>>    o the results are very poor with a particular popular AQM
>>
>> Define "very poor". ?
>
> Heh.  After skimming the same sections as you, I think my restraint
> must be vastly inferior to yours.

I wrote this a month ago. Hit save. Edited saturday. Shouldn't have
left in as much grumpiness as I did.

That said, we could build on this experimental design.  That
experiment would be more like:

phase 1)

20Mbit link, 4ms (google fiber), 10ms (cable), 20ms(dsl) (assuming
co-location of the video server), 40ms(random source) base RTT to the
video server. If pressed for time, just pick one of the above.

Provide a reference for actual measured behavior first for the
technology under emulation and emulate the actual bandwidth asymmetry.

HAS app shooting for 18mbit video. How long does it take the flow to ramp up?

Add *one* long running upload flow - How long does the HAS app take to ramp up?

Vary the AQMs on both sides of the link. Run with ECN and without.

Apply web traffic as background - what rates are chosen, what is the
PLT? I'd lean towards a web test that emulated a search, decision, web
page fetch loop.

(honestly I have a low bar for HAS traffic quality personally - merely
never stopping playback to buffer is it. I am hard pressed to remember
the last non-wifi-based interruption I've had on a video download
using sqm-scripts)

phase 2) trying 1-3 HAS flows on that link (as per the stanford
experiment but shooting for 18mbits per flow), varying the AQMs. Then
add in a single upload flow. Then try web traffic. Then all three....

phase 3) Adding in web traffic, a twitch game, a videoconference/voip
session, a (set of) uploads, a (set of) downloads.

To look further forward, pick an even higher encoding rate for the HAS
video. Run the downlink at rates like 75-200mbit. Try it over wifi,
too....

Getting to a high quality fast and staying there is *hard*, not just
for mixed HAS traffic. The videoconferencing servers I've looked at
thus far (jitsy, freeswitch) are doing *no* congestion control to
speak of, although there are some patches in flight - I am seeing
8mbits udp down, 2mbit up from freeswitches's conference server....

>
> I didn't like to complain on the AQM list.

Well, the EU commission is unlikely to read that list also, but
perhaps they will read that paper and other rite outputs.  I have no
idea what impact these outputs they will have.

I look forward (with some dread) as to what sorts of head end aqm/fq
technologies are adopted in the future on cable, fiber, cellular, etc,
if any, ever. At this point I am kind of expecting 5G to more or less
roll out first with better aqm/fq technologies as that's where the
heavy hardware/software investment seems to be going. It looks like
we'll see DOCSIS 3.1 trials in a few cities this year, but there's no
public data on what, if any, improvements were made to the CMTSes.

And gfiber (and services like comcast's blast) are so freaking fast
that nearly all the bloat moves to the wifi. At the same time, the
only things that seriously stress out wifi are multiple devices, and
trying to watch HAS traffic over it!

I did do some testing recently on the latest/greatest of the ethernet
over powerline AV/1200 devices from netgear. It was another one of
those - write my comments, hit save for a month - sort of exercises.

>
> The HAS over TCP must be more bursty than a simple TCP download.  All
> this could be explained by a competitive advantage in queues without
> fair scheduling.  There's no effort to rule it out here, at least.

The typical pattern is to try and fetch 10seconds of video in under
2-4(?) sec, otherwise degrade to a lower rate.

>
> You can see in the first link above, HAS used to basically die in
> competition with a simple TCP download.  Clearly it's been fixed for
> this case - assuming current commonly deployed queues.  SFQ looks
> *very* different to these queues, as they point out, and you see in
> the throughput graphs.  So it is possible there's an issue with the
> closed-source HAS being tested.  Somehow I doubt the Google Video's of
> this world would remain degraded, if there's a real issue here.  (I
> know, game theory of incremental deployment, sigh).

Google switched to using sch_fq (host fq + pacing) pretty universally
a year or two back, youtube switched first.

There was a tcp patch that landed recently in the mainline kernel that
didn't slam a connection after an idle period with the current size of
the congestion window. That looked pretty important... (I have no idea
how long it was in production)