[Cerowrt-devel] [Bloat] DC behaviors today

Jonathan Morton chromatix99 at gmail.com
Wed Dec 13 16:06:15 EST 2017

(Awesome development - I have a computer with a sane e-mail client again.  One that doesn’t assume I want to top-post if I quote anything at all, *and* lets me type with an actual keyboard.  Luxury!)

>> One of the features well observed in real measurements of real systems is that packet flows are "fractal", which means that there is a self-similarity of rate variability all time scales from micro to macro.
> I remember this debate and its evolution, Hurst parameters and all that. I also understand that a collection of on/off Poisson sources looks fractal - I found that “the universe if fractal - live with it” ethos of limited practical use (except to help people say it was not solvable).

>> Designers may imagine that their networks have "smooth averaging" properties. There's a strong thread in networking literature that makes this pretty-much-always-false assumption the basis of protocol designs, thinking about "Quality of Service" and other sorts of things. You can teach graduate students about a reality that does not exist, and get papers accepted in conferences where the reviewers have been trained in the same tradition of unreal assumptions.
> Agreed - there is a massive disconnect between a lot of the literature (and the people who make their living generating it - [to those people, please don’t take offence, queueing theory is really useful it is just the real world is a lot more non-stationary than you model]) and reality.

Probably a lot of theoreticians would be horrified at the extent to which I ignored mathematics and relied on intuition (and observations of real traffic, ie. eating my own dogfood) while building Cake.

That approach, however, led me to some novel algorithms and combinations thereof which seem to work well in practice, as well as to some practical observations about the present state of the Internet.  I’ve also used some contributions from others, but only where they made sense at an intuitive level.

However, Cake isn’t designed to work in the datacentre.  Nor is it likely to work optimally in an ISP’s core networks.  The combination of features in Cake is not optimised for those environments, rather for last-mile links which are typically the bottlenecks experienced by ordinary Internet users.  Some of Cake's algorithms could reasonably be reused in a different combination for a different environment.

> I see large scale (i.e. public internets) not as a mono-service but as a “poly service” - there are multiple demands for timeliness etc that exist out there for “real services”.

This is definitely true.  However, the biggest problem I’ve noticed is with distinguishing these traffic types from each other.  In some cases there are heuristics which are accurate enough to be useful.  In others, there are not.  Rarely is the distinction *explicitly* marked in any way, and some common protocols explicitly obscure themselves due to historical mistreatment.

Diffserv is very hard to use in practice.  There’s a controversial fact for you to chew on.

> We’ve worked with people who have created risks for Netflix delivery (accidentally I might add - they though they were doing “the right thing”) by increasing their network infrastructure to 100G delivery everywhere. That change (combined with others made by CDN people - TCP offload engines) created so much non-stationarity in the load so as to cause delay and loss spikes that *did* cause VoD playout buffers to empty.  This is an example of where “more capacity” produced worse outcomes.

That’s an interesting and counter-intuitive result.  I’ll hazard a guess that it had something to do with burst loss in dumb tail-drop FIFOs?  Offload engines tend to produce extremely bursty traffic which - with a nod to another thread presently ongoing - makes a mockery of any ack-clocking or pacing which TCP designers normally assume is in effect.

One of the things that fq_codel and Cake can do really well is to take a deep queue full of consecutive line-rate bursts and turn them into interleaved packet streams, which are at least slightly better “paced” than the originals. They also specifically try to avoid burst loss and (at least in Cake’s case) tail loss.

It is of course regrettable that this behaviour conflicts with the assumptions of most network acceleration hardware, and that maximum throughput might therefore be compromised.  The *qualitative* behaviour is however improved.

> I would suggest that there are other ways of dealing with the impact of “peak” (i.e where instantaneous demand exceeds supply over a long enough timescale to start effecting the most delay/loss sensitive application in the collective multiplexed stream).

Such as signalling to the traffic that congestion exists, and to please slow down a bit to make room?  ECN and AQM are great ways of doing that, especially in combination with flow isolation - the latter shares out the capacity fairly on short timescales, *and* avoids the need to signal congestion to flows which are already using less than their fair share.

> I would also agree that if all the streams are of the same “bound on delay and loss” requirements (i.e *all* Netflix) then 100%+ of all the same load (over, again the appropriate timescale - which for Netflix VoD in streaming is about 20s to 30s) then end-user disappointment is the only thing that can occur.

I think emphasising the importance of measurement timescales is consistently underrated in the industry and in academia alike.  An hour-long bucket of traffic tells you about a very different set of characteristics than a millisecond-long bucket, and there are several timescales between those extremes of great practical interest.

 - Jonathan Morton

More information about the Cerowrt-devel mailing list