[Cake] upstreaming cake in 2017?

Cake - FQ_codel the next generation
 help / color / mirror / Atom feed

* [Cake] upstreaming cake in 2017?
@ 2016-12-22 19:43 Dave Taht
  2016-12-22 20:02 ` Sebastian Moeller
  2016-12-30  7:42 ` Y
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Taht @ 2016-12-22 19:43 UTC (permalink / raw)
  To: cake, Stephen Hemminger

I think most of the reasons why cake could not be upstreamed are now
on their way towards being resolved, and after lede ships, I can't
think of any left to stop an
upstreaming push.

Some reasons for not upstreaming were:

* Because the algorithms weren't stable enough
* Because it wasn't feature complete until last month (denatting,
triple-isolate, and a 3 tier sqm)
* Because it had to work on embedded products going back to 3.12 or so
* Because I was busy with make-wifi-fast - which we got upstream as
soon as humanly possible.
* Because it was gated on having the large tester base we have with
lede (4.4 based)
* Because it rather abuses the tc statistics tool to generate tons of stats
* Because DSCP markings remain in flux at the ietf
* We ignore the packet priority fields entirely
* We don't know what diffserv models and ratios truly make sense

Anyone got more reasons not to upstream? Any more desirable features?

In looking over the sources today I see a couple issues:

* usage of  // comments and overlong lines
* could just use constants for the diffserv lookup tables (I just pushed the
   revised gen_cake_const.c file for the sqm mode, but didn't rip out the
   relevant code in sch_cake). I note that several of my boxes have 64
hw queues now
* I would rather like to retire "precedence" entirely
* cake cannot shape above 40Gbit (32 bit setting). Someday +40Gbit is possible
* we could split gso segments at quantum rather than always
* could use some profiling on x86, arm, and mips arches
* Need long RTT tests and stuff that abuses cobalt features
* Are we convinced the atm and overhead compensators are correct?
* ipv6 nat?
* ipsec recognition and prioritization?
* I liked deprioritizing ping in sqm-scripts

Hardware mq is bugging me - a single queued version of cake on the
root qdisc has much lower latency than a bql'd mq with cake on each
queue and *almost* the same throughput.

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-22 19:43 [Cake] upstreaming cake in 2017? Dave Taht
@ 2016-12-22 20:02 ` Sebastian Moeller
  2016-12-23  1:43   ` Stephen Hemminger
  2016-12-30  7:42 ` Y
  1 sibling, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2016-12-22 20:02 UTC (permalink / raw)
  To: Dave Täht; +Cc: cake, Stephen Hemminger

Hi Dave,

> On Dec 22, 2016, at 20:43, Dave Taht <dave.taht@gmail.com> wrote:
> 
> I think most of the reasons why cake could not be upstreamed are now
> on their way towards being resolved, and after lede ships, I can't
> think of any left to stop an
> upstreaming push.
> 
> Some reasons for not upstreaming were:
> 
> * Because the algorithms weren't stable enough
> * Because it wasn't feature complete until last month (denatting,
> triple-isolate, and a 3 tier sqm)
> * Because it had to work on embedded products going back to 3.12 or so
> * Because I was busy with make-wifi-fast - which we got upstream as
> soon as humanly possible.
> * Because it was gated on having the large tester base we have with
> lede (4.4 based)
> * Because it rather abuses the tc statistics tool to generate tons of stats
> * Because DSCP markings remain in flux at the ietf

	But does that matter? Is there really a hope that DSCPs will ever work outside of a well controlled DS/cP-domain? Because inside one, you can make any DSCP mean anything you want. Trusting ingress DSCPs to do the right thing and/or be well enough conserved is a lottery ticket. And also trusting that the right applications use the right ietf-compatible markings while no app tries to abuse those seems optimistic. And finally to end-users the problem is not so much which DSCP to priority bands/tier scheme was used, but rather how to convince their important applications to actually mark their packets such.

> * We ignore the packet priority fields entirely
> * We don't know what diffserv models and ratios truly make sense

	Well, IMHO that is a good indicator that making it configurable in addition to a few well reasoned configuration seems not the worst thing to do, no?

> 
> Anyone got more reasons not to upstream? Any more desirable features?
> 
> In looking over the sources today I see a couple issues:
> 
> * usage of  // comments and overlong lines
> * could just use constants for the diffserv lookup tables (I just pushed the
>   revised gen_cake_const.c file for the sqm mode, but didn't rip out the
>   relevant code in sch_cake). I note that several of my boxes have 64
> hw queues now
> * I would rather like to retire "precedence” entirely

	Why? At least it is a scheme that can be reasonably well described even if it rarely will be a good match for what people want. What is does get right IIRCC is sticking to half of the DSCP bits...

> * cake cannot shape above 40Gbit (32 bit setting). Someday +40Gbit is possible
> * we could split gso segments at quantum rather than always
> * could use some profiling on x86, arm, and mips arches
> * Need long RTT tests and stuff that abuses cobalt features
> * Are we convinced the atm and overhead compensators are correct?

	The ATM compensation itself is quite nice, the PTM compensation IMHO is not doing the right thing (less precise and more computationally intensive than required, even though by probably only little). I still have not become a friend of the keywords (it does not help that at least one of them seems not on accordance with the relevant ITU documents). Then again I am sure the keywords do not need me as a friend. But all of this is optional and hence no showstopper for merging (as long as none of them become default options changing them later seems doable to me).
 

> * ipv6 nat?

	The current believe seems to be that whoever does IPv6 NAT with a /128 and port remapping can keep the pieces. I have no idea how widespread such a configuration actually is, and adding an option for that after upstreaming also seems not unreasonable?

> * ipsec recognition and prioritization?

	Why?

Best Regards
	Sebastian

P.S.: The only part where I can claim some level of expertise (for a low value of expertise) is the overhead accounting stuff, so take the rest with a smile.

> * I liked deprioritizing ping in sqm-scripts
> 
> Hardware mq is bugging me - a single queued version of cake on the
> root qdisc has much lower latency than a bql'd mq with cake on each
> queue and *almost* the same throughput.
> 
> -- 
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-22 20:02 ` Sebastian Moeller
@ 2016-12-23  1:43   ` Stephen Hemminger
  2016-12-23  3:44     ` Jonathan Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2016-12-23  1:43 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Dave Täht, cake

On Thu, 22 Dec 2016 21:02:28 +0100
Sebastian Moeller <moeller0@gmx.de> wrote:

> Hi Dave,
> 
> > On Dec 22, 2016, at 20:43, Dave Taht <dave.taht@gmail.com> wrote:
> > 
> > I think most of the reasons why cake could not be upstreamed are now
> > on their way towards being resolved, and after lede ships, I can't
> > think of any left to stop an
> > upstreaming push.
> > 
> > Some reasons for not upstreaming were:
> > 
> > * Because the algorithms weren't stable enough
> > * Because it wasn't feature complete until last month (denatting,
> > triple-isolate, and a 3 tier sqm)
> > * Because it had to work on embedded products going back to 3.12 or so
> > * Because I was busy with make-wifi-fast - which we got upstream as
> > soon as humanly possible.
> > * Because it was gated on having the large tester base we have with
> > lede (4.4 based)
> > * Because it rather abuses the tc statistics tool to generate tons of stats
> > * Because DSCP markings remain in flux at the ietf  
> 
> 	But does that matter? Is there really a hope that DSCPs will ever work outside of a well controlled DS/cP-domain? Because inside one, you can make any DSCP mean anything you want. Trusting ingress DSCPs to do the right thing and/or be well enough conserved is a lottery ticket. And also trusting that the right applications use the right ietf-compatible markings while no app tries to abuse those seems optimistic. And finally to end-users the problem is not so much which DSCP to priority bands/tier scheme was used, but rather how to convince their important applications to actually mark their packets such.
> 
> > * We ignore the packet priority fields entirely
> > * We don't know what diffserv models and ratios truly make sense  
> 
> 	Well, IMHO that is a good indicator that making it configurable in addition to a few well reasoned configuration seems not the worst thing to do, no?
> 
> > 
> > Anyone got more reasons not to upstream? Any more desirable features?
> > 
> > In looking over the sources today I see a couple issues:
> > 
> > * usage of  // comments and overlong lines
> > * could just use constants for the diffserv lookup tables (I just pushed the
> >   revised gen_cake_const.c file for the sqm mode, but didn't rip out the
> >   relevant code in sch_cake). I note that several of my boxes have 64
> > hw queues now
> > * I would rather like to retire "precedence” entirely  
> 
> 	Why? At least it is a scheme that can be reasonably well described even if it rarely will be a good match for what people want. What is does get right IIRCC is sticking to half of the DSCP bits...
> 
> > * cake cannot shape above 40Gbit (32 bit setting). Someday +40Gbit is possible
> > * we could split gso segments at quantum rather than always
> > * could use some profiling on x86, arm, and mips arches
> > * Need long RTT tests and stuff that abuses cobalt features
> > * Are we convinced the atm and overhead compensators are correct?  
> 
> 	The ATM compensation itself is quite nice, the PTM compensation IMHO is not doing the right thing (less precise and more computationally intensive than required, even though by probably only little). I still have not become a friend of the keywords (it does not help that at least one of them seems not on accordance with the relevant ITU documents). Then again I am sure the keywords do not need me as a friend. But all of this is optional and hence no showstopper for merging (as long as none of them become default options changing them later seems doable to me).
>  
> 
> > * ipv6 nat?  
> 
> 	The current believe seems to be that whoever does IPv6 NAT with a /128 and port remapping can keep the pieces. I have no idea how widespread such a configuration actually is, and adding an option for that after upstreaming also seems not unreasonable?
> 
> > * ipsec recognition and prioritization?  
> 
> 	Why?
> 
> Best Regards
> 	Sebastian
> 
> P.S.: The only part where I can claim some level of expertise (for a low value of expertise) is the overhead accounting stuff, so take the rest with a smile.
> 
> > * I liked deprioritizing ping in sqm-scripts
> > 
> > Hardware mq is bugging me - a single queued version of cake on the
> > root qdisc has much lower latency than a bql'd mq with cake on each
> > queue and *almost* the same throughput.
> > 

It would also help to have a description of which use-case cake is trying to solve:
 - how much configuration (lots HTB) or zero (fq_codel)
 - AP, CPE, backbone router, host system?
Also what assumptions about the network are being made?

Ideally this could end up in both iproute2 and kernel documentation. Don't worry
if it is too much effort right away, LWN might help out.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-23  1:43   ` Stephen Hemminger
@ 2016-12-23  3:44     ` Jonathan Morton
  2016-12-23  8:42       ` Sebastian Moeller
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Morton @ 2016-12-23  3:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Sebastian Moeller, cake

> On 23 Dec, 2016, at 03:43, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> It would also help to have a description of which use-case cake is trying to solve:

> - how much configuration (lots HTB) or zero (fq_codel)

One of Cake’s central goals is that configuration should be straightforward for non-experts.  Some flexibility is sacrificed as a result, but many common use-cases are covered with very concise configuration.  That is why there are so many keywords.

> - AP, CPE, backbone router, host system?

The principal use-case is for either end of last-mile links, ie. CPE and head-end equipment - though actual deployment in the latter is much less likely than in the former, it remains a goal worth aspiring to.  This is very often a bottleneck link for consumers and businesses alike.

Cake could also be used in strategic locations in internal (corporate or ISP) networks, eg. building-to-building or site-to-site links.

For APs, the make-wifi-fast stuff is a better choice, because it adapts natively to the wifi environment.  Cake could gainfully be used on the wired LAN side of an AP, if inbound wifi traffic can saturate the wired link.

Deployment on core backbone networks is not a goal.  For that, you need hardware-accelerated simple AQM, if anything, simply to keep up.

> Also what assumptions about the network are being made?

As far as Diffserv is concerned, I explicitly assume that the standard RFC-defined DSCPs and PHBs are in use, which obviates any concerns about Diffserv policy boundaries.  No other assumption makes sense, other than that Diffserv should be ignored entirely (which is also RFC-compliant), or that legacy Precedence codes are in use (which is deprecated but remains plausible) - and both of these additional cases are also supported.

Cake does *not* assume that DSCPs are trustworthy.  It respects them as given, but employs straightforward countermeasures against misuse (eg. higher “priority” applies only up to some fraction of capacity), and incentives for correct use (eg. latency-sensitive tins get more aggressive AQM).  This improves deployability, and thus solves one half of the classic chicken-and-egg deployment problem.

So, if Cake gets deployed widely, an incentive for applications to correctly mark their traffic will emerge.

Incidentally, the biggest arguments against Precedence are: that it has no class of *lower* priority than the default (which is useful for swarm traffic), and that it was intended for use with strict priority, which only makes sense in a trusted network (which the Internet isn’t).

If you have complex or unusual Diffserv needs, you can still use Cake as leaf qdiscs to a classifier, ignoring its internal Diffserv support.

Cake's shaper assumes that the link has consistent throughput.  This assumption tends to break down on wireless links; you have to set the shaped bandwidth conservatively and still accept some occasional reversion to device buffering.  BQL helps a lot, but implementing it for certain types of device is very hard.

Conversely, Cake’s shaper carefully tries *not* to rely on downstream devices having large buffers of their own, unlike token-bucket shapers.  Indeed, avoiding this assumption improves latency performance at a given throughput and vice versa.

Cake also assumes in general that the number of flows on the link at any given instant is not too large - a few hundred is acceptable.  Behaviour should degrade fairly gracefully once flow-hash collisions can no longer be avoided, and will self-recover to peak performance after anomalous load spikes.  This assumption is however likely to break down on backbones and major backhaul networks.  Cake does support treating entire IP addresses as single flows, which may extend its applicability.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-23  3:44     ` Jonathan Morton
@ 2016-12-23  8:42       ` Sebastian Moeller
  2016-12-23  9:53         ` Jonathan Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2016-12-23  8:42 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Stephen Hemminger, cake

Hi Jonathan,


> On Dec 23, 2016, at 04:44, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> 
>> On 23 Dec, 2016, at 03:43, Stephen Hemminger <stephen@networkplumber.org> wrote:
>> 
>> It would also help to have a description of which use-case cake is trying to solve:
> 
>> - how much configuration (lots HTB) or zero (fq_codel)
> 
> One of Cake’s central goals is that configuration should be straightforward for non-experts.  Some flexibility is sacrificed as a result,

	This does not compute, offering simple configuration options is not at odds with also exposing more detailed configuration methods. The best thing for novices is arguable picking sane defaults...

> but many common use-cases are covered with very concise configuration.  That is why there are so many keywords.
> 
>> - AP, CPE, backbone router, host system?
> 
> The principal use-case is for either end of last-mile links, ie. CPE and head-end equipment - though actual deployment in the latter is much less likely than in the former, it remains a goal worth aspiring to.  This is very often a bottleneck link for consumers and businesses alike.
> 
> Cake could also be used in strategic locations in internal (corporate or ISP) networks, eg. building-to-building or site-to-site links.
> 
> For APs, the make-wifi-fast stuff is a better choice, because it adapts natively to the wifi environment.  Cake could gainfully be used on the wired LAN side of an AP, if inbound wifi traffic can saturate the wired link.
> 
> Deployment on core backbone networks is not a goal.  For that, you need hardware-accelerated simple AQM, if anything, simply to keep up.
> 
>> Also what assumptions about the network are being made?
> 
> As far as Diffserv is concerned, I explicitly assume that the standard RFC-defined DSCPs and PHBs are in use, which obviates any concerns about Diffserv policy boundaries.

	??? This comes close to ignoring reality. The RFCs are less important than what people actually send down the internet. I know I keep harping on this, but to help non-experts it is better to deal with the state of DSCP on the internet as it is right now as compared to how it should be.


> No other assumption makes sense, other than that Diffserv should be ignored entirely (which is also RFC-compliant),

	I beg to differ, as I tried to argue before, coming up with a completely different system (preferable randomized for each home network) will make gaming the DSCPs much harder and not matter which DSCP-svheme is selected, marking in the home network is the biggest stumbling block to experts and non experts alike (judged from trying to support users in the openwrt forum, admittedly a biased sample). Given that specific marking will be required to teach applications to use the wanted DSCPs (I assume OS support to override the application choice of DSCP as the only real option for forward progress, waiting for all networked applications to expose configurable DSCP options makes waiting for Godot a better wast of one’s time).

> or that legacy Precedence codes are in use (which is deprecated but remains plausible) - and both of these additional cases are also supported.
> 
> Cake does *not* assume that DSCPs are trustworthy.  It respects them as given, but employs straightforward countermeasures against misuse (eg. higher “priority” applies only up to some fraction of capacity),

	But doesn’t that automatically mean that an attacker can degrade performance of a well configured high priority tier (with appropriate access control) by overloading that band, which will affect the priority of the whole band, no? That might not be the worst alternative, but it certainly is not side-effect free.


> and incentives for correct use (eg. latency-sensitive tins get more aggressive AQM).  This improves deployability, and thus solves one half of the classic chicken-and-egg deployment problem.
> 
> So, if Cake gets deployed widely, an incentive for applications to correctly mark their traffic will emerge.

	For which value of “correct” exactly?

> 
> Incidentally, the biggest arguments against Precedence are: that it has no class of *lower* priority than the default (which is useful for swarm traffic), and that it was intended for use with strict priority, which only makes sense in a trusted network (which the Internet isn’t).

	But almost no program uses CS1 to label its data as lower priority, and that includes torrent style applications and microsoft distributed update. Which brings us back to the DSCP re-mapping problems that makes DSCP almost useless for non-experts. I would claim that instead of making a special bin for background, use CS0 for that and move everything more important to a higher values DSCP. This has the advantage that it will degrade to the status quo...

> 
> If you have complex or unusual Diffserv needs, you can still use Cake as leaf qdiscs to a classifier, ignoring its internal Diffserv support.
> 
> Cake's shaper assumes that the link has consistent throughput.  This assumption tends to break down on wireless links; you have to set the shaped bandwidth conservatively and still accept some occasional reversion to device buffering.  BQL helps a lot, but implementing it for certain types of device is very hard.
> 
> Conversely, Cake’s shaper carefully tries *not* to rely on downstream devices having large buffers of their own, unlike token-bucket shapers.  Indeed, avoiding this assumption improves latency performance at a given throughput and vice versa.

	Another noteworthy difference between cake and token bucket system is that under CPU starvation cake will try to honor the configured bandwidth at the cost of a little increasing latency while token bucket shapers (with small/no configured bursting) will sacrifice bandwidth. 

> 
> Cake also assumes in general that the number of flows on the link at any given instant is not too large - a few hundred is acceptable.  

	I assume there is a build time parameter that will cater to a specific set of flows, would recompiling with a higher value for that constant allow to taylor cake for environments with a larger number of concurrent flows?

Best Regards
	Sebastian


> Behaviour should degrade fairly gracefully once flow-hash collisions can no longer be avoided, and will self-recover to peak performance after anomalous load spikes.  This assumption is however likely to break down on backbones and major backhaul networks.  Cake does support treating entire IP addresses as single flows, which may extend its applicability.
> 
> - Jonathan Morton
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-23  8:42       ` Sebastian Moeller
@ 2016-12-23  9:53         ` Jonathan Morton
  2016-12-23 12:40           ` Sebastian Moeller
  2016-12-24 15:55           ` Benjamin Cronce
  0 siblings, 2 replies; 14+ messages in thread
From: Jonathan Morton @ 2016-12-23  9:53 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Stephen Hemminger, cake

>> As far as Diffserv is concerned, I explicitly assume that the standard RFC-defined DSCPs and PHBs are in use, which obviates any concerns about Diffserv policy boundaries.
> 
> 	??? This comes close to ignoring reality. The RFCs are less important than what people actually send down the internet.

What is actually sent down the Internet right now is mostly best-effort only - the default CS0 codepoint.  My inbound shaper currently shows 96GB best-effort, 46MB CS1 and 4.3MB “low latency”.

This is called the “chicken and egg” problem; applications mostly ignore Diffserv’s existence because it has no effect in most environments, and CPE ignores Diffserv’s existence because little traffic is observed using it.

To solve the chicken-and-egg problem, you have to break that vicious cycle.  It turns out to be easier to do that on the network side, creating an environment where DSCPs *do* have effects which applications might find useful.

> coming up with a completely different system (preferable randomized for each home network) will make gaming the DSCPs much harder

With all due respect, that is the single most boneheaded idea I’ve come across on this list.  If the effect of applying a given DSCP is unpredictable, and may even be opposite to the desired behaviour - or, equivalently, if the correct DSCP to achieve a given behaviour is unpredictable - then Diffserv will *never* be used by mainstream users and applications.

>> Cake does *not* assume that DSCPs are trustworthy.  It respects them as given, but employs straightforward countermeasures against misuse (eg. higher “priority” applies only up to some fraction of capacity),
> 
> 	But doesn’t that automatically mean that an attacker can degrade performance of a well configured high priority tier (with appropriate access control) by overloading that band, which will affect the priority of the whole band, no? That might not be the worst alternative, but it certainly is not side-effect free.

If an attacker wants to cause side-effects like that, he’ll always be able to do so - unless he’s filtered at source.  As a more direct counterpoint, if we weren’t using Diffserv at all, the very same attack would degrade performance for all traffic, not just the subset with equivalent DSCPs.

Therefore, I have chosen to focus on incentivising legitimate traffic in appropriate directions.

>> So, if Cake gets deployed widely, an incentive for applications to correctly mark their traffic will emerge.
> 
> 	For which value of “correct” exactly?

RFC-compliant, obviously.

There are a few very well-established DSCPs which mean “minimise latency” (TOS4, EF) or “yield priority” (CS1).  The default configuration recognises those and treats them accordingly.

>> But almost no program uses CS1 to label its data as lower priority

See chicken-and-egg argument above.  There are signs that CS1 is in fact being used in its modern sense; indeed, while downloading the latest Star Citizen test version the other day, 46MB of data ended up in CS1.  Star Citizen uses libtorrent, as I suspect do several other prominent games, so adding CS1 support there would probably increase coverage quite quickly.

>> Cake also assumes in general that the number of flows on the link at any given instant is not too large - a few hundred is acceptable.  
> 
> 	I assume there is a build time parameter that will cater to a specific set of flows, would recompiling with a higher value for that constant allow to taylor cake for environments with a larger number of concurrent flows?

There is a compile-time constant in the code which could, in principle, be exposed to the kernel configuration system.  Increasing the queue count from 1K to 32K would allow “several hundred” to be replaced with “about ten thousand”.  That’s still not backbone-grade, but might be useful for a very small ISP to manage its backhaul, such as an apartment complex FTTP installation or a village initiative.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-23  9:53         ` Jonathan Morton
@ 2016-12-23 12:40           ` Sebastian Moeller
  2016-12-23 14:06             ` Jonathan Morton
  2016-12-24 15:55           ` Benjamin Cronce
  1 sibling, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2016-12-23 12:40 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Stephen Hemminger, cake

Hi Jonathan,

> On Dec 23, 2016, at 10:53, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
>>> As far as Diffserv is concerned, I explicitly assume that the standard RFC-defined DSCPs and PHBs are in use, which obviates any concerns about Diffserv policy boundaries.
>> 
>> 	??? This comes close to ignoring reality. The RFCs are less important than what people actually send down the internet.
> 
> What is actually sent down the Internet right now is mostly best-effort only - the default CS0 codepoint.  My inbound shaper currently shows 96GB best-effort, 46MB CS1 and 4.3MB “low latency”.
> 
> This is called the “chicken and egg” problem; applications mostly ignore Diffserv’s existence because it has no effect in most environments, and CPE ignores Diffserv’s existence because little traffic is observed using it.
> 
> To solve the chicken-and-egg problem, you have to break that vicious cycle.  It turns out to be easier to do that on the network side, creating an environment where DSCPs *do* have effects which applications might find useful.

	You seem to completely ignore that given DSCP-domains the DSCP markings are not considered to be immutable property of the sender but are used as a scratch space for intermediate transport domains. Any scheme that does not account for that will never reach end2end reliability of DSCP-coded intent. Your chicken and egg phrasing of the challenge completely ignores this, and that is what I only half-jokingly called ignoring reality...

> 
>> coming up with a completely different system (preferable randomized for each home network) will make gaming the DSCPs much harder
> 
> With all due respect, that is the single most boneheaded idea I’ve come across on this list.  If the effect of applying a given DSCP is unpredictable, and may even be opposite to the desired behaviour - or, equivalently, if the correct DSCP to achieve a given behaviour is unpredictable - then Diffserv will *never* be used by mainstream users and applications.

	Exactly, Diffserv will ever only make sense inside a DSCP-domain and inside one the code point to priority mapping can be completely arbitrary as it can not be assumed to make sense in other settings. Anything else is a) open to be gamed by sufficiently motivated application writers and will b) need to survive the re-mapping ISPs might/will do during transit. I know there are proposals to split the 6bit Diffserv field into two sets of 3bits, one for signaling end2end intend and three for the current domain's re-mapping of that intent (or lack thereof). But you knew that. 
Also calling essentially randomization to avoid attacks boneheaded seems a bit strong, it is not that this idea is without precedence (see KASLR, even though 6bits are not much to work with).


> 
>>> Cake does *not* assume that DSCPs are trustworthy.  It respects them as given, but employs straightforward countermeasures against misuse (eg. higher “priority” applies only up to some fraction of capacity),
>> 
>> 	But doesn’t that automatically mean that an attacker can degrade performance of a well configured high priority tier (with appropriate access control) by overloading that band, which will affect the priority of the whole band, no? That might not be the worst alternative, but it certainly is not side-effect free.
> 
> If an attacker wants to cause side-effects like that, he’ll always be able to do so - unless he’s filtered at source.  As a more direct counterpoint, if we weren’t using Diffserv at all, the very same attack would degrade performance for all traffic, not just the subset with equivalent DSCPs.

	Well then the damage would be shared instead of allowing an attacker to selective degrade the higher priority data, but I see your point, the malicious actor will cause problems diffserv or not and it is debatable which problem is actually worse. But I also note that it is generally advised to re-map CS7 on ingress to basically take the remote offenders capability away to affect the network management traffic.

> 
> Therefore, I have chosen to focus on incentivising legitimate traffic in appropriate directions.
> 
>>> So, if Cake gets deployed widely, an incentive for applications to correctly mark their traffic will emerge.
>> 
>> 	For which value of “correct” exactly?
> 
> RFC-compliant, obviously.

	This is the other half to my "ignoring reality” claim, if you put RFC over observable data you are in for interesting challenges.

> 
> There are a few very well-established DSCPs which mean “minimise latency” (TOS4, EF) or “yield priority” (CS1).  The default configuration recognises those and treats them accordingly.

	Which sounds fine with the caveat that those can not be trusted on ingress without checking them first. Or put differently the internet is no overarching dscp-domain.


> 
>>> But almost no program uses CS1 to label its data as lower priority
> 
> See chicken-and-egg argument above.  There are signs that CS1 is in fact being used in its modern sense; indeed, while downloading the latest Star Citizen test version the other day, 46MB of data ended up in CS1.  Star Citizen uses libtorrent, as I suspect do several other prominent games, so adding CS1 support there would probably increase coverage quite quickly.

	The fact that torrent applications have not jumped upon the CS1 marking idea (even though they typically do try to be good scavengers/netizens), should give us pause to think. IMHO the reason is that torrents want to be robust against ISP interference and hence using simple identifiers like using CS1 marking seems a hard sell to the torrent community. But that basically diminishes the potential usefulness of the CS1 marking.


> 
>>> Cake also assumes in general that the number of flows on the link at any given instant is not too large - a few hundred is acceptable.  
>> 
>> 	I assume there is a build time parameter that will cater to a specific set of flows, would recompiling with a higher value for that constant allow to taylor cake for environments with a larger number of concurrent flows?
> 
> There is a compile-time constant in the code which could, in principle, be exposed to the kernel configuration system.  Increasing the queue count from 1K to 32K would allow “several hundred” to be replaced with “about ten thousand”.  That’s still not backbone-grade, but might be useful for a very small ISP to manage its backhaul, such as an apartment complex FTTP installation or a village initiative.

	Ah, this sounds great, if the average flow use will increase in the future cake can be made to cope better by a simple edit and recompile.

Best Regards
	Sebastian

> 
> - Jonathan Morton
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-23 12:40           ` Sebastian Moeller
@ 2016-12-23 14:06             ` Jonathan Morton
  2016-12-23 16:24               ` Sebastian Moeller
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Morton @ 2016-12-23 14:06 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Stephen Hemminger, cake

> On 23 Dec, 2016, at 14:40, Sebastian Moeller <moeller0@gmx.de> wrote:
> 
> You seem to completely ignore that given DSCP-domains the DSCP markings are not considered to be immutable property of the sender but are used as a scratch space for intermediate transport domains. Any scheme that does not account for that will never reach end2end reliability of DSCP-coded intent. 

And this is why I listed RFC-compliant DSCPs as an assumption about the network, when the question arose.

I happen to believe it’s reasonable to assume that DSCP remapping will *normally* use RFC-allocated DSCPs for their RFC-compliant meanings, and DSCPs in the unallocated and private-use spaces for internal meanings.  That’s sufficient for RFC-compliant Diffserv behaviour to be both useful and expected.

There will be occasional exceptions, but so far those have not showed up to any noticeable degree in my corner of the Internet.

> But I also note that it is generally advised to re-map CS7 on ingress to basically take the remote offenders capability away to affect the network management traffic.

Cake does not assume that it is at the edge of a Diffserv domain, which is where such remapping would be appropriate.  I leave it to firewall rules if required.

> if you put RFC over observable data you are in for interesting challenges.

My observations of reality are mostly consistent with the RFCs.

>> There are a few very well-established DSCPs which mean “minimise latency” (TOS4, EF) or “yield priority” (CS1).  The default configuration recognises those and treats them accordingly.
> 
> Which sounds fine with the caveat that those can not be trusted on ingress without checking them first.

And as I have said several times in this thread alone, Cake does not trust the DSCP field blindly - yet it does not need to “verify” it, either.  It interprets each DSCP as a request for a particular type of service, and each type of service has both advantages and disadvantages relative to other types.

Using the new “diffserv3” mode as an example, there are just three tins.  The default CS0 code, along with almost all the others which might randomly occur, end up in Best Effort, which is tuned for a general mix of traffic with “normal” Codel parameters.

CS1 gets shunted into the Bulk tin, which is guaranteed only 1/16th of the link capacity, and yields any use of the remainder to the other two tins - there is clearly no incentive to use that rather than CS0, except for altruism.

TOS4, VA, EF, CS6 and CS7 all go in the Voice tin, which is tuned for minimising latency - even if there is no competing traffic, bulk TCP flows will tend to get reduced throughput due to the more aggressive Codel parameters.  Priority is substantially raised (by way of a large WRR quantum) over Best Effort - but only as long as tin throughput stays below 1/4 of the link capacity.  Trying to increase bulk throughput by using one of these DSCPs will therefore be counterproductive, while trying to reduce peak and average latency is exactly what it’s for in the first place.

An example of a Diffserv implementation that *did* blindly trust DSCPs would be a strict-priority scheme without any failsafes.  Cake is not one of those.

> Or put differently the internet is no overarching dscp-domain.

No, but in the absence of an explicitly administered alternative, RFC compliance *is* the default and expected mode of the Internet.  If you no longer believe that, then perhaps you should stop trusting your TCP/IP packets entirely.

In any case, if it is necessary to remap DSCPs in any way to bring them into RFC compliance, that is not Cake’s job.  Use firewall rules or a classifier qdisc.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-23 14:06             ` Jonathan Morton
@ 2016-12-23 16:24               ` Sebastian Moeller
  2016-12-23 17:01                 ` Dave Taht
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2016-12-23 16:24 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Stephen Hemminger, cake

Hi Jonathan,

> On Dec 23, 2016, at 15:06, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> 
>> On 23 Dec, 2016, at 14:40, Sebastian Moeller <moeller0@gmx.de> wrote:
>> 
>> You seem to completely ignore that given DSCP-domains the DSCP markings are not considered to be immutable property of the sender but are used as a scratch space for intermediate transport domains. Any scheme that does not account for that will never reach end2end reliability of DSCP-coded intent. 
> 
> And this is why I listed RFC-compliant DSCPs as an assumption about the network, when the question arose.
> 
> I happen to believe it’s reasonable to assume that DSCP remapping will *normally* use RFC-allocated DSCPs for their RFC-compliant meanings, and DSCPs in the unallocated and private-use spaces for internal meanings.  That’s sufficient for RFC-compliant Diffserv behaviour to be both useful and expected.
> 
> There will be occasional exceptions, but so far those have not showed up to any noticeable degree in my corner of the Internet.
> 
>> But I also note that it is generally advised to re-map CS7 on ingress to basically take the remote offenders capability away to affect the network management traffic.
> 
> Cake does not assume that it is at the edge of a Diffserv domain, which is where such remapping would be appropriate.  I leave it to firewall rules if required.

	One of the use cases for cake that we constantly push is on the WAN interface of a CPE; if you do not consider a personal home net a different DSCP domain than the ISPs access network, I do not know what you would call  a domain boundary. But I guess I presented mu arguments and will stop now.


> 
>> if you put RFC over observable data you are in for interesting challenges.
> 
> My observations of reality are mostly consistent with the RFCs.

	Well, only if you consider it to be RFC conform if an intermediary network re-mapps to e.g. zero (in my case flent RRUL packets are re-mapped to zero for IPv4 but conserve their markings for IPv6, and I believe Dave reported his ISP delivering a considerable portion of CS1 packets, that appeared to have been initiated as !CS1).


> 
>>> There are a few very well-established DSCPs which mean “minimise latency” (TOS4, EF) or “yield priority” (CS1).  The default configuration recognises those and treats them accordingly.
>> 
>> Which sounds fine with the caveat that those can not be trusted on ingress without checking them first.
> 
> And as I have said several times in this thread alone, Cake does not trust the DSCP field blindly - yet it does not need to “verify” it, either.  It interprets each DSCP as a request for a particular type of service, and each type of service has both advantages and disadvantages relative to other types.
> 
> Using the new “diffserv3” mode as an example, there are just three tins.  The default CS0 code, along with almost all the others which might randomly occur, end up in Best Effort, which is tuned for a general mix of traffic with “normal” Codel parameters.
> 
> CS1 gets shunted into the Bulk tin, which is guaranteed only 1/16th of the link capacity, and yields any use of the remainder to the other two tins - there is clearly no incentive to use that rather than CS0, except for altruism.
> 
> TOS4, VA, EF, CS6 and CS7 all go in the Voice tin, which is tuned for minimising latency - even if there is no competing traffic, bulk TCP flows will tend to get reduced throughput due to the more aggressive Codel parameters.  Priority is substantially raised (by way of a large WRR quantum) over Best Effort - but only as long as tin throughput stays below 1/4 of the link capacity.  Trying to increase bulk throughput by using one of these DSCPs will therefore be counterproductive, while trying to reduce peak and average latency is exactly what it’s for in the first place.
> 
> An example of a Diffserv implementation that *did* blindly trust DSCPs would be a strict-priority scheme without any failsafes.  Cake is not one of those.
> 
>> Or put differently the internet is no overarching dscp-domain.
> 
> No, but in the absence of an explicitly administered alternative, RFC compliance *is* the default and expected mode of the Internet.  

	But those RFCs seem to state that one can not expect specific mapping on ingress and that one is free to what ever one wants inside a domain. Now an optimistic interpretation is that baring better reasons people will slowly and automatically evolve their mapping schemes to be closer to some RFCs.

> If you no longer believe that, then perhaps you should stop trusting your TCP/IP packets entirely.

	This is not so much e question of my believe system, but the fact that I am not convinced that the data fully supports your view. I will shut up now, and try to get a few packet captures in my network, while definitive it should give me an idea whether my pessimism might unjustified.

> 
> In any case, if it is necessary to remap DSCPs in any way to bring them into RFC compliance,

	Here I disagree, RFC compliance has in my eyes zero importance on re-mapping the only question one needs to answer is whether the re-mapping makes sense inside an DSCP domain.

> that is not Cake’s job.  Use firewall rules or a classifier qdisc.

	Funny you should say that, but I believe on ingress cake on an IFB will run before the firewall (but I might be completely wrong).

Best Regards
	Sebastian

> 
> - Jonathan Morton
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-23 16:24               ` Sebastian Moeller
@ 2016-12-23 17:01                 ` Dave Taht
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Taht @ 2016-12-23 17:01 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Jonathan Morton, cake

I have largely been focused on make-wifi-fast until the last week,
only doing a code review on cake over the last few days and submitting
a few patches thus far.

I have sometimes been frustrated enough on the squash/wash issue to
want to fork cake to "just do it", because RFC compliance actually
mandates that the traffic be re-marked appropriately upon transiting a
domain, and seeing CS1 enter my network from comcast really messes up
if it is passed directly through to wifi. It is far, far better to
squash on ingress, and to do it without a tc filter.

In other patches:

https://github.com/dtaht/sch_cake/pull/42 constifies things. (compile
tested only, btw)

And:

I was not aware of this "feature": https://github.com/dtaht/sch_cake/issues/41
until yesterday and it needs testing against real traffic, on real
RTTs. I think a lot of latency sensitive (and marked thus) traffic is
now pretty bursty (videoconferencing in particular, as well as
anything transiting a wifi hop beforehand) and (IMHO) the literature
on short queues is now invalid and out of date for all but voice
traffic. (that said, I'm willing to test first)

...

stats keeping seems broken: https://github.com/dtaht/sch_cake/issues/43

I have always been allergic to all the stats in the first place.
Keeping inaccurate stats is worse than no stats at all.

...

I will gladly re-roll a patch for squashing.

...

I would really like cake reviewed for suitability for mainlining
before it becomes part of lede's next release. In particular the
cobalt.*c* include has gotta go.

...

I've done a bit of profiling on it, basically cpu-wise it eats about
the same as htb+fq-codel does.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-23  9:53         ` Jonathan Morton
  2016-12-23 12:40           ` Sebastian Moeller
@ 2016-12-24 15:55           ` Benjamin Cronce
  2016-12-24 17:22             ` Jonathan Morton
  1 sibling, 1 reply; 14+ messages in thread
From: Benjamin Cronce @ 2016-12-24 15:55 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Sebastian Moeller, cake

[-- Attachment #1: Type: text/plain, Size: 5779 bytes --]

On Fri, Dec 23, 2016 at 3:53 AM, Jonathan Morton <chromatix99@gmail.com>
wrote:

> >> As far as Diffserv is concerned, I explicitly assume that the standard
> RFC-defined DSCPs and PHBs are in use, which obviates any concerns about
> Diffserv policy boundaries.
> >
> >       ??? This comes close to ignoring reality. The RFCs are less
> important than what people actually send down the internet.
>
> What is actually sent down the Internet right now is mostly best-effort
> only - the default CS0 codepoint.  My inbound shaper currently shows 96GB
> best-effort, 46MB CS1 and 4.3MB “low latency”.
>
> This is called the “chicken and egg” problem; applications mostly ignore
> Diffserv’s existence because it has no effect in most environments, and CPE
> ignores Diffserv’s existence because little traffic is observed using it.
>
> To solve the chicken-and-egg problem, you have to break that vicious
> cycle.  It turns out to be easier to do that on the network side, creating
> an environment where DSCPs *do* have effects which applications might find
> useful.
>
> > coming up with a completely different system (preferable randomized for
> each home network) will make gaming the DSCPs much harder
>
> With all due respect, that is the single most boneheaded idea I’ve come
> across on this list.  If the effect of applying a given DSCP is
> unpredictable, and may even be opposite to the desired behaviour - or,
> equivalently, if the correct DSCP to achieve a given behaviour is
> unpredictable - then Diffserv will *never* be used by mainstream users and
> applications.
>
> >> Cake does *not* assume that DSCPs are trustworthy.  It respects them as
> given, but employs straightforward countermeasures against misuse (eg.
> higher “priority” applies only up to some fraction of capacity),
> >
> >       But doesn’t that automatically mean that an attacker can degrade
> performance of a well configured high priority tier (with appropriate
> access control) by overloading that band, which will affect the priority of
> the whole band, no? That might not be the worst alternative, but it
> certainly is not side-effect free.
>
> If an attacker wants to cause side-effects like that, he’ll always be able
> to do so - unless he’s filtered at source.  As a more direct counterpoint,
> if we weren’t using Diffserv at all, the very same attack would degrade
> performance for all traffic, not just the subset with equivalent DSCPs.
>
> Therefore, I have chosen to focus on incentivising legitimate traffic in
> appropriate directions.
>
> >> So, if Cake gets deployed widely, an incentive for applications to
> correctly mark their traffic will emerge.
> >
> >       For which value of “correct” exactly?
>
> RFC-compliant, obviously.
>
> There are a few very well-established DSCPs which mean “minimise latency”
> (TOS4, EF) or “yield priority” (CS1).  The default configuration recognises
> those and treats them accordingly.
>
> >> But almost no program uses CS1 to label its data as lower priority
>
> See chicken-and-egg argument above.  There are signs that CS1 is in fact
> being used in its modern sense; indeed, while downloading the latest Star
> Citizen test version the other day, 46MB of data ended up in CS1.  Star
> Citizen uses libtorrent, as I suspect do several other prominent games, so
> adding CS1 support there would probably increase coverage quite quickly.
>
> >> Cake also assumes in general that the number of flows on the link at
> any given instant is not too large - a few hundred is acceptable.
> >
> >       I assume there is a build time parameter that will cater to a
> specific set of flows, would recompiling with a higher value for that
> constant allow to taylor cake for environments with a larger number of
> concurrent flows?
>
> There is a compile-time constant in the code which could, in principle, be
> exposed to the kernel configuration system.  Increasing the queue count
> from 1K to 32K would allow “several hundred” to be replaced with “about ten
> thousand”.  That’s still not backbone-grade, but might be useful for a very
> small ISP to manage its backhaul, such as an apartment complex FTTP
> installation or a village initiative.
>

A few years back when reading about fq_Codel and Cake, one of the research
articles that I came across talked about how many flows are actually in a
buffer at any given time. They looked at the buffers of backbone links from
155Mb to 10Gb and they got the same numbers every time. While these links
may be servicing hundreds of thousands of active flows, at any given
instant there was fewer than 200 flows in the buffer, nearly all flows had
exactly one packet in the buffer, in the ballpark of 10 flows had 2 or more
packets in the buffer.

You could say the buffer follows the 80/20 rule. 20% of the flows in the
buffer comprise of 80% of the buffer. Regardless, the total number of flows
in the buffer is almost fixed. What was also interesting is the flows
consuming the majority of the buffer were always in flux. You would think
the same few flows that were consuming the buffer at one moment would
continue to, but that is not the case, TCP keeps them alternating.

When all is said and done, assuming your link is not horribly
buffer-bloated, and it shouldn't be in this discussion because we're
talking about fq_Codel/Cake, then there will probably be very little reason
to have 32k buckets, ever. Cake especially since it has "Ways".


>
>  - Jonathan Morton
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>

[-- Attachment #2: Type: text/html, Size: 6661 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-24 15:55           ` Benjamin Cronce
@ 2016-12-24 17:22             ` Jonathan Morton
  2016-12-24 21:15               ` Benjamin Cronce
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Morton @ 2016-12-24 17:22 UTC (permalink / raw)
  To: Benjamin Cronce; +Cc: Sebastian Moeller, cake

> On 24 Dec, 2016, at 17:55, Benjamin Cronce <bcronce@gmail.com> wrote:
> 
> What was also interesting is the flows consuming the majority of the buffer were always in flux. You would think the same few flows that were consuming the buffer at one moment would continue to, but that is not the case, TCP keeps them alternating.

That sounds like the links are not actually congested on average, and the flows which temporarily collect in the buffer are due to transitory bursts - which is what you’d expect from a competently-managed backbone.  A flow-isolating AQM doesn’t really help there, though Cake should be capable of scaling up to 10Gbps on a modern CPU.

Conversely, there have been well-publicised instances of congestion at peering points, which have had substantial impacts on performance.  I imagine the relevant flow counts there would be very much higher, definitely in the thousands.  Even well-managed networks occasionally experience congestion due to exceptional loads.

The workings of DRR++ are also somewhat more subtle than simply counting the flows instantaneously in the buffer.  Each queue has a deficit, and in Cake an empty queue is not normally released from tracking a particular flow until the deficit has been repaid (by cycling through all the other flows and probably servicing them) and decaying the AQM state to rest, which may often take long enough for another packet to arrive for that flow.

The number of active bulk flows can therefore exceed the number of packets actually in the queue.  This is especially true if the AQM is working optimally and keeping the queue almost empty on average.

While fq_codel does not explicitly assign queues to specific flows (ie. to avoid hash collisions), the effects of hash collisions are similarly felt under the same circumstances, resulting in the colliding flows failing to receive their theoretical fair share of the link, even if they never have packets physically in the queue at the same time.

With that said, both fq_codel and Cake should work okay with statistical multiplexing to handle exceptional flow counts.  In such cases, Cake’s triple-isolate feature should be turned off, by selecting either “hosts” or “flows” modes.  I could run an analysis to show how even the multiplexing should be.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-24 17:22             ` Jonathan Morton
@ 2016-12-24 21:15               ` Benjamin Cronce
  0 siblings, 0 replies; 14+ messages in thread
From: Benjamin Cronce @ 2016-12-24 21:15 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Sebastian Moeller, cake

[-- Attachment #1: Type: text/plain, Size: 3216 bytes --]

On Sat, Dec 24, 2016 at 11:22 AM, Jonathan Morton <chromatix99@gmail.com>
wrote:

>
> > On 24 Dec, 2016, at 17:55, Benjamin Cronce <bcronce@gmail.com> wrote:
> >
> > What was also interesting is the flows consuming the majority of the
> buffer were always in flux. You would think the same few flows that were
> consuming the buffer at one moment would continue to, but that is not the
> case, TCP keeps them alternating.
>
> That sounds like the links are not actually congested on average, and the
> flows which temporarily collect in the buffer are due to transitory bursts
> - which is what you’d expect from a competently-managed backbone.  A
> flow-isolating AQM doesn’t really help there, though Cake should be capable
> of scaling up to 10Gbps on a modern CPU.
>
> Conversely, there have been well-publicised instances of congestion at
> peering points, which have had substantial impacts on performance.  I
> imagine the relevant flow counts there would be very much higher,
> definitely in the thousands.  Even well-managed networks occasionally
> experience congestion due to exceptional loads.
>

At least in my experience, most of the issues of congested peering is just
bufferbloat. You can get a 10Gb switch with 4GiB of buffer, which is like 4
seconds of buffer. Nearly every time I see someone talking about
congestion, you see pings increasing by 200ms+, many times in the thousands
of milliseconds. A "congested" link should show maybe 50ms of latency
increase with an increase in packetloss, but everyone knows how bad loss is
and bloats the buffers.

My argument is that an unbloated buffer has very few states in the buffer
at any given time. I would like to see some numbers from fq_Codel or Cake
about actual unique states at any given moment.


>
> The workings of DRR++ are also somewhat more subtle than simply counting
> the flows instantaneously in the buffer.  Each queue has a deficit, and in
> Cake an empty queue is not normally released from tracking a particular
> flow until the deficit has been repaid (by cycling through all the other
> flows and probably servicing them) and decaying the AQM state to rest,
> which may often take long enough for another packet to arrive for that flow.
>
> The number of active bulk flows can therefore exceed the number of packets
> actually in the queue.  This is especially true if the AQM is working
> optimally and keeping the queue almost empty on average.
>
> While fq_codel does not explicitly assign queues to specific flows (ie. to
> avoid hash collisions), the effects of hash collisions are similarly felt
> under the same circumstances, resulting in the colliding flows failing to
> receive their theoretical fair share of the link, even if they never have
> packets physically in the queue at the same time.
>
> With that said, both fq_codel and Cake should work okay with statistical
> multiplexing to handle exceptional flow counts.  In such cases, Cake’s
> triple-isolate feature should be turned off, by selecting either “hosts” or
> “flows” modes.  I could run an analysis to show how even the multiplexing
> should be.
>
>  - Jonathan Morton
>
>

[-- Attachment #2: Type: text/html, Size: 3789 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Cake] upstreaming cake in 2017?
  2016-12-22 19:43 [Cake] upstreaming cake in 2017? Dave Taht
  2016-12-22 20:02 ` Sebastian Moeller
@ 2016-12-30  7:42 ` Y
  1 sibling, 0 replies; 14+ messages in thread
From: Y @ 2016-12-30  7:42 UTC (permalink / raw)
  To: Dave Taht, cake, Stephen Hemminger

Hi , I am yutaka.

I want to use cake codel without ECN.

bye bye :)

2016-12-22 (木) の 11:43 -0800 に Dave Taht さんは書きました:
> I think most of the reasons why cake could not be upstreamed are now
> on their way towards being resolved, and after lede ships, I can't
> think of any left to stop an
> upstreaming push.
> 
> Some reasons for not upstreaming were:
> 
> * Because the algorithms weren't stable enough
> * Because it wasn't feature complete until last month (denatting,
> triple-isolate, and a 3 tier sqm)
> * Because it had to work on embedded products going back to 3.12 or
> so
> * Because I was busy with make-wifi-fast - which we got upstream as
> soon as humanly possible.
> * Because it was gated on having the large tester base we have with
> lede (4.4 based)
> * Because it rather abuses the tc statistics tool to generate tons of
> stats
> * Because DSCP markings remain in flux at the ietf
> * We ignore the packet priority fields entirely
> * We don't know what diffserv models and ratios truly make sense
> 
> Anyone got more reasons not to upstream? Any more desirable features?
> 
> In looking over the sources today I see a couple issues:
> 
> * usage of  // comments and overlong lines
> * could just use constants for the diffserv lookup tables (I just
> pushed the
>    revised gen_cake_const.c file for the sqm mode, but didn't rip out
> the
>    relevant code in sch_cake). I note that several of my boxes have
> 64
> hw queues now
> * I would rather like to retire "precedence" entirely
> * cake cannot shape above 40Gbit (32 bit setting). Someday +40Gbit is
> possible
> * we could split gso segments at quantum rather than always
> * could use some profiling on x86, arm, and mips arches
> * Need long RTT tests and stuff that abuses cobalt features
> * Are we convinced the atm and overhead compensators are correct?
> * ipv6 nat?
> * ipsec recognition and prioritization?
> * I liked deprioritizing ping in sqm-scripts
> 
> Hardware mq is bugging me - a single queued version of cake on the
> root qdisc has much lower latency than a bql'd mq with cake on each
> queue and *almost* the same throughput.
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-12-30  7:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-22 19:43 [Cake] upstreaming cake in 2017? Dave Taht
2016-12-22 20:02 ` Sebastian Moeller
2016-12-23  1:43   ` Stephen Hemminger
2016-12-23  3:44     ` Jonathan Morton
2016-12-23  8:42       ` Sebastian Moeller
2016-12-23  9:53         ` Jonathan Morton
2016-12-23 12:40           ` Sebastian Moeller
2016-12-23 14:06             ` Jonathan Morton
2016-12-23 16:24               ` Sebastian Moeller
2016-12-23 17:01                 ` Dave Taht
2016-12-24 15:55           ` Benjamin Cronce
2016-12-24 17:22             ` Jonathan Morton
2016-12-24 21:15               ` Benjamin Cronce
2016-12-30  7:42 ` Y

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox