Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
* [Cake] Master branch updated
@ 2016-10-04  7:22 Jonathan Morton
  2016-10-04  8:46 ` moeller0
  2016-10-04 15:22 ` Kevin Darbyshire-Bryant
  0 siblings, 2 replies; 9+ messages in thread
From: Jonathan Morton @ 2016-10-04  7:22 UTC (permalink / raw)
  To: cake

I’ve just merged the NAT, PTM and Linux-4.8 compatibility stuff into the master branch of Cake.  It’s stable code and a definite improvement.

This frees up the Cobalt branch for more experimentation, such as the rewrite of triple-isolate that I also just pushed.  I found a way to make it more DRR-like, by simply scaling down the quantum used for each host by the number of flows attached to that host.

I still need to test whether it works as well as the old version, but it should at least be less CPU intensive.  In particular it should no longer require bursts of CPU activity when the host deficits expire, and host deficit expiry should no longer be explicitly synchronised.

See if, between you, you can break it before I get back from shopping.  :-)

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Cake] Master branch updated
  2016-10-04  7:22 [Cake] Master branch updated Jonathan Morton
@ 2016-10-04  8:46 ` moeller0
  2016-10-04 11:18   ` Jonathan Morton
  2016-10-04 15:22 ` Kevin Darbyshire-Bryant
  1 sibling, 1 reply; 9+ messages in thread
From: moeller0 @ 2016-10-04  8:46 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

Hi Jonathan,

> On Oct 4, 2016, at 09:22 , Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> I’ve just merged the NAT, PTM

	About that PTM accounting, could you explain why you want to perform the adjustment as a a “virtual” size increase per packet instead of a “virtual” rate reduction? The arguments for adjusting the rate are as follows:
1) You only need to adjust the rate if the rate was changed compared to for each packet, which should save CPU time and be more efficient.

2) For most users the rate (potentially expressed as bits per second) will be larger than the typical packet size (around 1540 Bytes) so adjusting the rate should introduce less rounding imprecision. To put this in numbers 1540 Byte equal 12.32 Kbit.

I am confident that you have good reasons for you implementation decisions, all I want is to learn what those are.

Best Regards
	Sebastian

P.S.: I realize that I am  looking like  a one-trick pony totally hung up on the overhead adjustment issue. I would prefer to let go, but I want to be certain that this is in good hands before I do this, as the value of doing these compensations IMHO depends on absolute meticulous attention (so needs any additional pair of eyes to peer over it).



> and Linux-4.8 compatibility stuff into the master branch of Cake.  It’s stable code and a definite improvement.
> 
> This frees up the Cobalt branch for more experimentation, such as the rewrite of triple-isolate that I also just pushed.  I found a way to make it more DRR-like, by simply scaling down the quantum used for each host by the number of flows attached to that host.
> 
> I still need to test whether it works as well as the old version, but it should at least be less CPU intensive.  In particular it should no longer require bursts of CPU activity when the host deficits expire, and host deficit expiry should no longer be explicitly synchronised.
> 
> See if, between you, you can break it before I get back from shopping.  :-)
> 
> - Jonathan Morton
> 
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Cake] Master branch updated
  2016-10-04  8:46 ` moeller0
@ 2016-10-04 11:18   ` Jonathan Morton
  2016-10-04 11:54     ` moeller0
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Morton @ 2016-10-04 11:18 UTC (permalink / raw)
  To: moeller0; +Cc: cake


> On 4 Oct, 2016, at 11:46, moeller0 <moeller0@gmx.de> wrote:
> 
> About that PTM accounting, could you explain why you want to perform the adjustment as a a “virtual” size increase per packet instead of a “virtual” rate reduction?

The shaper works by calculating the time occupied by each packet on the wire, and advancing a virtual clock in step with a continuous stream of packets.

The time occupation, in turn, is calculated as the number of bytes which appear on the wire divided by the number of bytes that wire can pass per second.  As an optimisation, the division is turned into a multiplication by the reciprocal.

I’m quite keen to keep the “bytes per second” purely derived from the raw bitrate of the link, because that is the value widely advertised by ISPs and network equipment manufacturers everywhere.  Hence, overhead compensation is implemented purely by increasing the accounted size of the packets.

I have been careful here to calculate ceil(len * 65/64) here, so that the overhead is never underestimated.  For example, a 1500-byte IP packet becomes 1519 with bridged PTM or 1527 with PPPoE over PTM, before the PTM calculation itself.  These both round up to 1536 before division, so 24 more bytes will be added in both cases.

This is less than 2 bits more than actually required (on average), so wastes less than 1/6200 of the bandwidth when full-sized packets dominate the link (as is the usual case).  Users are unlikely to notice this in practice.

Next to all the other stuff Cake does for each packet, the overhead compensation is extremely quick.  And, although the code looks very similar, the PTM compensation is faster than the ATM compensation, because the factor involved is a power of two (which GCC is very good at optimising into shifts and masks).  This is fortunate, since PTM is typically used on higher-bandwidth links than ATM.

Now, if you can show me that the above is in fact incorrect - that significant bandwidth is wasted on some real traffic profile, or that cake_overhead() figures highly in a CPU profile on real hardware - then I will reconsider.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Cake] Master branch updated
  2016-10-04 11:18   ` Jonathan Morton
@ 2016-10-04 11:54     ` moeller0
  2016-10-04 16:23       ` Loganaden Velvindron
  0 siblings, 1 reply; 9+ messages in thread
From: moeller0 @ 2016-10-04 11:54 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

Hi Jonathan,

> On Oct 4, 2016, at 13:18 , Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> 
>> On 4 Oct, 2016, at 11:46, moeller0 <moeller0@gmx.de> wrote:
>> 
>> About that PTM accounting, could you explain why you want to perform the adjustment as a a “virtual” size increase per packet instead of a “virtual” rate reduction?
> 
> The shaper works by calculating the time occupied by each packet on the wire, and advancing a virtual clock in step with a continuous stream of packets.
> 
> The time occupation, in turn, is calculated as the number of bytes which appear on the wire divided by the number of bytes that wire can pass per second.  As an optimisation, the division is turned into a multiplication by the reciprocal.

	Okay, but that seems not really relevant to the topic at hand as in PTM systems the effective payload-rate is 64/65 of the raw bit rate. The 65th byte is independent of the actual packet size sent so theoretically better modeled as a rate reduction than as a size increase, even though in essence for a shaper you can account for it either way.

> 
> I’m quite keen to keep the “bytes per second” purely derived from the raw bitrate of the link, because that is the value widely advertised by ISPs and network equipment manufacturers everywhere.  Hence, overhead compensation is implemented purely by increasing the accounted size of the packets.

	Sorry that does not make much sense, I realize that mathematically they are interchangeable, but that does not make them the same IMHO. Per packet overhead needs to be accounted on a per-packet basis you have no other real option (unless you work with a fixed packet length), but generic rate reductions do not need to be recomputed for each packet.


> 
> I have been careful here to calculate ceil(len * 65/64) here, so that the overhead is never underestimated.

	Which is Jonathanese for might be overestimated, so you at least agree with my point about the precision of the accounting being relevant. As I proposed in on of my comments “floor(shaper_rate * 64/64)” has the same property of being conservative, only with a lower possible error.

>  For example, a 1500-byte IP packet becomes 1519 with bridged PTM or 1527 with PPPoE over PTM, before the PTM calculation itself.  These both round up to 1536 before division, so 24 more bytes will be added in both cases.

	That is not one of the arguments I have made, but thanks for pointing that out.

> 
> This is less than 2 bits more than actually required (on average), so wastes less than 1/6200 of the bandwidth when full-sized packets dominate the link (as is the usual case).  Users are unlikely to notice this in practice.

	Erm, VoIP packets are not close to full MTU so I am not sure whether “as is the usual case” is very convincing. Actually your tendency to always “wing it” instead of doing research as shown when you claimed 64/65 bit encoding for PTM instead of looking into the relevant standards (which I had to cite twice to make you at least fix that misconception) does not not fill me with confidence about those parts of cake where I do have zero expertise.

> 
> Next to all the other stuff Cake does for each packet, the overhead compensation is extremely quick.  And, although the code looks very similar, the PTM compensation is faster than the ATM compensation, because the factor involved is a power of two (which GCC is very good at optimising into shifts and masks).  This is fortunate, since PTM is typically used on higher-bandwidth links than ATM.

	I venture a guess that I have forgotten more about ATM/PTM ADSL/VDSL than you ever bothered to read up on, so why do you keep telling me these observations? If the goal is to annoy me, then mission accomplished.

> 
> Now, if you can show me that the above is in fact incorrect - that significant bandwidth is wasted on some real traffic profile, or that cake_overhead() figures highly in a CPU profile on real hardware - then I will reconsider.

	And that is great fun, the guy (you) that most often argues from first principle instead of from real world data, requests actual data in one of the cases where first principle seems quite applicable: when an operation can be (almost) completely avoided.

	But I guess we just keep it as in the past; you keep not fully grasping the intricacies of different XDSL/DOCSIS encodings and I keep ridiculing you for the demonstrated lack of love to detail in these matters. 
	In the past I sometimes wondered whether I did anything to offend you by voicing my concerns in too brash or impolite way, but now I simply assume that you (like most of us) simply do not react well to criticism (even if justified) and prefer to just harass the messenger.

> 
> - Jonathan Morton
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Cake] Master branch updated
  2016-10-04  7:22 [Cake] Master branch updated Jonathan Morton
  2016-10-04  8:46 ` moeller0
@ 2016-10-04 15:22 ` Kevin Darbyshire-Bryant
  2016-10-04 16:28   ` Jonathan Morton
  1 sibling, 1 reply; 9+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-10-04 15:22 UTC (permalink / raw)
  To: cake



On 04/10/16 08:22, Jonathan Morton wrote:
> I’ve just merged the NAT, PTM and Linux-4.8 compatibility stuff into the master branch of Cake.  It’s stable code and a definite improvement.
>
> This frees up the Cobalt branch for more experimentation, such as the rewrite of triple-isolate that I also just pushed.  I found a way to make it more DRR-like, by simply scaling down the quantum used for each host by the number of flows attached to that host.
>
> I still need to test whether it works as well as the old version, but it should at least be less CPU intensive.  In particular it should no longer require bursts of CPU activity when the host deficits expire, and host deficit expiry should no longer be explicitly synchronised.
>
> See if, between you, you can break it before I get back from shopping.  :-)

Ha ha!  I don't know if you're back from shopping yet...and I'm not sure 
that I've broken it (cobalt branch)...but it has broken my router!

Beyond 'spontaneously reboots' as part of sqm-scripts instantiating the 
shaper I can't offer any more info.  Archer C7 v2.  Reverting to master 
is ok.  It's not even as if I use 'triple-isolate' *but* I know that 
section of code is used by 'dual-src/dsthost' anyway.

Sorry can't offer more info...away from home a lot and no serial console 
access.

Kevin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Cake] Master branch updated
  2016-10-04 11:54     ` moeller0
@ 2016-10-04 16:23       ` Loganaden Velvindron
  0 siblings, 0 replies; 9+ messages in thread
From: Loganaden Velvindron @ 2016-10-04 16:23 UTC (permalink / raw)
  To: moeller0; +Cc: Jonathan Morton, cake

On Tue, Oct 4, 2016 at 3:54 PM, moeller0 <moeller0@gmx.de> wrote:
> Hi Jonathan,
>
>> On Oct 4, 2016, at 13:18 , Jonathan Morton <chromatix99@gmail.com> wrote:
>>
>>
>>> On 4 Oct, 2016, at 11:46, moeller0 <moeller0@gmx.de> wrote:
>>>
>>> About that PTM accounting, could you explain why you want to perform the adjustment as a a “virtual” size increase per packet instead of a “virtual” rate reduction?
>>
>> The shaper works by calculating the time occupied by each packet on the wire, and advancing a virtual clock in step with a continuous stream of packets.
>>
>> The time occupation, in turn, is calculated as the number of bytes which appear on the wire divided by the number of bytes that wire can pass per second.  As an optimisation, the division is turned into a multiplication by the reciprocal.
>
>         Okay, but that seems not really relevant to the topic at hand as in PTM systems the effective payload-rate is 64/65 of the raw bit rate. The 65th byte is independent of the actual packet size sent so theoretically better modeled as a rate reduction than as a size increase, even though in essence for a shaper you can account for it either way.
>
>>
>> I’m quite keen to keep the “bytes per second” purely derived from the raw bitrate of the link, because that is the value widely advertised by ISPs and network equipment manufacturers everywhere.  Hence, overhead compensation is implemented purely by increasing the accounted size of the packets.
>
>         Sorry that does not make much sense, I realize that mathematically they are interchangeable, but that does not make them the same IMHO. Per packet overhead needs to be accounted on a per-packet basis you have no other real option (unless you work with a fixed packet length), but generic rate reductions do not need to be recomputed for each packet.
>
>
>>
>> I have been careful here to calculate ceil(len * 65/64) here, so that the overhead is never underestimated.
>
>         Which is Jonathanese for might be overestimated, so you at least agree with my point about the precision of the accounting being relevant. As I proposed in on of my comments “floor(shaper_rate * 64/64)” has the same property of being conservative, only with a lower possible error.
>
>>  For example, a 1500-byte IP packet becomes 1519 with bridged PTM or 1527 with PPPoE over PTM, before the PTM calculation itself.  These both round up to 1536 before division, so 24 more bytes will be added in both cases.
>
>         That is not one of the arguments I have made, but thanks for pointing that out.
>
>>
>> This is less than 2 bits more than actually required (on average), so wastes less than 1/6200 of the bandwidth when full-sized packets dominate the link (as is the usual case).  Users are unlikely to notice this in practice.
>
>         Erm, VoIP packets are not close to full MTU so I am not sure whether “as is the usual case” is very convincing. Actually your tendency to always “wing it” instead of doing research as shown when you claimed 64/65 bit encoding for PTM instead of looking into the relevant standards (which I had to cite twice to make you at least fix that misconception) does not not fill me with confidence about those parts of cake where I do have zero expertise.
>
>>
>> Next to all the other stuff Cake does for each packet, the overhead compensation is extremely quick.  And, although the code looks very similar, the PTM compensation is faster than the ATM compensation, because the factor involved is a power of two (which GCC is very good at optimising into shifts and masks).  This is fortunate, since PTM is typically used on higher-bandwidth links than ATM.
>
>         I venture a guess that I have forgotten more about ATM/PTM ADSL/VDSL than you ever bothered to read up on, so why do you keep telling me these observations? If the goal is to annoy me, then mission accomplished.
>
>>
>> Now, if you can show me that the above is in fact incorrect - that significant bandwidth is wasted on some real traffic profile, or that cake_overhead() figures highly in a CPU profile on real hardware - then I will reconsider.
>
>         And that is great fun, the guy (you) that most often argues from first principle instead of from real world data, requests actual data in one of the cases where first principle seems quite applicable: when an operation can be (almost) completely avoided.
>
>         But I guess we just keep it as in the past; you keep not fully grasping the intricacies of different XDSL/DOCSIS encodings and I keep ridiculing you for the demonstrated lack of love to detail in these matters.
>         In the past I sometimes wondered whether I did anything to offend you by voicing my concerns in too brash or impolite way, but now I simply assume that you (like most of us) simply do not react well to criticism (even if justified) and prefer to just harass the messenger.
>
>>
>> - Jonathan Morton
>>
>

Perhaps a good way to move forward on this might be to have some kind
of hackathon, and have a face to face discussion so that we could move
on ?


> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Cake] Master branch updated
  2016-10-04 15:22 ` Kevin Darbyshire-Bryant
@ 2016-10-04 16:28   ` Jonathan Morton
  2016-10-11  5:41     ` Jonathan Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Morton @ 2016-10-04 16:28 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake


> On 4 Oct, 2016, at 18:22, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> Ha ha!  I don't know if you're back from shopping yet...and I'm not sure that I've broken it (cobalt branch)...but it has broken my router!

Hmm.  It’s been running all day with plenty of traffic over here - but it did crash the very first time I loaded it, just not the second.  I will need to exercise it some more, preferably on a non-critical machine.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Cake] Master branch updated
  2016-10-04 16:28   ` Jonathan Morton
@ 2016-10-11  5:41     ` Jonathan Morton
  2016-10-11 12:09       ` Luis E. Garcia
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Morton @ 2016-10-11  5:41 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake


> On 4 Oct, 2016, at 19:28, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
>> Ha ha!  I don't know if you're back from shopping yet...and I'm not sure that I've broken it (cobalt branch)...but it has broken my router!
> 
> Hmm.  It’s been running all day with plenty of traffic over here - but it did crash the very first time I loaded it, just not the second.  I will need to exercise it some more, preferably on a non-critical machine.

Okay, that bug is fixed and I’ve made further improvements to the triple-isolate algorithm.  It no longer needs quite as much spaghetti logic in the fast path, and might even be easier to understand from reading the code, since it’s now more obviously a modification of DRR++ rather than a brute-force wrapper around it.  It should certainly give smoother behaviour and be less CPU intensive in common cases.

In brief, what I now do is to scale the *flow* quantum down by the higher of the two hosts’ flow counts.  I’ve even dealt with underflow of the quotient using a dithering mechanism, which should also ensure that flows random-walk out of lockstep with each other.

It works sufficiently well that I was able to set Cake to 2.5Mbit besteffort triple-isolate, then watch a 720p YouTube video on one machine while another was downloading a game update using a 30-flow swarm.  I’d call that a success.

Hammer away at it, and then we’ll see if we can merge it up to master.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Cake] Master branch updated
  2016-10-11  5:41     ` Jonathan Morton
@ 2016-10-11 12:09       ` Luis E. Garcia
  0 siblings, 0 replies; 9+ messages in thread
From: Luis E. Garcia @ 2016-10-11 12:09 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Kevin Darbyshire-Bryant, cake

[-- Attachment #1: Type: text/plain, Size: 1892 bytes --]

If we can replicate the results of your test then I would say we're onto
something.

On Monday, 10 October 2016, Jonathan Morton <chromatix99@gmail.com> wrote:

>
> > On 4 Oct, 2016, at 19:28, Jonathan Morton <chromatix99@gmail.com
> <javascript:;>> wrote:
> >
> >> Ha ha!  I don't know if you're back from shopping yet...and I'm not
> sure that I've broken it (cobalt branch)...but it has broken my router!
> >
> > Hmm.  It’s been running all day with plenty of traffic over here - but
> it did crash the very first time I loaded it, just not the second.  I will
> need to exercise it some more, preferably on a non-critical machine.
>
> Okay, that bug is fixed and I’ve made further improvements to the
> triple-isolate algorithm.  It no longer needs quite as much spaghetti logic
> in the fast path, and might even be easier to understand from reading the
> code, since it’s now more obviously a modification of DRR++ rather than a
> brute-force wrapper around it.  It should certainly give smoother behaviour
> and be less CPU intensive in common cases.
>
> In brief, what I now do is to scale the *flow* quantum down by the higher
> of the two hosts’ flow counts.  I’ve even dealt with underflow of the
> quotient using a dithering mechanism, which should also ensure that flows
> random-walk out of lockstep with each other.
>
> It works sufficiently well that I was able to set Cake to 2.5Mbit
> besteffort triple-isolate, then watch a 720p YouTube video on one machine
> while another was downloading a game update using a 30-flow swarm.  I’d
> call that a success.
>
> Hammer away at it, and then we’ll see if we can merge it up to master.
>
>  - Jonathan Morton
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net <javascript:;>
> https://lists.bufferbloat.net/listinfo/cake
>

[-- Attachment #2: Type: text/html, Size: 2379 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-10-11 12:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-04  7:22 [Cake] Master branch updated Jonathan Morton
2016-10-04  8:46 ` moeller0
2016-10-04 11:18   ` Jonathan Morton
2016-10-04 11:54     ` moeller0
2016-10-04 16:23       ` Loganaden Velvindron
2016-10-04 15:22 ` Kevin Darbyshire-Bryant
2016-10-04 16:28   ` Jonathan Morton
2016-10-11  5:41     ` Jonathan Morton
2016-10-11 12:09       ` Luis E. Garcia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox