[Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck

Cake - FQ_codel the next generation
 help / color / mirror / Atom feed

* [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck
@ 2021-12-03 22:27 Dave Taht
  2021-12-04  0:09 ` Jonathan Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Taht @ 2021-12-03 22:27 UTC (permalink / raw)
  To: jonathan.kua, Cake List

this was really good:

https://jonathankua.github.io/preprints/jkua-ieeelcn2021_understanding_ar_preprint-20jul2021.pdf

I would love it if somehow the measured effects of chunklets against
cake's per-host/per flow fq was examined one day.

(I can't remember if I'd made this comment before)

-- 
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org

Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck
  2021-12-03 22:27 [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck Dave Taht
@ 2021-12-04  0:09 ` Jonathan Morton
  2021-12-04 18:44   ` Dave Taht
  2021-12-04 23:01   ` David P. Reed
  0 siblings, 2 replies; 8+ messages in thread
From: Jonathan Morton @ 2021-12-04  0:09 UTC (permalink / raw)
  To: Dave Taht; +Cc: jonathan.kua, Cake List

> On 4 Dec, 2021, at 12:27 am, Dave Taht <dave.taht@gmail.com> wrote:
> 
> https://jonathankua.github.io/preprints/jkua-ieeelcn2021_understanding_ar_preprint-20jul2021.pdf
> 
> I would love it if somehow the measured effects of chunklets against cake's per-host/per flow fq was examined one day.

I haven't actually measured it, but based on what the above paper says, I can make some firm predictions:

1: When competing against traffic to the same local host, the performance effects they describe will be present.

2: When competing against traffic to a different local-network host, the performance effects they describe will be attenuated or even entirely absent.

3: They noted one or two cases of observable effects of hash collisions in their tests with FQ-Codel.  These will be greatly reduced in prevalence with Cake, due to the set-associative hash function which specifically addresses that phenomenon.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck
  2021-12-04  0:09 ` Jonathan Morton
@ 2021-12-04 18:44   ` Dave Taht
  2021-12-04 22:29     ` David P. Reed
  2021-12-04 23:01   ` David P. Reed
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Taht @ 2021-12-04 18:44 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: jonathan.kua, Cake List

It was the conquest tool they referenced that really caught my eye

https://www.youtube.com/watch?v=Q3FFzB0SUjc

"ConQuest: Fine-Grained Queue Measurement in the Data Plane"

On Fri, Dec 3, 2021 at 4:09 PM Jonathan Morton <chromatix99@gmail.com> wrote:
>
> > On 4 Dec, 2021, at 12:27 am, Dave Taht <dave.taht@gmail.com> wrote:
> >
> > https://jonathankua.github.io/preprints/jkua-ieeelcn2021_understanding_ar_preprint-20jul2021.pdf
> >
> > I would love it if somehow the measured effects of chunklets against cake's per-host/per flow fq was examined one day.
>
> I haven't actually measured it, but based on what the above paper says, I can make some firm predictions:
>
> 1: When competing against traffic to the same local host, the performance effects they describe will be present.
>
> 2: When competing against traffic to a different local-network host, the performance effects they describe will be attenuated or even entirely absent.
>
> 3: They noted one or two cases of observable effects of hash collisions in their tests with FQ-Codel.  These will be greatly reduced in prevalence with Cake, due to the set-associative hash function which specifically addresses that phenomenon.
>
>  - Jonathan Morton



-- 
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org

Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck
  2021-12-04 18:44   ` Dave Taht
@ 2021-12-04 22:29     ` David P. Reed
  2021-12-05  1:20       ` Dave Taht
  0 siblings, 1 reply; 8+ messages in thread
From: David P. Reed @ 2021-12-04 22:29 UTC (permalink / raw)
  To: Dave Taht; +Cc: Jonathan Morton, jonathan.kua, Cake List

[-- Attachment #1: Type: text/plain, Size: 3055 bytes --]

I just watched it. His assumption that "carrier networks can't solve the problem because they can't control the hosts" is JUST WRONG!

The Internet solution is to require the flows' source hosts to regulate their transmission based on dynamic feedback.

And this ignorance on his part is clearly his advisors' fault.

The pattern here is:

I make assumption that rules out better solutions.

I then invent some complicated kludge "inside the network" and claim it solves the problem.

Then I demand that networks put this kludge into the network.

In other words, he takes an end-to-end problem (regulating source rates to achive low internal queue delay), and instead of implementing a solution at the ends, he adds much more complexity inside the network.

Violating the whole end-to-end argument.

Or, simplifying the point: "we have smarts in the routers, that we aren't using, so let's invent something to use them, even though there are better solutions."

Yuck!

This is how we ended up with CISC computers, with operating systems that shove huge amounts of function into protected mode with heavy use of shared global variables protected by complicated locks.

OK, this creates the need for complicated PhD theses where the coolness is how complicated the code was to get working.

On Saturday, December 4, 2021 1:44pm, "Dave Taht" <dave.taht@gmail.com> said:

> It was the conquest tool they referenced that really caught my eye
> 
> https://www.youtube.com/watch?v=Q3FFzB0SUjc
> 
> "ConQuest: Fine-Grained Queue Measurement in the Data Plane"
> 
> On Fri, Dec 3, 2021 at 4:09 PM Jonathan Morton <chromatix99@gmail.com>
> wrote:
> >
> > > On 4 Dec, 2021, at 12:27 am, Dave Taht <dave.taht@gmail.com>
> wrote:
> > >
> > >
> https://jonathankua.github.io/preprints/jkua-ieeelcn2021_understanding_ar_preprint-20jul2021.pdf
> > >
> > > I would love it if somehow the measured effects of chunklets against
> cake's per-host/per flow fq was examined one day.
> >
> > I haven't actually measured it, but based on what the above paper says, I can
> make some firm predictions:
> >
> > 1: When competing against traffic to the same local host, the performance
> effects they describe will be present.
> >
> > 2: When competing against traffic to a different local-network host, the
> performance effects they describe will be attenuated or even entirely absent.
> >
> > 3: They noted one or two cases of observable effects of hash collisions in
> their tests with FQ-Codel. These will be greatly reduced in prevalence with Cake,
> due to the set-associative hash function which specifically addresses that
> phenomenon.
> >
> > - Jonathan Morton
> 
> 
> 
> --
> I tried to build a better future, a few times:
> https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
> 
> Dave Täht CEO, TekLibre, LLC
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
> 

[-- Attachment #2: Type: text/html, Size: 6549 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck
  2021-12-04 22:29     ` David P. Reed
@ 2021-12-05  1:20       ` Dave Taht
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Taht @ 2021-12-05  1:20 UTC (permalink / raw)
  To: David P. Reed; +Cc: Jonathan Morton, jonathan.kua, Cake List

On Sat, Dec 4, 2021 at 2:30 PM David P. Reed <dpreed@deepplum.com> wrote:
>
> I just watched it. His assumption that "carrier networks can't solve the problem because they can't control the hosts" is JUST WRONG!

Step 1: Understand that microbursts exist: Win
Step 2: Find a way to measure them at 100Gbit scale using the tofino
switch and a viable data structure: Big win
Step 3: Find a way to use that measurement in that switch to "do
something":: ok, you object to this
Step 4: Publish at conext: Normal

Alternate step 3: Being able to leverage this tool to gain data about
e2e behavior at these scales and make better applications
is a huge win. From looking at the before sawtooth in what they did,
they had a brick wall ecn configuration going against RFC3168 flows,
and originally marked all packets over the threshold, leveraging
RFC3168's non-response for more than one packet marked per RTT.

The graph missing for me (and perhaps I should look again) was the
effect of that, vs not marking at all, or marking more smartly, as
they did.

The second question for me has always been to what extent ECN of any
form is being used in the datacenter today, in RFC3168 semi-compliant
ways like this. It is,
essentially, half on by default....


>
>
> The Internet solution is to require the flows' source hosts to regulate their transmission based on dynamic feedback.

From where? tcp timestamps aren't granular enough.

>
>
>
> And this ignorance on his part is clearly his advisors' fault.
>
>
>
> The pattern here is:
>
>
>
> I make assumption that rules out better solutions.
>
>
>
> I then invent some complicated kludge "inside the network" and claim it solves the problem.
>
>
>
> Then I demand that networks put this kludge into the network.
>
>
>
> In other words, he takes an end-to-end problem (regulating source rates to achive low internal queue delay), and instead of implementing a solution at the ends, he adds much more complexity inside the network.
>
>
>
> Violating the whole end-to-end argument.
>
>
>
> Or, simplifying the point: "we have smarts in the routers, that we aren't using, so let's invent something to use them, even though there are better solutions."
>
>
>
> Yuck!
>
>
>
> This is how we ended up with CISC computers, with operating systems that shove huge amounts of function into protected mode with heavy use of shared global variables protected by complicated locks.
>
>
>
> OK, this creates the need for complicated PhD theses where the coolness is how complicated the code was to get working.
>
>
>
>
>
>
>
> On Saturday, December 4, 2021 1:44pm, "Dave Taht" <dave.taht@gmail.com> said:
>
> > It was the conquest tool they referenced that really caught my eye
> >
> > https://www.youtube.com/watch?v=Q3FFzB0SUjc
> >
> > "ConQuest: Fine-Grained Queue Measurement in the Data Plane"
> >
> > On Fri, Dec 3, 2021 at 4:09 PM Jonathan Morton <chromatix99@gmail.com>
> > wrote:
> > >
> > > > On 4 Dec, 2021, at 12:27 am, Dave Taht <dave.taht@gmail.com>
> > wrote:
> > > >
> > > >
> > https://jonathankua.github.io/preprints/jkua-ieeelcn2021_understanding_ar_preprint-20jul2021.pdf
> > > >
> > > > I would love it if somehow the measured effects of chunklets against
> > cake's per-host/per flow fq was examined one day.
> > >
> > > I haven't actually measured it, but based on what the above paper says, I can
> > make some firm predictions:
> > >
> > > 1: When competing against traffic to the same local host, the performance
> > effects they describe will be present.
> > >
> > > 2: When competing against traffic to a different local-network host, the
> > performance effects they describe will be attenuated or even entirely absent.
> > >
> > > 3: They noted one or two cases of observable effects of hash collisions in
> > their tests with FQ-Codel. These will be greatly reduced in prevalence with Cake,
> > due to the set-associative hash function which specifically addresses that
> > phenomenon.
> > >
> > > - Jonathan Morton
> >
> >
> >
> > --
> > I tried to build a better future, a few times:
> > https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
> >
> > Dave Täht CEO, TekLibre, LLC
> > _______________________________________________
> > Cake mailing list
> > Cake@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cake
> >



-- 
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org

Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck
  2021-12-04  0:09 ` Jonathan Morton
  2021-12-04 18:44   ` Dave Taht
@ 2021-12-04 23:01   ` David P. Reed
  2021-12-05  1:24     ` Dave Taht
  1 sibling, 1 reply; 8+ messages in thread
From: David P. Reed @ 2021-12-04 23:01 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Dave Taht, jonathan.kua, Cake List

[-- Attachment #1: Type: text/plain, Size: 2131 bytes --]

I agree with your broad assessment, Jonathan.

The self-interference problem within a host isn't just a network problem. It's a user-space scheduler problem as well.

There are lots of interactions between user-space scheduler (in the case of Linux, the "Completely Fair Scheduler" and its quantum, which is set by the HZ variable at boot) and the network stack in the kernel. This interactions have non-trivial effects when mutliple flows are independently created by concurrent processes).

Lately, I've been studying, for reasons related to my day job, the complex interactions of timing at sub-millisecond scale among threads and processes on a single system in Linux. I/O driven by threads become highly correlated, and so assuming "independence" among flow timing  is just not a good assumption.

The paper observes the results of "dependencies" that couple/resonate.

On Friday, December 3, 2021 7:09pm, "Jonathan Morton" <chromatix99@gmail.com> said:

> > On 4 Dec, 2021, at 12:27 am, Dave Taht <dave.taht@gmail.com> wrote:
> >
> >
> https://jonathankua.github.io/preprints/jkua-ieeelcn2021_understanding_ar_preprint-20jul2021.pdf
> >
> > I would love it if somehow the measured effects of chunklets against cake's
> per-host/per flow fq was examined one day.
> 
> I haven't actually measured it, but based on what the above paper says, I can make
> some firm predictions:
> 
> 1: When competing against traffic to the same local host, the performance effects
> they describe will be present.
> 
> 2: When competing against traffic to a different local-network host, the
> performance effects they describe will be attenuated or even entirely absent.
> 
> 3: They noted one or two cases of observable effects of hash collisions in their
> tests with FQ-Codel. These will be greatly reduced in prevalence with Cake, due
> to the set-associative hash function which specifically addresses that phenomenon.
> 
> - Jonathan Morton
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
> 

[-- Attachment #2: Type: text/html, Size: 3634 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck
  2021-12-04 23:01   ` David P. Reed
@ 2021-12-05  1:24     ` Dave Taht
  2021-12-07 20:04       ` David P. Reed
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Taht @ 2021-12-05  1:24 UTC (permalink / raw)
  To: David P. Reed; +Cc: Jonathan Morton, jonathan.kua, Cake List

I too have been trying to get below 1ms (heck, 3ms) precision or at
least, resolution. I came up with the most promising thing I can think
of for
interactions in a multithreaded environment yet, I think. glibc has
long mapped the kernel clock page into process memory, so i was
thinking
(hoping) that mmaping that on top of itself a zillion times and using
that as my test data source for writes out across the network I'd get
some
really fine grained insights.

Haven't got around to it yet.

On Sat, Dec 4, 2021 at 3:01 PM David P. Reed <dpreed@deepplum.com> wrote:
>
> I agree with your broad assessment, Jonathan.
>
>
>
> The self-interference problem within a host isn't just a network problem. It's a user-space scheduler problem as well.
>
>
>
> There are lots of interactions between user-space scheduler (in the case of Linux, the "Completely Fair Scheduler" and its quantum, which is set by the HZ variable at boot) and the network stack in the kernel. This interactions have non-trivial effects when mutliple flows are independently created by concurrent processes).
>
>
>
> Lately, I've been studying, for reasons related to my day job, the complex interactions of timing at sub-millisecond scale among threads and processes on a single system in Linux. I/O driven by threads become highly correlated, and so assuming "independence" among flow timing  is just not a good assumption.
>
>
>
> The paper observes the results of "dependencies" that couple/resonate.
>
>
>
> On Friday, December 3, 2021 7:09pm, "Jonathan Morton" <chromatix99@gmail.com> said:
>
> > > On 4 Dec, 2021, at 12:27 am, Dave Taht <dave.taht@gmail.com> wrote:
> > >
> > >
> > https://jonathankua.github.io/preprints/jkua-ieeelcn2021_understanding_ar_preprint-20jul2021.pdf
> > >
> > > I would love it if somehow the measured effects of chunklets against cake's
> > per-host/per flow fq was examined one day.
> >
> > I haven't actually measured it, but based on what the above paper says, I can make
> > some firm predictions:
> >
> > 1: When competing against traffic to the same local host, the performance effects
> > they describe will be present.
> >
> > 2: When competing against traffic to a different local-network host, the
> > performance effects they describe will be attenuated or even entirely absent.
> >
> > 3: They noted one or two cases of observable effects of hash collisions in their
> > tests with FQ-Codel. These will be greatly reduced in prevalence with Cake, due
> > to the set-associative hash function which specifically addresses that phenomenon.
> >
> > - Jonathan Morton
> > _______________________________________________
> > Cake mailing list
> > Cake@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cake
> >



-- 
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org

Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck
  2021-12-05  1:24     ` Dave Taht
@ 2021-12-07 20:04       ` David P. Reed
  0 siblings, 0 replies; 8+ messages in thread
From: David P. Reed @ 2021-12-07 20:04 UTC (permalink / raw)
  To: Dave Taht; +Cc: Jonathan Morton, jonathan.kua, Cake List

[-- Attachment #1: Type: text/plain, Size: 4535 bytes --]


There are lots of easy ways to (using Intel standard CPUs) get *submicrosecond* precision on clock times.
 
In TidalScale where I work, our hyperkernel (hypervisor) synchronizes the TSC (which has about 0.3 nsec resolution) between 10 GigE switch connected servers to an accuracy of maybe 100 nsec. We use a custom algorithm I designed, but the Precision Time Protocol standard easily gets sub-microsecond accurate measurements.
 
Now if you aren't measuring in a hypervisor, or very low level Linux kernel stack, I suggest that using DPDK (another Intel goodie that works on Linux) lets you do a lot in userspace code - including doing the whole IP stack in userspace.
 
I understand that a lot of Cake usage is about non-Intel, low-end consumer router processors, and in those it is really hard to time anything (and the scheduler itself is often struggling to schedule stuff predictably), but not every idea where measurement leads to significant latency improvement has to be initially explored on those low end consumer MIPS and ARM chips.
 
There are a number of single board computers that have at least two GigE ports and are Intel-64 Celerons or better that have good nanosecond TSC clocks, PCIe slots that support high-end wifi, etc.
 
I can recommend some...
 
 
On Saturday, December 4, 2021 8:24pm, "Dave Taht" <dave.taht@gmail.com> said:



> I too have been trying to get below 1ms (heck, 3ms) precision or at
> least, resolution. I came up with the most promising thing I can think
> of for
> interactions in a multithreaded environment yet, I think. glibc has
> long mapped the kernel clock page into process memory, so i was
> thinking
> (hoping) that mmaping that on top of itself a zillion times and using
> that as my test data source for writes out across the network I'd get
> some
> really fine grained insights.
> 
> Haven't got around to it yet.
> 
> On Sat, Dec 4, 2021 at 3:01 PM David P. Reed <dpreed@deepplum.com> wrote:
> >
> > I agree with your broad assessment, Jonathan.
> >
> >
> >
> > The self-interference problem within a host isn't just a network problem.
> It's a user-space scheduler problem as well.
> >
> >
> >
> > There are lots of interactions between user-space scheduler (in the case of
> Linux, the "Completely Fair Scheduler" and its quantum, which is set by the HZ
> variable at boot) and the network stack in the kernel. This interactions have
> non-trivial effects when mutliple flows are independently created by concurrent
> processes).
> >
> >
> >
> > Lately, I've been studying, for reasons related to my day job, the complex
> interactions of timing at sub-millisecond scale among threads and processes on a
> single system in Linux. I/O driven by threads become highly correlated, and so
> assuming "independence" among flow timing is just not a good assumption.
> >
> >
> >
> > The paper observes the results of "dependencies" that couple/resonate.
> >
> >
> >
> > On Friday, December 3, 2021 7:09pm, "Jonathan Morton"
> <chromatix99@gmail.com> said:
> >
> > > > On 4 Dec, 2021, at 12:27 am, Dave Taht <dave.taht@gmail.com>
> wrote:
> > > >
> > > >
> > >
> https://jonathankua.github.io/preprints/jkua-ieeelcn2021_understanding_ar_preprint-20jul2021.pdf
> > > >
> > > > I would love it if somehow the measured effects of chunklets
> against cake's
> > > per-host/per flow fq was examined one day.
> > >
> > > I haven't actually measured it, but based on what the above paper says,
> I can make
> > > some firm predictions:
> > >
> > > 1: When competing against traffic to the same local host, the
> performance effects
> > > they describe will be present.
> > >
> > > 2: When competing against traffic to a different local-network host,
> the
> > > performance effects they describe will be attenuated or even entirely
> absent.
> > >
> > > 3: They noted one or two cases of observable effects of hash collisions
> in their
> > > tests with FQ-Codel. These will be greatly reduced in prevalence with
> Cake, due
> > > to the set-associative hash function which specifically addresses that
> phenomenon.
> > >
> > > - Jonathan Morton
> > > _______________________________________________
> > > Cake mailing list
> > > Cake@lists.bufferbloat.net
> > > https://lists.bufferbloat.net/listinfo/cake
> > >
> 
> 
> 
> --
> I tried to build a better future, a few times:
> https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
> 
> Dave Täht CEO, TekLibre, LLC
> 

[-- Attachment #2: Type: text/html, Size: 7030 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-07 20:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03 22:27 [Cake] Understanding the Achieved Rate Multiplication Effect in FlowQueue-based AQM Bottleneck Dave Taht
2021-12-04  0:09 ` Jonathan Morton
2021-12-04 18:44   ` Dave Taht
2021-12-04 22:29     ` David P. Reed
2021-12-05  1:20       ` Dave Taht
2021-12-04 23:01   ` David P. Reed
2021-12-05  1:24     ` Dave Taht
2021-12-07 20:04       ` David P. Reed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox