Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
* [Cake] and the bad cpu news on arm is
@ 2015-12-19 20:03 Dave Taht
  2015-12-20  8:55 ` Björn Grönvall
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Taht @ 2015-12-19 20:03 UTC (permalink / raw)
  To: cake

is that bcake configured via sqm to do 200mbit/20mbit, rings
in at 82% idle over a 2 minute period over a 1 minute rrul test.
cake, in all it's glory of statistics, extra features, bells, and whistles...

is also 82% idle.

vs sqm's htb + fq_codel at 86% idle.

(in other words, this is not a direct measurement of the code under
'load' but over a large sampling interval comparing idle with the
loaded state, AND (sigh) htb + fq_codel uses less cpu. am using the
mpstat 1 120 test for this from the openwrt sysstat package)

The good news from my perspective:

A) that it works at all with linux 4.4rc4 on the linksys 1200ac, which
was certainly not the case til last week
B) We CAN profile now
C) All systems for bandwidth (htb and cake) are accurate to 200mbit at
least, on this hardware

The bad news from my perspective:

A) I ran out of time for this back in august. Really have a ton of
wifi work stacked up.

B) I'd like to see someone show an instance where cake uses less cpu
or is better in any way than htb+ fq_codel.

There is a very small latency improvement (.8ms vs 1ms on this path).

I do not see any other appreciable difference in cpu usage or network
behavior... aside from negative ones.

Someone that's saying "ooh cake's better" *please* go measure rrul_be
while running "mpstat 1 120" for both sqm with fq_codel and sqm with
cake.

C) I will try to summon the energy to try it on mips myself, tomorrow.

--
Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Cake] and the bad cpu news on arm is
  2015-12-19 20:03 [Cake] and the bad cpu news on arm is Dave Taht
@ 2015-12-20  8:55 ` Björn Grönvall
  2015-12-20 12:59   ` Dave Taht
  2015-12-22  0:31   ` Jonathan Morton
  0 siblings, 2 replies; 6+ messages in thread
From: Björn Grönvall @ 2015-12-20  8:55 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake

[-- Attachment #1: Type: text/plain, Size: 514 bytes --]

Hi Dave,

The data cache on the wndr3800 and the archer is only 32k bytes. Some arm models also have a small cache.

Since the cake data structures are comparatively large it is important to only bring data into the cache that will actually make a difference when enqueueing.

The attached patch avoids reading data that will not be used. It will also save instructions when there are “q->way_hits”.

N.B. If your arm has a large data cache this change may not make much of a difference.

Cheers,
/b


[-- Attachment #2: sch_cake.c.diff --]
[-- Type: application/octet-stream, Size: 1847 bytes --]

diff --git a/sch_cake.c b/sch_cake.c
index 2d03d5a..ac4cb83 100644
--- a/sch_cake.c
+++ b/sch_cake.c
@@ -329,42 +329,38 @@ cake_hash(struct cake_tin_data *q, const struct sk_buff *skb, int flow_mode)
 	} else {
 		u32 inner_hash = reduced_hash % CAKE_SET_WAYS;
 		u32 outer_hash = reduced_hash - inner_hash;
-		u32 i, j, k;
+		u32 i, k;
 
 		/* check if any active queue in the set is reserved for
-		 * this flow. count the empty queues in the set, too
+		 * this flow.
 		 */
 
-		for (i = j = 0, k = inner_hash; i < CAKE_SET_WAYS;
+		for (i = 0, k = inner_hash; i < CAKE_SET_WAYS;
 		     i++, k = (k + 1) % CAKE_SET_WAYS) {
 			if (q->tags[outer_hash + k] == flow_hash) {
 				q->way_hits++;
 				goto found;
-			} else if (list_empty(&q->flows[outer_hash + k].
-					      flowchain)) {
-				j++;
 			}
 		}
 
-		/* no queue is reserved for this flow */
-		if (j) {
-			/* there's at least one empty queue, so find one
-			 * to reserve.
-			 */
-			q->way_misses++;
-
-			for (i = 0; i < CAKE_SET_WAYS; i++, k = (k + 1)
-				     % CAKE_SET_WAYS)
-				if (list_empty(&q->flows[outer_hash + k].
-					       flowchain))
-					goto found;
-		} else {
-			/* With no empty queues default to the original
-			 * queue and accept the collision.
-			 */
-			q->way_collisions++;
+		/* no queue is reserved for this flow. find an empty
+		 * queue to reserve.
+		 */
+		for (i = 0, k = inner_hash; i < CAKE_SET_WAYS;
+		     i++, k = (k + 1) % CAKE_SET_WAYS) {
+			if (list_empty(&q->flows[outer_hash + k].
+					      flowchain)) {
+				q->way_misses++;
+				goto found;
+			}
 		}
 
+		/* With no empty queues default to the original
+		 * queue and accept the collision.
+		 */
+		k = inner_hash;	 /* evict this entry. */
+		q->way_collisions++;
+
 found:
 		/* reserve queue for future packets in same flow */
 		reduced_hash = outer_hash + k;

[-- Attachment #3: Type: text/plain, Size: 2154 bytes --]




On 19 Dec 2015, at 21:03, Dave Taht <dave.taht@gmail.com> wrote:

> is that bcake configured via sqm to do 200mbit/20mbit, rings
> in at 82% idle over a 2 minute period over a 1 minute rrul test.
> cake, in all it's glory of statistics, extra features, bells, and whistles...
> 
> is also 82% idle.
> 
> vs sqm's htb + fq_codel at 86% idle.
> 
> (in other words, this is not a direct measurement of the code under
> 'load' but over a large sampling interval comparing idle with the
> loaded state, AND (sigh) htb + fq_codel uses less cpu. am using the
> mpstat 1 120 test for this from the openwrt sysstat package)
> 
> The good news from my perspective:
> 
> A) that it works at all with linux 4.4rc4 on the linksys 1200ac, which
> was certainly not the case til last week
> B) We CAN profile now
> C) All systems for bandwidth (htb and cake) are accurate to 200mbit at
> least, on this hardware
> 
> The bad news from my perspective:
> 
> A) I ran out of time for this back in august. Really have a ton of
> wifi work stacked up.
> 
> B) I'd like to see someone show an instance where cake uses less cpu
> or is better in any way than htb+ fq_codel.
> 
> There is a very small latency improvement (.8ms vs 1ms on this path).
> 
> I do not see any other appreciable difference in cpu usage or network
> behavior... aside from negative ones.
> 
> Someone that's saying "ooh cake's better" *please* go measure rrul_be
> while running "mpstat 1 120" for both sqm with fq_codel and sqm with
> cake.
> 
> C) I will try to summon the energy to try it on mips myself, tomorrow.
> 
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> https://www.gofundme.com/savewifi
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


--

Björn Grönvall, bjorngx@gmail.com, Cell +46 70 768 06 35, Jabber/XMPP: bg@kth.se

“The ultimate tragedy is not the oppression and cruelty by the bad people but the silence over that by the good people.”

― Martin Luther King Jr.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Cake] and the bad cpu news on arm is
  2015-12-20  8:55 ` Björn Grönvall
@ 2015-12-20 12:59   ` Dave Taht
  2015-12-22  0:30     ` Jonathan Morton
  2015-12-22  4:42     ` Jonathan Morton
  2015-12-22  0:31   ` Jonathan Morton
  1 sibling, 2 replies; 6+ messages in thread
From: Dave Taht @ 2015-12-20 12:59 UTC (permalink / raw)
  To: Björn Grönvall; +Cc: cake

A worthy optimization to eliminate a useless stat, yes!

(However this code path is not hit very often until you stress out a
system with tons of flows.)

At the moment my crusade is to actually convince the main author and
others here to eliminate nearly every statistic being kept, as well as
the ingress rate estimator, the sqrt cache, the different version of
codel, and a couple other things I laid out in a previous email.....

I do happen to like the set associativity idea, but it's proving
really hard to show its value with any existing tests we have.

Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi


On Sun, Dec 20, 2015 at 9:55 AM, Björn Grönvall <bjorngx@gmail.com> wrote:
> Hi Dave,
>
> The data cache on the wndr3800 and the archer is only 32k bytes. Some arm models also have a small cache.
>
> Since the cake data structures are comparatively large it is important to only bring data into the cache that will actually make a difference when enqueueing.
>
> The attached patch avoids reading data that will not be used. It will also save instructions when there are “q->way_hits”.
>
> N.B. If your arm has a large data cache this change may not make much of a difference.
>
> Cheers,
> /b
>
>
>
>
>
> On 19 Dec 2015, at 21:03, Dave Taht <dave.taht@gmail.com> wrote:
>
>> is that bcake configured via sqm to do 200mbit/20mbit, rings
>> in at 82% idle over a 2 minute period over a 1 minute rrul test.
>> cake, in all it's glory of statistics, extra features, bells, and whistles...
>>
>> is also 82% idle.
>>
>> vs sqm's htb + fq_codel at 86% idle.
>>
>> (in other words, this is not a direct measurement of the code under
>> 'load' but over a large sampling interval comparing idle with the
>> loaded state, AND (sigh) htb + fq_codel uses less cpu. am using the
>> mpstat 1 120 test for this from the openwrt sysstat package)
>>
>> The good news from my perspective:
>>
>> A) that it works at all with linux 4.4rc4 on the linksys 1200ac, which
>> was certainly not the case til last week
>> B) We CAN profile now
>> C) All systems for bandwidth (htb and cake) are accurate to 200mbit at
>> least, on this hardware
>>
>> The bad news from my perspective:
>>
>> A) I ran out of time for this back in august. Really have a ton of
>> wifi work stacked up.
>>
>> B) I'd like to see someone show an instance where cake uses less cpu
>> or is better in any way than htb+ fq_codel.
>>
>> There is a very small latency improvement (.8ms vs 1ms on this path).
>>
>> I do not see any other appreciable difference in cpu usage or network
>> behavior... aside from negative ones.
>>
>> Someone that's saying "ooh cake's better" *please* go measure rrul_be
>> while running "mpstat 1 120" for both sqm with fq_codel and sqm with
>> cake.
>>
>> C) I will try to summon the energy to try it on mips myself, tomorrow.
>>
>> --
>> Dave Täht
>> Let's go make home routers and wifi faster! With better software!
>> https://www.gofundme.com/savewifi
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>
>
> --
>
> Björn Grönvall, bjorngx@gmail.com, Cell +46 70 768 06 35, Jabber/XMPP: bg@kth.se
>
> “The ultimate tragedy is not the oppression and cruelty by the bad people but the silence over that by the good people.”
>
> ― Martin Luther King Jr.
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Cake] and the bad cpu news on arm is
  2015-12-20 12:59   ` Dave Taht
@ 2015-12-22  0:30     ` Jonathan Morton
  2015-12-22  4:42     ` Jonathan Morton
  1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Morton @ 2015-12-22  0:30 UTC (permalink / raw)
  To: Dave Taht; +Cc: Björn Grönvall, cake


> On 20 Dec, 2015, at 14:59, Dave Taht <dave.taht@gmail.com> wrote:
> 
> I do happen to like the set associativity idea, but it's proving
> really hard to show its value with any existing tests we have.

I’m sure we had those sorts of tests earlier on.  The 50:1 test might have been one of them.  Certainly, measuring Jain’s Fairness Index across the 50 flows should be revealing; with perfect flow isolation it should be close to unity, while with uncorrected hash collisions it should be noticeably inferior.

It’s certainly easy to detect when it’s avoiding hash collisions through the stats output, as that’s when the “indirect hit” counters start incrementing.

Perhaps when the next stage of flow isolation is actually in and working, the benefits will be easier to measure directly on a human scale.  I don’t think I’d be confident in my current approach to the problem unless the set-associative hash was in place.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Cake] and the bad cpu news on arm is
  2015-12-20  8:55 ` Björn Grönvall
  2015-12-20 12:59   ` Dave Taht
@ 2015-12-22  0:31   ` Jonathan Morton
  1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Morton @ 2015-12-22  0:31 UTC (permalink / raw)
  To: Björn Grönvall; +Cc: Dave Taht, cake


> On 20 Dec, 2015, at 10:55, Björn Grönvall <bjorngx@gmail.com> wrote:
> 
> The attached patch avoids reading data that will not be used. It will also save instructions when there are “q->way_hits”.

I made some small improvements on top of your patch before applying it, since it is unnecessary to re-initialise k after each loop has unsuccessfully completed.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Cake] and the bad cpu news on arm is
  2015-12-20 12:59   ` Dave Taht
  2015-12-22  0:30     ` Jonathan Morton
@ 2015-12-22  4:42     ` Jonathan Morton
  1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Morton @ 2015-12-22  4:42 UTC (permalink / raw)
  To: Dave Taht; +Cc: Björn Grönvall, cake


> On 20 Dec, 2015, at 14:59, Dave Taht <dave.taht@gmail.com> wrote:
> 
> I do happen to like the set associativity idea, but it's proving
> really hard to show its value with any existing tests we have.

Actually, your little experiment with dual-isolation inadvertently showed how robust the set-associative hash is.  Despite being effectively reduced to an 8-queue system (between a single pair of hosts), it was still able to keep four TCP ack flows, four UDP probe flows and a ping flow effectively separate from four TCP bulk flows, to the point where you couldn’t distinguish it from a properly functioning version of Cake under the same load.  Try *that* with a plain, collision-prone hash function.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-12-22  4:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-19 20:03 [Cake] and the bad cpu news on arm is Dave Taht
2015-12-20  8:55 ` Björn Grönvall
2015-12-20 12:59   ` Dave Taht
2015-12-22  0:30     ` Jonathan Morton
2015-12-22  4:42     ` Jonathan Morton
2015-12-22  0:31   ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox