[Cake] [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake

Cake - FQ_codel the next generation
 help / color / mirror / Atom feed

* [Cake] [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
@ 2025-11-13  3:53 Xiang Mei
  2025-11-13  4:05 ` [Cake] " Xiang Mei
  2025-11-13 13:21 ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 8+ messages in thread
From: Xiang Mei @ 2025-11-13  3:53 UTC (permalink / raw)
  To: security; +Cc: netdev, toke, cake, bestswngs, Xiang Mei

In cake_drop(), qdisc_tree_reduce_backlog() is called to decrement
the qlen of the qdisc hierarchy. However, this can incorrectly reduce
qlen when the dropped packet was never enqueued, leading to a possible
NULL dereference (e.g., when QFQ is the parent qdisc).

This happens when cake_enqueue() returns NET_XMIT_CN: the parent
qdisc does not enqueue the skb, but cake_drop() still reduces backlog.

This patch avoids the extra reduction by checking whether the packet
was actually enqueued. It also moves qdisc_tree_reduce_backlog()
out of cake_drop() to keep backlog accounting consistent.

Fixes: 15de71d06a40 ("net/sched: Make cake_enqueue return NET_XMIT_CN when past buffer_limit")
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v2: add missing cc
v3: move qdisc_tree_reduce_backlog out of cake_drop

 net/sched/sch_cake.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 32bacfc314c2..179cafe05085 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1597,7 +1597,6 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 
 	qdisc_drop_reason(skb, sch, to_free, SKB_DROP_REASON_QDISC_OVERLIMIT);
 	sch->q.qlen--;
-	qdisc_tree_reduce_backlog(sch, 1, len);
 
 	cake_heapify(q, 0);
 
@@ -1750,7 +1749,9 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	ktime_t now = ktime_get();
 	struct cake_tin_data *b;
 	struct cake_flow *flow;
-	u32 idx, tin;
+	u32 dropped = 0;
+	u32 idx, tin, prev_qlen, prev_backlog, drop_id;
+	bool same_flow = false;
 
 	/* choose flow to insert into */
 	idx = cake_classify(sch, &b, skb, q->flow_mode, &ret);
@@ -1927,24 +1928,31 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (q->buffer_used > q->buffer_max_used)
 		q->buffer_max_used = q->buffer_used;
 
-	if (q->buffer_used > q->buffer_limit) {
-		bool same_flow = false;
-		u32 dropped = 0;
-		u32 drop_id;
+	if (q->buffer_used <= q->buffer_limit)
+		return NET_XMIT_SUCCESS;
 
-		while (q->buffer_used > q->buffer_limit) {
-			dropped++;
-			drop_id = cake_drop(sch, to_free);
+	prev_qlen = sch->q.qlen;
+	prev_backlog = sch->qstats.backlog;
 
-			if ((drop_id >> 16) == tin &&
-			    (drop_id & 0xFFFF) == idx)
-				same_flow = true;
-		}
-		b->drop_overlimit += dropped;
+	while (q->buffer_used > q->buffer_limit) {
+		dropped++;
+		drop_id = cake_drop(sch, to_free);
+		if ((drop_id >> 16) == tin &&
+		    (drop_id & 0xFFFF) == idx)
+			same_flow = true;
+	}
+	b->drop_overlimit += dropped;
+
+	/* Compute the droppped qlen and pkt length */
+	prev_qlen -= sch->q.qlen;
+	prev_backlog -= sch->qstats.backlog;
 
-		if (same_flow)
-			return NET_XMIT_CN;
+	if (same_flow) {
+		qdisc_tree_reduce_backlog(sch, prev_qlen - 1,
+					  prev_backlog - len);
+		return NET_XMIT_CN;
 	}
+	qdisc_tree_reduce_backlog(sch, prev_qlen, prev_backlog);
 	return NET_XMIT_SUCCESS;
 }
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  2025-11-13  3:53 [Cake] [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop Xiang Mei
@ 2025-11-13  4:05 ` Xiang Mei
  2025-11-13 13:35   ` Toke Høiland-Jørgensen
  2025-11-13 13:21 ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 8+ messages in thread
From: Xiang Mei @ 2025-11-13  4:05 UTC (permalink / raw)
  To: security; +Cc: netdev, toke, cake, bestswngs

On Wed, Nov 12, 2025 at 08:53:03PM -0700, Xiang Mei wrote:
> In cake_drop(), qdisc_tree_reduce_backlog() is called to decrement
> the qlen of the qdisc hierarchy. However, this can incorrectly reduce
> qlen when the dropped packet was never enqueued, leading to a possible
> NULL dereference (e.g., when QFQ is the parent qdisc).
> 
> This happens when cake_enqueue() returns NET_XMIT_CN: the parent
> qdisc does not enqueue the skb, but cake_drop() still reduces backlog.
> 
> This patch avoids the extra reduction by checking whether the packet
> was actually enqueued. It also moves qdisc_tree_reduce_backlog()
> out of cake_drop() to keep backlog accounting consistent.
> 
> Fixes: 15de71d06a40 ("net/sched: Make cake_enqueue return NET_XMIT_CN when past buffer_limit")
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> ---
> v2: add missing cc
> v3: move qdisc_tree_reduce_backlog out of cake_drop
> 
>  net/sched/sch_cake.c | 40 ++++++++++++++++++++++++----------------
>  1 file changed, 24 insertions(+), 16 deletions(-)
> 
> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> index 32bacfc314c2..179cafe05085 100644
> --- a/net/sched/sch_cake.c
> +++ b/net/sched/sch_cake.c
> @@ -1597,7 +1597,6 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>  
>  	qdisc_drop_reason(skb, sch, to_free, SKB_DROP_REASON_QDISC_OVERLIMIT);
>  	sch->q.qlen--;
> -	qdisc_tree_reduce_backlog(sch, 1, len);
>  
>  	cake_heapify(q, 0);
>  
> @@ -1750,7 +1749,9 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	ktime_t now = ktime_get();
>  	struct cake_tin_data *b;
>  	struct cake_flow *flow;
> -	u32 idx, tin;
> +	u32 dropped = 0;
> +	u32 idx, tin, prev_qlen, prev_backlog, drop_id;
> +	bool same_flow = false;
>  
>  	/* choose flow to insert into */
>  	idx = cake_classify(sch, &b, skb, q->flow_mode, &ret);
> @@ -1927,24 +1928,31 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	if (q->buffer_used > q->buffer_max_used)
>  		q->buffer_max_used = q->buffer_used;
>  
> -	if (q->buffer_used > q->buffer_limit) {
> -		bool same_flow = false;
> -		u32 dropped = 0;
> -		u32 drop_id;
> +	if (q->buffer_used <= q->buffer_limit)
> +		return NET_XMIT_SUCCESS;
>  
> -		while (q->buffer_used > q->buffer_limit) {
> -			dropped++;
> -			drop_id = cake_drop(sch, to_free);
> +	prev_qlen = sch->q.qlen;
> +	prev_backlog = sch->qstats.backlog;
>  
> -			if ((drop_id >> 16) == tin &&
> -			    (drop_id & 0xFFFF) == idx)
> -				same_flow = true;
> -		}
> -		b->drop_overlimit += dropped;
> +	while (q->buffer_used > q->buffer_limit) {
> +		dropped++;
> +		drop_id = cake_drop(sch, to_free);
> +		if ((drop_id >> 16) == tin &&
> +		    (drop_id & 0xFFFF) == idx)
> +			same_flow = true;
> +	}
> +	b->drop_overlimit += dropped;
> +
> +	/* Compute the droppped qlen and pkt length */
> +	prev_qlen -= sch->q.qlen;
> +	prev_backlog -= sch->qstats.backlog;
>  
> -		if (same_flow)
> -			return NET_XMIT_CN;
> +	if (same_flow) {
> +		qdisc_tree_reduce_backlog(sch, prev_qlen - 1,
> +					  prev_backlog - len);
> +		return NET_XMIT_CN;
>  	}
> +	qdisc_tree_reduce_backlog(sch, prev_qlen, prev_backlog);
>  	return NET_XMIT_SUCCESS;
>  }
>  
> -- 
> 2.43.0
>

Thank Toke for the suggestion to move qdisc_tree_reduce_backlog out of
cake_drop. It makes the logic cleaner.

The patch passed CAKE's self-test:
```log
ok 1 1212 - Create CAKE with default setting
ok 2 3281 - Create CAKE with bandwidth limit
ok 3 c940 - Create CAKE with autorate-ingress flag
ok 4 2310 - Create CAKE with rtt time
ok 5 2385 - Create CAKE with besteffort flag
ok 6 a032 - Create CAKE with diffserv8 flag
ok 7 2349 - Create CAKE with diffserv4 flag
ok 8 8472 - Create CAKE with flowblind flag
ok 9 2341 - Create CAKE with dsthost and nat flag
ok 10 5134 - Create CAKE with wash flag
ok 11 2302 - Create CAKE with flowblind and no-split-gso flag
ok 12 0768 - Create CAKE with dual-srchost and ack-filter flag
ok 13 0238 - Create CAKE with dual-dsthost and ack-filter-aggressive flag
ok 14 6572 - Create CAKE with memlimit and ptm flag
ok 15 2436 - Create CAKE with fwmark and atm flag
ok 16 3984 - Create CAKE with overhead and mpu
ok 17 5421 - Create CAKE with conservative and ingress flag
ok 18 6854 - Delete CAKE with conservative and ingress flag
ok 19 2342 - Replace CAKE with mpu
ok 20 2313 - Change CAKE with mpu
ok 21 4365 - Show CAKE class
```

There is still one problem I am not very sure since I am not very 
experienced with cake and gso. It's about the gso branch [1]. The slen 
is the lenth added to the cake sch and that branch uses 
`qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);` to inform the 
parent sched. However, when we drop the packet, it could be probmatic 
since we should reduce slen instead of len. Is this a potential problem?

[1] https://elixir.bootlin.com/linux/v6.6.116/source/net/sched/sch_cake.c#L1803

Thanks,
Xiang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  2025-11-13  3:53 [Cake] [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop Xiang Mei
  2025-11-13  4:05 ` [Cake] " Xiang Mei
@ 2025-11-13 13:21 ` Toke Høiland-Jørgensen
  2025-11-15  9:03   ` Xiang Mei
  1 sibling, 1 reply; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-11-13 13:21 UTC (permalink / raw)
  To: Xiang Mei, security; +Cc: netdev, cake, bestswngs, Xiang Mei

Xiang Mei <xmei5@asu.edu> writes:

> In cake_drop(), qdisc_tree_reduce_backlog() is called to decrement
> the qlen of the qdisc hierarchy. However, this can incorrectly reduce
> qlen when the dropped packet was never enqueued, leading to a possible
> NULL dereference (e.g., when QFQ is the parent qdisc).
>
> This happens when cake_enqueue() returns NET_XMIT_CN: the parent
> qdisc does not enqueue the skb, but cake_drop() still reduces backlog.
>
> This patch avoids the extra reduction by checking whether the packet
> was actually enqueued. It also moves qdisc_tree_reduce_backlog()
> out of cake_drop() to keep backlog accounting consistent.
>
> Fixes: 15de71d06a40 ("net/sched: Make cake_enqueue return NET_XMIT_CN when past buffer_limit")
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> ---
> v2: add missing cc
> v3: move qdisc_tree_reduce_backlog out of cake_drop
>
>  net/sched/sch_cake.c | 40 ++++++++++++++++++++++++----------------
>  1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> index 32bacfc314c2..179cafe05085 100644
> --- a/net/sched/sch_cake.c
> +++ b/net/sched/sch_cake.c
> @@ -1597,7 +1597,6 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>  
>  	qdisc_drop_reason(skb, sch, to_free, SKB_DROP_REASON_QDISC_OVERLIMIT);
>  	sch->q.qlen--;
> -	qdisc_tree_reduce_backlog(sch, 1, len);
>  
>  	cake_heapify(q, 0);
>  
> @@ -1750,7 +1749,9 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	ktime_t now = ktime_get();
>  	struct cake_tin_data *b;
>  	struct cake_flow *flow;
> -	u32 idx, tin;
> +	u32 dropped = 0;
> +	u32 idx, tin, prev_qlen, prev_backlog, drop_id;
> +	bool same_flow = false;
>  
>  	/* choose flow to insert into */
>  	idx = cake_classify(sch, &b, skb, q->flow_mode, &ret);
> @@ -1927,24 +1928,31 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	if (q->buffer_used > q->buffer_max_used)
>  		q->buffer_max_used = q->buffer_used;
>  
> -	if (q->buffer_used > q->buffer_limit) {
> -		bool same_flow = false;
> -		u32 dropped = 0;
> -		u32 drop_id;
> +	if (q->buffer_used <= q->buffer_limit)
> +		return NET_XMIT_SUCCESS;

While this does reduce indentation, I don't think it's an overall
improvement; the overflow condition is the exceptional case, so it's
clearer to keep it inside the if statement (which also keeps the
variable scope smaller).

> -		while (q->buffer_used > q->buffer_limit) {
> -			dropped++;
> -			drop_id = cake_drop(sch, to_free);
> +	prev_qlen = sch->q.qlen;
> +	prev_backlog = sch->qstats.backlog;
>  
> -			if ((drop_id >> 16) == tin &&
> -			    (drop_id & 0xFFFF) == idx)
> -				same_flow = true;
> -		}
> -		b->drop_overlimit += dropped;
> +	while (q->buffer_used > q->buffer_limit) {
> +		dropped++;
> +		drop_id = cake_drop(sch, to_free);
> +		if ((drop_id >> 16) == tin &&
> +		    (drop_id & 0xFFFF) == idx)
> +			same_flow = true;
> +	}
> +	b->drop_overlimit += dropped;

The 'dropped' variable is wholly redundant after this change, so let's
just get rid of it and use the prev_qlen for this statistic instead.

-Toke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  2025-11-13  4:05 ` [Cake] " Xiang Mei
@ 2025-11-13 13:35   ` Toke Høiland-Jørgensen
  2025-11-15  9:23     ` Xiang Mei
  0 siblings, 1 reply; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-11-13 13:35 UTC (permalink / raw)
  To: Xiang Mei, security; +Cc: netdev, cake, bestswngs

Xiang Mei <xmei5@asu.edu> writes:

> There is still one problem I am not very sure since I am not very 
> experienced with cake and gso. It's about the gso branch [1]. The slen 
> is the lenth added to the cake sch and that branch uses 
> `qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);` to inform the 
> parent sched. However, when we drop the packet, it could be probmatic 
> since we should reduce slen instead of len. Is this a potential
> problem?

Hmm, no I think it's fine? The qdisc_tree_reduce_backlog(sch, 1-numsegs,
len-slen) *increases* the backlog with the difference between the
original length and the number of new segments. And then we *decrease*
the backlog with the number of bytes we dropped.

The compensation we're doing is for the backlog update of the parent,
which is still using the original packet length regardless of any
splitting, so that doesn't change the compensation value.

-Toke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  2025-11-13 13:21 ` Toke Høiland-Jørgensen
@ 2025-11-15  9:03   ` Xiang Mei
  0 siblings, 0 replies; 8+ messages in thread
From: Xiang Mei @ 2025-11-15  9:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: security, netdev, cake, bestswngs

On Thu, Nov 13, 2025 at 02:21:54PM +0100, Toke Høiland-Jørgensen wrote:
> Xiang Mei <xmei5@asu.edu> writes:
> 
> > In cake_drop(), qdisc_tree_reduce_backlog() is called to decrement
> > the qlen of the qdisc hierarchy. However, this can incorrectly reduce
> > qlen when the dropped packet was never enqueued, leading to a possible
> > NULL dereference (e.g., when QFQ is the parent qdisc).
> >
> > This happens when cake_enqueue() returns NET_XMIT_CN: the parent
> > qdisc does not enqueue the skb, but cake_drop() still reduces backlog.
> >
> > This patch avoids the extra reduction by checking whether the packet
> > was actually enqueued. It also moves qdisc_tree_reduce_backlog()
> > out of cake_drop() to keep backlog accounting consistent.
> >
> > Fixes: 15de71d06a40 ("net/sched: Make cake_enqueue return NET_XMIT_CN when past buffer_limit")
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > ---
> > v2: add missing cc
> > v3: move qdisc_tree_reduce_backlog out of cake_drop
> >
> >  net/sched/sch_cake.c | 40 ++++++++++++++++++++++++----------------
> >  1 file changed, 24 insertions(+), 16 deletions(-)
> >
> > diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> > index 32bacfc314c2..179cafe05085 100644
> > --- a/net/sched/sch_cake.c
> > +++ b/net/sched/sch_cake.c
> > @@ -1597,7 +1597,6 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
> >  
> >  	qdisc_drop_reason(skb, sch, to_free, SKB_DROP_REASON_QDISC_OVERLIMIT);
> >  	sch->q.qlen--;
> > -	qdisc_tree_reduce_backlog(sch, 1, len);
> >  
> >  	cake_heapify(q, 0);
> >  
> > @@ -1750,7 +1749,9 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >  	ktime_t now = ktime_get();
> >  	struct cake_tin_data *b;
> >  	struct cake_flow *flow;
> > -	u32 idx, tin;
> > +	u32 dropped = 0;
> > +	u32 idx, tin, prev_qlen, prev_backlog, drop_id;
> > +	bool same_flow = false;
> >  
> >  	/* choose flow to insert into */
> >  	idx = cake_classify(sch, &b, skb, q->flow_mode, &ret);
> > @@ -1927,24 +1928,31 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >  	if (q->buffer_used > q->buffer_max_used)
> >  		q->buffer_max_used = q->buffer_used;
> >  
> > -	if (q->buffer_used > q->buffer_limit) {
> > -		bool same_flow = false;
> > -		u32 dropped = 0;
> > -		u32 drop_id;
> > +	if (q->buffer_used <= q->buffer_limit)
> > +		return NET_XMIT_SUCCESS;
> 
> While this does reduce indentation, I don't think it's an overall
> improvement; the overflow condition is the exceptional case, so it's
> clearer to keep it inside the if statement (which also keeps the
> variable scope smaller).
> 
> > -		while (q->buffer_used > q->buffer_limit) {
> > -			dropped++;
> > -			drop_id = cake_drop(sch, to_free);
> > +	prev_qlen = sch->q.qlen;
> > +	prev_backlog = sch->qstats.backlog;
> >  
> > -			if ((drop_id >> 16) == tin &&
> > -			    (drop_id & 0xFFFF) == idx)
> > -				same_flow = true;
> > -		}
> > -		b->drop_overlimit += dropped;
> > +	while (q->buffer_used > q->buffer_limit) {
> > +		dropped++;
> > +		drop_id = cake_drop(sch, to_free);
> > +		if ((drop_id >> 16) == tin &&
> > +		    (drop_id & 0xFFFF) == idx)
> > +			same_flow = true;
> > +	}
> > +	b->drop_overlimit += dropped;
> 
> The 'dropped' variable is wholly redundant after this change, so let's
> just get rid of it and use the prev_qlen for this statistic instead.
> 
> -Toke

Thanks for the reminder, and sorry for the delayed reply.
You are right that we should remove the unused variable.

Xiang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  2025-11-13 13:35   ` Toke Høiland-Jørgensen
@ 2025-11-15  9:23     ` Xiang Mei
  2025-11-17 13:23       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 8+ messages in thread
From: Xiang Mei @ 2025-11-15  9:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: security, netdev, cake, bestswngs

On Thu, Nov 13, 2025 at 02:35:18PM +0100, Toke Høiland-Jørgensen wrote:
> Xiang Mei <xmei5@asu.edu> writes:
> 
> > There is still one problem I am not very sure since I am not very 
> > experienced with cake and gso. It's about the gso branch [1]. The slen 
> > is the lenth added to the cake sch and that branch uses 
> > `qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);` to inform the 
> > parent sched. However, when we drop the packet, it could be probmatic 
> > since we should reduce slen instead of len. Is this a potential
> > problem?
> 
> Hmm, no I think it's fine? The qdisc_tree_reduce_backlog(sch, 1-numsegs,
> len-slen) *increases* the backlog with the difference between the
> original length and the number of new segments. And then we *decrease*
> the backlog with the number of bytes we dropped.
> 
> The compensation we're doing is for the backlog update of the parent,
> which is still using the original packet length regardless of any
> splitting, so that doesn't change the compensation value.
> 
> -Toke

I still think current method to reduce backlog may be problematic:
What you said is stated for the GSO branch when cake_queue returns
NET_XMIT_SUCCESS, but it may lead to issues when it returns NET_XMIT_CN.
For the normal case where no dropping happens, the current implementation
is correct. We can see how qlen and backlog change as follows:

backlog:
	-(len - slen)  Reason: qdisc_tree_reduce_backlog(sch, 1 - numsegs, len - slen);
	+len           Reason: parent enqueue)
	Total: slen
qlen:
	-(1 - numsegs) Reason: qdisc_tree_reduce_backlog(sch, 1 - numsegs, len - slen);
	+1 	       Reason: parent enqueue
	Total: numsegs

This makes sense because we split one packet into numsegs packets of total
length slen and enqueue them all. When a drop happens, we must fix both 
qlen and backlog.

In the not patched code, cake_drop() calls qdisc_tree_reduce_backlog() for
dropped packets. This works in most cases but ignores the scenario where 
we drop (parts of) the incoming packet, meaning the expected:

```
backlog += len
qlen += 1
```

will not run because the parent scheduler stops enqueueing after seeing
NET_XMIT_CN. For normal packets (non-GSO), it's easy to fix: just do
qdisc_tree_reduce_backlog(sch, 1, len). However, GSO splitting makes this
difficult because we may have already added multiple segments into the
flow, and we don’t know how many of them were dequeued.

The number of dequeued segments can be anywhere in [0, numsegs], and the
dequeued length in [0, slen]. We cannot know the exact number without 
checking the tin/flow index of each dropped packet. Therefore, we should
check inside the loop (as v1 did):

```
cake_drop(...)
{
    ...
    if (likely(current_flow != idx + (tin << 16)))
        qdisc_tree_reduce_backlog(sch, 1, len);
    ...
}
```

This solution also has a problem, as you mentioned:
if the flow already contains packets, dropping those packets should
trigger backlog reduction, but our check would incorrectly skip that. One
possible solution is to track the number of packets/segments enqueued
in the current cake_enqueue (numsegs or 1), and then avoid calling
`qdisc_tree_reduce_backlog(sch, 1, len)` for the 1 or numsegs dropped
packets. If that makes sense, I'll make the patch and test it.

-----

Besides, I have a question about the condition for returning NET_XMIT_CN.
Do we return NET_XMIT_CN when:

The incoming packet itself is dropped? (makes more sense to me)
or
The same flow dequeued once? (This is the current logic)

If we keep the current logic, the above patch approach works. If not, we 
need additional checks because we append the incoming packet to the tail
but drops occur at the head.

Thanks,
Xiang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  2025-11-15  9:23     ` Xiang Mei
@ 2025-11-17 13:23       ` Toke Høiland-Jørgensen
  2025-11-21 20:33         ` Xiang Mei
  0 siblings, 1 reply; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-11-17 13:23 UTC (permalink / raw)
  To: Xiang Mei; +Cc: security, netdev, cake, bestswngs

> will not run because the parent scheduler stops enqueueing after seeing
> NET_XMIT_CN. For normal packets (non-GSO), it's easy to fix: just do
> qdisc_tree_reduce_backlog(sch, 1, len). However, GSO splitting makes this
> difficult because we may have already added multiple segments into the
> flow, and we don’t know how many of them were dequeued.

Huh, dequeued? This is all running under the qdisc lock, nothing gets
dequeued in the meantime.

Besides, the ACK thinning is irrelevant to the drop compensation. Here's
an example:

Without ACK splitting - we enqueue 1 packet of 100 bytes, then drop 1
packet of 200 bytes, so we should end up with the same qlen, but 100
fewer bytes in the queue:

start: parent qlen = X, parent backlog = Y

len = 100;
cake_drop() drops 1 pkt / 200 bytes

if (same_flow) {
  qdisc_reduce_backlog(0, 100) // parent qlen == X, parent backlog == Y - 100
  return NET_XMIT_CN;
  // no change in parent, so parent qlen == X, parent backlog == Y - 100
} else {
  qdisc_reduce_backlog(1, 200); // parent qlen == X - 1, backlog == Y - 200
  return NET_XMIT_SUCCESS;
  // parent does qlen +=1, backlog += 100, so parent qlen == x, parent backlog == Y - 100
}

With ACK splitting - we enqueue 10 segments totalling 110 bytes, then
drop 1 packet of 200 bytes, so we should end up with 9 packets more in
the queue, but 90 bytes less:

start: parent qlen = X, parent backlog = Y

len = 100;
/* split ack: slen == 110, numsegs == 10 */
qdisc_tree_reduce_backlog(-9, -10); // parent qlen == X + 9, backlog == Y + 10

cake_drop() drops 1 pkt / 200 bytes

if (same_flow) {
  qdisc_reduce_backlog(0, 100)   // parent qlen == X + 9, backlog == Y - 90
  return NET_XMIT_CN;
  // no change in parent, so parent qlen == X + 9, backlog == Y - 90

} else {
  qdisc_reduce_backlog(1, 200); // parent qlen == X + 8, backlog == Y - 190
  return NET_XMIT_SUCCESS;
  // parent does qlen +=1, backlog += 100, so parent qlen == X + 9, backlog == Y - 90
}


In both cases, what happens is that we drop one or more packets, reduce
the backlog by the number of packets/bytes dropped *while compensating
for what the parent qdisc does on return*. So the adjustments made by
the segmentation makes no difference to the final outcome.

However, we do have one problem with the ACK thinning code: in the 'if
(ack)' branch, we currently adjust 'len' if we drop an ACK. Meaning that
if we use that value later to adjust for what the parent qdisc, the
value will no longer agree with what the parent does. So we'll have to
introduce a new variable for the length used in the ACK thinning
calculation.

> The number of dequeued segments can be anywhere in [0, numsegs], and the
> dequeued length in [0, slen]. We cannot know the exact number without 
> checking the tin/flow index of each dropped packet. Therefore, we should
> check inside the loop (as v1 did):
>
> ```
> cake_drop(...)
> {
>     ...
>     if (likely(current_flow != idx + (tin << 16)))
>         qdisc_tree_reduce_backlog(sch, 1, len);
>     ...
> }
> ```

No, this is not needed - the calculation involving prev_qlen and
prev_backlog will correctly give us the total number of packets/bytes
dropped.
>
> This solution also has a problem, as you mentioned:
> if the flow already contains packets, dropping those packets should
> trigger backlog reduction, but our check would incorrectly skip that. One
> possible solution is to track the number of packets/segments enqueued
> in the current cake_enqueue (numsegs or 1), and then avoid calling
> `qdisc_tree_reduce_backlog(sch, 1, len)` for the 1 or numsegs dropped
> packets. If that makes sense, I'll make the patch and test it.

It does not - see above.

> -----
>
> Besides, I have a question about the condition for returning NET_XMIT_CN.
> Do we return NET_XMIT_CN when:
>
> The incoming packet itself is dropped? (makes more sense to me)
> or
> The same flow dequeued once? (This is the current logic)

The same flow. The current logic it correct.

-Toke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  2025-11-17 13:23       ` Toke Høiland-Jørgensen
@ 2025-11-21 20:33         ` Xiang Mei
  0 siblings, 0 replies; 8+ messages in thread
From: Xiang Mei @ 2025-11-21 20:33 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: security, netdev, cake, bestswngs

On Mon, Nov 17, 2025 at 02:23:45PM +0100, Toke Høiland-Jørgensen wrote:
> > will not run because the parent scheduler stops enqueueing after seeing
> > NET_XMIT_CN. For normal packets (non-GSO), it's easy to fix: just do
> > qdisc_tree_reduce_backlog(sch, 1, len). However, GSO splitting makes this
> > difficult because we may have already added multiple segments into the
> > flow, and we don’t know how many of them were dequeued.
> 
> Huh, dequeued? This is all running under the qdisc lock, nothing gets
> dequeued in the meantime.

Sorry for using wrong word. The "dequeued" should be "dropped".

> 
> Besides, the ACK thinning is irrelevant to the drop compensation. Here's
> an example:
> 
> Without ACK splitting - we enqueue 1 packet of 100 bytes, then drop 1
> packet of 200 bytes, so we should end up with the same qlen, but 100
> fewer bytes in the queue:
> 
> start: parent qlen = X, parent backlog = Y
> 
> len = 100;
> cake_drop() drops 1 pkt / 200 bytes
> 
> if (same_flow) {
>   qdisc_reduce_backlog(0, 100) // parent qlen == X, parent backlog == Y - 100
>   return NET_XMIT_CN;
>   // no change in parent, so parent qlen == X, parent backlog == Y - 100
> } else {
>   qdisc_reduce_backlog(1, 200); // parent qlen == X - 1, backlog == Y - 200
>   return NET_XMIT_SUCCESS;
>   // parent does qlen +=1, backlog += 100, so parent qlen == x, parent backlog == Y - 100
> }
> 
> With ACK splitting - we enqueue 10 segments totalling 110 bytes, then
> drop 1 packet of 200 bytes, so we should end up with 9 packets more in
> the queue, but 90 bytes less:
> 
> start: parent qlen = X, parent backlog = Y
> 
> len = 100;
> /* split ack: slen == 110, numsegs == 10 */
> qdisc_tree_reduce_backlog(-9, -10); // parent qlen == X + 9, backlog == Y + 10
> 
> cake_drop() drops 1 pkt / 200 bytes
> 
> if (same_flow) {
>   qdisc_reduce_backlog(0, 100)   // parent qlen == X + 9, backlog == Y - 90
>   return NET_XMIT_CN;
>   // no change in parent, so parent qlen == X + 9, backlog == Y - 90
> 
> } else {
>   qdisc_reduce_backlog(1, 200); // parent qlen == X + 8, backlog == Y - 190
>   return NET_XMIT_SUCCESS;
>   // parent does qlen +=1, backlog += 100, so parent qlen == X + 9, backlog == Y - 90
> }
> 
> 
> In both cases, what happens is that we drop one or more packets, reduce
> the backlog by the number of packets/bytes dropped *while compensating
> for what the parent qdisc does on return*. So the adjustments made by
> the segmentation makes no difference to the final outcome.

Thanks for the detailed explanations. You are right. The current patch
logic is correct to handle these cases.

> 
> However, we do have one problem with the ACK thinning code: in the 'if
> (ack)' branch, we currently adjust 'len' if we drop an ACK. Meaning that
> if we use that value later to adjust for what the parent qdisc, the
> value will no longer agree with what the parent does. So we'll have to
> introduce a new variable for the length used in the ACK thinning
> calculation.
> 

I see the issue. It will be resolved in the new patch.

> > The number of dequeued segments can be anywhere in [0, numsegs], and the
> > dequeued length in [0, slen]. We cannot know the exact number without 
> > checking the tin/flow index of each dropped packet. Therefore, we should
> > check inside the loop (as v1 did):
> >
> > ```
> > cake_drop(...)
> > {
> >     ...
> >     if (likely(current_flow != idx + (tin << 16)))
> >         qdisc_tree_reduce_backlog(sch, 1, len);
> >     ...
> > }
> > ```
> 
> No, this is not needed - the calculation involving prev_qlen and
> prev_backlog will correctly give us the total number of packets/bytes
> dropped.
> >
> > This solution also has a problem, as you mentioned:
> > if the flow already contains packets, dropping those packets should
> > trigger backlog reduction, but our check would incorrectly skip that. One
> > possible solution is to track the number of packets/segments enqueued
> > in the current cake_enqueue (numsegs or 1), and then avoid calling
> > `qdisc_tree_reduce_backlog(sch, 1, len)` for the 1 or numsegs dropped
> > packets. If that makes sense, I'll make the patch and test it.
> 
> It does not - see above.
> 
> > -----
> >
> > Besides, I have a question about the condition for returning NET_XMIT_CN.
> > Do we return NET_XMIT_CN when:
> >
> > The incoming packet itself is dropped? (makes more sense to me)
> > or
> > The same flow dequeued once? (This is the current logic)
> 
> The same flow. The current logic it correct.
> 
> -Toke


Thanks for the explanations. I have understood NET_XMIT_CN wrong. A patch
(removing var droppped and handling ACK branch correctly) will be tested
and sent.

Thanks,
Xiang

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-21 20:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-13  3:53 [Cake] [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop Xiang Mei
2025-11-13  4:05 ` [Cake] " Xiang Mei
2025-11-13 13:35   ` Toke Høiland-Jørgensen
2025-11-15  9:23     ` Xiang Mei
2025-11-17 13:23       ` Toke Høiland-Jørgensen
2025-11-21 20:33         ` Xiang Mei
2025-11-13 13:21 ` Toke Høiland-Jørgensen
2025-11-15  9:03   ` Xiang Mei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox