[Cake] cake infinite loop(?) with hfsc on one-armed router

Toke Høiland-Jørgensen toke at toke.dk
Fri Jan 4 17:34:23 EST 2019


Pete Heist <pete at heistp.net> writes:

>> On Jan 4, 2019, at 10:34 PM, Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>> 
>> Pete Heist <pete at heistp.net> writes:
>> 
>>> Ok, the lockup goes away if you use no-split-gso on the cake qdiscs for the default traffic (noted below in the drr and hfsc cases with "!!! must use no-split-gso here !!!"). Only I’d like my 600 μs back. :)
>>> 
>>> This smells of a bug Toke fixed on Sep 12, 2018 in 42e87f12ea5c390bf5eeb658c942bc810046160a, but then reverted in the next commit because it was fixed upstream. However, if I re-apply that commit, it still doesn’t fix it.
>>> 
>>> Perhaps there are more cases where skb_reset_mac_len(skb) needs to be called somewhere for VLAN support?
>>> 
>>> I managed to capture some output from what happens to hfsc:
>>> 
>>> [  683.864456] ------------[ cut here ]------------
>>> [  683.869116] WARNING: CPU: 1 PID: 11 at net/sched/sch_hfsc.c:1427
>>> 0xf9ced4ef()
>> 
>> So this seems to be this line:
>> 
>> WARN_ON(next_time == 0);
>> 
>> See https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_hfsc.c#L1427
>> 
>> Which seems to indicate that HFSC can't find the next class to schedule.
>> Not entirely sure why, nor why this only happens with CAKE as a qdisc.
>> But I don't think it's actually an infinite loop that's causing it...
>
>
> Ok, fwiw one doesn’t actually need a one-armed router or VLANs to reproduce this. Just do this:
>
> tc qdisc add dev $IFACE root handle 1: hfsc default 1
> tc class add dev $IFACE parent 1: classid 1:1 hfsc ls rate $RATE ul rate $RATE
> tc qdisc add dev $IFACE parent 1:1 cake # add split-gso here, or else…
>
> I’ve tried it as far as 4.9.0-8, but no farther. It’s not much of a
> priority for me now that I have a workaround for it...

Ah, I think I know what's going on:

On enqueue, HFSC will increase its own internal notion of qlen (q.qlen++
in hfsc_enqueue()), and in dequeue, it will return immediately if this
qlen is 0. Now, with GSO packet splitting, a single packet on enqueue
can turn in to several packets on dequeue, which means that HFSC will
think the queue is empty after dequeueing the first on, and refuse to
dequeue any more packets.

This basically means that we can't use CAKE as a leaf qdisc with GSO
splitting as it stands currently. I *think* the solution is for CAKE to
notify its parents; could you try the patch below and see if it helps?

-Toke

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index b910cd5c56f7..77b0ebd673ac 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1617,6 +1617,30 @@ static u32 cake_classify(struct Qdisc *sch, struct cake_tin_data **t,
 
 static void cake_reconfigure(struct Qdisc *sch);
 
+void adjust_parent_qlen(struct Qdisc *sch, unsigned int n,
+			       unsigned int len)
+{
+	u32 parentid;
+	if (n == 0 && len == 0)
+		return;
+	rcu_read_lock();
+	while ((parentid = sch->parent)) {
+		if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS))
+			break;
+
+		if (sch->flags & TCQ_F_NOPARENT)
+			break;
+		sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
+		if (sch == NULL) {
+			WARN_ON_ONCE(parentid != TC_H_ROOT);
+			break;
+		}
+		sch->q.qlen += n;
+		sch->qstats.backlog += len;
+	}
+	rcu_read_unlock();
+}
+
 static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			struct sk_buff **to_free)
 {
@@ -1667,7 +1691,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) {
 		struct sk_buff *segs, *nskb;
 		netdev_features_t features = netif_skb_features(skb);
-		unsigned int slen = 0;
+		unsigned int slen = 0, numsegs = 0;
 
 		segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
 		if (IS_ERR_OR_NULL(segs))
@@ -1684,6 +1708,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 			sch->q.qlen++;
 			slen += segs->len;
+			numsegs++;
 			q->buffer_used += segs->truesize;
 			b->packets++;
 			segs = nskb;
@@ -1696,7 +1721,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		sch->qstats.backlog += slen;
 		q->avg_window_bytes += slen;
 
-		qdisc_tree_reduce_backlog(sch, 1, len);
+		adjust_parent_qlen(sch, numsegs - 1, slen - len);
 		consume_skb(skb);
 	} else {
 		/* not splitting */


More information about the Cake mailing list