[Codel] [PATCH 2/2] Clamp interval to 32 bits

CoDel AQM discussions
 help / color / mirror / Atom feed

* [Codel] [PATCH 2/2] Clamp interval to 32 bits
@ 2012-05-05 11:34 Dave Täht
  2012-05-05 11:40 ` Dave Taht
  0 siblings, 1 reply; 34+ messages in thread
From: Dave Täht @ 2012-05-05 11:34 UTC (permalink / raw)
  To: codel; +Cc: Dave Täht

---
 net/sched/sch_codel.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 636f505..d26db8c 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -88,10 +88,10 @@ static unsigned int states;
  * return interval/sqrt(x) with good precision
  */
 
-static u32 calc(u64 interval, unsigned long x)
+static u32 calc(u32 _interval, unsigned long x)
 {
-       /* scale for 16 bits precision */
-       while (x < (1UL << 30)) {
+       u64 interval = _interval;       
+       while (x < (1UL << (BITS_PER_LONG - 2))) {
                x <<= 2;
                interval <<= 1;
        }
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH 2/2] Clamp interval to 32 bits
  2012-05-05 11:34 [Codel] [PATCH 2/2] Clamp interval to 32 bits Dave Täht
@ 2012-05-05 11:40 ` Dave Taht
  2012-05-05 11:53   ` Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: Dave Taht @ 2012-05-05 11:40 UTC (permalink / raw)
  To: Dave Täht; +Cc: codel

On Sat, May 5, 2012 at 4:34 AM, Dave Täht <dave.taht@bufferbloat.net> wrote:
> ---
>  net/sched/sch_codel.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
> index 636f505..d26db8c 100644
> --- a/net/sched/sch_codel.c
> +++ b/net/sched/sch_codel.c
> @@ -88,10 +88,10 @@ static unsigned int states;
>  * return interval/sqrt(x) with good precision
>  */
>
> -static u32 calc(u64 interval, unsigned long x)
> +static u32 calc(u32 _interval, unsigned long x)
>  {
> -       /* scale for 16 bits precision */
> -       while (x < (1UL << 30)) {
> +       u64 interval = _interval;
> +       while (x < (1UL << (BITS_PER_LONG - 2))) {
>                x <<= 2;
>                interval <<= 1;
>        }
> --
> 1.7.9.5
>

eric correctly pointed out that I'm crazy.

If I run either version of this with htb, it crashes after a brief load.

qdisc del dev eth0 root
qdisc add dev eth0 root handle 1: est 1sec 8sec htb default 1
class add dev eth0 parent 1: classid 1:1 est 1sec 8sec htb rate
2000kibit mtu 1500 quantum 1514
qdisc add dev eth0 parent 1:1 handle 10: est 1sec 4sec codel target
5ms interval 100ms limit 1000

I figure that's due to my still-botched netlink interface to the change routine.

I read some relevant code and the netlink doc and it's becoming more
clear but I'm going to bed.
handing off now.


-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH 2/2] Clamp interval to 32 bits
  2012-05-05 11:40 ` Dave Taht
@ 2012-05-05 11:53   ` Eric Dumazet
  2012-05-05 14:49     ` [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 11:53 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 04:40 -0700, Dave Taht wrote:

> If I run either version of this with htb, it crashes after a brief load.
> 
> qdisc del dev eth0 root
> qdisc add dev eth0 root handle 1: est 1sec 8sec htb default 1
> class add dev eth0 parent 1: classid 1:1 est 1sec 8sec htb rate
> 2000kibit mtu 1500 quantum 1514
> qdisc add dev eth0 parent 1:1 handle 10: est 1sec 4sec codel target
> 5ms interval 100ms limit 1000
> 
> I figure that's due to my still-botched netlink interface to the change routine.
> 
> I read some relevant code and the netlink doc and it's becoming more
> clear but I'm going to bed.
> handing off now.

Dont worry I'll take care of this

Have a good night !




^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 11:53   ` Eric Dumazet
@ 2012-05-05 14:49     ` Eric Dumazet
  2012-05-05 16:11       ` Dave Taht
  2012-05-05 20:20       ` [Codel] [PATCH v5] pkt_sched: " Eric Dumazet
  0 siblings, 2 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 14:49 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

From: Dave Taht <dave.taht@gmail.com>

A nice changelog here, to tell how nice is CoDel, giving pointers to
documentation and all credits.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/pkt_sched.h |   13 +
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_codel.c     |  425 ++++++++++++++++++++++++++++++++++++
 4 files changed, 450 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index ffe975c..62a73bf 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -655,4 +655,17 @@ struct tc_qfq_stats {
 	__u32 lmax;
 };
 
+/* CODEL */
+
+enum {
+	TCA_CODEL_UNSPEC,
+	TCA_CODEL_TARGET,
+	TCA_CODEL_LIMIT,
+	TCA_CODEL_MINBYTES,
+	TCA_CODEL_INTERVAL,
+	__TCA_CODEL_MAX
+};
+
+#define TCA_CODEL_MAX	(__TCA_CODEL_MAX - 1)
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 75b58f8..fadd252 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -250,6 +250,17 @@ config NET_SCH_QFQ
 
 	  If unsure, say N.
 
+config NET_SCH_CODEL
+	tristate "Controlled Delay AQM (CODEL)"
+	help
+	  Say Y here if you want to use the Controlled Delay (CODEL)
+	  packet scheduling algorithm.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called sch_codel.
+
+	  If unsure, say N.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8cdf4e2..30fab03 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -37,6 +37,7 @@ obj-$(CONFIG_NET_SCH_PLUG)	+= sch_plug.o
 obj-$(CONFIG_NET_SCH_MQPRIO)	+= sch_mqprio.o
 obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_SCH_QFQ)	+= sch_qfq.o
+obj-$(CONFIG_NET_SCH_CODEL)	+= sch_codel.o
 
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
new file mode 100644
index 0000000..a19177f
--- /dev/null
+++ b/net/sched/sch_codel.c
@@ -0,0 +1,425 @@
+/*
+ * net/sched/sch_codel.c	A Codel implementation
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ * 
+ * Codel, the COntrolled DELay Queueing discipline
+ * Based on ns2 simulation code presented by Kathie Nichols
+ *
+ * Authors:	Dave Täht <d@taht.net>
+ *		Eric Dumazet <edumazet@google.com>
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+
+#define MS2TIME(a) (ns_to_ktime( (u64) a * NSEC_PER_MSEC))
+#define DEFAULT_CODEL_LIMIT 1000
+#define PRECALC_MAX 64
+
+/* 
+ * Via patch found at:
+ * http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0659.html 
+ * I don't know why this isn't in ktime.h as it seemed sane...
+ */
+
+/*
+ * ktime_compare - Compares two ktime_t variables
+ *
+ * Return val:
+ * lhs < rhs: < 0
+ * lhs == rhs: 0
+ * lhs > rhs: > 0
+ */
+
+#if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)
+static inline int ktime_compare(const ktime_t lhs, const ktime_t rhs)
+{
+	if (lhs.tv64 < rhs.tv64)
+		return -1;
+	if (lhs.tv64 > rhs.tv64)
+		return 1;
+	return 0;
+}
+#else
+static inline int ktime_compare(const ktime_t lhs, const ktime_t rhs)
+{
+	if (lhs.tv.sec < rhs.tv.sec)
+		return -1;
+	if (lhs.tv.sec > rhs.tv.sec)
+		return 1;
+	return lhs.tv.nsec - rhs.tv.nsec;
+}
+#endif
+
+/* Per-queue state (codel_queue_t instance variables) */
+
+struct codel_sched_data {
+	u32	flags;
+	u32	minbytes;
+	u32	count; /* packets dropped since we went into drop state */
+	u32	drop_count;
+	bool	dropping;
+	ktime_t	target;
+	/* time to declare above q->target (0 if below)*/
+	ktime_t	first_above_time;
+	ktime_t	drop_next; /* time to drop next packet */
+	ktime_t	interval16;
+	u32	interval;
+	u32	q_intervals[PRECALC_MAX];
+};
+
+struct codel_skb_cb {
+	ktime_t enqueue_time;
+};
+
+static unsigned int state1;
+static unsigned int state2;
+static unsigned int state3;
+static unsigned int states;
+
+/* 
+ * return interval/sqrt(x) with good precision
+ */
+static u32 calc(u32 _interval, unsigned long x)
+{
+	u64 interval = _interval;
+
+	/* scale operands for max precision */
+	while (x < (1UL << (BITS_PER_LONG - 2))) {
+		x <<= 2;
+		interval <<= 1;
+	}
+	do_div(interval, int_sqrt(x));
+	return (u32)interval;
+}
+
+static void codel_fill_cache(struct codel_sched_data *q)
+{
+	int i;
+
+	q->q_intervals[0] = q->interval;
+	for (i = 2; i <= PRECALC_MAX; i++)
+		q->q_intervals[i - 1] = calc(q->interval, i);
+}
+
+static struct codel_skb_cb *get_codel_cb(const struct sk_buff *skb)
+{
+	qdisc_cb_private_validate(skb, sizeof(struct codel_skb_cb));
+	return (struct codel_skb_cb *)qdisc_skb_cb(skb)->data;
+}
+
+static ktime_t get_enqueue_time(const struct sk_buff *skb)
+{
+	return get_codel_cb(skb)->enqueue_time;
+}
+
+static void set_enqueue_time(struct sk_buff *skb)
+{
+	get_codel_cb(skb)->enqueue_time = ktime_get();
+}
+
+/*
+ *	The original control_law required floating point.
+ *
+ *	return ktime_add_ns(t, q->interval / sqrt(q->count));
+ *
+ */
+static ktime_t control_law(const struct codel_sched_data *q, ktime_t t)
+{
+	u32 inter;
+
+	if (q->count > PRECALC_MAX)
+		inter = calc(q->interval, q->count);
+	else
+		inter = q->q_intervals[q->count - 1];
+	return ktime_add_ns(t, inter);
+}
+
+static bool should_drop(struct sk_buff *skb, struct Qdisc *sch, ktime_t now)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	ktime_t sojourn_time;
+	bool drop;
+
+	if (!skb) {
+		q->first_above_time.tv64 = 0;
+		return false;
+	}
+	sojourn_time = ktime_sub(now, get_enqueue_time(skb));
+
+	if (ktime_compare(sojourn_time, q->target) < 0 || 
+	    sch->qstats.backlog < q->minbytes) {
+		/* went below so we'll stay below for at least q->interval */
+		q->first_above_time.tv64 = 0;
+		return false;
+	}
+	drop = false;
+	if (q->first_above_time.tv64 == 0) {
+		/* just went above from below. If we stay above
+		 * for at least q->interval we'll say it's ok to drop
+		 */
+		q->first_above_time = ktime_add_ns(now, q->interval);
+	} else if (ktime_compare(now, q->first_above_time) >= 0) {
+		drop = true;
+		state1++;
+	}
+	return drop;
+}
+
+static void codel_drop(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_drop(skb, sch);
+	q->drop_count++;
+}
+
+static struct sk_buff *codel_dequeue(struct Qdisc *sch)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = __skb_dequeue(&sch->q);
+	ktime_t now;
+	bool drop;
+
+	if (!skb) {
+		q->dropping = false;
+		return skb;
+	}
+	now = ktime_get();
+	drop = should_drop(skb, sch, now);
+	if (q->dropping) {
+		if (!drop) {
+			/* sojourn time below target - leave dropping state */
+			q->dropping = false;
+		} else if (ktime_compare(now, q->drop_next) >=0) {
+			state2++;
+			/* It's time for the next drop. Drop the current
+			 * packet and dequeue the next. The dequeue might 
+			 * take us out of dropping state. 
+			 * If not, schedule the next drop.
+			 * A large backlog might result in drop rates so high
+			 * that the next drop should happen now, 
+			 * hence the while loop.
+			 */  
+			while (q->dropping && 
+			       (ktime_compare(now, q->drop_next) >= 0)) {
+				codel_drop(sch, skb);
+				q->count++;
+				skb = __skb_dequeue(&sch->q);
+				if (!should_drop(skb, sch, now)) {
+					/* leave dropping state */
+					q->dropping = false;
+				} else {
+					/* and schedule the next drop */
+					q->drop_next = 
+						control_law(q, q->drop_next);
+				}
+			}
+		}
+	} else if (drop &&
+		   ((ktime_compare(ktime_sub(now, q->drop_next),
+				   q->interval16) < 0) ||
+		   (ktime_compare(ktime_sub(now, q->first_above_time),
+				  ns_to_ktime(2 * q->interval)) >= 0 ))) {
+		codel_drop(sch, skb);
+		skb = __skb_dequeue(&sch->q);
+		drop = should_drop(skb, sch, now);
+		q->dropping = true;
+		state3++;
+		/* 
+		 * if min went above target close to when we last went below it
+		 * assume that the drop rate that controlled the queue on the
+		 * last cycle is a good starting point to control it now.
+		 */
+		if (ktime_compare(ktime_sub(now, q->drop_next),
+				  q->interval16) < 0) {
+			q->count = q->count > 1 ? q->count - 1 : 1;
+		} else {
+			q->count = 1;
+		}
+		q->drop_next = control_law(q, now);
+	}
+	if ((states++ % 64) == 0) { 
+		pr_debug("s1: %u, s2: %u, s3: %u\n", 
+			  state1, state2, state3); 
+	}
+	/* We cant call qdisc_tree_decrease_qlen() if our qlen is 0,
+	 * or HTB crashes
+	 */
+	if (q->drop_count && sch->q.qlen) {
+		qdisc_tree_decrease_qlen(sch, q->drop_count);
+		q->drop_count = 0;
+	}
+	if (skb) {
+		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qdisc_bstats_update(sch, skb);
+	}
+	return skb;
+}
+
+static int codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	if (likely(skb_queue_len(&sch->q) < sch->limit)) {
+		set_enqueue_time(skb);
+		return qdisc_enqueue_tail(skb, sch);
+	}
+	return qdisc_drop(skb, sch);
+}
+
+static const struct nla_policy codel_policy[TCA_CODEL_MAX + 1] = {
+	[TCA_CODEL_TARGET]	= { .type = NLA_U32 },
+	[TCA_CODEL_LIMIT]	= { .type = NLA_U32 },
+	[TCA_CODEL_MINBYTES]	= { .type = NLA_U32 },
+	[TCA_CODEL_INTERVAL]	= { .type = NLA_U32 },
+};
+
+static int codel_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CODEL_MAX + 1];
+	unsigned int qlen;
+	int err;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CODEL_MAX, opt, codel_policy);
+	if (err < 0)
+		return err;
+
+	sch_tree_lock(sch);
+	if (tb[TCA_CODEL_TARGET]) {
+		u32 target = nla_get_u32(tb[TCA_CODEL_TARGET]);
+
+		q->target = ns_to_ktime((u64) target * NSEC_PER_USEC);
+	}
+	if (tb[TCA_CODEL_INTERVAL]) {
+		u32 interval = nla_get_u32(tb[TCA_CODEL_INTERVAL]);
+
+		interval = min_t(u32, ~0U / NSEC_PER_USEC, interval);
+
+		q->interval = interval * NSEC_PER_USEC;
+		q->interval16 = ns_to_ktime(16 * (u64)q->interval);
+		codel_fill_cache(q);
+	}
+	if (tb[TCA_CODEL_LIMIT])
+		sch->limit = nla_get_u32(tb[TCA_CODEL_LIMIT]);
+
+	if (tb[TCA_CODEL_MINBYTES])
+		q->minbytes = nla_get_u32(tb[TCA_CODEL_MINBYTES]);
+
+	qlen = sch->q.qlen;
+	while (sch->q.qlen > sch->limit) {
+		struct sk_buff *skb = __skb_dequeue(&sch->q);
+
+		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qdisc_drop(skb, sch);
+	}
+	qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
+
+	q->drop_next.tv64 = q->first_above_time.tv64 = 0;
+	q->dropping = false;
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static int codel_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+
+	q->target = MS2TIME(5);
+	/* It should be possible to run with no limit,
+	 * with infinite memory :)
+	 */
+	sch->limit = DEFAULT_CODEL_LIMIT;
+	q->minbytes = psched_mtu(qdisc_dev(sch));
+	q->interval = 100 * NSEC_PER_MSEC;
+	q->interval16 = ns_to_ktime(16 * (u64)q->interval);
+	q->drop_next.tv64 = q->first_above_time.tv64 = 0;
+	q->dropping = false; /* exit dropping state */
+	q->count = 1;
+	codel_fill_cache(q);
+	if (opt) {
+		int err = codel_change(sch, opt);
+
+		if (err)
+	 		return err;
+	}
+
+	if (sch->limit >= 1)
+		sch->flags |= TCQ_F_CAN_BYPASS;
+	else
+		sch->flags &= ~TCQ_F_CAN_BYPASS;
+
+	return 0;
+}
+
+static int codel_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+	u32 target = ktime_to_us(q->target);
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+	if (nla_put_u32(skb, TCA_CODEL_TARGET, target) ||
+	    nla_put_u32(skb, TCA_CODEL_LIMIT, sch->limit) ||
+	    nla_put_u32(skb, TCA_CODEL_INTERVAL, q->interval / NSEC_PER_USEC) ||
+	    nla_put_u32(skb, TCA_CODEL_MINBYTES, q->minbytes))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -1;
+}
+
+static void codel_reset(struct Qdisc *sch)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+
+	qdisc_reset_queue(sch);
+	sch->q.qlen = 0;
+	q->dropping = false;
+	q->count = 1;
+}
+
+static struct Qdisc_ops codel_qdisc_ops __read_mostly = {
+	.id		=	"codel",
+	.priv_size	=	sizeof(struct codel_sched_data),
+
+	.enqueue	=	codel_enqueue,
+	.dequeue	=	codel_dequeue,
+	.peek		=	qdisc_peek_dequeued,
+	.init		=	codel_init,
+	.reset		=	codel_reset,
+	.change		=	codel_change,
+	.dump		=	codel_dump,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init codel_module_init(void)
+{
+        return register_qdisc(&codel_qdisc_ops);
+}
+static void __exit codel_module_exit(void)
+{
+        unregister_qdisc(&codel_qdisc_ops);
+}
+module_init(codel_module_init)
+module_exit(codel_module_exit)
+MODULE_LICENSE("GPL");
+



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 14:49     ` [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM Eric Dumazet
@ 2012-05-05 16:11       ` Dave Taht
  2012-05-05 17:07         ` Eric Dumazet
  2012-05-05 20:20       ` [Codel] [PATCH v5] pkt_sched: " Eric Dumazet
  1 sibling, 1 reply; 34+ messages in thread
From: Dave Taht @ 2012-05-05 16:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

Nice!

Nits:

0) I figure you already have an iproute2 patch that you can send?
    I thought 5 hours ago I had almost, but not entirely grokked netlink.
    The way you just did it was not at all how I thought it worked. :/
    but I will read.
1) I take it if a limit is not specified or set here, sch->limit comes
from txqueuelen?
    I do kind of like infinite queues (and angels dancing on the heads of pins)
2) I woke up with a mod that could do ecn. I'll do an rfc patch.
3) Tom's already on the list
4) I'd like to play with this a lot (and have others do so too) before
it goes upstream,
    gain kathie and vans blessing, etc. Couple weeks? (see 2). In
particular I was
    hoping to see actual pings under load match the target setting.
I'll get this
    going on two boxes and see what happens... play with bql, htb, etc...
5) thought the * 16 could be efficiently implemented by the compiler,
and saves a mem
    access.
6) unless 2) happens we can kill q->flags
7) thx

On Sat, May 5, 2012 at 7:49 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Dave Taht <dave.taht@gmail.com>
>
> A nice changelog here, to tell how nice is CoDel, giving pointers to
> documentation and all credits.

I don't have a link to the web site... nor have I read the paper, yet. Monday.

>
> Signed-off-by: Dave Taht <dave.taht@gmail.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/pkt_sched.h |   13 +
>  net/sched/Kconfig         |   11
>  net/sched/Makefile        |    1
>  net/sched/sch_codel.c     |  425 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 450 insertions(+)
>
> diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
> index ffe975c..62a73bf 100644
> --- a/include/linux/pkt_sched.h
> +++ b/include/linux/pkt_sched.h
> @@ -655,4 +655,17 @@ struct tc_qfq_stats {
>        __u32 lmax;
>  };
>
> +/* CODEL */
> +
> +enum {
> +       TCA_CODEL_UNSPEC,
> +       TCA_CODEL_TARGET,
> +       TCA_CODEL_LIMIT,
> +       TCA_CODEL_MINBYTES,
> +       TCA_CODEL_INTERVAL,
> +       __TCA_CODEL_MAX
> +};
> +
> +#define TCA_CODEL_MAX  (__TCA_CODEL_MAX - 1)
> +
>  #endif
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index 75b58f8..fadd252 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -250,6 +250,17 @@ config NET_SCH_QFQ
>
>          If unsure, say N.
>
> +config NET_SCH_CODEL
> +       tristate "Controlled Delay AQM (CODEL)"
> +       help
> +         Say Y here if you want to use the Controlled Delay (CODEL)
> +         packet scheduling algorithm.
> +
> +         To compile this driver as a module, choose M here: the module
> +         will be called sch_codel.
> +
> +         If unsure, say N.
> +
>  config NET_SCH_INGRESS
>        tristate "Ingress Qdisc"
>        depends on NET_CLS_ACT
> diff --git a/net/sched/Makefile b/net/sched/Makefile
> index 8cdf4e2..30fab03 100644
> --- a/net/sched/Makefile
> +++ b/net/sched/Makefile
> @@ -37,6 +37,7 @@ obj-$(CONFIG_NET_SCH_PLUG)    += sch_plug.o
>  obj-$(CONFIG_NET_SCH_MQPRIO)   += sch_mqprio.o
>  obj-$(CONFIG_NET_SCH_CHOKE)    += sch_choke.o
>  obj-$(CONFIG_NET_SCH_QFQ)      += sch_qfq.o
> +obj-$(CONFIG_NET_SCH_CODEL)    += sch_codel.o
>
>  obj-$(CONFIG_NET_CLS_U32)      += cls_u32.o
>  obj-$(CONFIG_NET_CLS_ROUTE4)   += cls_route.o
> diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
> new file mode 100644
> index 0000000..a19177f
> --- /dev/null
> +++ b/net/sched/sch_codel.c
> @@ -0,0 +1,425 @@
> +/*
> + * net/sched/sch_codel.c       A Codel implementation
> + *
> + *     This program is free software; you can redistribute it and/or
> + *     modify it under the terms of the GNU General Public License
> + *     as published by the Free Software Foundation; either version
> + *     2 of the License, or (at your option) any later version.
> + *
> + * Codel, the COntrolled DELay Queueing discipline
> + * Based on ns2 simulation code presented by Kathie Nichols
> + *
> + * Authors:    Dave Täht <d@taht.net>
> + *             Eric Dumazet <edumazet@google.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/ktime.h>
> +#include <linux/skbuff.h>
> +#include <net/pkt_sched.h>
> +
> +#define MS2TIME(a) (ns_to_ktime( (u64) a * NSEC_PER_MSEC))
> +#define DEFAULT_CODEL_LIMIT 1000
> +#define PRECALC_MAX 64
> +
> +/*
> + * Via patch found at:
> + * http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0659.html
> + * I don't know why this isn't in ktime.h as it seemed sane...
> + */
> +
> +/*
> + * ktime_compare - Compares two ktime_t variables
> + *
> + * Return val:
> + * lhs < rhs: < 0
> + * lhs == rhs: 0
> + * lhs > rhs: > 0
> + */
> +
> +#if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)
> +static inline int ktime_compare(const ktime_t lhs, const ktime_t rhs)
> +{
> +       if (lhs.tv64 < rhs.tv64)
> +               return -1;
> +       if (lhs.tv64 > rhs.tv64)
> +               return 1;
> +       return 0;
> +}
> +#else
> +static inline int ktime_compare(const ktime_t lhs, const ktime_t rhs)
> +{
> +       if (lhs.tv.sec < rhs.tv.sec)
> +               return -1;
> +       if (lhs.tv.sec > rhs.tv.sec)
> +               return 1;
> +       return lhs.tv.nsec - rhs.tv.nsec;
> +}
> +#endif
> +
> +/* Per-queue state (codel_queue_t instance variables) */
> +
> +struct codel_sched_data {
> +       u32     flags;
> +       u32     minbytes;
> +       u32     count; /* packets dropped since we went into drop state */
> +       u32     drop_count;
> +       bool    dropping;
> +       ktime_t target;
> +       /* time to declare above q->target (0 if below)*/
> +       ktime_t first_above_time;
> +       ktime_t drop_next; /* time to drop next packet */
> +       ktime_t interval16;
> +       u32     interval;
> +       u32     q_intervals[PRECALC_MAX];
> +};
> +
> +struct codel_skb_cb {
> +       ktime_t enqueue_time;
> +};
> +
> +static unsigned int state1;
> +static unsigned int state2;
> +static unsigned int state3;
> +static unsigned int states;
> +
> +/*
> + * return interval/sqrt(x) with good precision
> + */
> +static u32 calc(u32 _interval, unsigned long x)
> +{
> +       u64 interval = _interval;
> +
> +       /* scale operands for max precision */
> +       while (x < (1UL << (BITS_PER_LONG - 2))) {
> +               x <<= 2;
> +               interval <<= 1;
> +       }
> +       do_div(interval, int_sqrt(x));
> +       return (u32)interval;
> +}
> +
> +static void codel_fill_cache(struct codel_sched_data *q)
> +{
> +       int i;
> +
> +       q->q_intervals[0] = q->interval;
> +       for (i = 2; i <= PRECALC_MAX; i++)
> +               q->q_intervals[i - 1] = calc(q->interval, i);
> +}
> +
> +static struct codel_skb_cb *get_codel_cb(const struct sk_buff *skb)
> +{
> +       qdisc_cb_private_validate(skb, sizeof(struct codel_skb_cb));
> +       return (struct codel_skb_cb *)qdisc_skb_cb(skb)->data;
> +}
> +
> +static ktime_t get_enqueue_time(const struct sk_buff *skb)
> +{
> +       return get_codel_cb(skb)->enqueue_time;
> +}
> +
> +static void set_enqueue_time(struct sk_buff *skb)
> +{
> +       get_codel_cb(skb)->enqueue_time = ktime_get();
> +}
> +
> +/*
> + *     The original control_law required floating point.
> + *
> + *     return ktime_add_ns(t, q->interval / sqrt(q->count));
> + *
> + */
> +static ktime_t control_law(const struct codel_sched_data *q, ktime_t t)
> +{
> +       u32 inter;
> +
> +       if (q->count > PRECALC_MAX)
> +               inter = calc(q->interval, q->count);
> +       else
> +               inter = q->q_intervals[q->count - 1];
> +       return ktime_add_ns(t, inter);
> +}
> +
> +static bool should_drop(struct sk_buff *skb, struct Qdisc *sch, ktime_t now)
> +{
> +       struct codel_sched_data *q = qdisc_priv(sch);
> +       ktime_t sojourn_time;
> +       bool drop;
> +
> +       if (!skb) {
> +               q->first_above_time.tv64 = 0;
> +               return false;
> +       }
> +       sojourn_time = ktime_sub(now, get_enqueue_time(skb));
> +
> +       if (ktime_compare(sojourn_time, q->target) < 0 ||
> +           sch->qstats.backlog < q->minbytes) {
> +               /* went below so we'll stay below for at least q->interval */
> +               q->first_above_time.tv64 = 0;
> +               return false;
> +       }
> +       drop = false;
> +       if (q->first_above_time.tv64 == 0) {
> +               /* just went above from below. If we stay above
> +                * for at least q->interval we'll say it's ok to drop
> +                */
> +               q->first_above_time = ktime_add_ns(now, q->interval);
> +       } else if (ktime_compare(now, q->first_above_time) >= 0) {
> +               drop = true;
> +               state1++;
> +       }
> +       return drop;
> +}
> +
> +static void codel_drop(struct Qdisc *sch, struct sk_buff *skb)
> +{
> +       struct codel_sched_data *q = qdisc_priv(sch);
> +
> +       sch->qstats.backlog -= qdisc_pkt_len(skb);
> +       qdisc_drop(skb, sch);
> +       q->drop_count++;
> +}
> +
> +static struct sk_buff *codel_dequeue(struct Qdisc *sch)
> +{
> +       struct codel_sched_data *q = qdisc_priv(sch);
> +       struct sk_buff *skb = __skb_dequeue(&sch->q);
> +       ktime_t now;
> +       bool drop;
> +
> +       if (!skb) {
> +               q->dropping = false;
> +               return skb;
> +       }
> +       now = ktime_get();
> +       drop = should_drop(skb, sch, now);
> +       if (q->dropping) {
> +               if (!drop) {
> +                       /* sojourn time below target - leave dropping state */
> +                       q->dropping = false;
> +               } else if (ktime_compare(now, q->drop_next) >=0) {
> +                       state2++;
> +                       /* It's time for the next drop. Drop the current
> +                        * packet and dequeue the next. The dequeue might
> +                        * take us out of dropping state.
> +                        * If not, schedule the next drop.
> +                        * A large backlog might result in drop rates so high
> +                        * that the next drop should happen now,
> +                        * hence the while loop.
> +                        */
> +                       while (q->dropping &&
> +                              (ktime_compare(now, q->drop_next) >= 0)) {
> +                               codel_drop(sch, skb);
> +                               q->count++;
> +                               skb = __skb_dequeue(&sch->q);
> +                               if (!should_drop(skb, sch, now)) {
> +                                       /* leave dropping state */
> +                                       q->dropping = false;
> +                               } else {
> +                                       /* and schedule the next drop */
> +                                       q->drop_next =
> +                                               control_law(q, q->drop_next);
> +                               }
> +                       }
> +               }
> +       } else if (drop &&
> +                  ((ktime_compare(ktime_sub(now, q->drop_next),
> +                                  q->interval16) < 0) ||
> +                  (ktime_compare(ktime_sub(now, q->first_above_time),
> +                                 ns_to_ktime(2 * q->interval)) >= 0 ))) {
> +               codel_drop(sch, skb);
> +               skb = __skb_dequeue(&sch->q);
> +               drop = should_drop(skb, sch, now);
> +               q->dropping = true;
> +               state3++;
> +               /*
> +                * if min went above target close to when we last went below it
> +                * assume that the drop rate that controlled the queue on the
> +                * last cycle is a good starting point to control it now.
> +                */
> +               if (ktime_compare(ktime_sub(now, q->drop_next),
> +                                 q->interval16) < 0) {
> +                       q->count = q->count > 1 ? q->count - 1 : 1;
> +               } else {
> +                       q->count = 1;
> +               }
> +               q->drop_next = control_law(q, now);
> +       }
> +       if ((states++ % 64) == 0) {
> +               pr_debug("s1: %u, s2: %u, s3: %u\n",
> +                         state1, state2, state3);
> +       }
> +       /* We cant call qdisc_tree_decrease_qlen() if our qlen is 0,
> +        * or HTB crashes
> +        */
> +       if (q->drop_count && sch->q.qlen) {
> +               qdisc_tree_decrease_qlen(sch, q->drop_count);
> +               q->drop_count = 0;
> +       }
> +       if (skb) {
> +               sch->qstats.backlog -= qdisc_pkt_len(skb);
> +               qdisc_bstats_update(sch, skb);
> +       }
> +       return skb;
> +}
> +
> +static int codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> +{
> +       if (likely(skb_queue_len(&sch->q) < sch->limit)) {
> +               set_enqueue_time(skb);
> +               return qdisc_enqueue_tail(skb, sch);
> +       }
> +       return qdisc_drop(skb, sch);
> +}
> +
> +static const struct nla_policy codel_policy[TCA_CODEL_MAX + 1] = {
> +       [TCA_CODEL_TARGET]      = { .type = NLA_U32 },
> +       [TCA_CODEL_LIMIT]       = { .type = NLA_U32 },
> +       [TCA_CODEL_MINBYTES]    = { .type = NLA_U32 },
> +       [TCA_CODEL_INTERVAL]    = { .type = NLA_U32 },
> +};
> +
> +static int codel_change(struct Qdisc *sch, struct nlattr *opt)
> +{
> +       struct codel_sched_data *q = qdisc_priv(sch);
> +       struct nlattr *tb[TCA_CODEL_MAX + 1];
> +       unsigned int qlen;
> +       int err;
> +
> +       if (opt == NULL)
> +               return -EINVAL;
> +
> +       err = nla_parse_nested(tb, TCA_CODEL_MAX, opt, codel_policy);
> +       if (err < 0)
> +               return err;
> +
> +       sch_tree_lock(sch);
> +       if (tb[TCA_CODEL_TARGET]) {
> +               u32 target = nla_get_u32(tb[TCA_CODEL_TARGET]);
> +
> +               q->target = ns_to_ktime((u64) target * NSEC_PER_USEC);
> +       }
> +       if (tb[TCA_CODEL_INTERVAL]) {
> +               u32 interval = nla_get_u32(tb[TCA_CODEL_INTERVAL]);
> +
> +               interval = min_t(u32, ~0U / NSEC_PER_USEC, interval);
> +
> +               q->interval = interval * NSEC_PER_USEC;
> +               q->interval16 = ns_to_ktime(16 * (u64)q->interval);
> +               codel_fill_cache(q);
> +       }
> +       if (tb[TCA_CODEL_LIMIT])
> +               sch->limit = nla_get_u32(tb[TCA_CODEL_LIMIT]);
> +
> +       if (tb[TCA_CODEL_MINBYTES])
> +               q->minbytes = nla_get_u32(tb[TCA_CODEL_MINBYTES]);
> +
> +       qlen = sch->q.qlen;
> +       while (sch->q.qlen > sch->limit) {
> +               struct sk_buff *skb = __skb_dequeue(&sch->q);
> +
> +               sch->qstats.backlog -= qdisc_pkt_len(skb);
> +               qdisc_drop(skb, sch);
> +       }
> +       qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
> +
> +       q->drop_next.tv64 = q->first_above_time.tv64 = 0;
> +       q->dropping = false;
> +       sch_tree_unlock(sch);
> +       return 0;
> +}
> +
> +static int codel_init(struct Qdisc *sch, struct nlattr *opt)
> +{
> +       struct codel_sched_data *q = qdisc_priv(sch);
> +
> +       q->target = MS2TIME(5);
> +       /* It should be possible to run with no limit,
> +        * with infinite memory :)
> +        */
> +       sch->limit = DEFAULT_CODEL_LIMIT;
> +       q->minbytes = psched_mtu(qdisc_dev(sch));
> +       q->interval = 100 * NSEC_PER_MSEC;
> +       q->interval16 = ns_to_ktime(16 * (u64)q->interval);
> +       q->drop_next.tv64 = q->first_above_time.tv64 = 0;
> +       q->dropping = false; /* exit dropping state */
> +       q->count = 1;
> +       codel_fill_cache(q);
> +       if (opt) {
> +               int err = codel_change(sch, opt);
> +
> +               if (err)
> +                       return err;
> +       }
> +
> +       if (sch->limit >= 1)
> +               sch->flags |= TCQ_F_CAN_BYPASS;
> +       else
> +               sch->flags &= ~TCQ_F_CAN_BYPASS;
> +
> +       return 0;
> +}
> +
> +static int codel_dump(struct Qdisc *sch, struct sk_buff *skb)
> +{
> +       struct codel_sched_data *q = qdisc_priv(sch);
> +       struct nlattr *opts;
> +       u32 target = ktime_to_us(q->target);
> +
> +       opts = nla_nest_start(skb, TCA_OPTIONS);
> +       if (opts == NULL)
> +               goto nla_put_failure;
> +       if (nla_put_u32(skb, TCA_CODEL_TARGET, target) ||
> +           nla_put_u32(skb, TCA_CODEL_LIMIT, sch->limit) ||
> +           nla_put_u32(skb, TCA_CODEL_INTERVAL, q->interval / NSEC_PER_USEC) ||
> +           nla_put_u32(skb, TCA_CODEL_MINBYTES, q->minbytes))
> +               goto nla_put_failure;
> +
> +       return nla_nest_end(skb, opts);
> +
> +nla_put_failure:
> +       nla_nest_cancel(skb, opts);
> +       return -1;
> +}
> +
> +static void codel_reset(struct Qdisc *sch)
> +{
> +       struct codel_sched_data *q = qdisc_priv(sch);
> +
> +       qdisc_reset_queue(sch);
> +       sch->q.qlen = 0;
> +       q->dropping = false;
> +       q->count = 1;
> +}
> +
> +static struct Qdisc_ops codel_qdisc_ops __read_mostly = {
> +       .id             =       "codel",
> +       .priv_size      =       sizeof(struct codel_sched_data),
> +
> +       .enqueue        =       codel_enqueue,
> +       .dequeue        =       codel_dequeue,
> +       .peek           =       qdisc_peek_dequeued,
> +       .init           =       codel_init,
> +       .reset          =       codel_reset,
> +       .change         =       codel_change,
> +       .dump           =       codel_dump,
> +       .owner          =       THIS_MODULE,
> +};
> +
> +static int __init codel_module_init(void)
> +{
> +        return register_qdisc(&codel_qdisc_ops);
> +}
> +static void __exit codel_module_exit(void)
> +{
> +        unregister_qdisc(&codel_qdisc_ops);
> +}
> +module_init(codel_module_init)
> +module_exit(codel_module_exit)
> +MODULE_LICENSE("GPL");
> +
>
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 16:11       ` Dave Taht
@ 2012-05-05 17:07         ` Eric Dumazet
  2012-05-05 17:22           ` Dave Taht
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 17:07 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 09:11 -0700, Dave Taht wrote:
> Nice!
> 
> Nits:
> 
> 0) I figure you already have an iproute2 patch that you can send?
>     I thought 5 hours ago I had almost, but not entirely grokked netlink.
>     The way you just did it was not at all how I thought it worked. :/
>     but I will read.


I'll send it ( I am currently with friends ...), but you dont need it to
use codel with default params :

qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec codel

> 1) I take it if a limit is not specified or set here, sch->limit comes
> from txqueuelen?

No, you set a default limit of 1000 in your patch

>     I do kind of like infinite queues (and angels dancing on the heads of pins)
> 2) I woke up with a mod that could do ecn. I'll do an rfc patch

Not sure it can play, if all packets are ECN, you need to drop at some
point.

> .
> 3) Tom's already on the list
ok
> 4) I'd like to play with this a lot (and have others do so too) before
> it goes upstream,
>     gain kathie and vans blessing, etc. Couple weeks? (see 2). In
> particular I was
>     hoping to see actual pings under load match the target setting.
> I'll get this
>     going on two boxes and see what happens... play with bql, htb, etc...

Hey, you'll send the patch when ready.

> 5) thought the * 16 could be efficiently implemented by the compiler,
> and saves a mem
>     access.

Well, just see the code on x86_32, the ktime conversion is expensive.
Access to memory is free since cache line is already in cpu cache.

By the way, I found the precalculated array of 64 values is not really
useful, we have q->count > 64 most of the time with 2 flows.

> 6) unless 2) happens we can kill q->flags

Yes





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 17:07         ` Eric Dumazet
@ 2012-05-05 17:22           ` Dave Taht
  2012-05-05 18:54             ` [Codel] [PATCH iproute2] " Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: Dave Taht @ 2012-05-05 17:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

On Sat, May 5, 2012 at 10:07 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2012-05-05 at 09:11 -0700, Dave Taht wrote:
>> Nice!
>>
>> Nits:
>>
>> 0) I figure you already have an iproute2 patch that you can send?
>>     I thought 5 hours ago I had almost, but not entirely grokked netlink.
>>     The way you just did it was not at all how I thought it worked. :/
>>     but I will read.
>
>
> I'll send it ( I am currently with friends ...), but you dont need it to
> use codel with default params :
>
> qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec codel

yep.

>> 1) I take it if a limit is not specified or set here, sch->limit comes
>> from txqueuelen?
>
> No, you set a default limit of 1000 in your patch

If I didn't set that at all, where does it come from?

(never mind I can find it, enjoy your saturday!)

>
>>     I do kind of like infinite queues (and angels dancing on the heads of pins)
>> 2) I woke up with a mod that could do ecn. I'll do an rfc patch
>
> Not sure it can play, if all packets are ECN, you need to drop at some
> point.

Yea, that's where blue had a few ideas, but the right thing is to setup
some long rtts and see what happens.

>> .
>> 3) Tom's already on the list
> ok
>> 4) I'd like to play with this a lot (and have others do so too) before
>> it goes upstream,
>>     gain kathie and vans blessing, etc. Couple weeks? (see 2). In
>> particular I was
>>     hoping to see actual pings under load match the target setting.
>> I'll get this
>>     going on two boxes and see what happens... play with bql, htb, etc...
>
> Hey, you'll send the patch when ready.

ok!

>
>> 5) thought the * 16 could be efficiently implemented by the compiler,
>> and saves a mem
>>     access.
>
> Well, just see the code on x86_32, the ktime conversion is expensive.
> Access to memory is free since cache line is already in cpu cache.

on everything but the x86_32 (do people still use that? :)). I would think
on the x86_64 and mips (16 bit memory bus) would be cheaper...
but it's rather a nit regardless.

>
> By the way, I found the precalculated array of 64 values is not really
> useful, we have q->count > 64 most of the time with 2 flows.

yea, just noticed that. I figured 25 wasn't enough, didn't think 64 was
either. will get some data on various flows and find a better value.

>
>> 6) unless 2) happens we can kill q->flags
>
> Yes

Enjoy your saturday!
>
>
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Codel] [PATCH iproute2] codel: Controlled Delay AQM
  2012-05-05 17:22           ` Dave Taht
@ 2012-05-05 18:54             ` Eric Dumazet
  2012-05-05 19:08               ` Eric Dumazet
  2012-05-05 21:30               ` Eric Dumazet
  0 siblings, 2 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 18:54 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

From: Eric Dumazet <edumazet@google.com>

tc qdisc .... codel [ limit PACKETS ] [ target TIME]
                    [ interval TIME ] [ minbytes BYTES ]

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/pkt_sched.h |   14 +++
 tc/Makefile               |    1 
 tc/q_codel.c              |  134 ++++++++++++++++++++++++++++++++++++
 3 files changed, 149 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 410b33d..62a73bf 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -509,6 +509,7 @@ enum {
 	TCA_NETEM_CORRUPT,
 	TCA_NETEM_LOSS,
 	TCA_NETEM_RATE,
+	TCA_NETEM_ECN,
 	__TCA_NETEM_MAX,
 };
 
@@ -654,4 +655,17 @@ struct tc_qfq_stats {
 	__u32 lmax;
 };
 
+/* CODEL */
+
+enum {
+	TCA_CODEL_UNSPEC,
+	TCA_CODEL_TARGET,
+	TCA_CODEL_LIMIT,
+	TCA_CODEL_MINBYTES,
+	TCA_CODEL_INTERVAL,
+	__TCA_CODEL_MAX
+};
+
+#define TCA_CODEL_MAX	(__TCA_CODEL_MAX - 1)
+
 #endif
diff --git a/tc/Makefile b/tc/Makefile
index be8cd5a..8a7cc8d 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -47,6 +47,7 @@ TCMODULES += em_cmp.o
 TCMODULES += em_u32.o
 TCMODULES += em_meta.o
 TCMODULES += q_mqprio.o
+TCMODULES += q_codel.o
 
 TCSO :=
 ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_codel.c b/tc/q_codel.c
new file mode 100644
index 0000000..c711b6f
--- /dev/null
+++ b/tc/q_codel.c
@@ -0,0 +1,134 @@
+/*
+ * q_codel.c		Codel.
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Eric Dumazet <edumazet@google.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+#include <math.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... codel [ limit PACKETS ] [ target TIME]\n");
+	fprintf(stderr, "                 [ interval TIME ] [ minbytes BYTES ]\n");
+}
+
+static int codel_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			   struct nlmsghdr *n)
+{
+	struct tc_red_qopt opt;
+	unsigned limit = 0;
+	unsigned target = 0;
+	unsigned interval = 0;
+	unsigned minbytes = 0;
+	struct rtattr *tail;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "limit") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&limit, *argv, 0)) {
+				fprintf(stderr, "Illegal \"limit\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "minbytes") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&minbytes, *argv, 0)) {
+				fprintf(stderr, "Illegal \"minbytes\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "target") == 0) {
+			NEXT_ARG();
+			if (get_time(&target, *argv)) {
+				fprintf(stderr, "Illegal \"target\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "interval") == 0) {
+			NEXT_ARG();
+			if (get_time(&interval, *argv)) {
+				fprintf(stderr, "Illegal \"interval\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	tail = NLMSG_TAIL(n);
+	if (limit)
+		addattr_l(n, 1024, TCA_CODEL_LIMIT, &limit, sizeof(limit));
+	if (minbytes)
+		addattr_l(n, 1024, TCA_CODEL_MINBYTES, &minbytes, sizeof(minbytes));
+	if (interval)
+		addattr_l(n, 1024, TCA_CODEL_INTERVAL, &interval, sizeof(interval));
+	if (target)
+		addattr_l(n, 1024, TCA_CODEL_TARGET, &target, sizeof(target));
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+static int codel_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CODEL_MAX + 1];
+	unsigned limit;
+	unsigned interval;
+	unsigned target;
+	unsigned minbytes;
+	SPRINT_BUF(b1);
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CODEL_MAX, opt);
+
+	if (tb[TCA_CODEL_LIMIT] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_LIMIT]) >= sizeof(__u32)) {
+		limit = rta_getattr_u32(tb[TCA_CODEL_LIMIT]);
+		fprintf(f, "limit %up ", limit);
+	}
+	if (tb[TCA_CODEL_MINBYTES] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_MINBYTES]) >= sizeof(__u32)) {
+		minbytes = rta_getattr_u32(tb[TCA_CODEL_MINBYTES]);
+		fprintf(f, "minbytes %u ", minbytes);
+	}
+	if (tb[TCA_CODEL_TARGET] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_TARGET]) >= sizeof(__u32)) {
+		target = rta_getattr_u32(tb[TCA_CODEL_TARGET]);
+		fprintf(f, "target %s ", sprint_time(target, b1));
+	}
+	if (tb[TCA_CODEL_INTERVAL] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_INTERVAL]) >= sizeof(__u32)) {
+		interval = rta_getattr_u32(tb[TCA_CODEL_INTERVAL]);
+		fprintf(f, "interval %s ", sprint_time(interval, b1));
+	}
+
+	return 0;
+}
+
+
+struct qdisc_util codel_qdisc_util = {
+	.id		= "codel",
+	.parse_qopt	= codel_parse_opt,
+	.print_qopt	= codel_print_opt,
+};



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH iproute2] codel: Controlled Delay AQM
  2012-05-05 18:54             ` [Codel] [PATCH iproute2] " Eric Dumazet
@ 2012-05-05 19:08               ` Eric Dumazet
  2012-05-05 21:30               ` Eric Dumazet
  1 sibling, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 19:08 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 20:54 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> tc qdisc .... codel [ limit PACKETS ] [ target TIME]
>                     [ interval TIME ] [ minbytes BYTES ]
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/pkt_sched.h |   14 +++
>  tc/Makefile               |    1 
>  tc/q_codel.c              |  134 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 149 insertions(+)
> 
> diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
> index 410b33d..62a73bf 100644
> --- a/include/linux/pkt_sched.h
> +++ b/include/linux/pkt_sched.h
> @@ -509,6 +509,7 @@ enum {
>  	TCA_NETEM_CORRUPT,
>  	TCA_NETEM_LOSS,
>  	TCA_NETEM_RATE,
> +	TCA_NETEM_ECN,
>  	__TCA_NETEM_MAX,
>  };
>  
> @@ -654,4 +655,17 @@ struct tc_qfq_stats {
>  	__u32 lmax;
>  };
>  
> +/* CODEL */
> +
> +enum {
> +	TCA_CODEL_UNSPEC,
> +	TCA_CODEL_TARGET,
> +	TCA_CODEL_LIMIT,
> +	TCA_CODEL_MINBYTES,
> +	TCA_CODEL_INTERVAL,
> +	__TCA_CODEL_MAX
> +};
> +
> +#define TCA_CODEL_MAX	(__TCA_CODEL_MAX - 1)
> +
>  #endif
> diff --git a/tc/Makefile b/tc/Makefile
> index be8cd5a..8a7cc8d 100644
> --- a/tc/Makefile
> +++ b/tc/Makefile
> @@ -47,6 +47,7 @@ TCMODULES += em_cmp.o
>  TCMODULES += em_u32.o
>  TCMODULES += em_meta.o
>  TCMODULES += q_mqprio.o
> +TCMODULES += q_codel.o
>  
>  TCSO :=
>  ifeq ($(TC_CONFIG_ATM),y)
> diff --git a/tc/q_codel.c b/tc/q_codel.c
> new file mode 100644
> index 0000000..c711b6f
> --- /dev/null
> +++ b/tc/q_codel.c
> @@ -0,0 +1,134 @@
> +/*
> + * q_codel.c		Codel.
> + *
> + *		This program is free software; you can redistribute it and/or
> + *		modify it under the terms of the GNU General Public License
> + *		as published by the Free Software Foundation; either version
> + *		2 of the License, or (at your option) any later version.
> + *
> + * Authors:	Eric Dumazet <edumazet@google.com>
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <syslog.h>
> +#include <fcntl.h>
> +#include <sys/socket.h>
> +#include <netinet/in.h>
> +#include <arpa/inet.h>
> +#include <string.h>
> +#include <math.h>
> +
> +#include "utils.h"
> +#include "tc_util.h"
> +
> +static void explain(void)
> +{
> +	fprintf(stderr, "Usage: ... codel [ limit PACKETS ] [ target TIME]\n");
> +	fprintf(stderr, "                 [ interval TIME ] [ minbytes BYTES ]\n");
> +}
> +
> +static int codel_parse_opt(struct qdisc_util *qu, int argc, char **argv,
> +			   struct nlmsghdr *n)
> +{
> +	struct tc_red_qopt opt;
> +	unsigned limit = 0;
> +	unsigned target = 0;
> +	unsigned interval = 0;
> +	unsigned minbytes = 0;
> +	struct rtattr *tail;
> +
> +	while (argc > 0) {
> +		if (strcmp(*argv, "limit") == 0) {
> +			NEXT_ARG();
> +			if (get_unsigned(&limit, *argv, 0)) {
> +				fprintf(stderr, "Illegal \"limit\"\n");
> +				return -1;
> +			}
> +		} else if (strcmp(*argv, "minbytes") == 0) {
> +			NEXT_ARG();
> +			if (get_unsigned(&minbytes, *argv, 0)) {
> +				fprintf(stderr, "Illegal \"minbytes\"\n");
> +				return -1;
> +			}
> +		} else if (strcmp(*argv, "target") == 0) {
> +			NEXT_ARG();
> +			if (get_time(&target, *argv)) {
> +				fprintf(stderr, "Illegal \"target\"\n");
> +				return -1;
> +			}
> +		} else if (strcmp(*argv, "interval") == 0) {
> +			NEXT_ARG();
> +			if (get_time(&interval, *argv)) {
> +				fprintf(stderr, "Illegal \"interval\"\n");
> +				return -1;
> +			}
> +		} else if (strcmp(*argv, "help") == 0) {
> +			explain();
> +			return -1;
> +		} else {
> +			fprintf(stderr, "What is \"%s\"?\n", *argv);
> +			explain();
> +			return -1;
> +		}
> +		argc--; argv++;
> +	}
> +
> +	tail = NLMSG_TAIL(n);

A line is missing here , please add :

	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);

> +	if (limit)
> +		addattr_l(n, 1024, TCA_CODEL_LIMIT, &limit, sizeof(limit));
> +	if (minbytes)
> +		addattr_l(n, 1024, TCA_CODEL_MINBYTES, &minbytes, sizeof(minbytes));
> +	if (interval)
> +		addattr_l(n, 1024, TCA_CODEL_INTERVAL, &interval, sizeof(interval));
> +	if (target)
> +		addattr_l(n, 1024, TCA_CODEL_TARGET, &target, sizeof(target));
> +	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
> +	return 0;
> +}
> +


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 14:49     ` [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM Eric Dumazet
  2012-05-05 16:11       ` Dave Taht
@ 2012-05-05 20:20       ` Eric Dumazet
  2012-05-05 20:36         ` Eric Dumazet
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 20:20 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 16:49 +0200, Eric Dumazet wrote:
> From: Dave Taht <dave.taht@gmail.com>
> 

> +static bool should_drop(struct sk_buff *skb, struct Qdisc *sch, ktime_t now)
> +{
> +	struct codel_sched_data *q = qdisc_priv(sch);
> +	ktime_t sojourn_time;
> +	bool drop;
> +
> +	if (!skb) {
> +		q->first_above_time.tv64 = 0;
> +		return false;
> +	}
> +	sojourn_time = ktime_sub(now, get_enqueue_time(skb));
> +
> +	if (ktime_compare(sojourn_time, q->target) < 0 || 
> +	    sch->qstats.backlog < q->minbytes) {
> +		/* went below so we'll stay below for at least q->interval */
> +		q->first_above_time.tv64 = 0;
> +		return false;
> +	}

I believe we should allow the last packet to be sent even if
sch->qstats.backlog >= q->minbytes

Hmm... this means we should do the 
sch->qstats.backlog -= qdisc_pkt_len(skb);
right after the calls to __skb_dequeue(&sch->q);

(and not in the codel_drop() or at end of codel_dequeue())




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 20:20       ` [Codel] [PATCH v5] pkt_sched: " Eric Dumazet
@ 2012-05-05 20:36         ` Eric Dumazet
  2012-05-05 21:11           ` Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 20:36 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 22:20 +0200, Eric Dumazet wrote:

> I believe we should allow the last packet to be sent even if
> sch->qstats.backlog >= q->minbytes
> 
> Hmm... this means we should do the 
> sch->qstats.backlog -= qdisc_pkt_len(skb);
> right after the calls to __skb_dequeue(&sch->q);
> 
> (and not in the codel_drop() or at end of codel_dequeue())
> 
> 

I am also adding a dump_stats capability, so I'll resend a v6

(So that we can check runtime param like 'q->count' and other
interesting stuff)




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 20:36         ` Eric Dumazet
@ 2012-05-05 21:11           ` Eric Dumazet
  2012-05-05 21:12             ` dave taht
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 21:11 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 22:36 +0200, Eric Dumazet wrote:
> On Sat, 2012-05-05 at 22:20 +0200, Eric Dumazet wrote:
> 
> > I believe we should allow the last packet to be sent even if
> > sch->qstats.backlog >= q->minbytes
> > 
> > Hmm... this means we should do the 
> > sch->qstats.backlog -= qdisc_pkt_len(skb);
> > right after the calls to __skb_dequeue(&sch->q);
> > 
> > (and not in the codel_drop() or at end of codel_dequeue())
> > 
> > 
> 
> I am also adding a dump_stats capability, so I'll resend a v6
> 
> (So that we can check runtime param like 'q->count' and other
> interesting stuff)
> 
> 

I believe we can remove the cache of 64 values, since q->count is way
bigger most if the time, according to my results.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 21:11           ` Eric Dumazet
@ 2012-05-05 21:12             ` dave taht
  2012-05-05 21:20               ` Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: dave taht @ 2012-05-05 21:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

On 05/05/2012 02:11 PM, Eric Dumazet wrote:
> On Sat, 2012-05-05 at 22:36 +0200, Eric Dumazet wrote:
>> On Sat, 2012-05-05 at 22:20 +0200, Eric Dumazet wrote:
>>
>>> I believe we should allow the last packet to be sent even if
>>> sch->qstats.backlog>= q->minbytes
>>>
>>> Hmm... this means we should do the
>>> sch->qstats.backlog -= qdisc_pkt_len(skb);
>>> right after the calls to __skb_dequeue(&sch->q);
>>>
>>> (and not in the codel_drop() or at end of codel_dequeue())
>>>
>>>
>> I am also adding a dump_stats capability, so I'll resend a v6
>>
>> (So that we can check runtime param like 'q->count' and other
>> interesting stuff)
>>
>>
> I believe we can remove the cache of 64 values, since q->count is way
> bigger most if the time, according to my results.
>
>
>
I went the other way and made it be 1000. How much bigger?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 21:12             ` dave taht
@ 2012-05-05 21:20               ` Eric Dumazet
  2012-05-05 21:28                 ` [Codel] [PATCH v6] " Eric Dumazet
  2012-05-05 22:03                 ` [Codel] [PATCH v5] " dave taht
  0 siblings, 2 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 21:20 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 14:12 -0700, dave taht wrote:
> On 05/05/2012 02:11 PM, Eric Dumazet wrote:
> > On Sat, 2012-05-05 at 22:36 +0200, Eric Dumazet wrote:
> >> On Sat, 2012-05-05 at 22:20 +0200, Eric Dumazet wrote:
> >>
> >>> I believe we should allow the last packet to be sent even if
> >>> sch->qstats.backlog>= q->minbytes
> >>>
> >>> Hmm... this means we should do the
> >>> sch->qstats.backlog -= qdisc_pkt_len(skb);
> >>> right after the calls to __skb_dequeue(&sch->q);
> >>>
> >>> (and not in the codel_drop() or at end of codel_dequeue())
> >>>
> >>>
> >> I am also adding a dump_stats capability, so I'll resend a v6
> >>
> >> (So that we can check runtime param like 'q->count' and other
> >> interesting stuff)
> >>
> >>
> > I believe we can remove the cache of 64 values, since q->count is way
> > bigger most if the time, according to my results.
> >
> >
> >
> I went the other way and made it be 1000. How much bigger?
> 

We can compute the thing in less time than a cache line miss.

So I removed the cache, it makes code simpler.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Codel] [PATCH v6] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 21:20               ` Eric Dumazet
@ 2012-05-05 21:28                 ` Eric Dumazet
  2012-05-05 21:40                   ` Eric Dumazet
                                     ` (2 more replies)
  2012-05-05 22:03                 ` [Codel] [PATCH v5] " dave taht
  1 sibling, 3 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 21:28 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

 include/linux/pkt_sched.h |   19 +
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_codel.c     |  414 ++++++++++++++++++++++++++++++++++++
 4 files changed, 445 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index ffe975c..420ea95 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -655,4 +655,23 @@ struct tc_qfq_stats {
 	__u32 lmax;
 };
 
+/* CODEL */
+
+enum {
+	TCA_CODEL_UNSPEC,
+	TCA_CODEL_TARGET,
+	TCA_CODEL_LIMIT,
+	TCA_CODEL_MINBYTES,
+	TCA_CODEL_INTERVAL,
+	__TCA_CODEL_MAX
+};
+
+#define TCA_CODEL_MAX	(__TCA_CODEL_MAX - 1)
+
+struct tc_codel_xstats {
+	__u32		count;
+	__u32		delay; /* time elapsed since next packet was queued (in us) */
+	__u32		drop_next;
+};
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 75b58f8..fadd252 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -250,6 +250,17 @@ config NET_SCH_QFQ
 
 	  If unsure, say N.
 
+config NET_SCH_CODEL
+	tristate "Controlled Delay AQM (CODEL)"
+	help
+	  Say Y here if you want to use the Controlled Delay (CODEL)
+	  packet scheduling algorithm.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called sch_codel.
+
+	  If unsure, say N.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8cdf4e2..30fab03 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -37,6 +37,7 @@ obj-$(CONFIG_NET_SCH_PLUG)	+= sch_plug.o
 obj-$(CONFIG_NET_SCH_MQPRIO)	+= sch_mqprio.o
 obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_SCH_QFQ)	+= sch_qfq.o
+obj-$(CONFIG_NET_SCH_CODEL)	+= sch_codel.o
 
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
new file mode 100644
index 0000000..2e938c1
--- /dev/null
+++ b/net/sched/sch_codel.c
@@ -0,0 +1,414 @@
+/*
+ * net/sched/sch_codel.c	A Codel implementation
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ * 
+ * Codel, the COntrolled DELay Queueing discipline
+ * Based on ns2 simulation code presented by Kathie Nichols
+ *
+ * Authors:	Dave Täht <d@taht.net>
+ *		Eric Dumazet <edumazet@google.com>
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+
+#define MS2TIME(a) (ns_to_ktime( (u64) a * NSEC_PER_MSEC))
+#define DEFAULT_CODEL_LIMIT 1000
+
+/* 
+ * Via patch found at:
+ * http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0659.html 
+ * I don't know why this isn't in ktime.h as it seemed sane...
+ */
+
+/*
+ * ktime_compare - Compares two ktime_t variables
+ *
+ * Return val:
+ * lhs < rhs: < 0
+ * lhs == rhs: 0
+ * lhs > rhs: > 0
+ */
+
+#if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)
+static inline int ktime_compare(const ktime_t lhs, const ktime_t rhs)
+{
+	if (lhs.tv64 < rhs.tv64)
+		return -1;
+	if (lhs.tv64 > rhs.tv64)
+		return 1;
+	return 0;
+}
+#else
+static inline int ktime_compare(const ktime_t lhs, const ktime_t rhs)
+{
+	if (lhs.tv.sec < rhs.tv.sec)
+		return -1;
+	if (lhs.tv.sec > rhs.tv.sec)
+		return 1;
+	return lhs.tv.nsec - rhs.tv.nsec;
+}
+#endif
+
+/* Per-queue state (codel_queue_t instance variables) */
+
+struct codel_sched_data {
+	u32	minbytes;
+	u32	count; /* packets dropped since we went into drop state */
+	u32	drop_count;
+	bool	dropping;
+	ktime_t	target;
+	/* time to declare above q->target (0 if below)*/
+	ktime_t	first_above_time;
+	ktime_t	drop_next; /* time to drop next packet */
+	ktime_t	interval16;
+	u32	interval;
+};
+
+struct codel_skb_cb {
+	ktime_t enqueue_time;
+};
+
+static unsigned int state1;
+static unsigned int state2;
+static unsigned int state3;
+static unsigned int states;
+
+/* 
+ * return interval/sqrt(x) with good precision
+ */
+static u32 calc(u32 _interval, unsigned long x)
+{
+	u64 interval = _interval;
+
+	/* scale operands for max precision */
+	while (x < (1UL << (BITS_PER_LONG - 2))) {
+		x <<= 2;
+		interval <<= 1;
+	}
+	do_div(interval, int_sqrt(x));
+	return (u32)interval;
+}
+
+static struct codel_skb_cb *get_codel_cb(const struct sk_buff *skb)
+{
+	qdisc_cb_private_validate(skb, sizeof(struct codel_skb_cb));
+	return (struct codel_skb_cb *)qdisc_skb_cb(skb)->data;
+}
+
+static ktime_t get_enqueue_time(const struct sk_buff *skb)
+{
+	return get_codel_cb(skb)->enqueue_time;
+}
+
+static void set_enqueue_time(struct sk_buff *skb)
+{
+	get_codel_cb(skb)->enqueue_time = ktime_get();
+}
+
+static ktime_t control_law(const struct codel_sched_data *q, ktime_t t)
+{
+	return ktime_add_ns(t, calc(q->interval, q->count));
+}
+
+static bool should_drop(struct sk_buff *skb, struct Qdisc *sch, ktime_t now)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	ktime_t sojourn_time;
+	bool drop;
+
+	if (!skb) {
+		q->first_above_time.tv64 = 0;
+		return false;
+	}
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sojourn_time = ktime_sub(now, get_enqueue_time(skb));
+
+	if (ktime_compare(sojourn_time, q->target) < 0 || 
+	    sch->qstats.backlog < q->minbytes) {
+		/* went below so we'll stay below for at least q->interval */
+		q->first_above_time.tv64 = 0;
+		return false;
+	}
+	drop = false;
+	if (q->first_above_time.tv64 == 0) {
+		/* just went above from below. If we stay above
+		 * for at least q->interval we'll say it's ok to drop
+		 */
+		q->first_above_time = ktime_add_ns(now, q->interval);
+	} else if (ktime_compare(now, q->first_above_time) >= 0) {
+		drop = true;
+		state1++;
+	}
+	return drop;
+}
+
+static void codel_drop(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+
+	qdisc_drop(skb, sch);
+	q->drop_count++;
+}
+
+static struct sk_buff *codel_dequeue(struct Qdisc *sch)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = __skb_dequeue(&sch->q);
+	ktime_t now;
+	bool drop;
+
+	if (!skb) {
+		q->dropping = false;
+		return skb;
+	}
+	now = ktime_get();
+	drop = should_drop(skb, sch, now);
+	if (q->dropping) {
+		if (!drop) {
+			/* sojourn time below target - leave dropping state */
+			q->dropping = false;
+		} else if (ktime_compare(now, q->drop_next) >=0) {
+			state2++;
+			/* It's time for the next drop. Drop the current
+			 * packet and dequeue the next. The dequeue might 
+			 * take us out of dropping state. 
+			 * If not, schedule the next drop.
+			 * A large backlog might result in drop rates so high
+			 * that the next drop should happen now, 
+			 * hence the while loop.
+			 */  
+			while (q->dropping && 
+			       (ktime_compare(now, q->drop_next) >= 0)) {
+				codel_drop(sch, skb);
+				q->count++;
+				skb = __skb_dequeue(&sch->q);
+				if (!should_drop(skb, sch, now)) {
+					/* leave dropping state */
+					q->dropping = false;
+				} else {
+					/* and schedule the next drop */
+					q->drop_next = 
+						control_law(q, q->drop_next);
+				}
+			}
+		}
+	} else if (drop &&
+		   ((ktime_compare(ktime_sub(now, q->drop_next), q->interval16) < 0) ||
+		   (ktime_compare(ktime_sub(now, q->first_above_time),
+				  ns_to_ktime(2 * q->interval)) >= 0 ))) {
+		codel_drop(sch, skb);
+		skb = __skb_dequeue(&sch->q);
+		drop = should_drop(skb, sch, now);
+		q->dropping = true;
+		state3++;
+		/* 
+		 * if min went above target close to when we last went below it
+		 * assume that the drop rate that controlled the queue on the
+		 * last cycle is a good starting point to control it now.
+		 */
+		if (ktime_compare(ktime_sub(now, q->drop_next), q->interval16) < 0)
+			q->count = min(1U, q->count - 1);
+		else
+			q->count = 1;
+		q->drop_next = control_law(q, now);
+	}
+	if ((states++ % 64) == 0) { 
+		pr_debug("s1: %u, s2: %u, s3: %u\n", 
+			  state1, state2, state3); 
+	}
+	/* We cant call qdisc_tree_decrease_qlen() if our qlen is 0,
+	 * or HTB crashes
+	 */
+	if (q->drop_count && sch->q.qlen) {
+		qdisc_tree_decrease_qlen(sch, q->drop_count);
+		q->drop_count = 0;
+	}
+	if (skb)
+		qdisc_bstats_update(sch, skb);
+	return skb;
+}
+
+static int codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	if (likely(skb_queue_len(&sch->q) < sch->limit)) {
+		set_enqueue_time(skb);
+		return qdisc_enqueue_tail(skb, sch);
+	}
+	return qdisc_drop(skb, sch);
+}
+
+static const struct nla_policy codel_policy[TCA_CODEL_MAX + 1] = {
+	[TCA_CODEL_TARGET]	= { .type = NLA_U32 },
+	[TCA_CODEL_LIMIT]	= { .type = NLA_U32 },
+	[TCA_CODEL_MINBYTES]	= { .type = NLA_U32 },
+	[TCA_CODEL_INTERVAL]	= { .type = NLA_U32 },
+};
+
+static int codel_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CODEL_MAX + 1];
+	unsigned int qlen;
+	int err;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CODEL_MAX, opt, codel_policy);
+	if (err < 0)
+		return err;
+
+	sch_tree_lock(sch);
+	if (tb[TCA_CODEL_TARGET]) {
+		u32 target = nla_get_u32(tb[TCA_CODEL_TARGET]);
+
+		q->target = ns_to_ktime((u64) target * NSEC_PER_USEC);
+	}
+	if (tb[TCA_CODEL_INTERVAL]) {
+		u32 interval = nla_get_u32(tb[TCA_CODEL_INTERVAL]);
+
+		interval = min_t(u32, ~0U / NSEC_PER_USEC, interval);
+
+		q->interval = interval * NSEC_PER_USEC;
+		q->interval16 = ns_to_ktime(16 * (u64)q->interval);
+	}
+	if (tb[TCA_CODEL_LIMIT])
+		sch->limit = nla_get_u32(tb[TCA_CODEL_LIMIT]);
+
+	if (tb[TCA_CODEL_MINBYTES])
+		q->minbytes = nla_get_u32(tb[TCA_CODEL_MINBYTES]);
+
+	qlen = sch->q.qlen;
+	while (sch->q.qlen > sch->limit) {
+		struct sk_buff *skb = __skb_dequeue(&sch->q);
+
+		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qdisc_drop(skb, sch);
+	}
+	qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
+
+	q->drop_next.tv64 = q->first_above_time.tv64 = 0;
+	q->dropping = false;
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static int codel_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+
+	q->target = MS2TIME(5);
+	/* It should be possible to run with no limit,
+	 * with infinite memory :)
+	 */
+	sch->limit = DEFAULT_CODEL_LIMIT;
+	q->minbytes = psched_mtu(qdisc_dev(sch));
+	q->interval = 100 * NSEC_PER_MSEC;
+	q->interval16 = ns_to_ktime(16 * (u64)q->interval);
+	q->drop_next.tv64 = q->first_above_time.tv64 = 0;
+	q->dropping = false; /* exit dropping state */
+	q->count = 1;
+	if (opt) {
+		int err = codel_change(sch, opt);
+
+		if (err)
+	 		return err;
+	}
+
+	if (sch->limit >= 1)
+		sch->flags |= TCQ_F_CAN_BYPASS;
+	else
+		sch->flags &= ~TCQ_F_CAN_BYPASS;
+
+	return 0;
+}
+
+static int codel_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+	u32 target = ktime_to_us(q->target);
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+	if (nla_put_u32(skb, TCA_CODEL_TARGET, target) ||
+	    nla_put_u32(skb, TCA_CODEL_LIMIT, sch->limit) ||
+	    nla_put_u32(skb, TCA_CODEL_INTERVAL, q->interval / NSEC_PER_USEC) ||
+	    nla_put_u32(skb, TCA_CODEL_MINBYTES, q->minbytes))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -1;
+}
+
+static int codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = skb_peek(&sch->q);
+	ktime_t now = ktime_get();
+	s64 delay;
+	struct tc_codel_xstats st = {
+		.count	= q->count,
+	};
+
+	if (skb) {
+		delay = ktime_us_delta(now, get_enqueue_time(skb));
+		st.delay = min_t(u64, ~0U, delay);
+	}
+	delay = ktime_us_delta(q->drop_next, now);
+	st.drop_next = delay > 0 ? min_t(u64, ~0U, delay) : 0;
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void codel_reset(struct Qdisc *sch)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+
+	qdisc_reset_queue(sch);
+	sch->q.qlen = 0;
+	q->dropping = false;
+	q->count = 1;
+}
+
+static struct Qdisc_ops codel_qdisc_ops __read_mostly = {
+	.id		=	"codel",
+	.priv_size	=	sizeof(struct codel_sched_data),
+
+	.enqueue	=	codel_enqueue,
+	.dequeue	=	codel_dequeue,
+	.peek		=	qdisc_peek_dequeued,
+	.init		=	codel_init,
+	.reset		=	codel_reset,
+	.change		=	codel_change,
+	.dump		=	codel_dump,
+	.dump_stats	=	codel_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init codel_module_init(void)
+{
+        return register_qdisc(&codel_qdisc_ops);
+}
+static void __exit codel_module_exit(void)
+{
+        unregister_qdisc(&codel_qdisc_ops);
+}
+module_init(codel_module_init)
+module_exit(codel_module_exit)
+MODULE_LICENSE("GPL");
+



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Codel] [PATCH iproute2] codel: Controlled Delay AQM
  2012-05-05 18:54             ` [Codel] [PATCH iproute2] " Eric Dumazet
  2012-05-05 19:08               ` Eric Dumazet
@ 2012-05-05 21:30               ` Eric Dumazet
  2012-05-06 18:56                 ` [Codel] [PATCH v8 " Eric Dumazet
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 21:30 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

 include/linux/pkt_sched.h |   19 ++++
 tc/Makefile               |    1 
 tc/q_codel.c              |  155 ++++++++++++++++++++++++++++++++++++
 3 files changed, 175 insertions(+)


diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 410b33d..fbece83 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -654,4 +654,23 @@ struct tc_qfq_stats {
 	__u32 lmax;
 };
 
+/* CODEL */
+
+enum {
+	TCA_CODEL_UNSPEC,
+	TCA_CODEL_TARGET,
+	TCA_CODEL_LIMIT,
+	TCA_CODEL_MINBYTES,
+	TCA_CODEL_INTERVAL,
+	__TCA_CODEL_MAX
+};
+
+#define TCA_CODEL_MAX	(__TCA_CODEL_MAX - 1)
+
+struct tc_codel_xstats {
+	__u32	count;
+	__u32	delay; /* time elapsed since next packet was queued (in us) */
+	__u32	drop_next;
+};
+
 #endif
diff --git a/tc/Makefile b/tc/Makefile
index be8cd5a..8a7cc8d 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -47,6 +47,7 @@ TCMODULES += em_cmp.o
 TCMODULES += em_u32.o
 TCMODULES += em_meta.o
 TCMODULES += q_mqprio.o
+TCMODULES += q_codel.o
 
 TCSO :=
 ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_codel.c b/tc/q_codel.c
new file mode 100644
index 0000000..ec23ebb
--- /dev/null
+++ b/tc/q_codel.c
@@ -0,0 +1,155 @@
+/*
+ * q_codel.c		Codel.
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Eric Dumazet <edumazet@google.com>
+ *		Dave Taht <dave.taht@bufferbloat.net>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... codel [ limit PACKETS ] [ target TIME]\n");
+	fprintf(stderr, "                 [ interval TIME ] [ minbytes BYTES ]\n");
+}
+
+static int codel_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			   struct nlmsghdr *n)
+{
+	unsigned limit = 0;
+	unsigned target = 0;
+	unsigned interval = 0;
+	unsigned minbytes = 0;
+	struct rtattr *tail;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "limit") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&limit, *argv, 0)) {
+				fprintf(stderr, "Illegal \"limit\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "minbytes") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&minbytes, *argv, 0)) {
+				fprintf(stderr, "Illegal \"minbytes\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "target") == 0) {
+			NEXT_ARG();
+			if (get_time(&target, *argv)) {
+				fprintf(stderr, "Illegal \"target\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "interval") == 0) {
+			NEXT_ARG();
+			if (get_time(&interval, *argv)) {
+				fprintf(stderr, "Illegal \"interval\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	if (limit)
+		addattr_l(n, 1024, TCA_CODEL_LIMIT, &limit, sizeof(limit));
+	if (minbytes)
+		addattr_l(n, 1024, TCA_CODEL_MINBYTES, &minbytes, sizeof(minbytes));
+	if (interval)
+		addattr_l(n, 1024, TCA_CODEL_INTERVAL, &interval, sizeof(interval));
+	if (target)
+		addattr_l(n, 1024, TCA_CODEL_TARGET, &target, sizeof(target));
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+static int codel_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CODEL_MAX + 1];
+	unsigned limit;
+	unsigned interval;
+	unsigned target;
+	unsigned minbytes;
+	SPRINT_BUF(b1);
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CODEL_MAX, opt);
+
+	if (tb[TCA_CODEL_LIMIT] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_LIMIT]) >= sizeof(__u32)) {
+		limit = rta_getattr_u32(tb[TCA_CODEL_LIMIT]);
+		fprintf(f, "limit %up ", limit);
+	}
+	if (tb[TCA_CODEL_MINBYTES] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_MINBYTES]) >= sizeof(__u32)) {
+		minbytes = rta_getattr_u32(tb[TCA_CODEL_MINBYTES]);
+		fprintf(f, "minbytes %u ", minbytes);
+	}
+	if (tb[TCA_CODEL_TARGET] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_TARGET]) >= sizeof(__u32)) {
+		target = rta_getattr_u32(tb[TCA_CODEL_TARGET]);
+		fprintf(f, "target %s ", sprint_time(target, b1));
+	}
+	if (tb[TCA_CODEL_INTERVAL] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_INTERVAL]) >= sizeof(__u32)) {
+		interval = rta_getattr_u32(tb[TCA_CODEL_INTERVAL]);
+		fprintf(f, "interval %s ", sprint_time(interval, b1));
+	}
+
+	return 0;
+}
+
+static int codel_print_xstats(struct qdisc_util *qu, FILE *f,
+			      struct rtattr *xstats)
+{
+	struct tc_codel_xstats *st;
+	SPRINT_BUF(b1);
+
+	if (xstats == NULL)
+		return 0;
+
+	if (RTA_PAYLOAD(xstats) < sizeof(*st))
+		return -1;
+
+	st = RTA_DATA(xstats);
+	fprintf(f, "  count %u delay %s",
+		st->count, sprint_time(st->delay, b1));
+	if (st->drop_next)
+		fprintf(f, " drop_next %s", sprint_time(st->drop_next, b1));
+	return 0;
+
+}
+
+struct qdisc_util codel_qdisc_util = {
+	.id		= "codel",
+	.parse_qopt	= codel_parse_opt,
+	.print_qopt	= codel_print_opt,
+	.print_xstats	= codel_print_xstats,
+};



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v6] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 21:28                 ` [Codel] [PATCH v6] " Eric Dumazet
@ 2012-05-05 21:40                   ` Eric Dumazet
  2012-05-05 21:58                   ` Eric Dumazet
  2012-05-06 18:52                   ` [Codel] [PATCH v8] " Eric Dumazet
  2 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 21:40 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 23:28 +0200, Eric Dumazet wrote:
> include/linux/pkt_sched.h |   19 +
>  net/sched/Kconfig         |   11 
>  net/sched/Makefile        |    1 
>  net/sched/sch_codel.c     |  414 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 445 insertions(+)

Oops, hold on, this one freezes, I have to fix it :(



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v6] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 21:28                 ` [Codel] [PATCH v6] " Eric Dumazet
  2012-05-05 21:40                   ` Eric Dumazet
@ 2012-05-05 21:58                   ` Eric Dumazet
  2012-05-06 18:52                   ` [Codel] [PATCH v8] " Eric Dumazet
  2 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 21:58 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 23:28 +0200, Eric Dumazet wrote:

OK I found the bug

> +		 * if min went above target close to when we last went below it
> +		 * assume that the drop rate that controlled the queue on the
> +		 * last cycle is a good starting point to control it now.
> +		 */
> +		if (ktime_compare(ktime_sub(now, q->drop_next), q->interval16) < 0)
> +			q->count = min(1U, q->count - 1);

	should be : 

			q->count = max(1U, q->count - 1);

> +		else
> +			q->count = 1;
> +		q->drop_next = control_law(q, now);
> +	}




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 21:20               ` Eric Dumazet
  2012-05-05 21:28                 ` [Codel] [PATCH v6] " Eric Dumazet
@ 2012-05-05 22:03                 ` dave taht
  2012-05-05 22:09                   ` Eric Dumazet
  1 sibling, 1 reply; 34+ messages in thread
From: dave taht @ 2012-05-05 22:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

On 05/05/2012 02:20 PM, Eric Dumazet wrote:
> On Sat, 2012-05-05 at 14:12 -0700, dave taht wrote:
>> On 05/05/2012 02:11 PM, Eric Dumazet wrote:
>>> On Sat, 2012-05-05 at 22:36 +0200, Eric Dumazet wrote:
>>>> On Sat, 2012-05-05 at 22:20 +0200, Eric Dumazet wrote:
>>>>
>>>>> I believe we should allow the last packet to be sent even if
>>>>> sch->qstats.backlog>= q->minbytes
>>>>>
>>>>> Hmm... this means we should do the
>>>>> sch->qstats.backlog -= qdisc_pkt_len(skb);
>>>>> right after the calls to __skb_dequeue(&sch->q);
>>>>>
>>>>> (and not in the codel_drop() or at end of codel_dequeue())
>>>>>
>>>>>
>>>> I am also adding a dump_stats capability, so I'll resend a v6
>>>>
>>>> (So that we can check runtime param like 'q->count' and other
>>>> interesting stuff)
>>>>
>>>>
>>> I believe we can remove the cache of 64 values, since q->count is way
>>> bigger most if the time, according to my results.
>>>
>>>
>>>
>> I went the other way and made it be 1000. How much bigger?
>>
> We can compute the thing in less time than a cache line miss.

Maybe on your arch, but highly doubtful on a 680Mhz mips that isn't even 
superscalar.

I'd prefer to leave it in and be able to compile it out, and actually 
measure the difference.

>
> So I removed the cache, it makes code simpler.

Well, it was fun while it lasted.
>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 22:03                 ` [Codel] [PATCH v5] " dave taht
@ 2012-05-05 22:09                   ` Eric Dumazet
  2012-05-05 22:12                     ` Eric Dumazet
                                       ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 22:09 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 15:03 -0700, dave taht wrote:

> Maybe on your arch, but highly doubtful on a 680Mhz mips that isn't even 
> superscalar.
> 

CPU are fast, memory is slow.

> I'd prefer to leave it in and be able to compile it out, and actually 
> measure the difference.

You optimize the case where there is no need to optimize (small queue)

I can see count bigger than 100000 with 20 concurrent netperf

This makes no sense to have a cache so big.

Or there is a bug in codel




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 22:09                   ` Eric Dumazet
@ 2012-05-05 22:12                     ` Eric Dumazet
  2012-05-05 22:16                       ` dave taht
  2012-05-05 22:15                     ` dave taht
  2012-05-05 22:34                     ` dave taht
  2 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 22:12 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

On Sun, 2012-05-06 at 00:09 +0200, Eric Dumazet wrote:

> I can see count bigger than 100000 with 20 concurrent netperf


and htb rate 200Mbit

(If you want to reproduce the problem)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 22:09                   ` Eric Dumazet
  2012-05-05 22:12                     ` Eric Dumazet
@ 2012-05-05 22:15                     ` dave taht
  2012-05-05 22:34                     ` dave taht
  2 siblings, 0 replies; 34+ messages in thread
From: dave taht @ 2012-05-05 22:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

On 05/05/2012 03:09 PM, Eric Dumazet wrote:
> On Sat, 2012-05-05 at 15:03 -0700, dave taht wrote:
>
>> Maybe on your arch, but highly doubtful on a 680Mhz mips that isn't even
>> superscalar.
>>
> CPU are fast, memory is slow.
>
>> I'd prefer to leave it in and be able to compile it out, and actually
>> measure the difference.
> You optimize the case where there is no need to optimize (small queue)
>
> I can see count bigger than 100000 with 20 concurrent netperf
At what speeds?

Are you testing 10GigE? I'm dying to know what happens there...

If I could encourage you to ratchet down to 2 or 100Mbit
it would be easier to get comparable results.

I like using the ethtool trick for 10 and 100Mbit as that
leaves just bql and the qdisc in the mix.

We've discussed elsewhere some of the issues with htb.

Also I'd had an 'interesting' result at 2Mbit that I haven't had time
to duplicate, which I'm going to do now on mainstream hardware.

> This makes no sense to have a cache so big.
concur, which is why I'd also asked how big it got, and how fast you 
were running

>  Or there is a bug in codel


hmm.

>
>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 22:12                     ` Eric Dumazet
@ 2012-05-05 22:16                       ` dave taht
  0 siblings, 0 replies; 34+ messages in thread
From: dave taht @ 2012-05-05 22:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

On 05/05/2012 03:12 PM, Eric Dumazet wrote:
> On Sun, 2012-05-06 at 00:09 +0200, Eric Dumazet wrote:
>
>> I can see count bigger than 100000 with 20 concurrent netperf
>
> and htb rate 200Mbit
>
> (If you want to reproduce the problem)

I will, thx. It's WAY easier to get and analyze packet captures if you 
play around at 10-100Mbit and below, btw...




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 22:09                   ` Eric Dumazet
  2012-05-05 22:12                     ` Eric Dumazet
  2012-05-05 22:15                     ` dave taht
@ 2012-05-05 22:34                     ` dave taht
  2012-05-05 22:39                       ` Eric Dumazet
  2 siblings, 1 reply; 34+ messages in thread
From: dave taht @ 2012-05-05 22:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

On 05/05/2012 03:09 PM, Eric Dumazet wrote:
> On Sat, 2012-05-05 at 15:03 -0700, dave taht wrote:
>
>> Maybe on your arch, but highly doubtful on a 680Mhz mips that isn't even
>> superscalar.
>>
> CPU are fast, memory is slow.
>
>> I'd prefer to leave it in and be able to compile it out, and actually
>> measure the difference.
> You optimize the case where there is no need to optimize (small queue)
>
> I can see count bigger than 100000 with 20 concurrent netperf
>
> This makes no sense to have a cache so big.
>
> Or there is a bug in codel
The original reciprocol approximation test code rapidly goes AWOL after 
exceeding 2^8.

I went looking for butterflies and didn't see any in the scaled code in 
the range 0-100000,
and they would only take flight briefly, so...

However I have not corrected it for BITS_PER_LONG as per our 4AM 
discussion.

I will get a build going of your latest code with the stats collection 
and look at it harder
after dinner. Get some sleep, too! fun day.

value                      sqrt         recip inv sqrt       relative 
err  scaled inv/sqrt   value      relative err
interval/sqrt(256)=6250000 approx :6250190 1.00003040 interval/scaled: 
6250000 1.00000000
interval/sqrt(257)=6237828 approx :6250190 1.00198178 interval/scaled: 
6238006 1.00002854
interval/sqrt(258)=6225728 approx :6250190 1.00392918 interval/scaled: 
6225870 1.00002281
interval/sqrt(259)=6213697 approx :6250190 1.00587299 interval/scaled: 
6213780 1.00001336
interval/sqrt(260)=6201736 approx :6250190 1.00781297 interval/scaled: 
6201738 1.00000032
interval/sqrt(261)=6189844 approx :6250190 1.00974920 interval/scaled: 
6189929 1.00001373
interval/sqrt(262)=6178020 approx :6250190 1.01168174 interval/scaled: 
6178165 1.00002347
interval/sqrt(263)=6166264 approx :6250190 1.01361051 interval/scaled: 
6166445 1.00002935
interval/sqrt(264)=6154574 approx :6250190 1.01553576 interval/scaled: 
6154585 1.00000179
interval/sqrt(265)=6142951 approx :6250190 1.01745724 interval/scaled: 
6142955 1.00000065
interval/sqrt(266)=6131393 approx :6250190 1.01937521 interval/scaled: 
6131552 1.00002593
interval/sqrt(267)=6119900 approx :6250190 1.02128956 interval/scaled: 
6120009 1.00001781
interval/sqrt(268)=6108472 approx :6250190 1.02320024 interval/scaled: 
6108509 1.00000606
interval/sqrt(269)=6097107 approx :6250190 1.02510748 interval/scaled: 
6097234 1.00002083
interval/sqrt(270)=6085806 approx :6250190 1.02701105 interval/scaled: 
6085819 1.00000214
interval/sqrt(271)=6074567 approx :6250190 1.02891120 interval/scaled: 
6074627 1.00000988
interval/sqrt(272)=6063390 approx :6250190 1.03080785 interval/scaled: 
6063477 1.00001435

...

interval/sqrt(99999)=316229 approx :6250190 19.76475908 interval/scaled: 
316236 1.00002214

>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 22:34                     ` dave taht
@ 2012-05-05 22:39                       ` Eric Dumazet
  2012-05-05 22:48                         ` dave taht
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 22:39 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 15:34 -0700, dave taht wrote:
> On 05/05/2012 03:09 PM, Eric Dumazet wrote:
> > On Sat, 2012-05-05 at 15:03 -0700, dave taht wrote:
> >
> >> Maybe on your arch, but highly doubtful on a 680Mhz mips that isn't even
> >> superscalar.
> >>
> > CPU are fast, memory is slow.
> >
> >> I'd prefer to leave it in and be able to compile it out, and actually
> >> measure the difference.
> > You optimize the case where there is no need to optimize (small queue)
> >
> > I can see count bigger than 100000 with 20 concurrent netperf
> >
> > This makes no sense to have a cache so big.
> >
> > Or there is a bug in codel
> The original reciprocol approximation test code rapidly goes AWOL after 
> exceeding 2^8.
> 
> I went looking for butterflies and didn't see any in the scaled code in 
> the range 0-100000,
> and they would only take flight briefly, so...
> 
> However I have not corrected it for BITS_PER_LONG as per our 4AM 
> discussion.

You should use the exact code in kernel. (using BITS_PER_LONG)

> 
> ...
> 
> interval/sqrt(99999)=316229 approx :6250190 19.76475908 interval/scaled: 
> 316236 1.00002214
> 
> >
> >
> 

If you read the code , there is no possible overflow, even with very
large 'u32 count'

anyway the problem is q->count keeps increasing under load.

Only when load is stopped for a while, count is reset to 1








^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 22:39                       ` Eric Dumazet
@ 2012-05-05 22:48                         ` dave taht
  2012-05-05 23:07                           ` Eric Dumazet
  2012-05-05 23:09                           ` dave taht
  0 siblings, 2 replies; 34+ messages in thread
From: dave taht @ 2012-05-05 22:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

On 05/05/2012 03:39 PM, Eric Dumazet wrote:
> On Sat, 2012-05-05 at 15:34 -0700, dave taht wrote:
>> On 05/05/2012 03:09 PM, Eric Dumazet wrote:
>>> On Sat, 2012-05-05 at 15:03 -0700, dave taht wrote:
>>>
>>>> Maybe on your arch, but highly doubtful on a 680Mhz mips that isn't even
>>>> superscalar.
>>>>
>>> CPU are fast, memory is slow.
>>>
>>>> I'd prefer to leave it in and be able to compile it out, and actually
>>>> measure the difference.
>>> You optimize the case where there is no need to optimize (small queue)
>>>
>>> I can see count bigger than 100000 with 20 concurrent netperf
>>>
>>> This makes no sense to have a cache so big.
>>>
>>> Or there is a bug in codel
>> The original reciprocol approximation test code rapidly goes AWOL after
>> exceeding 2^8.
>>
>> I went looking for butterflies and didn't see any in the scaled code in
>> the range 0-100000,
>> and they would only take flight briefly, so...
>>
>> However I have not corrected it for BITS_PER_LONG as per our 4AM
>> discussion.
> You should use the exact code in kernel. (using BITS_PER_LONG)
>
>> ...
>>
>> interval/sqrt(99999)=316229 approx :6250190 19.76475908 interval/scaled:
>> 316236 1.00002214
>>
>>>
> If you read the code , there is no possible overflow, even with very
> large 'u32 count'
>
> anyway the problem is q->count keeps increasing under load.
>
> Only when load is stopped for a while, count is reset to 1
>
Stalking butterflies. ( 
http://en.wikipedia.org/wiki/File:Lorenz_attractor_yb.svg )

I suspected also we would have issues as we hit some natural quantums 
(clock rate/interrupt rate/bql estimator etc) but for all I know it's 
just a plain bug. I need a reboot. Goin to dinner.

>
>
>
>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 22:48                         ` dave taht
@ 2012-05-05 23:07                           ` Eric Dumazet
  2012-05-05 23:19                             ` dave taht
  2012-05-05 23:09                           ` dave taht
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-05 23:07 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 15:48 -0700, dave taht wrote:

> Stalking butterflies. ( 
> http://en.wikipedia.org/wiki/File:Lorenz_attractor_yb.svg )
> 
> I suspected also we would have issues as we hit some natural quantums 
> (clock rate/interrupt rate/bql estimator etc) but for all I know it's 
> just a plain bug. I need a reboot. Goin to dinner.
> 

This part of Codel seems suspicious (last page )

// If min went above target close to when we last went below it
// assume that the drop rate that controlled the queue on the
// last cycle is a good starting point to control it now.
if (now - drop_next < 16.*interval) {
	int c = count - 1;
	count = c<1? 1 : c;
else {
	count = 1;
}


I suggest to replace it by a more conservative algo :

// If min went above target close to when we last went below it
// assume that sqrt(half) the drop rate that controlled the queue on the
// last cycle is a good starting point to control it now.
if (now - drop_next < 16.*interval) {
	int c = count >> 1;
	count = c<1? 1 : c;
else {
	count = 1;
}


With this change, my q->count max value is 12000




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 22:48                         ` dave taht
  2012-05-05 23:07                           ` Eric Dumazet
@ 2012-05-05 23:09                           ` dave taht
  2012-05-05 23:15                             ` dave taht
  1 sibling, 1 reply; 34+ messages in thread
From: dave taht @ 2012-05-05 23:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

On 05/05/2012 03:48 PM, dave taht wrote:
> On 05/05/2012 03:39 PM, Eric Dumazet wrote:
>> On Sat, 2012-05-05 at 15:34 -0700, dave taht wrote:
>>> On 05/05/2012 03:09 PM, Eric Dumazet wrote:
>>>> On Sat, 2012-05-05 at 15:03 -0700, dave taht wrote:
>>>>
>>>>> Maybe on your arch, but highly doubtful on a 680Mhz mips that 
>>>>> isn't even
>>>>> superscalar.
>>>>>
>>>> CPU are fast, memory is slow.
>>>>
>>>>> I'd prefer to leave it in and be able to compile it out, and actually
>>>>> measure the difference.
>>>> You optimize the case where there is no need to optimize (small queue)
>>>>
>>>> I can see count bigger than 100000 with 20 concurrent netperf
>>>>
>>>> This makes no sense to have a cache so big.
>>>>
>>>> Or there is a bug in codel
>>> The original reciprocol approximation test code rapidly goes AWOL after
>>> exceeding 2^8.
>>>
>>> I went looking for butterflies and didn't see any in the scaled code in
>>> the range 0-100000,
>>> and they would only take flight briefly, so...
>>>
>>> However I have not corrected it for BITS_PER_LONG as per our 4AM
>>> discussion.
>> You should use the exact code in kernel. (using BITS_PER_LONG)
>>
>>> ...
>>>
>>> interval/sqrt(99999)=316229 approx :6250190 19.76475908 
>>> interval/scaled:
>>> 316236 1.00002214
>>>
>>>>
>> If you read the code , there is no possible overflow, even with very
>> large 'u32 count'
>>
>> anyway the problem is q->count keeps increasing under load.
>>
>> Only when load is stopped for a while, count is reset to 1
>>
> Stalking butterflies. ( 
> http://en.wikipedia.org/wiki/File:Lorenz_attractor_yb.svg )
>
> I suspected also we would have issues as we hit some natural quantums 
> (clock rate/interrupt rate/bql estimator etc) but for all I know it's 
> just a plain bug. I need a reboot. Goin to dinner.
>

If we just compare us to us rather than ns to ns you get chunkier drops, 
by a lot...

interval/sqrt(370)=5198752 approx :6250190 1.20224815 interval/scaled: 
5198761 1.00000173
interval/sqrt(371)=5191741 approx :6250190 1.20387169 interval/scaled: 
5191776 1.00000674


secondly adding a fudge factor to the calculation would bring it closer 
to inline with an actual sqrt.


>>
>>
>>
>>
>>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 23:09                           ` dave taht
@ 2012-05-05 23:15                             ` dave taht
  0 siblings, 0 replies; 34+ messages in thread
From: dave taht @ 2012-05-05 23:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

[-- Attachment #1: Type: text/plain, Size: 2457 bytes --]

On 05/05/2012 04:09 PM, dave taht wrote:
> On 05/05/2012 03:48 PM, dave taht wrote:
>> On 05/05/2012 03:39 PM, Eric Dumazet wrote:
>>> On Sat, 2012-05-05 at 15:34 -0700, dave taht wrote:
>>>> On 05/05/2012 03:09 PM, Eric Dumazet wrote:
>>>>> On Sat, 2012-05-05 at 15:03 -0700, dave taht wrote:
>>>>>
>>>>>> Maybe on your arch, but highly doubtful on a 680Mhz mips that 
>>>>>> isn't even
>>>>>> superscalar.
>>>>>>
>>>>> CPU are fast, memory is slow.
>>>>>
>>>>>> I'd prefer to leave it in and be able to compile it out, and 
>>>>>> actually
>>>>>> measure the difference.
>>>>> You optimize the case where there is no need to optimize (small 
>>>>> queue)
>>>>>
>>>>> I can see count bigger than 100000 with 20 concurrent netperf
>>>>>
>>>>> This makes no sense to have a cache so big.
>>>>>
>>>>> Or there is a bug in codel
>>>> The original reciprocol approximation test code rapidly goes AWOL 
>>>> after
>>>> exceeding 2^8.
>>>>
>>>> I went looking for butterflies and didn't see any in the scaled 
>>>> code in
>>>> the range 0-100000,
>>>> and they would only take flight briefly, so...
>>>>
>>>> However I have not corrected it for BITS_PER_LONG as per our 4AM
>>>> discussion.
>>> You should use the exact code in kernel. (using BITS_PER_LONG)
>>>
>>>> ...
>>>>
>>>> interval/sqrt(99999)=316229 approx :6250190 19.76475908 
>>>> interval/scaled:
>>>> 316236 1.00002214
>>>>
>>>>>
>>> If you read the code , there is no possible overflow, even with very
>>> large 'u32 count'
>>>
>>> anyway the problem is q->count keeps increasing under load.
>>>
>>> Only when load is stopped for a while, count is reset to 1
>>>
>> Stalking butterflies. ( 
>> http://en.wikipedia.org/wiki/File:Lorenz_attractor_yb.svg )
>>
>> I suspected also we would have issues as we hit some natural quantums 
>> (clock rate/interrupt rate/bql estimator etc) but for all I know it's 
>> just a plain bug. I need a reboot. Goin to dinner.
>>
>
> If we just compare us to us rather than ns to ns you get chunkier 
> drops, by a lot...
truncate  at 10 * *µs <http://en.wikipedia.org/wiki/Mu_%28letter%29> *or 
.1ms

seriously going to dinner now
> interval/sqrt(370)=5198752 approx :6250190 1.20224815 interval/scaled: 
> 5198761 1.00000173
> interval/sqrt(371)=5191741 approx :6250190 1.20387169 interval/scaled: 
> 5191776 1.00000674
>
>
> secondly adding a fudge factor to the calculation would bring it 
> closer to inline with an actual sqrt.
>
>
>>>
>>>
>>>
>>>
>>>
>>
>


[-- Attachment #2: Type: text/html, Size: 4743 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 23:07                           ` Eric Dumazet
@ 2012-05-05 23:19                             ` dave taht
  2012-05-06  5:18                               ` Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: dave taht @ 2012-05-05 23:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: codel, Dave Täht

On 05/05/2012 04:07 PM, Eric Dumazet wrote:
> On Sat, 2012-05-05 at 15:48 -0700, dave taht wrote:
>
>> Stalking butterflies. (
>> http://en.wikipedia.org/wiki/File:Lorenz_attractor_yb.svg )
>>
>> I suspected also we would have issues as we hit some natural quantums
>> (clock rate/interrupt rate/bql estimator etc) but for all I know it's
>> just a plain bug. I need a reboot. Goin to dinner.
>>
> This part of Codel seems suspicious (last page )
>
> // If min went above target close to when we last went below it
> // assume that the drop rate that controlled the queue on the
> // last cycle is a good starting point to control it now.
> if (now - drop_next<  16.*interval) {
> 	int c = count - 1;
> 	count = c<1? 1 : c;
> else {
> 	count = 1;
> }
>
>
> I suggest to replace it by a more conservative algo :
>
> // If min went above target close to when we last went below it
> // assume that sqrt(half) the drop rate that controlled the queue on the
> // last cycle is a good starting point to control it now.
> if (now - drop_next<  16.*interval) {
> 	int c = count>>  1;
> 	count = c<1? 1 : c;
> else {
> 	count = 1;
> }
>
I don't buy it. See previous mail.

count - some_x sure.

> With this change, my q->count max value is 12000
>
>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 23:19                             ` dave taht
@ 2012-05-06  5:18                               ` Eric Dumazet
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-06  5:18 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

On Sat, 2012-05-05 at 16:19 -0700, dave taht wrote:

> I don't buy it. See previous mail.
> 
> count - some_x sure.

It depends if some_x is a constant or not.

TCP cwnd is divided by two after a drop.

cwnd = cwnd >> 1;

If we did cwnd = cwnd - some_x; It would no work very well.


I am now experimenting :

q->count = max(1U, q->count - (q->count >> 2));

(count = 75% of count)
 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Codel] [PATCH v8] pkt_sched: codel: Controlled Delay AQM
  2012-05-05 21:28                 ` [Codel] [PATCH v6] " Eric Dumazet
  2012-05-05 21:40                   ` Eric Dumazet
  2012-05-05 21:58                   ` Eric Dumazet
@ 2012-05-06 18:52                   ` Eric Dumazet
  2012-05-06 19:51                     ` Eric Dumazet
  2 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2012-05-06 18:52 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

Some stuff added for stats and timing based on u32 fields to ease time
compare.

An optimization for 64bit arches in calc() to avoid 16 loops to prescale
values.

 include/linux/pkt_sched.h |   25 ++
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_codel.c     |  419 ++++++++++++++++++++++++++++++++++++
 4 files changed, 456 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index ffe975c..0ee40e7 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -655,4 +655,29 @@ struct tc_qfq_stats {
 	__u32 lmax;
 };
 
+/* CODEL */
+
+enum {
+	TCA_CODEL_UNSPEC,
+	TCA_CODEL_TARGET,
+	TCA_CODEL_LIMIT,
+	TCA_CODEL_MINBYTES,
+	TCA_CODEL_INTERVAL,
+	__TCA_CODEL_MAX
+};
+
+#define TCA_CODEL_MAX	(__TCA_CODEL_MAX - 1)
+
+struct tc_codel_xstats {
+	__u32		count;
+	__u32		delay; /* time elapsed since next packet was queued (in us) */
+	__u32		drop_next;
+	__u32		drop_overlimit;
+	__u32		dropping;
+	__u32		state1;
+	__u32		state2;
+	__u32		state3;
+	__u32		states;
+};
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 75b58f8..fadd252 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -250,6 +250,17 @@ config NET_SCH_QFQ
 
 	  If unsure, say N.
 
+config NET_SCH_CODEL
+	tristate "Controlled Delay AQM (CODEL)"
+	help
+	  Say Y here if you want to use the Controlled Delay (CODEL)
+	  packet scheduling algorithm.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called sch_codel.
+
+	  If unsure, say N.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8cdf4e2..30fab03 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -37,6 +37,7 @@ obj-$(CONFIG_NET_SCH_PLUG)	+= sch_plug.o
 obj-$(CONFIG_NET_SCH_MQPRIO)	+= sch_mqprio.o
 obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_SCH_QFQ)	+= sch_qfq.o
+obj-$(CONFIG_NET_SCH_CODEL)	+= sch_codel.o
 
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
new file mode 100644
index 0000000..8272a08
--- /dev/null
+++ b/net/sched/sch_codel.c
@@ -0,0 +1,419 @@
+/*
+ * net/sched/sch_codel.c	A Codel implementation
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ * 
+ * Codel, the COntrolled DELay Queueing discipline
+ * Based on ns2 simulation code presented by Kathie Nichols
+ *
+ * Authors:	Dave Täht <d@taht.net>
+ *		Eric Dumazet <edumazet@google.com>
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <net/pkt_sched.h>
+
+/*
+ * codel uses a 1024 nsec clock, encoded in u32
+ */
+typedef u32 codel_time_t;
+#define CODEL_SHIFT 10
+
+static codel_time_t codel_get_time(void)
+{
+	u64 ns = ktime_to_ns(ktime_get());
+
+	return ns >> CODEL_SHIFT;
+}
+
+#define codel_time_after(a, b)	 ((int)(a) - (int)(b) > 0)
+#define codel_time_after_eq(a, b) ((int)(a) - (int)(b) >= 0)
+#define codel_time_before(a, b)	 ((int)(a) - (int)(b) < 0)
+#define codel_time_before_eq(a, b) ((int)(a) - (int)(b) >= 0)
+
+#define MS2TIME(a) ((a * NSEC_PER_MSEC) >> CODEL_SHIFT)
+
+#define DEFAULT_CODEL_LIMIT 1000
+
+/* Per-queue state (codel_queue_t instance variables) */
+
+struct codel_sched_data {
+	u32		minbytes;
+	u32		interval;
+	codel_time_t	target;
+
+	u32		count; /* packets dropped since we went into drop state */
+	u32		drop_count;
+	bool		dropping;
+	/* time to declare above q->target (0 if below)*/
+	codel_time_t	first_above_time;
+	codel_time_t	drop_next; /* time to drop next packet */
+
+	u32		state1;
+	u32		state2;
+	u32		state3;
+	u32		states;
+	u32		drop_overlimit;
+};
+
+struct codel_skb_cb {
+	codel_time_t enqueue_time;
+};
+
+
+/* 
+ * return interval/sqrt(x) with good precision
+ */
+static u32 calc(u32 _interval, u32 _x)
+{
+	u64 interval = _interval;
+	unsigned long x = _x;
+
+	/* scale operands for max precision
+	 * On 64bit arches, we can prescale x by 32bits
+	 */
+	if (BITS_PER_LONG == 64) {
+		x <<= 32;
+		interval <<= 16;
+	}
+	while (x < (1UL << (BITS_PER_LONG - 2))) {
+		x <<= 2;
+		interval <<= 1;
+	}
+	do_div(interval, int_sqrt(x));
+	return (u32)interval;
+}
+
+static struct codel_skb_cb *get_codel_cb(const struct sk_buff *skb)
+{
+	qdisc_cb_private_validate(skb, sizeof(struct codel_skb_cb));
+	return (struct codel_skb_cb *)qdisc_skb_cb(skb)->data;
+}
+
+static codel_time_t get_enqueue_time(const struct sk_buff *skb)
+{
+	return get_codel_cb(skb)->enqueue_time;
+}
+
+static void set_enqueue_time(struct sk_buff *skb)
+{
+	get_codel_cb(skb)->enqueue_time = codel_get_time();
+}
+
+static codel_time_t control_law(const struct codel_sched_data *q, codel_time_t t)
+{
+	return t + calc(q->interval, q->count);
+}
+
+static bool should_drop(struct sk_buff *skb, struct Qdisc *sch, codel_time_t now)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	codel_time_t sojourn_time;
+	bool drop;
+
+	if (!skb) {
+		q->first_above_time = 0;
+		return false;
+	}
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sojourn_time = now - get_enqueue_time(skb);
+
+	if (codel_time_before(sojourn_time, q->target) || 
+	    sch->qstats.backlog < q->minbytes) {
+		/* went below so we'll stay below for at least q->interval */
+		q->first_above_time = 0;
+		return false;
+	}
+	drop = false;
+	if (q->first_above_time == 0) {
+		/* just went above from below. If we stay above
+		 * for at least q->interval we'll say it's ok to drop
+		 */
+		q->first_above_time = now + q->interval;
+	} else if (codel_time_after(now, q->first_above_time)) {
+		drop = true;
+		q->state1++;
+	}
+	return drop;
+}
+
+static void codel_drop(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+
+	qdisc_drop(skb, sch);
+	q->drop_count++;
+}
+
+static struct sk_buff *codel_dequeue(struct Qdisc *sch)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = __skb_dequeue(&sch->q);
+	codel_time_t now;
+	bool drop;
+
+	if (!skb) {
+		q->dropping = false;
+		return skb;
+	}
+	now = codel_get_time();
+	drop = should_drop(skb, sch, now);
+	if (q->dropping) {
+		if (!drop) {
+			/* sojourn time below target - leave dropping state */
+			q->dropping = false;
+		} else if (codel_time_after_eq(now, q->drop_next)) {
+			q->state2++;
+			/* It's time for the next drop. Drop the current
+			 * packet and dequeue the next. The dequeue might 
+			 * take us out of dropping state. 
+			 * If not, schedule the next drop.
+			 * A large backlog might result in drop rates so high
+			 * that the next drop should happen now, 
+			 * hence the while loop.
+			 */  
+			while (q->dropping && 
+			       codel_time_after_eq(now, q->drop_next)) {
+				codel_drop(sch, skb);
+				q->count++;
+				skb = __skb_dequeue(&sch->q);
+				if (!should_drop(skb, sch, now)) {
+					/* leave dropping state */
+					q->dropping = false;
+				} else {
+					/* and schedule the next drop */
+					q->drop_next = 
+						control_law(q, q->drop_next);
+				}
+			}
+		}
+	} else if (drop &&
+		   (codel_time_before(now - q->drop_next,
+				      16 * q->interval) ||
+		    codel_time_after_eq(now - q->first_above_time,
+					2 * q->interval))) {
+		codel_drop(sch, skb);
+		skb = __skb_dequeue(&sch->q);
+		drop = should_drop(skb, sch, now);
+		q->dropping = true;
+		q->state3++;
+		/* 
+		 * if min went above target close to when we last went below it
+		 * assume that the drop rate that controlled the queue on the
+		 * last cycle is a good starting point to control it now.
+		 */
+		if (codel_time_after(now - q->drop_next, 16 * q->interval)) {
+//			u32 c = min(q->count - 1, q->count - (q->count >> 4));
+			u32 c = q->count - 1;
+			q->count = max(1U, c);
+		} else {
+			q->count = 1;
+		}
+		q->drop_next = control_law(q, now);
+	}
+	q->states++;
+	/* We cant call qdisc_tree_decrease_qlen() if our qlen is 0,
+	 * or HTB crashes. Defer it for next round.
+	 */
+	if (q->drop_count && sch->q.qlen) {
+		qdisc_tree_decrease_qlen(sch, q->drop_count);
+		q->drop_count = 0;
+	}
+	if (skb)
+		qdisc_bstats_update(sch, skb);
+	return skb;
+}
+
+static int codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct codel_sched_data *q;
+
+	if (likely(skb_queue_len(&sch->q) < sch->limit)) {
+		set_enqueue_time(skb);
+		return qdisc_enqueue_tail(skb, sch);
+	}
+	q = qdisc_priv(sch);
+	q->drop_overlimit++;
+	return qdisc_drop(skb, sch);
+}
+
+static const struct nla_policy codel_policy[TCA_CODEL_MAX + 1] = {
+	[TCA_CODEL_TARGET]	= { .type = NLA_U32 },
+	[TCA_CODEL_LIMIT]	= { .type = NLA_U32 },
+	[TCA_CODEL_MINBYTES]	= { .type = NLA_U32 },
+	[TCA_CODEL_INTERVAL]	= { .type = NLA_U32 },
+};
+
+static int codel_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CODEL_MAX + 1];
+	unsigned int qlen;
+	int err;
+
+	if (!opt)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CODEL_MAX, opt, codel_policy);
+	if (err < 0)
+		return err;
+
+	sch_tree_lock(sch);
+	if (tb[TCA_CODEL_TARGET]) {
+		u32 target = nla_get_u32(tb[TCA_CODEL_TARGET]);
+
+		q->target = ((u64)target * NSEC_PER_USEC) >> CODEL_SHIFT;
+	}
+	if (tb[TCA_CODEL_INTERVAL]) {
+		u32 interval = nla_get_u32(tb[TCA_CODEL_INTERVAL]);
+
+		q->interval = ((u64)interval * NSEC_PER_USEC) >> CODEL_SHIFT;
+	}
+	if (tb[TCA_CODEL_LIMIT])
+		sch->limit = nla_get_u32(tb[TCA_CODEL_LIMIT]);
+
+	if (tb[TCA_CODEL_MINBYTES])
+		q->minbytes = nla_get_u32(tb[TCA_CODEL_MINBYTES]);
+
+	qlen = sch->q.qlen;
+	while (sch->q.qlen > sch->limit) {
+		struct sk_buff *skb = __skb_dequeue(&sch->q);
+
+		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qdisc_drop(skb, sch);
+	}
+	qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
+
+	q->drop_next = q->first_above_time = 0;
+	q->dropping = false;
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static int codel_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+
+	q->target = MS2TIME(5);
+	/* It should be possible to run with no limit,
+	 * with infinite memory :)
+	 */
+	sch->limit = DEFAULT_CODEL_LIMIT;
+	q->minbytes = psched_mtu(qdisc_dev(sch));
+	q->interval = MS2TIME(100);
+	q->drop_next = q->first_above_time = 0;
+	q->dropping = false; /* exit dropping state */
+	q->count = 1;
+	if (opt) {
+		int err = codel_change(sch, opt);
+
+		if (err)
+	 		return err;
+	}
+
+	if (sch->limit >= 1)
+		sch->flags |= TCQ_F_CAN_BYPASS;
+	else
+		sch->flags &= ~TCQ_F_CAN_BYPASS;
+
+	return 0;
+}
+
+static u32 codel_time_to_us(codel_time_t val)
+{
+	u64 valns = ((u64)val << CODEL_SHIFT);
+
+	do_div(valns, NSEC_PER_USEC);
+	return (u32)valns;
+}
+
+static int codel_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+	if (nla_put_u32(skb, TCA_CODEL_TARGET, codel_time_to_us(q->target)) ||
+	    nla_put_u32(skb, TCA_CODEL_LIMIT, sch->limit) ||
+	    nla_put_u32(skb, TCA_CODEL_INTERVAL, codel_time_to_us(q->interval)) ||
+	    nla_put_u32(skb, TCA_CODEL_MINBYTES, q->minbytes))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -1;
+}
+
+static int codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = skb_peek(&sch->q);
+	codel_time_t now = codel_get_time();
+	struct tc_codel_xstats st = {
+		.count	= q->count,
+		.state1 = q->state1,
+		.state2 = q->state2,
+		.state3 = q->state3,
+		.states = q->states,
+		.drop_overlimit = q->drop_overlimit,
+		.delay = skb ? now - get_enqueue_time(skb) : 0,
+		.drop_next = q->drop_next ? q->drop_next - now : 0,
+		.dropping = q->dropping,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void codel_reset(struct Qdisc *sch)
+{
+	struct codel_sched_data *q = qdisc_priv(sch);
+
+	qdisc_reset_queue(sch);
+	sch->q.qlen = 0;
+	q->dropping = false;
+	q->count = 1;
+}
+
+static struct Qdisc_ops codel_qdisc_ops __read_mostly = {
+	.id		=	"codel",
+	.priv_size	=	sizeof(struct codel_sched_data),
+
+	.enqueue	=	codel_enqueue,
+	.dequeue	=	codel_dequeue,
+	.peek		=	qdisc_peek_dequeued,
+	.init		=	codel_init,
+	.reset		=	codel_reset,
+	.change		=	codel_change,
+	.dump		=	codel_dump,
+	.dump_stats	=	codel_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init codel_module_init(void)
+{
+        return register_qdisc(&codel_qdisc_ops);
+}
+static void __exit codel_module_exit(void)
+{
+        unregister_qdisc(&codel_qdisc_ops);
+}
+module_init(codel_module_init)
+module_exit(codel_module_exit)
+
+MODULE_DESCRIPTION("Controlled Delay queue discipline");
+MODULE_AUTHOR("Dave Taht");
+MODULE_AUTHOR("Eric Dumazet");
+MODULE_LICENSE("GPL");



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Codel] [PATCH v8 iproute2] codel: Controlled Delay AQM
  2012-05-05 21:30               ` Eric Dumazet
@ 2012-05-06 18:56                 ` Eric Dumazet
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-06 18:56 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, Dave Täht

print some additional stats and state (tc -s -d qdisc dev ...)

 include/linux/pkt_sched.h |   25 +++++
 tc/Makefile               |    1 
 tc/q_codel.c              |  160 ++++++++++++++++++++++++++++++++++++
 3 files changed, 186 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 410b33d..900323a 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -654,4 +654,29 @@ struct tc_qfq_stats {
 	__u32 lmax;
 };
 
+/* CODEL */
+
+enum {
+	TCA_CODEL_UNSPEC,
+	TCA_CODEL_TARGET,
+	TCA_CODEL_LIMIT,
+	TCA_CODEL_MINBYTES,
+	TCA_CODEL_INTERVAL,
+	__TCA_CODEL_MAX
+};
+
+#define TCA_CODEL_MAX	(__TCA_CODEL_MAX - 1)
+
+struct tc_codel_xstats {
+	__u32	count;
+	__u32	delay; /* time elapsed since next packet was queued (in us) */
+	__u32	drop_next;
+	__u32	drop_overlimit;
+	__u32	dropping;
+	__u32	state1;
+	__u32	state2;
+	__u32	state3;
+	__u32	states;
+};
+
 #endif
diff --git a/tc/Makefile b/tc/Makefile
index be8cd5a..8a7cc8d 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -47,6 +47,7 @@ TCMODULES += em_cmp.o
 TCMODULES += em_u32.o
 TCMODULES += em_meta.o
 TCMODULES += q_mqprio.o
+TCMODULES += q_codel.o
 
 TCSO :=
 ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_codel.c b/tc/q_codel.c
new file mode 100644
index 0000000..0175e18
--- /dev/null
+++ b/tc/q_codel.c
@@ -0,0 +1,160 @@
+/*
+ * q_codel.c		Codel.
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Eric Dumazet <edumazet@google.com>
+ *		Dave Taht <dave.taht@bufferbloat.net>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... codel [ limit PACKETS ] [ target TIME]\n");
+	fprintf(stderr, "                 [ interval TIME ] [ minbytes BYTES ]\n");
+}
+
+static int codel_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			   struct nlmsghdr *n)
+{
+	unsigned limit = 0;
+	unsigned target = 0;
+	unsigned interval = 0;
+	unsigned minbytes = 0;
+	struct rtattr *tail;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "limit") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&limit, *argv, 0)) {
+				fprintf(stderr, "Illegal \"limit\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "minbytes") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&minbytes, *argv, 0)) {
+				fprintf(stderr, "Illegal \"minbytes\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "target") == 0) {
+			NEXT_ARG();
+			if (get_time(&target, *argv)) {
+				fprintf(stderr, "Illegal \"target\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "interval") == 0) {
+			NEXT_ARG();
+			if (get_time(&interval, *argv)) {
+				fprintf(stderr, "Illegal \"interval\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	if (limit)
+		addattr_l(n, 1024, TCA_CODEL_LIMIT, &limit, sizeof(limit));
+	if (minbytes)
+		addattr_l(n, 1024, TCA_CODEL_MINBYTES, &minbytes, sizeof(minbytes));
+	if (interval)
+		addattr_l(n, 1024, TCA_CODEL_INTERVAL, &interval, sizeof(interval));
+	if (target)
+		addattr_l(n, 1024, TCA_CODEL_TARGET, &target, sizeof(target));
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+static int codel_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CODEL_MAX + 1];
+	unsigned limit;
+	unsigned interval;
+	unsigned target;
+	unsigned minbytes;
+	SPRINT_BUF(b1);
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CODEL_MAX, opt);
+
+	if (tb[TCA_CODEL_LIMIT] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_LIMIT]) >= sizeof(__u32)) {
+		limit = rta_getattr_u32(tb[TCA_CODEL_LIMIT]);
+		fprintf(f, "limit %up ", limit);
+	}
+	if (tb[TCA_CODEL_MINBYTES] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_MINBYTES]) >= sizeof(__u32)) {
+		minbytes = rta_getattr_u32(tb[TCA_CODEL_MINBYTES]);
+		fprintf(f, "minbytes %u ", minbytes);
+	}
+	if (tb[TCA_CODEL_TARGET] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_TARGET]) >= sizeof(__u32)) {
+		target = rta_getattr_u32(tb[TCA_CODEL_TARGET]);
+		fprintf(f, "target %s ", sprint_time(target, b1));
+	}
+	if (tb[TCA_CODEL_INTERVAL] &&
+	    RTA_PAYLOAD(tb[TCA_CODEL_INTERVAL]) >= sizeof(__u32)) {
+		interval = rta_getattr_u32(tb[TCA_CODEL_INTERVAL]);
+		fprintf(f, "interval %s ", sprint_time(interval, b1));
+	}
+
+	return 0;
+}
+
+static int codel_print_xstats(struct qdisc_util *qu, FILE *f,
+			      struct rtattr *xstats)
+{
+	struct tc_codel_xstats *st;
+	SPRINT_BUF(b1);
+
+	if (xstats == NULL)
+		return 0;
+
+	if (RTA_PAYLOAD(xstats) < sizeof(*st))
+		return -1;
+
+	st = RTA_DATA(xstats);
+	fprintf(f, "  count %u delay %s",
+		st->count, sprint_time(st->delay, b1));
+	if (st->dropping)
+		fprintf(f, " dropping");
+	if (st->drop_next)
+		fprintf(f, " drop_next %s", sprint_time(st->drop_next, b1));
+	fprintf(f, "\n  drop_overlimit %u", st->drop_overlimit); 
+	fprintf(f, " states %u : %u %u %u",
+		st->states, st->state1, st->state2, st->state3);
+	return 0;
+
+}
+
+struct qdisc_util codel_qdisc_util = {
+	.id		= "codel",
+	.parse_qopt	= codel_parse_opt,
+	.print_qopt	= codel_print_opt,
+	.print_xstats	= codel_print_xstats,
+};



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Codel] [PATCH v8] pkt_sched: codel: Controlled Delay AQM
  2012-05-06 18:52                   ` [Codel] [PATCH v8] " Eric Dumazet
@ 2012-05-06 19:51                     ` Eric Dumazet
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2012-05-06 19:51 UTC (permalink / raw)
  To: dave taht; +Cc: codel, Dave Täht

On Sun, 2012-05-06 at 20:53 +0200, Eric Dumazet wrote:
> Some stuff added for stats and timing based on u32 fields to ease time
> compare.
> 
> An optimization for 64bit arches in calc() to avoid 16 loops to prescale
> values.

...

> +#define codel_time_after(a, b)	 ((int)(a) - (int)(b) > 0)
> +#define codel_time_after_eq(a, b) ((int)(a) - (int)(b) >= 0)
> +#define codel_time_before(a, b)	 ((int)(a) - (int)(b) < 0)
> +#define codel_time_before_eq(a, b) ((int)(a) - (int)(b) >= 0)
> +

before_eq() is wrong here (but not used in this file)



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2012-05-06 19:52 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-05 11:34 [Codel] [PATCH 2/2] Clamp interval to 32 bits Dave Täht
2012-05-05 11:40 ` Dave Taht
2012-05-05 11:53   ` Eric Dumazet
2012-05-05 14:49     ` [Codel] [PATCH v5] pkt_sched: codel: Controlled Delay AQM Eric Dumazet
2012-05-05 16:11       ` Dave Taht
2012-05-05 17:07         ` Eric Dumazet
2012-05-05 17:22           ` Dave Taht
2012-05-05 18:54             ` [Codel] [PATCH iproute2] " Eric Dumazet
2012-05-05 19:08               ` Eric Dumazet
2012-05-05 21:30               ` Eric Dumazet
2012-05-06 18:56                 ` [Codel] [PATCH v8 " Eric Dumazet
2012-05-05 20:20       ` [Codel] [PATCH v5] pkt_sched: " Eric Dumazet
2012-05-05 20:36         ` Eric Dumazet
2012-05-05 21:11           ` Eric Dumazet
2012-05-05 21:12             ` dave taht
2012-05-05 21:20               ` Eric Dumazet
2012-05-05 21:28                 ` [Codel] [PATCH v6] " Eric Dumazet
2012-05-05 21:40                   ` Eric Dumazet
2012-05-05 21:58                   ` Eric Dumazet
2012-05-06 18:52                   ` [Codel] [PATCH v8] " Eric Dumazet
2012-05-06 19:51                     ` Eric Dumazet
2012-05-05 22:03                 ` [Codel] [PATCH v5] " dave taht
2012-05-05 22:09                   ` Eric Dumazet
2012-05-05 22:12                     ` Eric Dumazet
2012-05-05 22:16                       ` dave taht
2012-05-05 22:15                     ` dave taht
2012-05-05 22:34                     ` dave taht
2012-05-05 22:39                       ` Eric Dumazet
2012-05-05 22:48                         ` dave taht
2012-05-05 23:07                           ` Eric Dumazet
2012-05-05 23:19                             ` dave taht
2012-05-06  5:18                               ` Eric Dumazet
2012-05-05 23:09                           ` dave taht
2012-05-05 23:15                             ` dave taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox