From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qc0-f171.google.com (mail-qc0-f171.google.com [209.85.216.171]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 4EC6420061E for ; Thu, 3 May 2012 13:47:57 -0700 (PDT) Received: by qcsp15 with SMTP id p15so2631854qcs.16 for ; Thu, 03 May 2012 13:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:organization:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=65GTdEEfG7rlPEBqRq9dZMJ7NiBd5+u45ENl9oRKvgE=; b=N9zURxCYr1wtS9nEs0MKYWUt5mia1f0+AvQWBXlBKkb0cTHJMrTF/MyrMF+y91Sb19 SqFoIUm7drXtKHJfu+xviMTWNYDuFuS4X7ClUEnbBL1mCZoIPVojyGNP4L9E1jl4lOQY HXzwmNSO+F1zO/TanvvP0SYvLiYzu3BwdGXfoavUkmJ+AdjmZqgweAeoRUBVemKBtZCd 9Wi8nGuywy/S+f4/6tg9wHAEMfP3t3I53oMuOQHCcRrVJQiGWTVBIIwk/c2p05P72PYW kHO4/sXA4NO+T0GvTxxLJLwdToCdUfC54SKlq3Dn/16SzcHbAsbWXhnW3FEiP1Lnb07H Uh9g== Received: by 10.224.27.5 with SMTP id g5mr6373307qac.2.1336078076033; Thu, 03 May 2012 13:47:56 -0700 (PDT) Received: from [192.168.1.81] (c-50-138-170-142.hsd1.ma.comcast.net. [50.138.170.142]) by mx.google.com with ESMTPS id gw8sm10923383qab.7.2012.05.03.13.47.54 (version=SSLv3 cipher=OTHER); Thu, 03 May 2012 13:47:54 -0700 (PDT) Sender: Jim Gettys Message-ID: <4FA2EEF8.2070109@freedesktop.org> Date: Thu, 03 May 2012 16:47:52 -0400 From: Jim Gettys Organization: Bell Labs User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: =?ISO-8859-1?Q?Dave_T=E4ht?= References: <1336067533-16923-1-git-send-email-dave.taht@bufferbloat.net> <1336067533-16923-2-git-send-email-dave.taht@bufferbloat.net> In-Reply-To: <1336067533-16923-2-git-send-email-dave.taht@bufferbloat.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Cc: codel@lists.bufferbloat.net Subject: Re: [Codel] [PATCH] Preliminary codel implementation X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 May 2012 20:47:58 -0000 On 05/03/2012 01:52 PM, Dave Täht wrote: > --- > include/linux/pkt_sched.h | 29 +++++ > net/sched/Kconfig | 11 ++ > net/sched/Makefile | 1 + > net/sched/sch_codel.c | 296 +++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 337 insertions(+) > create mode 100644 net/sched/sch_codel.c > > diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h > index 0d5b793..c21b720 100644 > --- a/include/linux/pkt_sched.h > +++ b/include/linux/pkt_sched.h > @@ -633,4 +633,33 @@ struct tc_qfq_stats { > __u32 lmax; > }; > > +/* CODEL */ > + > +enum { > + TCA_CODEL_UNSPEC, > + TCA_CODEL_PARMS, > + TCA_CODEL_TARGET, > + TCA_CODEL_DEPTH, > + TCA_CODEL_MINBYTES, > + TCA_CODEL_INTERVAL, > + __TCA_CODEL_MAX > +}; > + > +#define TCA_CODEL_MAX (__TCA_CODEL_MAX - 1) > +#define TC_CODEL_ECN 1 > + > +struct tc_codel_qopt { > + __u32 flags; /* flags (e.g. ecn) */ > + __u32 target; /* max delay, in us */ > + __u32 depth; /* queue depth in packets */ > + __u32 minbytes; /* MTU (usually) */ > + __u32 interval; /* Sliding min time window width (us) */ > +}; > + > +struct tc_codel_stats { > + __u64 drops; > + __u64 marks; > +}; > + > + > #endif > diff --git a/net/sched/Kconfig b/net/sched/Kconfig > index 2590e91..8106c42 100644 > --- a/net/sched/Kconfig > +++ b/net/sched/Kconfig > @@ -250,6 +250,17 @@ config NET_SCH_QFQ > > If unsure, say N. > > +config NET_SCH_CODEL > + tristate "Controlled Delay AQM (CODEL)" > + help > + Say Y here if you want to use the Controlled Delay (CODEL) > + packet scheduling algorithm. > + > + To compile this driver as a module, choose M here: the module > + will be called sch_codel. > + > + If unsure, say N. > + > config NET_SCH_INGRESS > tristate "Ingress Qdisc" > depends on NET_CLS_ACT > diff --git a/net/sched/Makefile b/net/sched/Makefile > index dc5889c..41130b5 100644 > --- a/net/sched/Makefile > +++ b/net/sched/Makefile > @@ -36,6 +36,7 @@ obj-$(CONFIG_NET_SCH_DRR) += sch_drr.o > obj-$(CONFIG_NET_SCH_MQPRIO) += sch_mqprio.o > obj-$(CONFIG_NET_SCH_CHOKE) += sch_choke.o > obj-$(CONFIG_NET_SCH_QFQ) += sch_qfq.o > +obj-$(CONFIG_NET_SCH_CODEL) += sch_codel.o > > obj-$(CONFIG_NET_CLS_U32) += cls_u32.o > obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o > diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c > new file mode 100644 > index 0000000..159df47 > --- /dev/null > +++ b/net/sched/sch_codel.c > @@ -0,0 +1,296 @@ > +/* > + * net/sched/sch_codel.c A Codel implementation > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + * > + * Based on ns2 simulation code presented by Kathie Nichols > + * Authors: Dave Täht > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define MS2TIME(a) (ns_to_ktime( (u64) a*1000000)) > +#define DEFAULT_CODEL_DEPTH 1000 > + > +/* Per-queue state (codel_queue_t instance variables) */ > + > +struct codel_sched_data { > + u32 flags; > + u32 minbytes; > + u32 count; /* packets dropped since we went into drop state */ > + bool dropping; /* 1 if in drop state, might just add to flags */ > + ktime_t target; > + ktime_t interval; > + /* time to declare above q->target (0 if below)*/ > + ktime_t first_above_time; > + ktime_t drop_next; /* time to drop next packet */ > +}; > + > +struct codel_skb_cb { > + ktime_t enqueue_time; > + char data[16]; > +}; > + > +static inline struct codel_skb_cb *get_codel_cb(const struct sk_buff *skb) > +{ > + return (struct codel_skb_cb *)skb->cb; > +} > + > +static inline ktime_t get_enqueue_time(const struct sk_buff *skb) { > + struct codel_skb_cb *cb = get_codel_cb(skb); > + return cb->enqueue_time; > +} > + > +static inline ktime_t set_enqueue_time(struct sk_buff *skb, ktime_t t ) { > + struct codel_skb_cb *cb = get_codel_cb(skb); > + cb->enqueue_time = t; > + return t; > +} > + > +static inline ktime_t control_law(const struct codel_sched_data *q, ktime_t t) > +{ > + t.tv64 = t.tv64 + q->interval.tv64 / int_sqrt(q->count); > + return t; > +} There is a note in the article at this point in its exposition of the pseudo code: "Note that for embedded systems or kernel implementation that the inverse *sqrt* can be computed efficiently using only integer multiplication." That will likely be faster than the division by sqrt; if/when this optimisation is done, we should leave comments about why it is division by a square root, to correspond to TCP's control law. Before we upstream this, I'd recommend taking the comments documenting the algorithm from the article and including them. I think Kathie had to trim on the slides. - Jim