From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f43.google.com (mail-pb0-f43.google.com [209.85.160.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 78DF421F127 for ; Fri, 31 Aug 2012 06:50:34 -0700 (PDT) Received: by pbbrq2 with SMTP id rq2so6769004pbb.16 for ; Fri, 31 Aug 2012 06:50:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; bh=WCQ0APXJpd/Vkif8KXccOOFAH0eBS+374Ib8lY9LVkM=; b=JWb/+xBAU1g4DoQCiA8ScgbSQi+uqb2TzTG212CkEVEMmh0t1a9oz9O+wTkeikOBdF RTaxl8/dfULelxnBRP47ps5PkjS4KTKfT22Y//bjmD2Gh0MlxW1WNhWx/hdxVf4Ctm++ Xdyvrp7ozUv0A9I6ksohOEJMBvIc+LA8x0WT/xEaP9eRZgK9e/igokP+fF4V4NA+1OXf waFJA5D4yP/YdbKcZwqQInPDFPP/3UyZ8bUa0xK1phhNlfHAQUXlIJmahouvjvJ2X28S F0s2L9Ej//EmSYU7JMkCTT/riVJdT7pw1daExKTkZ8jeHHSx+3SjQ3WefggXpFhNTdmX susA== Received: by 10.68.138.135 with SMTP id qq7mr17776486pbb.167.1346421033870; Fri, 31 Aug 2012 06:50:33 -0700 (PDT) Received: from [10.10.4.254] (0127ahost2.starwoodbroadband.com. [12.105.246.2]) by mx.google.com with ESMTPS id rm6sm1685070pbc.54.2012.08.31.06.50.31 (version=SSLv3 cipher=OTHER); Fri, 31 Aug 2012 06:50:32 -0700 (PDT) From: Eric Dumazet To: codel@lists.bufferbloat.net In-Reply-To: <1346396137.2586.301.camel@edumazet-glaptop> References: <1346396137.2586.301.camel@edumazet-glaptop> Content-Type: text/plain; charset="UTF-8" Date: Fri, 31 Aug 2012 06:50:31 -0700 Message-ID: <1346421031.2591.34.camel@edumazet-glaptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit Cc: Tomas Hruby , Nandita Dukkipati , netdev Subject: [Codel] [RFC] fq_codel : interval servo on hosts X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Aug 2012 13:50:34 -0000 On Thu, 2012-08-30 at 23:55 -0700, Eric Dumazet wrote: > On locally generated TCP traffic (host), we can override the 100 ms > interval value using the more accurate RTT estimation maintained by TCP > stack (tp->srtt) > > Datacenter workload benefits using shorter feedback (say if RTT is below > 1 ms, we can react 100 times faster to a congestion) > > Idea from Yuchung Cheng. > Linux patch would be the following : I'll do tests next week, but I am sending a raw patch right now if anybody wants to try it. Presumably we also want to adjust target as well. To get more precise srtt values in the datacenter, we might avoid the 'one jiffie slack' on small values in tcp_rtt_estimator(), as we force m to be 1 before the scaling by 8 : if (m == 0) m = 1; We only need to force the least significant bit of srtt to be set. net/sched/sch_fq_codel.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 9fc1c62..7d2fe35 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -25,6 +25,7 @@ #include #include #include +#include /* Fair Queue CoDel. * @@ -59,6 +60,7 @@ struct fq_codel_sched_data { u32 perturbation; /* hash perturbation */ u32 quantum; /* psched_mtu(qdisc_dev(sch)); */ struct codel_params cparams; + codel_time_t default_interval; struct codel_stats cstats; u32 drop_overlimit; u32 new_flow_count; @@ -211,6 +213,14 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch) return NET_XMIT_SUCCESS; } +/* Given TCP srtt evaluation, return codel interval. + * srtt is given in jiffies, scaled by 8. + */ +static codel_time_t tcp_srtt_to_codel(unsigned int srtt) +{ + return srtt * ((NSEC_PER_SEC >> (CODEL_SHIFT + 3)) / HZ); +} + /* This is the specific function called from codel_dequeue() * to dequeue a packet from queue. Note: backlog is handled in * codel, we dont need to reduce it here. @@ -220,12 +230,21 @@ static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch) struct fq_codel_sched_data *q = qdisc_priv(sch); struct fq_codel_flow *flow; struct sk_buff *skb = NULL; + struct sock *sk; flow = container_of(vars, struct fq_codel_flow, cvars); if (flow->head) { skb = dequeue_head(flow); q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb); sch->q.qlen--; + sk = skb->sk; + q->cparams.interval = q->default_interval; + if (sk && sk->sk_protocol == IPPROTO_TCP) { + u32 srtt = tcp_sk(sk)->srtt; + + if (srtt) + q->cparams.interval = tcp_srtt_to_codel(srtt); + } } return skb; } @@ -330,7 +349,7 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt) if (tb[TCA_FQ_CODEL_INTERVAL]) { u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]); - q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT; + q->default_interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT; } if (tb[TCA_FQ_CODEL_LIMIT]) @@ -441,7 +460,7 @@ static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb) nla_put_u32(skb, TCA_FQ_CODEL_LIMIT, sch->limit) || nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL, - codel_time_to_us(q->cparams.interval)) || + codel_time_to_us(q->default_interval)) || nla_put_u32(skb, TCA_FQ_CODEL_ECN, q->cparams.ecn) || nla_put_u32(skb, TCA_FQ_CODEL_QUANTUM,