From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-x244.google.com (mail-qk0-x244.google.com [IPv6:2607:f8b0:400d:c09::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id B61123B2A4 for ; Thu, 17 May 2018 13:52:18 -0400 (EDT) Received: by mail-qk0-x244.google.com with SMTP id h19-v6so4314810qkj.10 for ; Thu, 17 May 2018 10:52:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-transfer-encoding; bh=zjGpEMXQGut05xSde++KTkF7oHm87V79guNkMSaFfkQ=; b=rPIV7naIz7W5Ws5NcW0/LRxznBBYHZqGbAHeItKjl5aXcROUj4XZ9NFQkYUnwkwz07 vWRJwVPP8kp8iFTf3MvQEibncL4+9ToEB1s+AIMyv6++5QF2r2WJwvzBTw7gXtf/YKNZ aXPa1i8EL/+KygwkzBWwMkkzOCtGQsO5isFQF2oLkkN406NEU5DEeijOR9vg9pZfaQ/V mgvvhNpQR2RQTXjwnzYE6B1ObL/HfK+S+MrCEh3IfPy/geSknOPeJpyImpoek1msh5hX Ru6oRgsAuYIhq53O7PyBygS0RRF9s5/q6RZz8LmY9i9Ek8GTUC2KBpQGPmzTmEFcTzZ3 K+PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=zjGpEMXQGut05xSde++KTkF7oHm87V79guNkMSaFfkQ=; b=AlTd5pl5Ey5nH55rQFdpKY6UqdvNsMygnk0gpgMO6rpLw4J8pDAtBvI61b1FF8qVoI Ka5wlM3UgL1LJg7C7ymWf3rtj2mukLuLJNcF2428iOWjMi/JudgER2VW5/8KpU3F4Asp pfa6aXyZfxheGErmgjf89GYIvTJlU0qQxLleDv7zqCtcJLzb2A14LV3joAwelTesleuv H9sqfcUU3RoR48X5wKAHg2K9+HHroc9LbCkMoqLzteRsHYr562RHdwosYzJSm4+fOH0g aYjouELEIy8Fh8qDw/6jfOV/IMRkVzG+Fm7pbsHDhAy8vGaZYf8xrd4MDf7ZKjFuzD+h 1kPg== X-Gm-Message-State: ALKqPwcCq8Gv+G1jq/Ec3As4SqMW3Bq4PtwOV9GFF231pawT/nIlSS4k ZVMXTOkOhOpT47TiM4lbinJDphqYluN2FyXPjW0= X-Google-Smtp-Source: AB8JxZrT4Zgn4pRnxgdZ2uLXCYqFZh92AIGVQBuCZb54HbU0sAhacChiFM2D5NudWpCge1RCaF63XEhJtoWyNVT7YU4= X-Received: by 2002:a37:b4c5:: with SMTP id d188-v6mr5936384qkf.36.1526579538093; Thu, 17 May 2018 10:52:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.249.70 with HTTP; Thu, 17 May 2018 10:52:17 -0700 (PDT) In-Reply-To: <58bcf9c0-e4f0-691d-8d6a-40ff3629f500@gmail.com> References: <20180517121213.43559-1-edumazet@google.com> <20180517121213.43559-4-edumazet@google.com> <58bcf9c0-e4f0-691d-8d6a-40ff3629f500@gmail.com> From: Dave Taht Date: Thu, 17 May 2018 10:52:17 -0700 Message-ID: To: make-wifi-fast@lists.bufferbloat.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [Make-wifi-fast] Fwd: [PATCH net-next 3/4] tcp: add SACK compression X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 May 2018 17:52:18 -0000 ---------- Forwarded message ---------- From: Eric Dumazet Date: Thu, May 17, 2018 at 8:40 AM Subject: Re: [PATCH net-next 3/4] tcp: add SACK compression To: Neal Cardwell , Eric Dumazet Cc: David Miller , Netdev , Toke H=C3=B8iland-J=C3=B8rgensen , Yuchung Cheng , Soheil Hassas Yeganeh On 05/17/2018 08:14 AM, Neal Cardwell wrote: > On Thu, May 17, 2018 at 8:12 AM Eric Dumazet wrote: > >> When TCP receives an out-of-order packet, it immediately sends >> a SACK packet, generating network load but also forcing the >> receiver to send 1-MSS pathological packets, increasing its >> RTX queue length/depth, and thus processing time. > >> Wifi networks suffer from this aggressive behavior, but generally >> speaking, all these SACK packets add fuel to the fire when networks >> are under congestion. > >> This patch adds a high resolution timer and tp->compressed_ack counter. > >> Instead of sending a SACK, we program this timer with a small delay, >> based on SRTT and capped to 2.5 ms : delay =3D min ( 5 % of SRTT, 2.5 ms= ) > ... > > Very nice. Thanks for implementing this, Eric! I was wondering if the > constants here might be worth some discussion/elaboration. > >> @@ -5074,6 +5076,7 @@ static inline void tcp_data_snd_check(struct sock > *sk) >> static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) >> { >> struct tcp_sock *tp =3D tcp_sk(sk); >> + unsigned long delay; > >> /* More than one full frame received... */ >> if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_m= ss > && >> @@ -5085,15 +5088,31 @@ static void __tcp_ack_snd_check(struct sock *sk, > int ofo_possible) >> (tp->rcv_nxt - tp->copied_seq < sk->sk_rcvlowat || >> __tcp_select_window(sk) >=3D tp->rcv_wnd)) || >> /* We ACK each frame or... */ >> - tcp_in_quickack_mode(sk) || >> - /* We have out of order data. */ >> - (ofo_possible && !RB_EMPTY_ROOT(&tp->out_of_order_queue))) { >> - /* Then ack it now */ >> + tcp_in_quickack_mode(sk)) { >> +send_now: >> tcp_send_ack(sk); >> - } else { >> - /* Else, send delayed ack. */ >> + return; >> + } >> + >> + if (!ofo_possible || RB_EMPTY_ROOT(&tp->out_of_order_queue)) { >> tcp_send_delayed_ack(sk); >> + return; >> } >> + >> + if (!tcp_is_sack(tp) || tp->compressed_ack >=3D 127) >> + goto send_now; >> + tp->compressed_ack++; > > Is there a particular motivation for the cap of 127? IMHO 127 ACKs is qui= te > a few to compress. Experience seems to show that it works well to have on= e > GRO ACK for ~64KBytes that triggers a single TSO skb of ~64KBytes. It mig= ht > be nice to try to match those dynamics in this SACK compression case, so = it > might be nice to cap the number of compressed ACKs at something like 44? > (0xffff / 1448 - 1). That way for high-speed paths we could try to keep > the ACK clock going with ACKs for ~64KBytes that trigger a single TSO skb > of ~64KBytes, no matter whether we are sending SACKs or cumulative ACKs. 127 was chosen because the field is u8, and since skb allocation for the AC= K can fail, we could have cases were the field goes above 127. Ultimately, I believe a followup patch would add a sysctl, so that we can fine-tune this, and eventually disable ACK compression if this sysctl is set to 0 > >> + >> + if (hrtimer_is_queued(&tp->compressed_ack_timer)) >> + return; >> + >> + /* compress ack timer : 5 % of srtt, but no more than 2.5 ms */ >> + >> + delay =3D min_t(unsigned long, 2500 * NSEC_PER_USEC, >> + tp->rcv_rtt_est.rtt_us * (NSEC_PER_USEC >> 3)/20); > > Any particular motivation for the 2.5ms here? It might be nice to match t= he > existing TSO autosizing dynamics and use 1ms here instead of having a > separate new constant of 2.5ms. Smaller time scales here should lead to > less burstiness and queue pressure from data packets in the network, and = we > know from experience that the CPU overhead of 1ms chunks is acceptable. This came from my tests on wifi really :) I also had the idea to make this threshold adjustable for wifi, like we did for sk_pacing_shift. (On wifi, we might want to increase the max delay between ACK) So maybe use 1ms delay, when sk_pacing_shift =3D=3D 10, but increase it if sk_pacing_shift has been lowered. --=20 Dave T=C3=A4ht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619