[Make-wifi-fast] Fwd: [PATCH net-next 3/4] tcp: add SACK compression

Thu May 17 13:52:17 EDT 2018

---------- Forwarded message ----------
From: Eric Dumazet <eric.dumazet at gmail.com>
Date: Thu, May 17, 2018 at 8:40 AM
Subject: Re: [PATCH net-next 3/4] tcp: add SACK compression
To: Neal Cardwell <ncardwell at google.com>, Eric Dumazet <edumazet at google.com>
Cc: David Miller <davem at davemloft.net>, Netdev
<netdev at vger.kernel.org>, Toke Høiland-Jørgensen <toke at toke.dk>,
Yuchung Cheng <ycheng at google.com>, Soheil Hassas Yeganeh
<soheil at google.com>

On 05/17/2018 08:14 AM, Neal Cardwell wrote:
> On Thu, May 17, 2018 at 8:12 AM Eric Dumazet <edumazet at google.com> wrote:
>
>> When TCP receives an out-of-order packet, it immediately sends
>> a SACK packet, generating network load but also forcing the
>> receiver to send 1-MSS pathological packets, increasing its
>> RTX queue length/depth, and thus processing time.
>
>> Wifi networks suffer from this aggressive behavior, but generally
>> speaking, all these SACK packets add fuel to the fire when networks
>> are under congestion.
>
>> This patch adds a high resolution timer and tp->compressed_ack counter.
>
>> Instead of sending a SACK, we program this timer with a small delay,
>> based on SRTT and capped to 2.5 ms : delay = min ( 5 % of SRTT, 2.5 ms)
> ...
>
> Very nice. Thanks for implementing this, Eric! I was wondering if the
> constants here might be worth some discussion/elaboration.
>
>> @@ -5074,6 +5076,7 @@ static inline void tcp_data_snd_check(struct sock
> *sk)
>>   static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
>>   {
>>          struct tcp_sock *tp = tcp_sk(sk);
>> +       unsigned long delay;
>
>>              /* More than one full frame received... */
>>          if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss
> &&
>> @@ -5085,15 +5088,31 @@ static void __tcp_ack_snd_check(struct sock *sk,
> int ofo_possible)
>>              (tp->rcv_nxt - tp->copied_seq < sk->sk_rcvlowat ||
>>               __tcp_select_window(sk) >= tp->rcv_wnd)) ||
>>              /* We ACK each frame or... */
>> -           tcp_in_quickack_mode(sk) ||
>> -           /* We have out of order data. */
>> -           (ofo_possible && !RB_EMPTY_ROOT(&tp->out_of_order_queue))) {
>> -               /* Then ack it now */
>> +           tcp_in_quickack_mode(sk)) {
>> +send_now:
>>                  tcp_send_ack(sk);
>> -       } else {
>> -               /* Else, send delayed ack. */
>> +               return;
>> +       }
>> +
>> +       if (!ofo_possible || RB_EMPTY_ROOT(&tp->out_of_order_queue)) {
>>                  tcp_send_delayed_ack(sk);
>> +               return;
>>          }
>> +
>> +       if (!tcp_is_sack(tp) || tp->compressed_ack >= 127)
>> +               goto send_now;
>> +       tp->compressed_ack++;
>
> Is there a particular motivation for the cap of 127? IMHO 127 ACKs is quite
> a few to compress. Experience seems to show that it works well to have one
> GRO ACK for ~64KBytes that triggers a single TSO skb of ~64KBytes. It might
> be nice to try to match those dynamics in this SACK compression case, so it
> might be nice to cap the number of compressed ACKs at something like 44?
> (0xffff / 1448 - 1).  That way for high-speed paths we could try to keep
> the ACK clock going with ACKs for ~64KBytes that trigger a single TSO skb
> of ~64KBytes, no matter whether we are sending SACKs or cumulative ACKs.

127 was chosen because the field is u8, and since skb allocation for the ACK
can fail, we could have cases were the field goes above 127.

Ultimately, I believe a followup patch would add a sysctl, so that we
can fine-tune
this, and eventually disable ACK compression if this sysctl is set to 0

>
>> +
>> +       if (hrtimer_is_queued(&tp->compressed_ack_timer))
>> +               return;
>> +
>> +       /* compress ack timer : 5 % of srtt, but no more than 2.5 ms */
>> +
>> +       delay = min_t(unsigned long, 2500 * NSEC_PER_USEC,
>> +                     tp->rcv_rtt_est.rtt_us * (NSEC_PER_USEC >> 3)/20);
>
> Any particular motivation for the 2.5ms here? It might be nice to match the
> existing TSO autosizing dynamics and use 1ms here instead of having a
> separate new constant of 2.5ms. Smaller time scales here should lead to
> less burstiness and queue pressure from data packets in the network, and we
> know from experience that the CPU overhead of 1ms chunks is acceptable.

This came from my tests on wifi really :)

I also had the idea to make this threshold adjustable for wifi, like
we did for sk_pacing_shift.

(On wifi, we might want to increase the max delay between ACK)

So maybe use 1ms delay, when sk_pacing_shift == 10, but increase it if
sk_pacing_shift has been lowered.

-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619