From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-f44.google.com (mail-ua1-f44.google.com [209.85.222.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id D6E7D3B29D; Fri, 1 Oct 2021 12:22:12 -0400 (EDT) Received: by mail-ua1-f44.google.com with SMTP id i8so7053088uae.7; Fri, 01 Oct 2021 09:22:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ruhi3LwsRez5LoYw8AcjhsTLBeodVwLdQEZAv7VfH7M=; b=Ntt604jfTQ70FQRybXdUAcwd9dLhbsjwXJ2G7LWRa0a8r1rwpYj5bw/IvCYvbFCiCj Fp+0sp8O1ffr8xA3MBW6yO9gH6RJAKcrO1F0WR9MH/yRTfqCu8/5aFPRtaJHVUytsDUL Tr9yAK9MC6qORZVyfK4udAPVoLCMKQI+br9bfQl/3P+Xg9bo1FQQVZ/hTIK9hQjLSlLf OuCwd292BJ3Csf76JvCRH3QSP2RNLWlQ5Zs6hxVmDY6qq3RbyD4xYGlbBpX0EjWnAgUy T7X06gEQ0zJ8hs+DvhkH5LGKasN8Zh5WwaC3GpPrrwqBGsRoKnF3SOr6aNm8I5o2MUzT DwqA== X-Gm-Message-State: AOAM5331upYogqU/bZuRXmVwvIL/xLfknIP2cLpbF+1NHM2udnZ24SeJ GciNQDzLfU+4m9LQ5AzrcokNx2dmXcZfU9EzPtI= X-Google-Smtp-Source: ABdhPJx/CiIBo0XLvEiH4dSyYJbPdZqBvikpCmT99G1j8QCiMN1wD/L36SVVETy4+pDBWYKqSgGju4cOzitcqUcUKI4= X-Received: by 2002:ab0:3d13:: with SMTP id f19mr11700626uax.140.1633105332268; Fri, 01 Oct 2021 09:22:12 -0700 (PDT) MIME-Version: 1.0 References: <56ef13985bd34834916aabef978db1f1@EX16-05.ad.unipi.it> In-Reply-To: <56ef13985bd34834916aabef978db1f1@EX16-05.ad.unipi.it> From: Luigi Rizzo Date: Fri, 1 Oct 2021 18:22:01 +0200 Message-ID: To: Dave Taht Cc: "rpm@lists.bufferbloat.net" , Ben Greear , Karl Auerbach , Bob McMahon , bloat Content-Type: text/plain; charset="UTF-8" Subject: Re: [Rpm] [Bloat] Relentless congestion control for testing purposes X-BeenThere: rpm@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: revolutions per minute - a new metric for measuring responsiveness List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2021 16:22:12 -0000 On Wed, Sep 29, 2021 at 1:17 AM Dave Taht wrote: > > In today's rpm meeting I didn't quite manage to make a complicated > point. This long-ago proposal > of matt mathis's has often intrigued (inspired? frightened?) me: > > https://datatracker.ietf.org/doc/html/draft-mathis-iccrg-relentless-tcp-00 > > where he proposed that a tcp variant have no response at all to loss > or markings, merely > replacing lost segments as they are requested, continually ramping up > until the network > basically explodes. For a similar purpose, I use the following patch that ignores holes, thus mostly defeating congestion control. The nice thing is that you can just count sent and received bytes in the application to estimate losses commit 9c429c9644f2fd22d5fe2b6f2d4df6fb2a8962b2 Author: Luigi Rizzo Date: Fri Oct 1 09:10:44 2021 -0700 test: module parameter to ignore holes in TCP Sometimes, for testing, it is useful to let the TCP receiver ignore drops (and defeat congestion control) and accept all packets as if they were in sequence. This will show whether a connection is sender or receiver throttled. This patch implements the above with 3 /sys/module/tcp_input/parameters : - lossy_local_port, lossy_remote_port if non zero, indicate that sockets matching one of these ports will be set to ignore drops. (This socker flag could be set with a setsockopt(), but would require changes in the caller); - drop_freq if non zero, one every drop_freq packets will be artificially dropped on the receive side. Example echo 2345 > /sys/module/tcp_input/parameters/lossy_local_port echo 10 > /sys/module/tcp_input/parameters/drop_freq # drop one in 10 ifconfig lo mtu 600 nc -l 2345 > /tmp/a & MSG="this is a test this is a test this is a test count is" (for ((i=0;i<10000;i++)); do echo "${MSG} ${i}; $i"; done) | nc 127.0.0.1 2345 The output file will have missing lines diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 48d8a363319e..3bb8888a56af 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -225,7 +225,8 @@ struct tcp_sock { u8 compressed_ack; u8 dup_ack_counter:2, tlp_retrans:1, /* TLP is a retransmission */ - unused:5; + ignore_holes:1, /* ignore holes on rx. test only */ + unused:4; u32 chrono_start; /* Start time in jiffies of a TCP chrono */ u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */ u8 chrono_type:2, /* current chronograph type */ @@ -250,6 +251,7 @@ struct tcp_sock { u32 tlp_high_seq; /* snd_nxt at the time of TLP */ u32 tcp_tx_delay; /* delay (in usec) added to TX packets */ + u32 tcp_test_drops; /* artificial packet drops. test only */ u64 tcp_wstamp_ns; /* departure time for next sent data packet */ u64 tcp_clock_cache; /* cache last tcp_clock_ns() (see tcp_mstamp_refresh()) */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 414c179c28e0..fd770a176a68 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2344,6 +2344,9 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, last = skb_peek_tail(&sk->sk_receive_queue); skb_queue_walk(&sk->sk_receive_queue, skb) { last = skb; + /* XXX if we allow holes, update copied_seq */ + if (tp->ignore_holes && before(*seq, TCP_SKB_CB(skb)->seq)) + *seq = TCP_SKB_CB(skb)->seq; /* Now that we have two receive queues this * shouldn't happen. */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 246ab7b5e857..a5161ce78171 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -81,6 +81,14 @@ #include #include +#include +static int drop_freq; /* drop one pkt ever this many */ +module_param(drop_freq, int, 0644); +static int lossy_local_port = 2345; /* drop pkts, ignore holes on this port */ +module_param(lossy_local_port, int, 0644); +static int lossy_remote_port = 0; /* drop pkts, ignore holes on this port */ +module_param(lossy_remote_port, int, 0644); + int sysctl_tcp_max_orphans __read_mostly = NR_FILE; #define FLAG_DATA 0x01 /* Incoming frame contained data. */ @@ -5806,6 +5814,16 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) tp->rx_opt.saw_tstamp = 0; + if (tp->ignore_holes) { + const u32 seq = TCP_SKB_CB(skb)->seq; + if (drop_freq && tp->tcp_test_drops++ >= drop_freq) { + tp->tcp_test_drops = 0; + goto discard; /* artificial drop */ + } + if (after(seq, tp->rcv_nxt)) /* Pretend this is in order */ + tcp_rcv_nxt_update(tp, seq); + } + /* pred_flags is 0xS?10 << 16 + snd_wnd * if header_prediction is to be made * 'S' will always be tp->tcp_header_len >> 2 @@ -5986,12 +6004,25 @@ void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb) tcp_init_buffer_space(sk); } +/* Testing. Conditionally set tp->ignore_holes. Should be a setsockopt */ +static void tcp_update_ignore_holes(struct sock *sk) +{ + if (READ_ONCE(lossy_local_port) == ntohs(inet_sk(sk)->inet_sport) || + READ_ONCE(lossy_remote_port) == ntohs(inet_sk(sk)->inet_dport)) { + pr_info("XXX ignore holes for ports local %d remote %d\n", + ntohs(inet_sk(sk)->inet_sport), + ntohs(inet_sk(sk)->inet_dport)); + tcp_sk(sk)->ignore_holes = 1; + } +} + void tcp_finish_connect(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); tcp_set_state(sk, TCP_ESTABLISHED); + tcp_update_ignore_holes(sk); icsk->icsk_ack.lrcvtime = tcp_jiffies32; if (skb) { @@ -6476,6 +6507,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) } smp_mb(); tcp_set_state(sk, TCP_ESTABLISHED); + tcp_update_ignore_holes(sk); sk->sk_state_change(sk); /* Note, that this wakeup is only for marginal crossed SYN case.