[Bloat] Relentless congestion control for testing purposes

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* [Bloat] Relentless congestion control for testing purposes
@ 2021-09-28 23:17 Dave Taht
  2021-09-29  1:37 ` Jonathan Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Taht @ 2021-09-28 23:17 UTC (permalink / raw)
  To: rpm; +Cc: Ben Greear, Bob McMahon, bloat, Karl Auerbach

In today's rpm meeting I didn't quite manage to make a complicated
point. This long-ago proposal
of matt mathis's has often intrigued (inspired? frightened?) me:

https://datatracker.ietf.org/doc/html/draft-mathis-iccrg-relentless-tcp-00

where he proposed that a tcp variant have no response at all to loss
or markings, merely
replacing lost segments as they are requested, continually ramping up
until the network
basically explodes.

In the context of *testing* bidirectional network behaviors in
particular, seeing tcp tested more than unicast udp
has been, in more labs, has long been on my mind.

Also, I have a long held desire to more quickly and easily determine
the correct behavior (or existence) of a particular aqm in a given
implementation,  so the predictability of such an approach has appeal.
Having a well defined mis-behavior of being "relentless" then lets
other variables along the path such as the client's particular tcp
acking methods, ack filtering at the bottleneck, request/grant delays
on a wifi AP/client setup, etc., be more visible.

This particular approach might actually find multiple bottlenecks,
until the ack channel itself gets backlogged. That too is
interesting... and a test using this method to probe for available
bandwidth would complete quickly and produce a more reliable result.

policer and ddos defense designers would find such behavior useful to
test against, also.

I return now to scraping americium out of my private collection of
smoke detectors with my teeth so as to
repower my boat with a small nuclear reactor.

(the above statement is a joke, but I'm kind of serious about
everything else. I think. Do any commercial
 test tools have such a tcp?)

-- 
Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw

Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bloat] Relentless congestion control for testing purposes
  2021-09-28 23:17 [Bloat] Relentless congestion control for testing purposes Dave Taht
@ 2021-09-29  1:37 ` Jonathan Morton
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Morton @ 2021-09-29  1:37 UTC (permalink / raw)
  To: Dave Taht; +Cc: rpm, Ben Greear, Karl Auerbach, Bob McMahon, bloat

> On 29 Sep, 2021, at 2:17 am, Dave Taht <dave.taht@gmail.com> wrote:
> 
> In today's rpm meeting I didn't quite manage to make a complicated point. This long-ago proposal of matt mathis's has often intrigued (inspired? frightened?) me:
> 
> https://datatracker.ietf.org/doc/html/draft-mathis-iccrg-relentless-tcp-00
> 
> where he proposed that a tcp variant have no response at all to loss or markings, merely replacing lost segments as they are requested, continually ramping up until the network basically explodes.

I think "no response at all" is overstating it.  Right in the abstract, it is described as removing the lost segments from the cwnd; ie. only acked segments result in new segments being transmitted (modulo the 2-segment minimum).  In this sense, Relentless TCP is an AIAD algorithm much like DCTCP, to be classified distinctly from Reno (AIMD) and Scalable TCP (MIMD).

   Relentless congestion control is a simple modification that can be
   applied to almost any AIMD style congestion control: instead of
   applying a multiplicative reduction to cwnd after a loss, cwnd is
   reduced by the number of lost segments.  It can be modeled as a
   strict implementation of van Jacobson's Packet Conservation
   Principle.  During recovery, new segments are injected into the
   network in exact accordance with the segments that are reported to
   have been delivered to the receiver by the returning ACKs.

Obviously, an AIAD congestion control would not coexist nicely with AIMD based traffic.  We know this directly from experience with DCTCP.  It cannot therefore be recommended for general use on the Internet.  This is acknowledged extensively in Mathis' draft.

> In the context of *testing* bidirectional network behaviors in particular, seeing tcp tested more than unicast udp has been, in more labs, has long been on my mind.

Yes, as a tool specifically for testing with, and distributed with copious warnings against attempting to use it more generally, this might be interesting.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <56ef13985bd34834916aabef978db1f1@EX16-05.ad.unipi.it>]

* Re: [Bloat] Relentless congestion control for testing purposes
       [not found] <56ef13985bd34834916aabef978db1f1@EX16-05.ad.unipi.it>
@ 2021-10-01 16:22 ` Luigi Rizzo
  2021-10-01 16:32   ` Bob McMahon
       [not found]   ` <be3c7fee3fbc4fa09c575150bc3254e1@EX16-05.ad.unipi.it>
  0 siblings, 2 replies; 5+ messages in thread
From: Luigi Rizzo @ 2021-10-01 16:22 UTC (permalink / raw)
  To: Dave Taht; +Cc: rpm, Ben Greear, Karl Auerbach, Bob McMahon, bloat

On Wed, Sep 29, 2021 at 1:17 AM Dave Taht <dave.taht@gmail.com> wrote:
>
> In today's rpm meeting I didn't quite manage to make a complicated
> point. This long-ago proposal
> of matt mathis's has often intrigued (inspired? frightened?) me:
>
> https://datatracker.ietf.org/doc/html/draft-mathis-iccrg-relentless-tcp-00
>
> where he proposed that a tcp variant have no response at all to loss
> or markings, merely
> replacing lost segments as they are requested, continually ramping up
> until the network
> basically explodes.


For a similar purpose, I use the following patch that ignores holes,
thus mostly defeating congestion control.

The nice thing is that you can just count sent and received bytes
in the application to estimate losses

commit 9c429c9644f2fd22d5fe2b6f2d4df6fb2a8962b2
Author: Luigi Rizzo <lrizzo@google.com>
Date:   Fri Oct 1 09:10:44 2021 -0700

    test: module parameter to ignore holes in TCP

    Sometimes, for testing, it is useful to let the TCP receiver ignore
    drops (and defeat congestion control) and accept all packets as if they
    were in sequence. This will show whether a connection is sender or
    receiver throttled.

    This patch implements the above with 3 /sys/module/tcp_input/parameters :
    - lossy_local_port, lossy_remote_port
        if non zero, indicate that sockets matching one of these ports
        will be set to ignore drops. (This socker flag could be set with a
        setsockopt(), but would require changes in the caller);
    - drop_freq
        if non zero, one every drop_freq packets will be artificially
        dropped on the receive side.

    Example
      echo 2345 > /sys/module/tcp_input/parameters/lossy_local_port
      echo 10 > /sys/module/tcp_input/parameters/drop_freq # drop one in 10

      ifconfig lo mtu 600
      nc -l 2345 > /tmp/a &
      MSG="this is a test this is a test this is a test count is"
      (for ((i=0;i<10000;i++)); do echo "${MSG} ${i}; $i"; done) | nc
127.0.0.1 2345

    The output file will have missing lines


diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 48d8a363319e..3bb8888a56af 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -225,7 +225,8 @@ struct tcp_sock {
        u8      compressed_ack;
        u8      dup_ack_counter:2,
                tlp_retrans:1,  /* TLP is a retransmission */
-               unused:5;
+               ignore_holes:1, /* ignore holes on rx. test only */
+               unused:4;
        u32     chrono_start;   /* Start time in jiffies of a TCP chrono */
        u32     chrono_stat[3]; /* Time in jiffies for chrono_stat stats */
        u8      chrono_type:2,  /* current chronograph type */
@@ -250,6 +251,7 @@ struct tcp_sock {
        u32     tlp_high_seq;   /* snd_nxt at the time of TLP */

        u32     tcp_tx_delay;   /* delay (in usec) added to TX packets */
+       u32     tcp_test_drops; /* artificial packet drops. test only */
        u64     tcp_wstamp_ns;  /* departure time for next sent data packet */
        u64     tcp_clock_cache; /* cache last tcp_clock_ns() (see
tcp_mstamp_refresh()) */

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 414c179c28e0..fd770a176a68 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2344,6 +2344,9 @@ static int tcp_recvmsg_locked(struct sock *sk,
struct msghdr *msg, size_t len,
                last = skb_peek_tail(&sk->sk_receive_queue);
                skb_queue_walk(&sk->sk_receive_queue, skb) {
                        last = skb;
+                       /* XXX if we allow holes, update copied_seq */
+                       if (tp->ignore_holes && before(*seq,
TCP_SKB_CB(skb)->seq))
+                               *seq = TCP_SKB_CB(skb)->seq;
                        /* Now that we have two receive queues this
                         * shouldn't happen.
                         */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 246ab7b5e857..a5161ce78171 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -81,6 +81,14 @@
 #include <net/busy_poll.h>
 #include <net/mptcp.h>

+#include <linux/module.h>
+static int drop_freq; /* drop one pkt ever this many */
+module_param(drop_freq, int, 0644);
+static int lossy_local_port = 2345; /* drop pkts, ignore holes on this port */
+module_param(lossy_local_port, int, 0644);
+static int lossy_remote_port = 0; /* drop pkts, ignore holes on this port */
+module_param(lossy_remote_port, int, 0644);
+
 int sysctl_tcp_max_orphans __read_mostly = NR_FILE;

 #define FLAG_DATA              0x01 /* Incoming frame contained data.
         */
@@ -5806,6 +5814,16 @@ void tcp_rcv_established(struct sock *sk,
struct sk_buff *skb)

        tp->rx_opt.saw_tstamp = 0;

+       if (tp->ignore_holes) {
+               const u32 seq = TCP_SKB_CB(skb)->seq;
+               if (drop_freq && tp->tcp_test_drops++ >= drop_freq) {
+                       tp->tcp_test_drops = 0;
+                       goto discard;   /* artificial drop */
+               }
+               if (after(seq, tp->rcv_nxt)) /* Pretend this is in order */
+                       tcp_rcv_nxt_update(tp, seq);
+       }
+
        /*      pred_flags is 0xS?10 << 16 + snd_wnd
         *      if header_prediction is to be made
         *      'S' will always be tp->tcp_header_len >> 2
@@ -5986,12 +6004,25 @@ void tcp_init_transfer(struct sock *sk, int
bpf_op, struct sk_buff *skb)
        tcp_init_buffer_space(sk);
 }

+/* Testing. Conditionally set tp->ignore_holes. Should be a setsockopt */
+static void tcp_update_ignore_holes(struct sock *sk)
+{
+       if (READ_ONCE(lossy_local_port) == ntohs(inet_sk(sk)->inet_sport) ||
+           READ_ONCE(lossy_remote_port) == ntohs(inet_sk(sk)->inet_dport)) {
+               pr_info("XXX ignore holes for ports local %d remote %d\n",
+                       ntohs(inet_sk(sk)->inet_sport),
+                       ntohs(inet_sk(sk)->inet_dport));
+               tcp_sk(sk)->ignore_holes = 1;
+       }
+}
+
 void tcp_finish_connect(struct sock *sk, struct sk_buff *skb)
 {
        struct tcp_sock *tp = tcp_sk(sk);
        struct inet_connection_sock *icsk = inet_csk(sk);

        tcp_set_state(sk, TCP_ESTABLISHED);
+       tcp_update_ignore_holes(sk);
        icsk->icsk_ack.lrcvtime = tcp_jiffies32;

        if (skb) {
@@ -6476,6 +6507,7 @@ int tcp_rcv_state_process(struct sock *sk,
struct sk_buff *skb)
                }
                smp_mb();
                tcp_set_state(sk, TCP_ESTABLISHED);
+               tcp_update_ignore_holes(sk);
                sk->sk_state_change(sk);

                /* Note, that this wakeup is only for marginal crossed SYN case.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bloat] Relentless congestion control for testing purposes
  2021-10-01 16:22 ` Luigi Rizzo
@ 2021-10-01 16:32   ` Bob McMahon
       [not found]   ` <be3c7fee3fbc4fa09c575150bc3254e1@EX16-05.ad.unipi.it>
  1 sibling, 0 replies; 5+ messages in thread
From: Bob McMahon @ 2021-10-01 16:32 UTC (permalink / raw)
  To: Luigi Rizzo; +Cc: Dave Taht, rpm, Ben Greear, Karl Auerbach, bloat


[-- Attachment #1.1: Type: text/plain, Size: 8007 bytes --]

hmm, this looks interesting to a test & measurement guy. Can it be done
with a setsockopt? I might want to add this as an iperf2 option,
particularly if it's broadly available,

Thanks,
Bob

On Fri, Oct 1, 2021 at 9:22 AM Luigi Rizzo <rizzo@iet.unipi.it> wrote:

> On Wed, Sep 29, 2021 at 1:17 AM Dave Taht <dave.taht@gmail.com> wrote:
> >
> > In today's rpm meeting I didn't quite manage to make a complicated
> > point. This long-ago proposal
> > of matt mathis's has often intrigued (inspired? frightened?) me:
> >
> >
> https://datatracker.ietf.org/doc/html/draft-mathis-iccrg-relentless-tcp-00
> >
> > where he proposed that a tcp variant have no response at all to loss
> > or markings, merely
> > replacing lost segments as they are requested, continually ramping up
> > until the network
> > basically explodes.
>
>
> For a similar purpose, I use the following patch that ignores holes,
> thus mostly defeating congestion control.
>
> The nice thing is that you can just count sent and received bytes
> in the application to estimate losses
>
> commit 9c429c9644f2fd22d5fe2b6f2d4df6fb2a8962b2
> Author: Luigi Rizzo <lrizzo@google.com>
> Date:   Fri Oct 1 09:10:44 2021 -0700
>
>     test: module parameter to ignore holes in TCP
>
>     Sometimes, for testing, it is useful to let the TCP receiver ignore
>     drops (and defeat congestion control) and accept all packets as if they
>     were in sequence. This will show whether a connection is sender or
>     receiver throttled.
>
>     This patch implements the above with 3
> /sys/module/tcp_input/parameters :
>     - lossy_local_port, lossy_remote_port
>         if non zero, indicate that sockets matching one of these ports
>         will be set to ignore drops. (This socker flag could be set with a
>         setsockopt(), but would require changes in the caller);
>     - drop_freq
>         if non zero, one every drop_freq packets will be artificially
>         dropped on the receive side.
>
>     Example
>       echo 2345 > /sys/module/tcp_input/parameters/lossy_local_port
>       echo 10 > /sys/module/tcp_input/parameters/drop_freq # drop one in 10
>
>       ifconfig lo mtu 600
>       nc -l 2345 > /tmp/a &
>       MSG="this is a test this is a test this is a test count is"
>       (for ((i=0;i<10000;i++)); do echo "${MSG} ${i}; $i"; done) | nc
> 127.0.0.1 2345
>
>     The output file will have missing lines
>
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 48d8a363319e..3bb8888a56af 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -225,7 +225,8 @@ struct tcp_sock {
>         u8      compressed_ack;
>         u8      dup_ack_counter:2,
>                 tlp_retrans:1,  /* TLP is a retransmission */
> -               unused:5;
> +               ignore_holes:1, /* ignore holes on rx. test only */
> +               unused:4;
>         u32     chrono_start;   /* Start time in jiffies of a TCP chrono */
>         u32     chrono_stat[3]; /* Time in jiffies for chrono_stat stats */
>         u8      chrono_type:2,  /* current chronograph type */
> @@ -250,6 +251,7 @@ struct tcp_sock {
>         u32     tlp_high_seq;   /* snd_nxt at the time of TLP */
>
>         u32     tcp_tx_delay;   /* delay (in usec) added to TX packets */
> +       u32     tcp_test_drops; /* artificial packet drops. test only */
>         u64     tcp_wstamp_ns;  /* departure time for next sent data
> packet */
>         u64     tcp_clock_cache; /* cache last tcp_clock_ns() (see
> tcp_mstamp_refresh()) */
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 414c179c28e0..fd770a176a68 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2344,6 +2344,9 @@ static int tcp_recvmsg_locked(struct sock *sk,
> struct msghdr *msg, size_t len,
>                 last = skb_peek_tail(&sk->sk_receive_queue);
>                 skb_queue_walk(&sk->sk_receive_queue, skb) {
>                         last = skb;
> +                       /* XXX if we allow holes, update copied_seq */
> +                       if (tp->ignore_holes && before(*seq,
> TCP_SKB_CB(skb)->seq))
> +                               *seq = TCP_SKB_CB(skb)->seq;
>                         /* Now that we have two receive queues this
>                          * shouldn't happen.
>                          */
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 246ab7b5e857..a5161ce78171 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -81,6 +81,14 @@
>  #include <net/busy_poll.h>
>  #include <net/mptcp.h>
>
> +#include <linux/module.h>
> +static int drop_freq; /* drop one pkt ever this many */
> +module_param(drop_freq, int, 0644);
> +static int lossy_local_port = 2345; /* drop pkts, ignore holes on this
> port */
> +module_param(lossy_local_port, int, 0644);
> +static int lossy_remote_port = 0; /* drop pkts, ignore holes on this port
> */
> +module_param(lossy_remote_port, int, 0644);
> +
>  int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
>
>  #define FLAG_DATA              0x01 /* Incoming frame contained data.
>          */
> @@ -5806,6 +5814,16 @@ void tcp_rcv_established(struct sock *sk,
> struct sk_buff *skb)
>
>         tp->rx_opt.saw_tstamp = 0;
>
> +       if (tp->ignore_holes) {
> +               const u32 seq = TCP_SKB_CB(skb)->seq;
> +               if (drop_freq && tp->tcp_test_drops++ >= drop_freq) {
> +                       tp->tcp_test_drops = 0;
> +                       goto discard;   /* artificial drop */
> +               }
> +               if (after(seq, tp->rcv_nxt)) /* Pretend this is in order */
> +                       tcp_rcv_nxt_update(tp, seq);
> +       }
> +
>         /*      pred_flags is 0xS?10 << 16 + snd_wnd
>          *      if header_prediction is to be made
>          *      'S' will always be tp->tcp_header_len >> 2
> @@ -5986,12 +6004,25 @@ void tcp_init_transfer(struct sock *sk, int
> bpf_op, struct sk_buff *skb)
>         tcp_init_buffer_space(sk);
>  }
>
> +/* Testing. Conditionally set tp->ignore_holes. Should be a setsockopt */
> +static void tcp_update_ignore_holes(struct sock *sk)
> +{
> +       if (READ_ONCE(lossy_local_port) == ntohs(inet_sk(sk)->inet_sport)
> ||
> +           READ_ONCE(lossy_remote_port) ==
> ntohs(inet_sk(sk)->inet_dport)) {
> +               pr_info("XXX ignore holes for ports local %d remote %d\n",
> +                       ntohs(inet_sk(sk)->inet_sport),
> +                       ntohs(inet_sk(sk)->inet_dport));
> +               tcp_sk(sk)->ignore_holes = 1;
> +       }
> +}
> +
>  void tcp_finish_connect(struct sock *sk, struct sk_buff *skb)
>  {
>         struct tcp_sock *tp = tcp_sk(sk);
>         struct inet_connection_sock *icsk = inet_csk(sk);
>
>         tcp_set_state(sk, TCP_ESTABLISHED);
> +       tcp_update_ignore_holes(sk);
>         icsk->icsk_ack.lrcvtime = tcp_jiffies32;
>
>         if (skb) {
> @@ -6476,6 +6507,7 @@ int tcp_rcv_state_process(struct sock *sk,
> struct sk_buff *skb)
>                 }
>                 smp_mb();
>                 tcp_set_state(sk, TCP_ESTABLISHED);
> +               tcp_update_ignore_holes(sk);
>                 sk->sk_state_change(sk);
>
>                 /* Note, that this wakeup is only for marginal crossed SYN
> case.
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

[-- Attachment #1.2: Type: text/html, Size: 9876 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4206 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <be3c7fee3fbc4fa09c575150bc3254e1@EX16-05.ad.unipi.it>]

* Re: [Bloat] Relentless congestion control for testing purposes
       [not found]   ` <be3c7fee3fbc4fa09c575150bc3254e1@EX16-05.ad.unipi.it>
@ 2021-10-01 18:02     ` Luigi Rizzo
  0 siblings, 0 replies; 5+ messages in thread
From: Luigi Rizzo @ 2021-10-01 18:02 UTC (permalink / raw)
  To: Bob McMahon; +Cc: Luigi Rizzo, Dave Taht, rpm, Ben Greear, Karl Auerbach, bloat

On Fri, Oct 1, 2021 at 6:33 PM Bob McMahon <bob.mcmahon@broadcom.com> wrote:
>
> hmm, this looks interesting to a test & measurement guy. Can it be done with a setsockopt? I might want to add this as an iperf2 option, particularly if it's broadly available,


I would be happy to submit it as one or two upstream patches --
perhaps one to implement
the basic "ignore_holes" + setsockopt(), and another mechanism (if
there isn't one
already) to override defaults sockopts on certain sockets.

I do think we need more readily available testing tool,

cheers
luigi

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-01 18:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28 23:17 [Bloat] Relentless congestion control for testing purposes Dave Taht
2021-09-29  1:37 ` Jonathan Morton
     [not found] <56ef13985bd34834916aabef978db1f1@EX16-05.ad.unipi.it>
2021-10-01 16:22 ` Luigi Rizzo
2021-10-01 16:32   ` Bob McMahon
     [not found]   ` <be3c7fee3fbc4fa09c575150bc3254e1@EX16-05.ad.unipi.it>
2021-10-01 18:02     ` Luigi Rizzo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox