[Bloat] Little's Law mea culpa, but not invalidating my main point

Wed Jul 14 21:27:04 EDT 2021

From: Bob McMahon via Bloat <bloat at lists.bufferbloat.net>
> Date: Wed,2021-07-14 at 11:38 AM
> One challenge I faced with iperf 2 was around flow control's effects on
> latency. I find if iperf 2 rate limits on writes then the end/end
> latencies, RTT look good because the pipe is basically empty, while rate
> limiting reads to the same value fills the window and drives the RTT up.
> One might conclude, from a network perspective, the write side is
> better.  But in reality, the write rate limiting is just pushing the
> delay into the application's logic, i.e. the relevant bytes may not be
> in the pipe but they aren't at the receiver either, they're stuck
> somewhere in the "tx application space."
>
> It wasn't obvious to me how to address this. We added burst measurements
> (burst xfer time, and bursts/sec) which, I think, helps.
...
>>> I find the assumption that congestion occurs "in network" as not always
>>> true. Taking OWD measurements with read side rate limiting suggests that
>>> equally important to mitigating bufferbloat driven latency using congestion
>>> signals is to make sure apps read "fast enough" whatever that means. I
>>> rarely hear about how important it is for apps to prioritize reads over
>>> open sockets. Not sure why that's overlooked and bufferbloat gets all the
>>> attention. I'm probably missing something.

Hi Bob,

You're right that the sender generally also has to avoid sending
more than the receiver can handle to avoid delays in a message-
reply cycle on the same TCP flow.

In general, I think of failures here as application faults rather
than network faults.  While important for user experience, it's
something that an app developer can solve.  That's importantly
different from network buffering.

It's also somewhat possible to avoid getting excessively backed up
in the network because of your own traffic.  Here bbr usually does
a decent job of keeping the queues decently low.  (And you'll maybe
find that some of the bufferbloat measurement efforts are relying
on the self-congestion you get out of cubic, so if you switch them
to bbr you might not get a good answer on how big the network buffers
are.)

In general, anything along these lines has to give backpressure to
the sender somehow.  What I'm guessing you saw when you did receiver-
side rate limiting was that the backpressure had to fill bytes all
the way back to a full receive kernel buffer (making a 0 rwnd for
TCP) and a full send kernel buffer before the send writes start
failing (I think with ENOBUFS iirc?), and that's the first hint the
sender has that it can't send more data right now.  The assumption
that the receiver can receive as fast as the sender can send is so
common that it often goes unstated.

(If you love to suffer, you can maybe get the backpressure to start
earlier, and with maybe a lower impact to your app-level RTT, if
you try hard enough from the receive side with TCP_WINDOW_CLAMP:
https://man7.org/linux/man-pages/man7/tcp.7.html#:~:text=tcp_window_clamp
But you'll still be living with a full send buffer ahead of the
message-response.)

But usually the right thing to do if you want receiver-driven rate
control is to send back some kind of explicit "slow down, it's too
fast for me" feedback at the app layer that will make the sender send
slower.  For instance most ABR players will shift down their bitrate
if they're failing to render video fast enough just as well as if the
network isn't feeding the video segments fast enough, like if they're
CPU-bound from something else churning on the machine.  (RTP-based
video players are supposed to send feedback with this same kind of
"slow down" capability, and sometimes they do.)

But what you can't fix from the endpoints no matter how hard you
try is the buffers in the network that get filled by other people's
traffic.

Getting other people's traffic to avoid breaking my latency when
we're sharing a bottleneck requires deploying something in the network
and it's not something I can fix myself except inside my own network.

While the app-specific fixes would make for very fine blog posts or
stack overflow questions that could help someone who managed to search
the right terms, there's a lot of different approaches for different
apps that can solve it more or less, and anyone who tries hard enough
will land on something that works well enough for them, and you don't
need a whole movement to get people to make it so their own app works
ok for them and their users.  The problems can be subtle and maybe
there will be some late and frustrating nights involved, but anyone
who gets it reproducible and keeps digging will solve it eventually.

But getting stuff deployed in networks to stop people's traffic
breaking each other's latency is harder, especially when it's a
major challenge for people to even grasp the problem and understand
its causes.  The only possible paths to getting a solution widely
deployed (assuming you have one that works) start with things like
"start an advocacy movement" or "get a controlling interest in Cisco".

Best,
Jake