[Bloat] TCP congestion detection - random thoughts

Sun Jun 21 13:53:47 EDT 2015

There should also be a way to track the "ack backlog".  By that I mean, if
you can see that the packets being acked were sent 10 seconds ago and they
are consistently so, you should then be able to determine that you are
likely (10 seconds - real RTT - processing delay) deep in buffers
somewhere.   If you back off on the number of packets in flight and that
ack backlog doesn't seem to change much, then the congestion is probably
not related to your specific flow.  It is likely due to aggregate
congestion somewhere in the path.  Could be a congested peering point, pop,
busy distant end, whatever. But if the backing off DOES significantly
reduce the ack backlog (acks are now arriving for packets sent only 5
seconds ago rather than 10) then you have a notion that the flow is a
significant contributor to the total backlog.   Exactly what one would do
with that information is the question, I guess.

Is the backlog consistent across all flows or just one?  If it is
consistent across all flows then the source of buffering is very close to
you.  If it is wildly different, it is likely somewhere in the path of that
particular flow.  And looking at the document linked concerning CDG, I see
they take that into account.  If I back off but the RTT doesn't decrease,
then my flow is not a significant contributor to the delay.  The problem
with the algorithm to my mind is that finding the size of "the queue" for
any particular flow is practically impossible because each flow will have
its own specific amount of buffering along the path and if you get into
things like asymmetric routing where the reply path might not be the same
as the send path (multihomed transit provider or end node sending reply
traffic over different peer than the traffic in the other direction is
arriving on) or (worse) where ECMP is being done across peers on a packet
by packed and not flow-based basis.  At that point it is impossible to
really profile the path.

So if I were designing such an algorithm, I would try to determine:  Is the
delay consistent across all flows?  Is the delay consistent even within a
single flow?  When I reduce my rate, does the backlog drop?  Exactly what I
would do with that information would require more thought.

On Sun, Jun 21, 2015 at 9:19 AM, Benjamin Cronce <bcronce at gmail.com> wrote:

> Just a random Sunday morning thought that has probably already been
> thought of before, but I currently can't think of hearing it before.
>
> My understanding of most TCP congestion control algorithms is they
> primarily watch for drops, but drops are indicated by the receiving party
> via ACKs. The issue with this is TCP keeps pushing more data into the
> window until a drop is signaled, even if the rate received is not
> increased. What if the sending TCP also monitors rate received and backs
> off cramming more segments into a window if the received rate does not
> increase.
>
> Two things to measure this. RTT which is part of TCP statistics already
> and the rate at which bytes are ACKed. If you double the number of segments
> being sent, but in a time frame relative to the RTT, you do not see a
> meaningful increase in the rate at which bytes are being ACKed, may want to
> back off.
>
> It just seems to me that if you have a 50ms RTT and 10 seconds of
> bufferbloat, TCP is cramming data down the path with no care in the world
> about how quickly data is actually getting ACKed, it's just waiting for the
> first segment to get dropped, which would never happen in an infinitely
> buffered network.
>
> TCP should be able to keep state that tracks the minimum RTT and maximum
> ACK rate. Between these two, it should not be able to go over the max path
> rate except when attempting to probe for a new max or min. Min RTT is
> probably a good target because path latency should be relatively static,
> however path free-bandwidth is not static. The desirable number of segments
> in flight would need to change but would be bounded by the max.
>
> Of course naggle type algorithms can mess with this because when ACKs
> occur is no longer based entirely when a segment is received, but also by
> some other additional amount of time. If you assume that naggle will
> coalesce N segments into a single ACK, then you need to add to the RTT, the
> amount of time at the current PPS, how long until you expect another ACK
> assuming N number of segments will be coalesced. This would be even
> important for low latency low bandwidth paths. Coalesce information could
> be assumed, negotiated, or inferred. Negotiated would be best.
>
> Anyway, just some random Sunday thoughts.
>
> _______________________________________________
> Bloat mailing list
> Bloat at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20150621/8b4d8206/attachment-0003.html>