[Bloat] Best practices for paced TCP on Linux?

Sat Apr 7 11:08:20 EDT 2012

Fred

That is the general idea - the issue is that the dynamic arrival rate as "round trip window size" double just dramatically exceeds the available buffering at some intermediate point  - it is self inflicted (intra stream) congestion with the effect of dramatically increasing the quality attenuation (delay and loss) for streams flowing through that point.

The packet train may also be an issue, especially if there is h/w assist for TCP (which might well be the case here, as the interface was  a 10G one, comments Steinar?) - we have observed an interesting phenomena in access networks where packet trains arrive (8+ packets back to pack at 10G) for service down a low speed (2M) link - this leads to the effective transport delay being highly non-stationary - with all that implies for the other flows on that link.

Neil

On 7 Apr 2012, at 15:17, Fred Baker wrote:

> 
> On Apr 7, 2012, at 4:54 AM, Neil Davies wrote:
> 
>> The answer was rather simple - calculate the amount of buffering needed to achieve
>> say 99% of the "theoretical" throughput (this took some measurement as to exactly what 
>> that was) and limit the sender to that.
> 
> So what I think I hear you saying is that we need some form of ioctl interface in the sockets library that will allow the sender to state the rate it associates with the data (eg, the video codec rate), and let TCP calculate
> 
>                           f(rate in bits per second, pmtu)
>     cwnd_limit = ceiling (--------------------------------)  + C
>                                g(rtt in microseconds)
> 
> Where C is a fudge factor, probably a single digit number, and f and g are appropriate conversion functions.
> 
> I suspect there may also be value in considering Jain's "Packet Trains" paper. Something you can observe in a simple trace is that the doubling behavior in slow start has the effect of bunching a TCP session's data together. If I have two 5 MBPS data exchanges sharing a 10 MBPS pipe, it's not unusual to observe one of the sessions dominating the pipe for a while and then the other one, for a long time. One of the benefits of per-flow WFQ in the network is that it consciously breaks that up - it forces the TCPs to interleave packets instead of bursts, which means that a downstream device on a more limited bandwidth sees packets arrive at what it considers a more rational rate. It might be nice if In its initial burst, TCP consciously broke the initial window into 2, or 3, or 4, or ten, individual packet trains - spaced those packets some number of milliseconds apart, so that their acknowledgements were similarly spaced, and the resulting packet trains in subsequent RTTs were relatively small.