I've discovered something perhaps you guys can explain it better or shed some light.
It isn't specifically to do with buffer bloat but it is to do with TCP tuning.
Attached is two pictures of my upload to New York speed test server with 1 stream.
It doesn't make any difference if it is 1 stream or 8 streams, the picture and behaviour remains the same.
I am 200ms from new york so it qualifies as a fairly long (but not very fat) pipe.
The nice smooth one is with linux tcp_rmem set to '4096 32768 65535' (on the server)
The ugly bumpy one is with linux tcp_rmem set to '4096 65535 67108864' (on the server)
It actually doesn't matter what that last huge number is, once it goes much about 65k, e.g. 128k or 256k or beyond things get bumpy and ugly on the upload speed.
Now as I understand this setting, it is the tcp receive window that Linux advertises, and the last number sets the maximum size it can get to (for one TCP stream).
For users with very fast upload speeds, they do not see an ugly bumpy upload graph, it is smooth and sustained.
But for the majority of users (like me) with uploads less than 5 to 10mbit, we frequently see the ugly graph.
The second tcp_rmem setting is how I have been running the speed test servers.
Up to now I thought this was just the distance of the speedtest from the interface: perhaps the browser was buffering a lot, and didn't feed back progress but now I realise the bumpy one is actually being influenced by the server receive window.
I guess my question is this: Why does ALLOWING a large receive window appear to encourage problems with upload smoothness??
This implies that setting the receive window should be done on a connection by connection basis: small for slow connections, large, for high speed, long distance connections.
In addition, if I cap it to 65k, for reasons of smoothness,
that means the bandwidth delay product will keep maximum speed per upload stream quite low. So a symmetric or gigabit connection is going to need a ton of parallel streams to see full speed.
Most puzzling is why would anything special be required on the Client --> Server side of the equation
but nothing much appears wrong with the Server --> Client side, whether speeds are very low (GPRS) or very high (gigabit).
Note that also I am not yet sure if smoothness == better throughput. I have noticed upload speeds for some people often being under their claimed sync rate by 10 or 20% but I've no logs that show the bumpy graph is showing inefficiency. Maybe.
help!