[Bloat] Best practices for paced TCP on Linux?

Dave Taht dave.taht at gmail.com
Sat Apr 7 10:48:02 EDT 2012


On Sat, Apr 7, 2012 at 4:54 AM, Neil Davies <neil.davies at pnsol.com> wrote:
> Hi
>
> Yep - you might well be right. I first fell across this sort of thing helping the guys
> with the ATLAS experiment on the LHC several years ago.
>
> The issue, as best as we could capture it - we hit "commercial confidence"
> walls inside network and manufacturer suppliers, was the the following.
>
> The issue was that with each "window round trip cycle"  the volume of data
> was doubling  - they had opened the window size up to the level where, between
> the two critical cycles, the increase in the number of packets in flight were several
> hundred - this caused massive burst loss at an intermediate point on the network.
>
> The answer was rather simple - calculate the amount of buffering needed to achieve
> say 99% of the "theoretical" throughput (this took some measurement as to exactly what
> that was) and limit the sender to that.
>
> This eliminated the massive burst (the window had closed) and the system would
> approach the true maximum throughput and then stay there.

Since you did that, the world went wifi, which is a much more flaky
medium than ethernet.
Thus the probability of a packet loss event - or a string of them -
has gone way up. Same
goes for re-ordering.

Steiner has shipped me, and I've taken a couple captures, of the
behavior they are seeing at this event.

http://www.gathering.org/tg12/en/


it's a pretty cool set of demonstrations and tutorials built around
the demo scene in norway.

In summary, tcp is a really lousy way to ship live video around in the
wifi age.

With seconds or 10s of seconds of buffering on the client side, it
might work better, but the captures from here (170+ms away) are
currently showing tcp streams dropping into slow start every couple of
seconds, even though on the sending side they have 10s of gigs of
bandwidth.  They implemented packet pacing in vlc late last night
which appears to be helping the local users some...

Now interestingly (I've been fiddling with this stuff all night), the
hd udp feed I'm getting is VASTLY to be preferred.

If anybody would like me to ship them 5Mbits of live and vlc
compatible video reflected from this event from the stream I'm
getting,
over udp, over ipv6, to a/b the differences, please let me know your
ipv6 address. (and install an ipv6 compatable vlc)

let me know.

One of the interesting experiments I did last night was re-mark the
incoming udp stream to be CS5 and ship it the rest of the way around
the world (to new zealand). Somewhat unsurprisingly the CS5 marking
did not survive. Usefully, the CS5 marking inside my lab made running
it over wifi much more tolerable, as the queue lengths for other
queues would remain short. I'd like to increase the size of that data
set.

It was also nice to exercise the wifi VI queue via ipv6 - that
functionality was broken in ipv6 under linux, before v3.3.

Another experiment I'm trying is to convince routed multicast to work.
I haven't seen that work in half a decade.

There were another couple interesting statistics, including the number
of ipv6 users in the audience, that steinar has shared with me, but I
suppose it's up to him to talk further, and he's trying to hold a very
big show together.


>
> This, given the nature of use of these transfer, was a practical suggestion - they were
> going to use these systems for years analysing the LHC collisions at remote sites.
>
> Sometimes the right thing to do is to *not* push the system into its unpredictable
> region of operation.




root at europa:/sys/kernel/debug/ieee80211/phy1/ath9k# cat xmit
Num-Tx-Queues: 10  tx-queues-setup: 0x10f poll-work-seen: 301383
                            BE         BK        VI        VO

MPDUs Queued:           379719       4843     24683  10579741
MPDUs Completed:        379495       4843     24677  10576203
MPDUs XRetried:            224          0         6      3538
Aggregates:            3762753    2118222     64515         0
AMPDUs Queued HW:      4935800     513972   3578907         0
AMPDUs Queued SW:     16920425   11800714    678764         0
AMPDUs Completed:     21840409   12314328   4251697         0
AMPDUs Retried:         716857     525837    445387         0
AMPDUs XRetried:         15816        358      5974         0

root at europa:/sys/kernel/debug/ieee80211/phy1/ath9k# tc -s qdisc show dev sw10
qdisc mq 1: root
 Sent 35510975683 bytes 36846390 pkt (dropped 1292, overlimits 4047
requeues 122464)
 backlog 0b 0p requeues 122464
qdisc sfq 10: parent 1:1 limit 200p quantum 3028b depth 24 headdrop
divisor 16384 perturb 600sec
 ewma 3 min 4500b max 18000b probability 0.2 ecn
 prob_mark 9 prob_mark_head 1440 prob_drop 9
 forced_mark 19 forced_mark_head 1395 forced_drop 15
 Sent 7674030933 bytes 8098730 pkt (dropped 169, overlimits 2887
requeues 120177)
 rate 624bit 1pps backlog 0b 0p requeues 120177
qdisc sfq 20: parent 1:2 limit 200p quantum 3028b depth 24 headdrop
divisor 16384 perturb 600sec
 ewma 3 min 4500b max 18000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 0 prob_drop 0
 forced_mark 0 forced_mark_head 0 forced_drop 0
 Sent 5213430326 bytes 4290003 pkt (dropped 86, overlimits 0 requeues 509)
 rate 0bit 0pps backlog 0b 0p requeues 509
qdisc sfq 30: parent 1:3 limit 200p quantum 3028b depth 24 headdrop
divisor 16384 perturb 600sec
 ewma 3 min 4500b max 18000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 118 prob_drop 118
 forced_mark 0 forced_mark_head 272 forced_drop 652
 Sent 22167659923 bytes 17551859 pkt (dropped 920, overlimits 1160
requeues 1587)
 rate 4123Kbit 379pps backlog 0b 0p requeues 1587
qdisc sfq 40: parent 1:4 limit 200p quantum 3028b depth 24 headdrop
divisor 16384 perturb 600sec
 ewma 3 min 4500b max 18000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 0 prob_drop 0
 forced_mark 0 forced_mark_head 0 forced_drop 0
 Sent 455854501 bytes 6905798 pkt (dropped 117, overlimits 0 requeues 191)
 rate 0bit 0pps backlog 0b 0p requeues 191



More information about the Bloat mailing list