From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id B229A2013A1 for ; Sat, 7 Apr 2012 07:48:04 -0700 (PDT) Received: by wgbge7 with SMTP id ge7so2381911wgb.28 for ; Sat, 07 Apr 2012 07:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=8NRpa/whTdry7Y5M3E0UUNfebyUHRG5m75OAr1Ph3vM=; b=j0jIIb3RMVK6nKfcM2RQ3/es29TdqSWTw5XXL4VLztj3VdVDl8F74ajeutcy6jY5f+ SpGnmrHwD2YJiDt6CkDYmOD/0kgA8NjUOAH9nyC88flHuTtWgpfyFIg9JRQoA/1Cbbes r/SeTd78Bfa9I+5kM8DNBEahl/mCrXsN9s0eerpIDrUyVvqlKisTtxCccgG96hXNaNUb 4feZ7z/FX4sE5GS4bcNunE9VvWAeiUJiLYElaoqw71EK0ttKzaAs6ZznCpn4aDRDjRJQ MCrJT6m17tHXqjvTzs33Xt0qc4/BxDpFX7ltK3AIj+PcKMFQMYckXzEWNMkYRs1eyrLY OH9A== MIME-Version: 1.0 Received: by 10.216.138.135 with SMTP id a7mr929227wej.19.1333810082594; Sat, 07 Apr 2012 07:48:02 -0700 (PDT) Received: by 10.223.127.194 with HTTP; Sat, 7 Apr 2012 07:48:02 -0700 (PDT) In-Reply-To: References: <20120406213725.GA12641@uio.no> Date: Sat, 7 Apr 2012 07:48:02 -0700 Message-ID: From: Dave Taht To: Neil Davies Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Best practices for paced TCP on Linux? X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Apr 2012 14:48:05 -0000 On Sat, Apr 7, 2012 at 4:54 AM, Neil Davies wrote: > Hi > > Yep - you might well be right. I first fell across this sort of thing hel= ping the guys > with the ATLAS experiment on the LHC several years ago. > > The issue, as best as we could capture it - we hit "commercial confidence= " > walls inside network and manufacturer suppliers, was the the following. > > The issue was that with each "window round trip cycle" =A0the volume of d= ata > was doubling =A0- they had opened the window size up to the level where, = between > the two critical cycles, the increase in the number of packets in flight = were several > hundred - this caused massive burst loss at an intermediate point on the = network. > > The answer was rather simple - calculate the amount of buffering needed t= o achieve > say 99% of the "theoretical" throughput (this took some measurement as to= exactly what > that was) and limit the sender to that. > > This eliminated the massive burst (the window had closed) and the system = would > approach the true maximum throughput and then stay there. Since you did that, the world went wifi, which is a much more flaky medium than ethernet. Thus the probability of a packet loss event - or a string of them - has gone way up. Same goes for re-ordering. Steiner has shipped me, and I've taken a couple captures, of the behavior they are seeing at this event. http://www.gathering.org/tg12/en/ it's a pretty cool set of demonstrations and tutorials built around the demo scene in norway. In summary, tcp is a really lousy way to ship live video around in the wifi age. With seconds or 10s of seconds of buffering on the client side, it might work better, but the captures from here (170+ms away) are currently showing tcp streams dropping into slow start every couple of seconds, even though on the sending side they have 10s of gigs of bandwidth. They implemented packet pacing in vlc late last night which appears to be helping the local users some... Now interestingly (I've been fiddling with this stuff all night), the hd udp feed I'm getting is VASTLY to be preferred. If anybody would like me to ship them 5Mbits of live and vlc compatible video reflected from this event from the stream I'm getting, over udp, over ipv6, to a/b the differences, please let me know your ipv6 address. (and install an ipv6 compatable vlc) let me know. One of the interesting experiments I did last night was re-mark the incoming udp stream to be CS5 and ship it the rest of the way around the world (to new zealand). Somewhat unsurprisingly the CS5 marking did not survive. Usefully, the CS5 marking inside my lab made running it over wifi much more tolerable, as the queue lengths for other queues would remain short. I'd like to increase the size of that data set. It was also nice to exercise the wifi VI queue via ipv6 - that functionality was broken in ipv6 under linux, before v3.3. Another experiment I'm trying is to convince routed multicast to work. I haven't seen that work in half a decade. There were another couple interesting statistics, including the number of ipv6 users in the audience, that steinar has shared with me, but I suppose it's up to him to talk further, and he's trying to hold a very big show together. > > This, given the nature of use of these transfer, was a practical suggesti= on - they were > going to use these systems for years analysing the LHC collisions at remo= te sites. > > Sometimes the right thing to do is to *not* push the system into its unpr= edictable > region of operation. root@europa:/sys/kernel/debug/ieee80211/phy1/ath9k# cat xmit Num-Tx-Queues: 10 tx-queues-setup: 0x10f poll-work-seen: 301383 BE BK VI VO MPDUs Queued: 379719 4843 24683 10579741 MPDUs Completed: 379495 4843 24677 10576203 MPDUs XRetried: 224 0 6 3538 Aggregates: 3762753 2118222 64515 0 AMPDUs Queued HW: 4935800 513972 3578907 0 AMPDUs Queued SW: 16920425 11800714 678764 0 AMPDUs Completed: 21840409 12314328 4251697 0 AMPDUs Retried: 716857 525837 445387 0 AMPDUs XRetried: 15816 358 5974 0 root@europa:/sys/kernel/debug/ieee80211/phy1/ath9k# tc -s qdisc show dev sw= 10 qdisc mq 1: root Sent 35510975683 bytes 36846390 pkt (dropped 1292, overlimits 4047 requeues 122464) backlog 0b 0p requeues 122464 qdisc sfq 10: parent 1:1 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec ewma 3 min 4500b max 18000b probability 0.2 ecn prob_mark 9 prob_mark_head 1440 prob_drop 9 forced_mark 19 forced_mark_head 1395 forced_drop 15 Sent 7674030933 bytes 8098730 pkt (dropped 169, overlimits 2887 requeues 120177) rate 624bit 1pps backlog 0b 0p requeues 120177 qdisc sfq 20: parent 1:2 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec ewma 3 min 4500b max 18000b probability 0.2 ecn prob_mark 0 prob_mark_head 0 prob_drop 0 forced_mark 0 forced_mark_head 0 forced_drop 0 Sent 5213430326 bytes 4290003 pkt (dropped 86, overlimits 0 requeues 509) rate 0bit 0pps backlog 0b 0p requeues 509 qdisc sfq 30: parent 1:3 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec ewma 3 min 4500b max 18000b probability 0.2 ecn prob_mark 0 prob_mark_head 118 prob_drop 118 forced_mark 0 forced_mark_head 272 forced_drop 652 Sent 22167659923 bytes 17551859 pkt (dropped 920, overlimits 1160 requeues 1587) rate 4123Kbit 379pps backlog 0b 0p requeues 1587 qdisc sfq 40: parent 1:4 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec ewma 3 min 4500b max 18000b probability 0.2 ecn prob_mark 0 prob_mark_head 0 prob_drop 0 forced_mark 0 forced_mark_head 0 forced_drop 0 Sent 455854501 bytes 6905798 pkt (dropped 117, overlimits 0 requeues 191) rate 0bit 0pps backlog 0b 0p requeues 191