General list for discussing Bufferbloat
 help / color / mirror / Atom feed
* [Bloat] Best practices for paced TCP on Linux?
@ 2012-04-06 21:37 Steinar H. Gunderson
  2012-04-06 21:49 ` Dave Taht
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-04-06 21:37 UTC (permalink / raw)
  To: bloat

Hi,

This is only related to bloat, so bear with me if it's not 100% on-topic;
I guess the list is about the best place on the Internet to get a reasonble
answer for this anyway :-)

Long story short, I have a Linux box (running 3.2.0 or so) with a 10Gbit/sec
interface, streaming a large amount of video streams to external users,
at 1Mbit/sec, 3Mbit/sec or 5Mbit/sec (different values). Unfortunately, even
though there is no congestion in _our_ network (we have 190 Gbit/sec free!),
some users are complaining that they can't keep the stream up.

My guess is that this is because at 10Gbit/sec, we are crazy bursty, and
somewhere along the line, there will be devices doing down conversion without
enough buffers (for instance, I've seen drop behavior on Cisco 2960-S in a
very real ISP network on 10->1 Gbit/sec down conversion, and I doubt it's the
worst offender here).

Is there anything I can do about this on my end? I looked around for paced
TCP implementations, but couldn't find anything current. Can I somehow shape
each TCP stream to 10Mbit/sec or so each with a combination of SFQ and TBF?
(SFQRED?)

I'm not very well versed in tc, so anything practical would be very much
appreciated. Bonus points if we won't have to patch the kernel.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-06 21:37 [Bloat] Best practices for paced TCP on Linux? Steinar H. Gunderson
@ 2012-04-06 21:49 ` Dave Taht
  2012-04-06 22:21   ` Steinar H. Gunderson
  2012-04-07 11:54 ` Neil Davies
  2012-05-12 20:08 ` Steinar H. Gunderson
  2 siblings, 1 reply; 38+ messages in thread
From: Dave Taht @ 2012-04-06 21:49 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat

On Fri, Apr 6, 2012 at 2:37 PM, Steinar H. Gunderson
<sgunderson@bigfoot.com> wrote:
> Hi,
>
> This is only related to bloat, so bear with me if it's not 100% on-topic;
> I guess the list is about the best place on the Internet to get a reasonble
> answer for this anyway :-)

This is actually both very relevant and very hard to simulate. Having
someone willing
to try this in the real world would be excellent.

> Long story short, I have a Linux box (running 3.2.0 or so) with a 10Gbit/sec
> interface, streaming a large amount of video streams to external users,
> at 1Mbit/sec, 3Mbit/sec or 5Mbit/sec (different values). Unfortunately, even
> though there is no congestion in _our_ network (we have 190 Gbit/sec free!),
> some users are complaining that they can't keep the stream up.

Interesting. While there can be many causes I do tend to think that
10GigE is putting incredible pressure on downstream devices that
is very hard to observe, unless you are downstream...

>
> My guess is that this is because at 10Gbit/sec, we are crazy bursty, and
> somewhere along the line, there will be devices doing down conversion without
> enough buffers (for instance, I've seen drop behavior on Cisco 2960-S in a
> very real ISP network on 10->1 Gbit/sec down conversion, and I doubt it's the
> worst offender here).
>
> Is there anything I can do about this on my end? I looked around for paced
> TCP implementations, but couldn't find anything current. Can I somehow shape
> each TCP stream to 10Mbit/sec or so each with a combination of SFQ and TBF?
> (SFQRED?)

It would be best to get some packet captures from user's streams that are
complaining of the issue.

Most 10Gige network cards have the ability to have multiple hardware queues
that will do some FQ for you.

SFQ or QFQ will FQ the output streams. SFQRED or QFQ + something will be
able to do some level of queue management but not 'shaping', or 'packet pacing'

However in your environment you will need the beefed up SFQ that is in 3.3.
and BQL. If you are not saturating that 10GigE card, you can turn off TSO/GSO
as well.

You can reduce your tcp sent windows. There's a horde of other things you
can try, but it would be best to start of with a few measurements, both
on your side and at a customer's.


> I'm not very well versed in tc, so anything practical would be very much

I have a debloat script (with about 5 out of tree versions) that tries it's best
to do the right thing on home gear and takes a bit of the arcaneness out
of the mix. I have not got around to trying out a couple algos

> appreciated. Bonus points if we won't have to patch the kernel.

3.3.1 is out and in my environment is stable. YMMV.

> /* Steinar */
> --
> Homepage: http://www.sesse.net/
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-06 21:49 ` Dave Taht
@ 2012-04-06 22:21   ` Steinar H. Gunderson
  2012-04-07 15:08     ` Eric Dumazet
  2012-04-14  0:35     ` Rick Jones
  0 siblings, 2 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-04-06 22:21 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

On Fri, Apr 06, 2012 at 02:49:38PM -0700, Dave Taht wrote:
> It would be best to get some packet captures from user's streams that are
> complaining of the issue.

I'll try asking some users.

> Most 10Gige network cards have the ability to have multiple hardware queues
> that will do some FQ for you.
> 
> SFQ or QFQ will FQ the output streams. SFQRED or QFQ + something will be
> able to do some level of queue management but not 'shaping', or 'packet pacing'
> 
> However in your environment you will need the beefed up SFQ that is in 3.3.
> and BQL. If you are not saturating that 10GigE card, you can turn off TSO/GSO
> as well.

We're not anywhere near saturating our 10GigE card, and even if we did, we
could add at least one 10GigE card more.

> You can reduce your tcp sent windows.

Do you know how?

> I have a debloat script (with about 5 out of tree versions) that tries it's best
> to do the right thing on home gear and takes a bit of the arcaneness out
> of the mix. I have not got around to trying out a couple algos

But debloat is just like... way too much. And probably not very tuned for our
use case :-)

I'll be perfectly happy just doing _something_; I don't need a perfect
solution. We have one more night of streaming, and then the event is over. :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-06 21:37 [Bloat] Best practices for paced TCP on Linux? Steinar H. Gunderson
  2012-04-06 21:49 ` Dave Taht
@ 2012-04-07 11:54 ` Neil Davies
  2012-04-07 14:17   ` Fred Baker
  2012-04-07 14:48   ` Dave Taht
  2012-05-12 20:08 ` Steinar H. Gunderson
  2 siblings, 2 replies; 38+ messages in thread
From: Neil Davies @ 2012-04-07 11:54 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat

Hi

Yep - you might well be right. I first fell across this sort of thing helping the guys
with the ATLAS experiment on the LHC several years ago.

The issue, as best as we could capture it - we hit "commercial confidence"
walls inside network and manufacturer suppliers, was the the following.

The issue was that with each "window round trip cycle"  the volume of data
was doubling  - they had opened the window size up to the level where, between 
the two critical cycles, the increase in the number of packets in flight were several 
hundred - this caused massive burst loss at an intermediate point on the network.

The answer was rather simple - calculate the amount of buffering needed to achieve
say 99% of the "theoretical" throughput (this took some measurement as to exactly what 
that was) and limit the sender to that.

This eliminated the massive burst (the window had closed) and the system would
approach the true maximum throughput and then stay there.

This, given the nature of use of these transfer, was a practical suggestion - they were
going to use these systems for years analysing the LHC collisions at remote sites.

Sometimes the right thing to do is to *not* push the system into its unpredictable
region of operation.

Neil


On 6 Apr 2012, at 22:37, Steinar H. Gunderson wrote:

> Hi,
> 
> This is only related to bloat, so bear with me if it's not 100% on-topic;
> I guess the list is about the best place on the Internet to get a reasonble
> answer for this anyway :-)
> 
> Long story short, I have a Linux box (running 3.2.0 or so) with a 10Gbit/sec
> interface, streaming a large amount of video streams to external users,
> at 1Mbit/sec, 3Mbit/sec or 5Mbit/sec (different values). Unfortunately, even
> though there is no congestion in _our_ network (we have 190 Gbit/sec free!),
> some users are complaining that they can't keep the stream up.
> 
> My guess is that this is because at 10Gbit/sec, we are crazy bursty, and
> somewhere along the line, there will be devices doing down conversion without
> enough buffers (for instance, I've seen drop behavior on Cisco 2960-S in a
> very real ISP network on 10->1 Gbit/sec down conversion, and I doubt it's the
> worst offender here).
> 
> Is there anything I can do about this on my end? I looked around for paced
> TCP implementations, but couldn't find anything current. Can I somehow shape
> each TCP stream to 10Mbit/sec or so each with a combination of SFQ and TBF?
> (SFQRED?)
> 
> I'm not very well versed in tc, so anything practical would be very much
> appreciated. Bonus points if we won't have to patch the kernel.
> 
> /* Steinar */
> -- 
> Homepage: http://www.sesse.net/
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 11:54 ` Neil Davies
@ 2012-04-07 14:17   ` Fred Baker
  2012-04-07 15:08     ` Neil Davies
  2012-04-14  0:44     ` Rick Jones
  2012-04-07 14:48   ` Dave Taht
  1 sibling, 2 replies; 38+ messages in thread
From: Fred Baker @ 2012-04-07 14:17 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat


On Apr 7, 2012, at 4:54 AM, Neil Davies wrote:

> The answer was rather simple - calculate the amount of buffering needed to achieve
> say 99% of the "theoretical" throughput (this took some measurement as to exactly what 
> that was) and limit the sender to that.

So what I think I hear you saying is that we need some form of ioctl interface in the sockets library that will allow the sender to state the rate it associates with the data (eg, the video codec rate), and let TCP calculate

                           f(rate in bits per second, pmtu)
     cwnd_limit = ceiling (--------------------------------)  + C
                                g(rtt in microseconds)

Where C is a fudge factor, probably a single digit number, and f and g are appropriate conversion functions.

I suspect there may also be value in considering Jain's "Packet Trains" paper. Something you can observe in a simple trace is that the doubling behavior in slow start has the effect of bunching a TCP session's data together. If I have two 5 MBPS data exchanges sharing a 10 MBPS pipe, it's not unusual to observe one of the sessions dominating the pipe for a while and then the other one, for a long time. One of the benefits of per-flow WFQ in the network is that it consciously breaks that up - it forces the TCPs to interleave packets instead of bursts, which means that a downstream device on a more limited bandwidth sees packets arrive at what it considers a more rational rate. It might be nice if In its initial burst, TCP consciously broke the initial window into 2, or 3, or 4, or ten, individual packet trains - spaced those packets some number of milliseconds apart, so that their acknowledgements were similarly spaced, and the resulting packet trains in subsequent RTTs were relatively small.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 11:54 ` Neil Davies
  2012-04-07 14:17   ` Fred Baker
@ 2012-04-07 14:48   ` Dave Taht
  1 sibling, 0 replies; 38+ messages in thread
From: Dave Taht @ 2012-04-07 14:48 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

On Sat, Apr 7, 2012 at 4:54 AM, Neil Davies <neil.davies@pnsol.com> wrote:
> Hi
>
> Yep - you might well be right. I first fell across this sort of thing helping the guys
> with the ATLAS experiment on the LHC several years ago.
>
> The issue, as best as we could capture it - we hit "commercial confidence"
> walls inside network and manufacturer suppliers, was the the following.
>
> The issue was that with each "window round trip cycle"  the volume of data
> was doubling  - they had opened the window size up to the level where, between
> the two critical cycles, the increase in the number of packets in flight were several
> hundred - this caused massive burst loss at an intermediate point on the network.
>
> The answer was rather simple - calculate the amount of buffering needed to achieve
> say 99% of the "theoretical" throughput (this took some measurement as to exactly what
> that was) and limit the sender to that.
>
> This eliminated the massive burst (the window had closed) and the system would
> approach the true maximum throughput and then stay there.

Since you did that, the world went wifi, which is a much more flaky
medium than ethernet.
Thus the probability of a packet loss event - or a string of them -
has gone way up. Same
goes for re-ordering.

Steiner has shipped me, and I've taken a couple captures, of the
behavior they are seeing at this event.

http://www.gathering.org/tg12/en/


it's a pretty cool set of demonstrations and tutorials built around
the demo scene in norway.

In summary, tcp is a really lousy way to ship live video around in the
wifi age.

With seconds or 10s of seconds of buffering on the client side, it
might work better, but the captures from here (170+ms away) are
currently showing tcp streams dropping into slow start every couple of
seconds, even though on the sending side they have 10s of gigs of
bandwidth.  They implemented packet pacing in vlc late last night
which appears to be helping the local users some...

Now interestingly (I've been fiddling with this stuff all night), the
hd udp feed I'm getting is VASTLY to be preferred.

If anybody would like me to ship them 5Mbits of live and vlc
compatible video reflected from this event from the stream I'm
getting,
over udp, over ipv6, to a/b the differences, please let me know your
ipv6 address. (and install an ipv6 compatable vlc)

let me know.

One of the interesting experiments I did last night was re-mark the
incoming udp stream to be CS5 and ship it the rest of the way around
the world (to new zealand). Somewhat unsurprisingly the CS5 marking
did not survive. Usefully, the CS5 marking inside my lab made running
it over wifi much more tolerable, as the queue lengths for other
queues would remain short. I'd like to increase the size of that data
set.

It was also nice to exercise the wifi VI queue via ipv6 - that
functionality was broken in ipv6 under linux, before v3.3.

Another experiment I'm trying is to convince routed multicast to work.
I haven't seen that work in half a decade.

There were another couple interesting statistics, including the number
of ipv6 users in the audience, that steinar has shared with me, but I
suppose it's up to him to talk further, and he's trying to hold a very
big show together.


>
> This, given the nature of use of these transfer, was a practical suggestion - they were
> going to use these systems for years analysing the LHC collisions at remote sites.
>
> Sometimes the right thing to do is to *not* push the system into its unpredictable
> region of operation.




root@europa:/sys/kernel/debug/ieee80211/phy1/ath9k# cat xmit
Num-Tx-Queues: 10  tx-queues-setup: 0x10f poll-work-seen: 301383
                            BE         BK        VI        VO

MPDUs Queued:           379719       4843     24683  10579741
MPDUs Completed:        379495       4843     24677  10576203
MPDUs XRetried:            224          0         6      3538
Aggregates:            3762753    2118222     64515         0
AMPDUs Queued HW:      4935800     513972   3578907         0
AMPDUs Queued SW:     16920425   11800714    678764         0
AMPDUs Completed:     21840409   12314328   4251697         0
AMPDUs Retried:         716857     525837    445387         0
AMPDUs XRetried:         15816        358      5974         0

root@europa:/sys/kernel/debug/ieee80211/phy1/ath9k# tc -s qdisc show dev sw10
qdisc mq 1: root
 Sent 35510975683 bytes 36846390 pkt (dropped 1292, overlimits 4047
requeues 122464)
 backlog 0b 0p requeues 122464
qdisc sfq 10: parent 1:1 limit 200p quantum 3028b depth 24 headdrop
divisor 16384 perturb 600sec
 ewma 3 min 4500b max 18000b probability 0.2 ecn
 prob_mark 9 prob_mark_head 1440 prob_drop 9
 forced_mark 19 forced_mark_head 1395 forced_drop 15
 Sent 7674030933 bytes 8098730 pkt (dropped 169, overlimits 2887
requeues 120177)
 rate 624bit 1pps backlog 0b 0p requeues 120177
qdisc sfq 20: parent 1:2 limit 200p quantum 3028b depth 24 headdrop
divisor 16384 perturb 600sec
 ewma 3 min 4500b max 18000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 0 prob_drop 0
 forced_mark 0 forced_mark_head 0 forced_drop 0
 Sent 5213430326 bytes 4290003 pkt (dropped 86, overlimits 0 requeues 509)
 rate 0bit 0pps backlog 0b 0p requeues 509
qdisc sfq 30: parent 1:3 limit 200p quantum 3028b depth 24 headdrop
divisor 16384 perturb 600sec
 ewma 3 min 4500b max 18000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 118 prob_drop 118
 forced_mark 0 forced_mark_head 272 forced_drop 652
 Sent 22167659923 bytes 17551859 pkt (dropped 920, overlimits 1160
requeues 1587)
 rate 4123Kbit 379pps backlog 0b 0p requeues 1587
qdisc sfq 40: parent 1:4 limit 200p quantum 3028b depth 24 headdrop
divisor 16384 perturb 600sec
 ewma 3 min 4500b max 18000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 0 prob_drop 0
 forced_mark 0 forced_mark_head 0 forced_drop 0
 Sent 455854501 bytes 6905798 pkt (dropped 117, overlimits 0 requeues 191)
 rate 0bit 0pps backlog 0b 0p requeues 191

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 14:17   ` Fred Baker
@ 2012-04-07 15:08     ` Neil Davies
  2012-04-07 15:16       ` Steinar H. Gunderson
  2012-04-14  0:44     ` Rick Jones
  1 sibling, 1 reply; 38+ messages in thread
From: Neil Davies @ 2012-04-07 15:08 UTC (permalink / raw)
  To: Fred Baker; +Cc: bloat

Fred

That is the general idea - the issue is that the dynamic arrival rate as "round trip window size" double just dramatically exceeds the available buffering at some intermediate point  - it is self inflicted (intra stream) congestion with the effect of dramatically increasing the quality attenuation (delay and loss) for streams flowing through that point.

The packet train may also be an issue, especially if there is h/w assist for TCP (which might well be the case here, as the interface was  a 10G one, comments Steinar?) - we have observed an interesting phenomena in access networks where packet trains arrive (8+ packets back to pack at 10G) for service down a low speed (2M) link - this leads to the effective transport delay being highly non-stationary - with all that implies for the other flows on that link.

Neil

On 7 Apr 2012, at 15:17, Fred Baker wrote:

> 
> On Apr 7, 2012, at 4:54 AM, Neil Davies wrote:
> 
>> The answer was rather simple - calculate the amount of buffering needed to achieve
>> say 99% of the "theoretical" throughput (this took some measurement as to exactly what 
>> that was) and limit the sender to that.
> 
> So what I think I hear you saying is that we need some form of ioctl interface in the sockets library that will allow the sender to state the rate it associates with the data (eg, the video codec rate), and let TCP calculate
> 
>                           f(rate in bits per second, pmtu)
>     cwnd_limit = ceiling (--------------------------------)  + C
>                                g(rtt in microseconds)
> 
> Where C is a fudge factor, probably a single digit number, and f and g are appropriate conversion functions.
> 
> I suspect there may also be value in considering Jain's "Packet Trains" paper. Something you can observe in a simple trace is that the doubling behavior in slow start has the effect of bunching a TCP session's data together. If I have two 5 MBPS data exchanges sharing a 10 MBPS pipe, it's not unusual to observe one of the sessions dominating the pipe for a while and then the other one, for a long time. One of the benefits of per-flow WFQ in the network is that it consciously breaks that up - it forces the TCPs to interleave packets instead of bursts, which means that a downstream device on a more limited bandwidth sees packets arrive at what it considers a more rational rate. It might be nice if In its initial burst, TCP consciously broke the initial window into 2, or 3, or 4, or ten, individual packet trains - spaced those packets some number of milliseconds apart, so that their acknowledgements were similarly spaced, and the resulting packet trains in subsequent RTTs were relatively small.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-06 22:21   ` Steinar H. Gunderson
@ 2012-04-07 15:08     ` Eric Dumazet
  2012-04-07 15:25       ` Dave Taht
  2012-04-14  0:35     ` Rick Jones
  1 sibling, 1 reply; 38+ messages in thread
From: Eric Dumazet @ 2012-04-07 15:08 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat

Le samedi 07 avril 2012 à 00:21 +0200, Steinar H. Gunderson a écrit :

> I'll be perfectly happy just doing _something_; I don't need a perfect
> solution. We have one more night of streaming, and then the event is over. :-)
> 
> /* Steinar */

OK its probably too late :)

Here is a script I would use/adapt somehow...
(You probably need 1024 subclasses, this script uses 256 slots only)

ETH=eth7
NUMGROUPS=256
RATE="rate 5Mbit"
ALLOT="allot 4000"

MASK=$(($NUMGROUPS-1))
HMASK=$(printf %02x $MASK)
LIST=`seq 0 $MASK`

TC="tc"
export PATH=/sbin:/usr/sbin:$PATH

# Not sure 10Gbit can be reached without TSO...
#ethtool -K $ETH tso off

$TC qdisc del dev $ETH root 2>/dev/null


$TC qdisc add dev $ETH root handle 1: est 0.5sec 4sec cbq avpkt 1200 rate 10Gbit \
	bandwidth 10Gbit
$TC class add dev $ETH parent 1: classid 1:1 \
	est 0.5sec 4sec cbq allot 8000 mpu 64 \
	rate 10Gbit prio 1 avpkt 1200 bounded

$TC filter add dev $ETH parent 1: protocol ip  prio 10 u32

$TC filter add dev $ETH parent 1: protocol ip prio 10 handle 8: u32 divisor $NUMGROUPS

for i in $LIST
do
  slot=$(printf %02x $(($i+16)))
  hexa=$(printf %02x $i)
  $TC class add dev $ETH parent 1:1 classid 1:$slot \
               est 0.5sec 4sec cbq $ALLOT mpu 64 $RATE prio 2 avpkt 1200 bounded
  $TC qdisc add dev $ETH parent 1:$slot handle $slot: est 0.5sec 4sec sfq limit 64
  $TC filter add dev $ETH parent 1: protocol ip prio 100 u32 ht 8:$hexa: \
	match ip dport 0x$hexa 0x$HMASK  flowid 1:$slot

done


$TC filter add dev $ETH parent 1: protocol ip prio 100 u32 \
		ht 800:: \
		match ip src 172.24.24.252 \
		match ip protocol 6 0xff \
		hashkey mask 0x$HMASK at 20 \
		link 8:

$TC filter add dev $ETH parent 1: protocol ip prio 100 u32 \
	match ip protocol 0 0x00 flowid 1:1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 15:08     ` Neil Davies
@ 2012-04-07 15:16       ` Steinar H. Gunderson
  0 siblings, 0 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-04-07 15:16 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

On Sat, Apr 07, 2012 at 04:08:20PM +0100, Neil Davies wrote:
> That is the general idea - the issue is that the dynamic arrival rate as
> "round trip window size" double just dramatically exceeds the available
> buffering at some intermediate point  - it is self inflicted (intra stream)
> congestion with the effect of dramatically increasing the quality
> attenuation (delay and loss) for streams flowing through that point.

We've been tuning our stream to have more consistent frame sizes (adjusting
the VBV settings); that seems to have alleviated the problems somewhat.

> The packet train may also be an issue, especially if there is h/w assist
> for TCP (which might well be the case here, as the interface was  a 10G
> one, comments Steinar?) - we have observed an interesting phenomena in
> access networks where packet trains arrive (8+ packets back to pack at 10G)
> for service down a low speed (2M) link - this leads to the effective
> transport delay being highly non-stationary - with all that implies for the
> other flows on that link.

The card is a standard Intel 10GigE card, and we've turned off segmentation
offload to avoid this precise issue.

I've also tried hacking VLC to pace out the packets a bit more, but it didn't
really seem to give the effect I had hoped, especially as things sometimes
would glitch even without any packets actually being lost (which would
contradict my “10GigE is too bursty” guess). The VBV tuning was a lot more
efficient.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 15:08     ` Eric Dumazet
@ 2012-04-07 15:25       ` Dave Taht
  2012-04-07 15:35         ` Steinar H. Gunderson
  2012-04-07 20:27         ` Neil Davies
  0 siblings, 2 replies; 38+ messages in thread
From: Dave Taht @ 2012-04-07 15:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: bloat

The test HD tcp stream is up at

http://cesur.tg12.gathering.org:9094/

on both ipv6 and ipv4. They are streaming anywhere up to 1000 users,
and there is an astounding amount of ipv6 present - 73% of the room
has an ipv6 address.

I took some captures from california last night, they were
interesting. I think a few more captures would also be interesting.

One indicated throttling at the isp at t+60 seconds, the others showed
stuff dropping out for large periods of time. (170ms rtt here!)

I'd like to look into what percentage of the failures I observed
happened on the wifi hop vs the ethernet gateway
since then many changes where made, and I'm low on sleep. (what do
geeks do on a friday night?)

I don't know if they are still trying sfqred or qfq in production -
they worked! - but had little effect (as is to be kind of expected
with the instantaneous queue length being so short and bandwidth so
high on their first and nearest hops....)
On Sat, Apr 7, 2012 at 8:08 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le samedi 07 avril 2012 à 00:21 +0200, Steinar H. Gunderson a écrit :
>
>> I'll be perfectly happy just doing _something_; I don't need a perfect
>> solution. We have one more night of streaming, and then the event is over. :-)


-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 15:25       ` Dave Taht
@ 2012-04-07 15:35         ` Steinar H. Gunderson
  2012-04-07 15:48           ` Dave Taht
                             ` (2 more replies)
  2012-04-07 20:27         ` Neil Davies
  1 sibling, 3 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-04-07 15:35 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

On Sat, Apr 07, 2012 at 08:25:18AM -0700, Dave Taht wrote:
> The test HD tcp stream is up at
> 
> http://cesur.tg12.gathering.org:9094/

That's the SD stream. http://stream.tg12.gathering.org/ has HD etc.

> I'd like to look into what percentage of the failures I observed
> happened on the wifi hop vs the ethernet gateway
> since then many changes where made, and I'm low on sleep. (what do
> geeks do on a friday night?)

FWIW, most of the users complaining don't have wifi in the mix at all.

> I don't know if they are still trying sfqred or qfq in production -
> they worked! - but had little effect (as is to be kind of expected
> with the instantaneous queue length being so short and bandwidth so
> high on their first and nearest hops....)

The one on cesur.tg12 has sfqred + my hacked VLC to do TCP pacing.

The one on stream.tg12 has the oddest “shaping” in a while; the 10GigE is
terminated in a Cisco 4948E which then has a 8x1GigE trunk out. We hope this
will smooth out the worst bursts a bit.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 15:35         ` Steinar H. Gunderson
@ 2012-04-07 15:48           ` Dave Taht
  2012-04-07 15:52           ` Dave Taht
  2012-04-07 17:10           ` Jonathan Morton
  2 siblings, 0 replies; 38+ messages in thread
From: Dave Taht @ 2012-04-07 15:48 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat

Incidentally, I need to correct something I said earlier, in that I
noted that CS5 marking over ipv6 didn't survive the trip to NZ.

It turned out that I'd changed the port number and wasn't marking that
correctly.
That reduces my data set for that statistic to 0. (it WAS 3AM, forgive me)

I note that doing this marking internally at the event for wifi would
be a bad thing due to the characteristics of the VI queue, but perhaps
marking such for e2e externally on the tcp stream might help (and the
survival rate of that marking particularly for ipv6 and 6in4,6to4)
would be interesting too)

for the port I was using...

ip6tables -t mangle -I POSTROUTING -p udp -m multiport --ports 1234 -j
DSCP --set-dscp-class CS5

would have been correct

for the tcp streams in play here:

ip6tables -t mangle -I POSTROUTING -o whatever_the_right_interface_is
-p tcp -m multiport --ports 80,9094  -j DSCP --set-dscp-class CS5
iptables -t mangle -I POSTROUTING -o whatever_the_right_interface_is
-p tcp -m multiport --ports  80,9094 -j DSCP --set-dscp-class CS5


On Sat, Apr 7, 2012 at 8:35 AM, Steinar H. Gunderson
<sgunderson@bigfoot.com> wrote:
> On Sat, Apr 07, 2012 at 08:25:18AM -0700, Dave Taht wrote:
>> The test HD tcp stream is up at
>>
>> http://cesur.tg12.gathering.org:9094/
>
> That's the SD stream. http://stream.tg12.gathering.org/ has HD etc.
>
>> I'd like to look into what percentage of the failures I observed
>> happened on the wifi hop vs the ethernet gateway
>> since then many changes where made, and I'm low on sleep. (what do
>> geeks do on a friday night?)
>
> FWIW, most of the users complaining don't have wifi in the mix at all.
>
>> I don't know if they are still trying sfqred or qfq in production -
>> they worked! - but had little effect (as is to be kind of expected
>> with the instantaneous queue length being so short and bandwidth so
>> high on their first and nearest hops....)
>
> The one on cesur.tg12 has sfqred + my hacked VLC to do TCP pacing.
>
> The one on stream.tg12 has the oddest “shaping” in a while; the 10GigE is
> terminated in a Cisco 4948E which then has a 8x1GigE trunk out. We hope this
> will smooth out the worst bursts a bit.
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 15:35         ` Steinar H. Gunderson
  2012-04-07 15:48           ` Dave Taht
@ 2012-04-07 15:52           ` Dave Taht
  2012-04-07 17:10           ` Jonathan Morton
  2 siblings, 0 replies; 38+ messages in thread
From: Dave Taht @ 2012-04-07 15:52 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat

On Sat, Apr 7, 2012 at 8:35 AM, Steinar H. Gunderson
<sgunderson@bigfoot.com> wrote:
> On Sat, Apr 07, 2012 at 08:25:18AM -0700, Dave Taht wrote:
>> The test HD tcp stream is up at
>>
>> http://cesur.tg12.gathering.org:9094/
>
> That's the SD stream. http://stream.tg12.gathering.org/ has HD etc.
>
>> I'd like to look into what percentage of the failures I observed
>> happened on the wifi hop vs the ethernet gateway
>> since then many changes where made, and I'm low on sleep. (what do
>> geeks do on a friday night?)
>
> FWIW, most of the users complaining don't have wifi in the mix at all.
>
>> I don't know if they are still trying sfqred or qfq in production -
>> they worked! - but had little effect (as is to be kind of expected
>> with the instantaneous queue length being so short and bandwidth so
>> high on their first and nearest hops....)
>
> The one on cesur.tg12 has sfqred + my hacked VLC to do TCP pacing.

kernel 3.3?

please send:

tc -s qdisc show dev the_device

cd /sys/class/net/the_device/queues/tx-0/byte_queue_limits

cat inflight
cat limit

> The one on stream.tg12 has the oddest “shaping” in a while; the 10GigE is
> terminated in a Cisco 4948E which then has a 8x1GigE trunk out. We hope this
> will smooth out the worst bursts a bit.
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 15:35         ` Steinar H. Gunderson
  2012-04-07 15:48           ` Dave Taht
  2012-04-07 15:52           ` Dave Taht
@ 2012-04-07 17:10           ` Jonathan Morton
  2012-04-07 17:18             ` Dave Taht
  2012-04-07 18:10             ` Steinar H. Gunderson
  2 siblings, 2 replies; 38+ messages in thread
From: Jonathan Morton @ 2012-04-07 17:10 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat


On 7 Apr, 2012, at 6:35 pm, Steinar H. Gunderson wrote:

> On Sat, Apr 07, 2012 at 08:25:18AM -0700, Dave Taht wrote:
>> The test HD tcp stream is up at
>> 
>> http://cesur.tg12.gathering.org:9094/
> 
> That's the SD stream. http://stream.tg12.gathering.org/ has HD etc.

I've got this running over here:

12Mbps ADSL2+ in Finland (not at all far from Norway), via a standard modem-router rather than my usual Linux system.  The LAN segment is entirely wired full-duplex Ethernet.  The receiver is a Core i7 Windows PC running the latest stable VLC, so it's not short of ability to play what it receives.

And it's dropping more frames than it's playing.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 17:10           ` Jonathan Morton
@ 2012-04-07 17:18             ` Dave Taht
  2012-04-07 17:44               ` Jonathan Morton
  2012-04-07 18:10             ` Steinar H. Gunderson
  1 sibling, 1 reply; 38+ messages in thread
From: Dave Taht @ 2012-04-07 17:18 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

On Sat, Apr 7, 2012 at 10:10 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>
> On 7 Apr, 2012, at 6:35 pm, Steinar H. Gunderson wrote:
>
>> On Sat, Apr 07, 2012 at 08:25:18AM -0700, Dave Taht wrote:
>>> The test HD tcp stream is up at
>>>
>>> http://cesur.tg12.gathering.org:9094/
>>
>> That's the SD stream. http://stream.tg12.gathering.org/ has HD etc.
>
> I've got this running over here:
>
> 12Mbps ADSL2+ in Finland (not at all far from Norway), via a standard modem-router rather than my usual Linux system.  The LAN segment is entirely wired full-duplex Ethernet.  The receiver is a Core i7 Windows PC running the latest stable VLC, so it's not short of ability to play what it receives.
>
> And it's dropping more frames than it's playing.

The difference between the udp stream and the tcp stream are amazing.
Perhaps steinar can send you a udp stream for comparison.

I just took a capture of both the tcp stream and the udp stream
simultaneously for about 5 minutes. The tcp stream was nearly useless,
the udp one excellent...

the captures are up at:

http://huchra.bufferbloat.net/~d/captures/gathering/

The tcp stream was interestingly periodic...




>
>  - Jonathan Morton
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 17:18             ` Dave Taht
@ 2012-04-07 17:44               ` Jonathan Morton
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Morton @ 2012-04-07 17:44 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat


On 7 Apr, 2012, at 8:18 pm, Dave Taht wrote:

> the captures are up at:
> 
> http://huchra.bufferbloat.net/~d/captures/gathering/
> 
> The tcp stream was interestingly periodic...

...and indicates a congestion window growing to a half megabyte.  Ouch.

I used the TCP Optimiser tool from speedguide.net to turn off window scaling on the Windows box.  This forcibly limits the window size to 64K.  It seems to be behaving a lot better after doing that.

 - Jonathan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 17:10           ` Jonathan Morton
  2012-04-07 17:18             ` Dave Taht
@ 2012-04-07 18:10             ` Steinar H. Gunderson
  2012-04-07 18:27               ` Jonathan Morton
  2012-04-07 18:50               ` Dave Taht
  1 sibling, 2 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-04-07 18:10 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

On Sat, Apr 07, 2012 at 08:10:50PM +0300, Jonathan Morton wrote:
> 12Mbps ADSL2+ in Finland (not at all far from Norway), via a standard
> modem-router rather than my usual Linux system.  The LAN segment is
> entirely wired full-duplex Ethernet.  The receiver is a Core i7 Windows PC
> running the latest stable VLC, so it's not short of ability to play what it
> receives.
> 
> And it's dropping more frames than it's playing.

Have you tried the one at cesur.tg12? It's pacing things out a bit more.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 18:10             ` Steinar H. Gunderson
@ 2012-04-07 18:27               ` Jonathan Morton
  2012-04-07 18:56                 ` Dave Taht
  2012-04-07 18:50               ` Dave Taht
  1 sibling, 1 reply; 38+ messages in thread
From: Jonathan Morton @ 2012-04-07 18:27 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat


On 7 Apr, 2012, at 9:10 pm, Steinar H. Gunderson wrote:

> Have you tried the one at cesur.tg12? It's pacing things out a bit more.

After reducing my TCP *receive* window to 64KB, the 720p50 stream at the main site is playing just fine - or at least it was, until the presenters just came on for the "Grand Finale".  That was a very abrupt and well-synchronised change in performance, so there has to be something different there.

I appreciate, however, that a 64KB window requires a maximum of 100ms round-trip latency at 5Mbps, so that's not a solution for New Zealond.

 - Jonathan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 18:10             ` Steinar H. Gunderson
  2012-04-07 18:27               ` Jonathan Morton
@ 2012-04-07 18:50               ` Dave Taht
  2012-04-07 18:54                 ` Steinar H. Gunderson
  2012-04-07 21:49                 ` Fred Baker
  1 sibling, 2 replies; 38+ messages in thread
From: Dave Taht @ 2012-04-07 18:50 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat

Steinar: Judging from these results it sounds like you should try
limiting the send window.

I don't know what the right number is, or what the current calculation
for what percentage
of this is used for window related buffering, (anyone?) but reducing
the defaults by a lot for your tcp based video servers seems
like a good idea.

to do that, edit /etc/sysctl.conf, add or change the line for

net.core.wmem_max = 256000 # or less. or more. 5Mbit stream...fairly
short rtts... hmm.. no brain cells. no sleep. ?

This is another parameter that may apply, the last value should be changed.

net.ipv4.tcp_wmem = 4096 65536 2097152

These of course change it globally. you commit the changes with sysctl
-a -p /etc/sysctl.conf

You can also tweak SO_SNDBUF in vlc via setsockopt

http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html

64-256k seems about right but the math is eluding me this morning.

You can turn off window scaling entirely, but that's not the right answer

 net.ipv4.tcp_default_win_scale = 0

the periodicity bothers me, it would be bad if that was synchronised,
to look at that problem would require some synchronous captures from
mutiple location.
On Sat, Apr 7, 2012 at 11:10
AM, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:
> On Sat, Apr 07, 2012 at 08:10:50PM +0300, Jonathan Morton wrote:
>> 12Mbps ADSL2+ in Finland (not at all far from Norway), via a standard
>> modem-router rather than my usual Linux system.  The LAN segment is
>> entirely wired full-duplex Ethernet.  The receiver is a Core i7 Windows PC
>> running the latest stable VLC, so it's not short of ability to play what it
>> receives.
>>
>> And it's dropping more frames than it's playing.
>
> Have you tried the one at cesur.tg12? It's pacing things out a bit more.
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 18:50               ` Dave Taht
@ 2012-04-07 18:54                 ` Steinar H. Gunderson
  2012-04-07 19:01                   ` Steinar H. Gunderson
  2012-04-07 19:02                   ` Dave Taht
  2012-04-07 21:49                 ` Fred Baker
  1 sibling, 2 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-04-07 18:54 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

On Sat, Apr 07, 2012 at 11:50:24AM -0700, Dave Taht wrote:
> net.core.wmem_max = 256000 # or less. or more. 5Mbit stream...fairly
> short rtts... hmm.. no brain cells. no sleep. ?
> 
> This is another parameter that may apply, the last value should be changed.
> 
> net.ipv4.tcp_wmem = 4096 65536 2097152

I did these on one of the irrors (well, I did 500000 instead of 256000).
You can try

  http://pannekake.samfundet.no:3013/ (SD)
  http://pannekake.samfundet.no:3015/ (HD)

I didn't restart VLC; I hope I don't have to. =)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 18:27               ` Jonathan Morton
@ 2012-04-07 18:56                 ` Dave Taht
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Taht @ 2012-04-07 18:56 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

On Sat, Apr 7, 2012 at 11:27 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>
> On 7 Apr, 2012, at 9:10 pm, Steinar H. Gunderson wrote:
>
>> Have you tried the one at cesur.tg12? It's pacing things out a bit more.
>
> After reducing my TCP *receive* window to 64KB, the 720p50 stream at the main site is playing just fine - or at least it was, until the presenters just came on for the "Grand Finale".  That was a very abrupt and well-synchronised change in performance, so there has to be something different there.

there was some great female singer/techno folk on earlier, I enjoyed
it, even if I didn't speak the language.

>
> I appreciate, however, that a 64KB window requires a maximum of 100ms round-trip latency at 5Mbps, so that's not a solution for New Zealond.

They are mostly dealing with far shorter rtts than that - a few ms in
the hall (wifi, but with bufferbloat I imagine - I'd have liked to
have got some captures ) a few dozen in the area  -new zealand isn't
their target market, playing with that and california was a way to see
where the problem might be coming from.

So it sounds like a less than 64k window is enough to cover much of
europe at these speeds.

>
>  - Jonathan



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 18:54                 ` Steinar H. Gunderson
@ 2012-04-07 19:01                   ` Steinar H. Gunderson
  2012-04-07 19:08                     ` Jonathan Morton
                                       ` (2 more replies)
  2012-04-07 19:02                   ` Dave Taht
  1 sibling, 3 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-04-07 19:01 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

On Sat, Apr 07, 2012 at 08:54:56PM +0200, Steinar H. Gunderson wrote:
> I did these on one of the irrors (well, I did 500000 instead of 256000).
> You can try
> 
>   http://pannekake.samfundet.no:3013/ (SD)
>   http://pannekake.samfundet.no:3015/ (HD)
> 
> I didn't restart VLC; I hope I don't have to. =)

I got reports from people in Norway that this instantly stopped the problems
on the HD stream, so incredibly enough, it may have worked.

I don't understand these mechanisms. Why would a smaller send window help?
Less burstiness?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 18:54                 ` Steinar H. Gunderson
  2012-04-07 19:01                   ` Steinar H. Gunderson
@ 2012-04-07 19:02                   ` Dave Taht
  1 sibling, 0 replies; 38+ messages in thread
From: Dave Taht @ 2012-04-07 19:02 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat

On Sat, Apr 7, 2012 at 11:54 AM, Steinar H. Gunderson
<sgunderson@bigfoot.com> wrote:
> On Sat, Apr 07, 2012 at 11:50:24AM -0700, Dave Taht wrote:
>> net.core.wmem_max = 256000 # or less. or more. 5Mbit stream...fairly
>> short rtts... hmm.. no brain cells. no sleep. ?
>>
>> This is another parameter that may apply, the last value should be changed.
>>
>> net.ipv4.tcp_wmem = 4096 65536 2097152
>
> I did these on one of the irrors (well, I did 500000 instead of 256000).
> You can try
>
>  http://pannekake.samfundet.no:3013/ (SD)
>  http://pannekake.samfundet.no:3015/ (HD)
>
> I didn't restart VLC; I hope I don't have to. =)

aguably way too big, even to get stuff to california.
Tried it, same behavior.

And regrettably I think (but am not the expert), that
this is an inheritable property of the socket, so you would
need to restart vlc (or just add another mirror on another port after
changinge  it, or rebuilding vlc to have the right setsock,or perhaps
it has it on the command line)



>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 19:01                   ` Steinar H. Gunderson
@ 2012-04-07 19:08                     ` Jonathan Morton
  2012-04-07 19:38                     ` Dave Taht
  2012-04-07 21:13                     ` Jesper Dangaard Brouer
  2 siblings, 0 replies; 38+ messages in thread
From: Jonathan Morton @ 2012-04-07 19:08 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat


On 7 Apr, 2012, at 10:01 pm, Steinar H. Gunderson wrote:

> I got reports from people in Norway that this instantly stopped the problems
> on the HD stream, so incredibly enough, it may have worked.
> 
> I don't understand these mechanisms. Why would a smaller send window help?
> Less burstiness?

That's the short answer, yes.  It limits the amount of data that can be "in the network" at any one time, which is where it needs to find buffers whenever the link speed narrows.  If it needs to find buffers but they aren't big enough to hold all the data, packets are lost.

Another side effect is that it increases the "resonant frequency" of the connection, so the receiver can communicate about lost packets and actually receive the retransmissions before it's own buffer runs out.

Note also that once the client buffer is satisfied, it only needs to get data smoothly, almost one frame at a time, whereas if it has just had to deal with a major loss event, the buffer will be empty and needs a lot of data to fill it.  This latter effect is why, once the problem has occurred, it keeps on occurring.  The periodicity probably is synchronised - to the video keyframes!

 - Jonathan


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 19:01                   ` Steinar H. Gunderson
  2012-04-07 19:08                     ` Jonathan Morton
@ 2012-04-07 19:38                     ` Dave Taht
  2012-04-07 20:16                       ` Steinar H. Gunderson
  2012-04-14  0:37                       ` Rick Jones
  2012-04-07 21:13                     ` Jesper Dangaard Brouer
  2 siblings, 2 replies; 38+ messages in thread
From: Dave Taht @ 2012-04-07 19:38 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat

On Sat, Apr 7, 2012 at 12:01 PM, Steinar H. Gunderson
<sgunderson@bigfoot.com> wrote:
> On Sat, Apr 07, 2012 at 08:54:56PM +0200, Steinar H. Gunderson wrote:
>> I did these on one of the irrors (well, I did 500000 instead of 256000).
>> You can try
>>
>>   http://pannekake.samfundet.no:3013/ (SD)
>>   http://pannekake.samfundet.no:3015/ (HD)
>>
>> I didn't restart VLC; I hope I don't have to. =)
>
> I got reports from people in Norway that this instantly stopped the problems
> on the HD stream, so incredibly enough, it may have worked.
>
> I don't understand these mechanisms. Why would a smaller send window help?
> Less burstiness?

Awesome. I still think it's way too big, but there's some divisor in
here (1/4?) that I don't remember.

As for an explanation...

Welcome to bufferbloat, the global plague that is sweeping the world!

Up until you hit the available bandwidth on a path, life is golden,
and response time to lost packets approximately equals the overall
latency in the path, say, 10ms for around town. Your video player has
at least a few 100ms worth of it's own buffering, so it doesn't even
notice anything but a truly massive outage. TCP just recovers,
transparently, underneat.

But:

You pass that operating point, all the buffers in the path fill, your
delays go way up, then you finally lose a packet (see rfc970) and tcp
cannot recover with all the data in flight rapidly enough
(particularly in the case of streaming video), all the lost data needs
to be resent, and you get a collapse and a tcp reset. And then the
process starts again.

The more bandwidth you are using up, the easier (and faster ) it is to trigger.

tcp's response time vs buffering induced latency is quadratic. if you
have 10x more buffering in the path than needed, it takes 100x longer
to recover (so your recovery time went from ~10ms to 1000)

We're seeing buffering all over the internet in multiple types of
devices well in excess of that.

Your buffers don't always fill, as some people have sufficient
bandwidth and correctly operating gear.

Most importantly very few people try to send sustained bursts of tcp
data longer than a few seconds, which is why this is normally so hard
to see. You were streaming 5Mbit which is far more than just about
anybody....

Second most importantly it's a problem that is hard to trigger at sub
ms rtts (like what you would get on local ethernet during test) , but
gets rapidly easier as your rtts > 1 and the randomness of the
internet kicks in.

For WAY more detail on bufferbloat the cacm articles are canonical.

http://cacm.acm.org/magazines/2012/1/144810-bufferbloat/fulltext

See chart 4B in particular for a graphical explanation of how window
size and rtts are interrelated.

Your current setting now is overbuffered, but far less massively so.
With the short rtts the quadratic-isms kick in, but limiting the send
window size is still sub-optimal.

I'm very sorry your show hit this problem, so hard,  and that  it took
so long to figure out.

It will make a great case study, and I would love it of a few of the
amazing graphics and sound talents you had at  "the gathering" - could
lend their vision to explaining what bufferbloat is!

jg's videos are helpful but they don't rock out to techno! nor do they
feature led-lit juggling balls (wow!), nor spectral snakes that
dissolve into visual madness.

I am glad I had a chance to watch the udp stream and sad to know that
so few others were able to enjoy the show.

Perhaps using an rtp based streaming method, particularly over ipv6,
will give you a better result next year.



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 19:38                     ` Dave Taht
@ 2012-04-07 20:16                       ` Steinar H. Gunderson
  2012-04-14  0:37                       ` Rick Jones
  1 sibling, 0 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-04-07 20:16 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

On Sat, Apr 07, 2012 at 12:38:04PM -0700, Dave Taht wrote:
> As for an explanation...
> 
> Welcome to bufferbloat, the global plague that is sweeping the world!

Aaa. I already follow the bufferbloat discussion (although most of it is over
my head; even though my degree is in communications technology, I went down
the DSP path and not networking), but I really didn't expect bufferbloat to
actually be the direct cause of all of this. :-)

> It will make a great case study, and I would love it of a few of the
> amazing graphics and sound talents you had at  "the gathering" - could
> lend their vision to explaining what bufferbloat is!
> 
> jg's videos are helpful but they don't rock out to techno! nor do they
> feature led-lit juggling balls (wow!), nor spectral snakes that
> dissolve into visual madness.

Not speaking Norwegian, I think you missed the most important part about the
juggling balls -- they contain Arduinos with gravity sensors etc.. The
spectral snakes sound very much like Struct by Outracks, which was indeed
shown (it got 3rd place in the democompo last year).

> Perhaps using an rtp based streaming method, particularly over ipv6,
> will give you a better result next year.

RTP/RSTP would be interesting, although I have no experience with it
properly. How does it work wrt. yielding to TCP? How well does it traverse
NATs? (We do full stack and have done for years, but unfortunately most of
our external audience has no IPv6 yet.)

The intro compo is starting soon, and then the demo compo (right now,
freestyle video).

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 15:25       ` Dave Taht
  2012-04-07 15:35         ` Steinar H. Gunderson
@ 2012-04-07 20:27         ` Neil Davies
  1 sibling, 0 replies; 38+ messages in thread
From: Neil Davies @ 2012-04-07 20:27 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

Fred

thinking on this more the 'g' (the function of RTT) is something like 'min' - on the assumption that the path doesn't change during the transfer - that represents the immutable part of the end-to-end delay, the 'inflight' buffering.

Neil

On 7 Apr 2012, at 16:25, Dave Taht wrote:

> The test HD tcp stream is up at
> 
> http://cesur.tg12.gathering.org:9094/
> 
> on both ipv6 and ipv4. They are streaming anywhere up to 1000 users,
> and there is an astounding amount of ipv6 present - 73% of the room
> has an ipv6 address.
> 
> I took some captures from california last night, they were
> interesting. I think a few more captures would also be interesting.
> 
> One indicated throttling at the isp at t+60 seconds, the others showed
> stuff dropping out for large periods of time. (170ms rtt here!)
> 
> I'd like to look into what percentage of the failures I observed
> happened on the wifi hop vs the ethernet gateway
> since then many changes where made, and I'm low on sleep. (what do
> geeks do on a friday night?)
> 
> I don't know if they are still trying sfqred or qfq in production -
> they worked! - but had little effect (as is to be kind of expected
> with the instantaneous queue length being so short and bandwidth so
> high on their first and nearest hops....)
> On Sat, Apr 7, 2012 at 8:08 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le samedi 07 avril 2012 à 00:21 +0200, Steinar H. Gunderson a écrit :
>> 
>>> I'll be perfectly happy just doing _something_; I don't need a perfect
>>> solution. We have one more night of streaming, and then the event is over. :-)
> 
> 
> -- 
> Dave Täht
> SKYPE: davetaht
> US Tel: 1-239-829-5608
> http://www.bufferbloat.net
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 19:01                   ` Steinar H. Gunderson
  2012-04-07 19:08                     ` Jonathan Morton
  2012-04-07 19:38                     ` Dave Taht
@ 2012-04-07 21:13                     ` Jesper Dangaard Brouer
  2012-04-07 21:31                       ` Steinar H. Gunderson
  2 siblings, 1 reply; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2012-04-07 21:13 UTC (permalink / raw)
  To: Steinar H. Gunderson, Dave Taht; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

Hi Steinar,

The stream from http://pannekake.samfundet.no:3013 is fairly stable,
compared to e.g. http://cesur.tg12.gathering.org:9094/, but is clear
that sometimes excessive buffering does occur.

Try to looking at the Recv-Q size, e.g. using the command "netstat -tan".
Some time you will see it grows, see the output below, where
it reached 400Kbytes.  This is data avail to the application (in the
kernel), but not yet processed/read by the VLC application.

[hawk@t520 norsk-streaming]$ netstat -tan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address          Foreign Address     State
[...cut...]
tcp   400312      0 192.168.42.180:59826   129.241.93.35:3013  ESTABLISHED

While writing this email, I saw it jump upto 949690 bytes, and the
signal quality went down.

--Jesper Brouer



-----Original Message-----
From: bloat-bounces@lists.bufferbloat.net on behalf of Steinar H. Gunderson
Sent: Sat 4/7/2012 21:01
To: Dave Taht
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Best practices for paced TCP on Linux?
 
On Sat, Apr 07, 2012 at 08:54:56PM +0200, Steinar H. Gunderson wrote:
> I did these on one of the irrors (well, I did 500000 instead of 256000).
> You can try
> 
>   http://pannekake.samfundet.no:3013/ (SD)
>   http://pannekake.samfundet.no:3015/ (HD)
> 
> I didn't restart VLC; I hope I don't have to. =)

I got reports from people in Norway that this instantly stopped the problems
on the HD stream, so incredibly enough, it may have worked.

I don't understand these mechanisms. Why would a smaller send window help?
Less burstiness?

/* Steinar */
-- 
Homepage: http://www.sesse.net/
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


[-- Attachment #2: Type: text/html, Size: 2921 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 21:13                     ` Jesper Dangaard Brouer
@ 2012-04-07 21:31                       ` Steinar H. Gunderson
  0 siblings, 0 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-04-07 21:31 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: bloat

On Sat, Apr 07, 2012 at 11:13:59PM +0200, Jesper Dangaard Brouer wrote:
> The stream from http://pannekake.samfundet.no:3013 is fairly stable,
> compared to e.g. http://cesur.tg12.gathering.org:9094/, but is clear
> that sometimes excessive buffering does occur.

Note that right now I think you're comparing 3Mbit/sec to 5Mbit/sec.

In any case, the main interesting part is over now; and seemingly Dave's wmem
trick was what pushed it to usable for our Norwegian viewers. So thanks a
bunch -- our viewers seem to be very happy. =)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 18:50               ` Dave Taht
  2012-04-07 18:54                 ` Steinar H. Gunderson
@ 2012-04-07 21:49                 ` Fred Baker
  2012-04-07 22:36                   ` Dave Taht
  1 sibling, 1 reply; 38+ messages in thread
From: Fred Baker @ 2012-04-07 21:49 UTC (permalink / raw)
  To: Dave Taht, Steinar H. Gunderson; +Cc: bloat


On Apr 7, 2012, at 11:50 AM, Dave Taht wrote:

> http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html
> 
> 64-256k seems about right but the math is eluding me this morning.

For a 5 MBPS data stream, Path MTU = 1460, 100 ms RTT, you're looking at

                       rate in bps    rtt in microseconds
cwnd_limit = ceiling ( ----------- * ------------------- ) 
                          8*pmtu         1e6

                        5e6          100 e 3
           = ceiling ( ------ * ------------------- ) 
                       8*1460          1e6

           = ceiling ( 428 * 0.100 ) 

           = 43

you probably want to bump that by one or two to account for 43*40 bytes of IP and TCP headers.

43*1460 is 62780 bytes per RTT, which is frightfully close to 65K bytes per RTT, 524,280 bits per RTT, or 5,242,800 bits per second with the stated RTT. Hmmm.

Speaking strictly for myself, I would throw in one caveat, which is that a variable bit rate codec that averages 5 MBPS sometimes sends faster, and there may be good reason to allow it to. I think I'd recalculate for 6 MBPS on average, and carefully insert the RTT I cared about into the calculation. Doing that also accounts for the Mathis formula, which is far more complex and requires a lot more assumptions, but will come up with a number below 6 MBPS for a .1% loss rate.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 21:49                 ` Fred Baker
@ 2012-04-07 22:36                   ` Dave Taht
  2012-04-07 23:59                     ` Fred Baker
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Taht @ 2012-04-07 22:36 UTC (permalink / raw)
  To: bloat

On Sat, Apr 7, 2012 at 2:49 PM, Fred Baker <fred@cisco.com> wrote:
>
> On Apr 7, 2012, at 11:50 AM, Dave Taht wrote:
>
>> http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html
>>
>> 64-256k seems about right but the math is eluding me this morning.
>
> For a 5 MBPS data stream, Path MTU = 1460, 100 ms RTT, you're looking at
>
>                       rate in bps    rtt in microseconds
> cwnd_limit = ceiling ( ----------- * ------------------- )
>                          8*pmtu         1e6
>
>                        5e6          100 e 3
>           = ceiling ( ------ * ------------------- )
>                       8*1460          1e6
>
>           = ceiling ( 428 * 0.100 )
>
>           = 43
>
> you probably want to bump that by one or two to account for 43*40 bytes of IP and TCP headers.
>
> 43*1460 is 62780 bytes per RTT, which is frightfully close to 65K bytes per RTT, 524,280 bits per RTT, or 5,242,800 bits per second with the stated RTT. Hmmm.
>
> Speaking strictly for myself, I would throw in one caveat, which is that a variable bit rate codec that averages 5 MBPS sometimes sends faster, and there may be good reason to allow it to. I think I'd recalculate for 6 MBPS on average, and carefully insert the RTT I cared about into the calculation. Doing that also accounts for the Mathis formula, which is far more complex and requires a lot more assumptions, but will come up with a number below 6 MBPS for a .1% loss rate.

In my case I'm 196ms away and running this for the past hour or so

vlc -6 http://pannekake.samfundet.no:3015/

seems to show it never really getting out of slow start.

Regrettably my favorite graph crashes xplot (grr)... so I can't see
the canonical bloat pattern or not.

Could be a problem in my lab (but the udp stream is ok), am checking now...

Captures are up here:
http://huchra.bufferbloat.net/~d/captures/gathering/pannekake_3015.cap

-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 22:36                   ` Dave Taht
@ 2012-04-07 23:59                     ` Fred Baker
  0 siblings, 0 replies; 38+ messages in thread
From: Fred Baker @ 2012-04-07 23:59 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat


On Apr 7, 2012, at 3:36 PM, Dave Taht wrote:

> In my case I'm 196ms away and running this for the past hour or so
> 
> vlc -6 http://pannekake.samfundet.no:3015/
> 
> seems to show it never really getting out of slow start.

Which raises the question of whether the propagation delay is 196 ms or you're simply not seeing a signal from the network...

But your average cwnd should be 83.904109589041096, which probably means it's 84 or better. What is it actually?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-06 22:21   ` Steinar H. Gunderson
  2012-04-07 15:08     ` Eric Dumazet
@ 2012-04-14  0:35     ` Rick Jones
  2012-04-14 21:06       ` Roger Jørgensen
  1 sibling, 1 reply; 38+ messages in thread
From: Rick Jones @ 2012-04-14  0:35 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: bloat

On 04/06/2012 03:21 PM, Steinar H. Gunderson wrote:
> On Fri, Apr 06, 2012 at 02:49:38PM -0700, Dave Taht wrote:
>> However in your environment you will need the beefed up SFQ that is in 3.3.
>> and BQL. If you are not saturating that 10GigE card, you can turn off TSO/GSO
>> as well.
>
> We're not anywhere near saturating our 10GigE card, and even if we did, we
> could add at least one 10GigE card more.

TSO/GSO isn't so much about saturating the 10 GbE NIC as it is avoiding 
saturating the CPU(s) driving the 10 GbE NIC.  That is, they save trips 
down the protocol stack, saving CPU cycles.  So, if you are not 
saturating one or more of the CPUs in the system, disabling TSO/GSO 
should not affect your ability to drive bits out the NIC.

rick jones

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 19:38                     ` Dave Taht
  2012-04-07 20:16                       ` Steinar H. Gunderson
@ 2012-04-14  0:37                       ` Rick Jones
  1 sibling, 0 replies; 38+ messages in thread
From: Rick Jones @ 2012-04-14  0:37 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat

And bufferbloat in the intermediate devices would be one thing, but the 
Linux stack's penchant to grow the socket buffers/windows well beyond 
the bandwidth delay product certainly does not help.

rick jones

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-07 14:17   ` Fred Baker
  2012-04-07 15:08     ` Neil Davies
@ 2012-04-14  0:44     ` Rick Jones
  1 sibling, 0 replies; 38+ messages in thread
From: Rick Jones @ 2012-04-14  0:44 UTC (permalink / raw)
  To: Fred Baker; +Cc: bloat

On 04/07/2012 07:17 AM, Fred Baker wrote:
>
> On Apr 7, 2012, at 4:54 AM, Neil Davies wrote:
>
>> The answer was rather simple - calculate the amount of buffering needed to achieve
>> say 99% of the "theoretical" throughput (this took some measurement as to exactly what
>> that was) and limit the sender to that.
>
> So what I think I hear you saying is that we need some form of ioctl
> interface in the sockets library that will allow the sender to state
> the rate it associates with the data (eg, the video codec rate), and
> let TCP calculate
>
>                             f(rate in bits per second, pmtu)
>       cwnd_limit = ceiling (--------------------------------)  + C
>                                  g(rtt in microseconds)
>
> Where C is a fudge factor, probably a single digit number, and f and
> g  are appropriate conversion functions.

Since cwnd will never be more than SO_SNDBUF, apart from complications 
getting the rtt, I think one can probably do something close to that 
from user space with setsockopt(SO_SNDBUF).  I'm ass-u-me-ing that 
getsockopt(TCP_MAXSEG) will track PTMU, but I'm not sure if one can get 
the RTT portably - I think a getsockopt(TCP_INFO) under Linux will get 
the RTT, but don't know about other stacks. (Looks like Linux will also 
return a pmtu value).

rick jones

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-14  0:35     ` Rick Jones
@ 2012-04-14 21:06       ` Roger Jørgensen
  2012-04-16 17:05         ` Rick Jones
  0 siblings, 1 reply; 38+ messages in thread
From: Roger Jørgensen @ 2012-04-14 21:06 UTC (permalink / raw)
  To: Rick Jones; +Cc: bloat

On Sat, Apr 14, 2012 at 2:35 AM, Rick Jones <rick.jones2@hp.com> wrote:
> On 04/06/2012 03:21 PM, Steinar H. Gunderson wrote:
>> On Fri, Apr 06, 2012 at 02:49:38PM -0700, Dave Taht wrote:
>>> However in your environment you will need the beefed up SFQ that is in
>>> 3.3.
>>> and BQL. If you are not saturating that 10GigE card, you can turn off
>>> TSO/GSO
>>> as well.
>> We're not anywhere near saturating our 10GigE card, and even if we did, we
>> could add at least one 10GigE card more.
>
> TSO/GSO isn't so much about saturating the 10 GbE NIC as it is avoiding
> saturating the CPU(s) driving the 10 GbE NIC.  That is, they save trips down
> the protocol stack, saving CPU cycles.  So, if you are not saturating one or
> more of the CPUs in the system, disabling TSO/GSO should not affect your
> ability to drive bits out the NIC.

What will happen in a virtual only environment when all the VM's got
more than one 10Gbps and you push close to 10Gbps through each VM?
like heavy iperf between lots of the VM's?
Unless the platform does something that should start to saturate some
of the CPU core's in the entire playform.


... kinda make me want to test it out since I got a 10Gbps only
environment (Cisco UCS+Nexus5K) with VMware and there are not much
production traffic there yet...



-- 

Roger Jorgensen           |
rogerj@gmail.com          | - IPv6 is The Key!
http://www.jorgensen.no   | roger@jorgensen.no

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-14 21:06       ` Roger Jørgensen
@ 2012-04-16 17:05         ` Rick Jones
  0 siblings, 0 replies; 38+ messages in thread
From: Rick Jones @ 2012-04-16 17:05 UTC (permalink / raw)
  To: Roger Jørgensen; +Cc: bloat

On 04/14/2012 02:06 PM, Roger Jørgensen wrote:
> On Sat, Apr 14, 2012 at 2:35 AM, Rick Jones<rick.jones2@hp.com>  wrote:
>> On 04/06/2012 03:21 PM, Steinar H. Gunderson wrote:
>>> On Fri, Apr 06, 2012 at 02:49:38PM -0700, Dave Taht wrote:
>>>> However in your environment you will need the beefed up SFQ that is in
>>>> 3.3.
>>>> and BQL. If you are not saturating that 10GigE card, you can turn off
>>>> TSO/GSO
>>>> as well.
>>> We're not anywhere near saturating our 10GigE card, and even if we did, we
>>> could add at least one 10GigE card more.
>>
>> TSO/GSO isn't so much about saturating the 10 GbE NIC as it is avoiding
>> saturating the CPU(s) driving the 10 GbE NIC.  That is, they save trips down
>> the protocol stack, saving CPU cycles.  So, if you are not saturating one or
>> more of the CPUs in the system, disabling TSO/GSO should not affect your
>> ability to drive bits out the NIC.
>
> What will happen in a virtual only environment when all the VM's got
> more than one 10Gbps and you push close to 10Gbps through each VM?
> like heavy iperf between lots of the VM's?

I don't know, I run netperf :)

> Unless the platform does something that should start to saturate some
> of the CPU core's in the entire playform.

If the VMs are all on the same system, or there are enough 10 GbEs yes, 
that would probably start to saturate the CPUs, perhaps even with TSO on 
(if the VMs have that on their emulated interfaces).  Probably lots of 
time spent moving data around.  If though all the VMs are talking out 
the one 10 Gbps pipe, even with bloat the TCP connections (I'm assuming 
TCP) will get backed-off and the CPUs won't be at saturation.

But the eash with which things like TSO/GSO and LRO/GRO can hide a 
multitude of path-length sins is why I prefer to use aggregate, 
burst-mode TCP_RR to measure scalability - lots and lots of trips up and 
down the protocol stack.

(I should probably switch the example to TCP_RR - 
http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-_002d_002denable_002ddemo 
)

rick jones

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Bloat] Best practices for paced TCP on Linux?
  2012-04-06 21:37 [Bloat] Best practices for paced TCP on Linux? Steinar H. Gunderson
  2012-04-06 21:49 ` Dave Taht
  2012-04-07 11:54 ` Neil Davies
@ 2012-05-12 20:08 ` Steinar H. Gunderson
  2 siblings, 0 replies; 38+ messages in thread
From: Steinar H. Gunderson @ 2012-05-12 20:08 UTC (permalink / raw)
  To: bloat

On Fri, Apr 06, 2012 at 11:37:25PM +0200, Steinar H. Gunderson wrote:
> Long story short, I have a Linux box (running 3.2.0 or so) with a 10Gbit/sec
> interface, streaming a large amount of video streams to external users,
> at 1Mbit/sec, 3Mbit/sec or 5Mbit/sec (different values). Unfortunately, even
> though there is no congestion in _our_ network (we have 190 Gbit/sec free!),
> some users are complaining that they can't keep the stream up.

I made a summary blog post about this case; the series isn't done yet
(since the problem isn't fully solved yet! :-) ), and there's not too much
new stuff for people on this list, but the packet videos are pretty nice,
so it might be interesting nevertheless:

  http://blog.sesse.net/blog/tech/TG/2012-05-12-22-03_tcp_optimization_for_video_streaming

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2012-05-12 20:09 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-06 21:37 [Bloat] Best practices for paced TCP on Linux? Steinar H. Gunderson
2012-04-06 21:49 ` Dave Taht
2012-04-06 22:21   ` Steinar H. Gunderson
2012-04-07 15:08     ` Eric Dumazet
2012-04-07 15:25       ` Dave Taht
2012-04-07 15:35         ` Steinar H. Gunderson
2012-04-07 15:48           ` Dave Taht
2012-04-07 15:52           ` Dave Taht
2012-04-07 17:10           ` Jonathan Morton
2012-04-07 17:18             ` Dave Taht
2012-04-07 17:44               ` Jonathan Morton
2012-04-07 18:10             ` Steinar H. Gunderson
2012-04-07 18:27               ` Jonathan Morton
2012-04-07 18:56                 ` Dave Taht
2012-04-07 18:50               ` Dave Taht
2012-04-07 18:54                 ` Steinar H. Gunderson
2012-04-07 19:01                   ` Steinar H. Gunderson
2012-04-07 19:08                     ` Jonathan Morton
2012-04-07 19:38                     ` Dave Taht
2012-04-07 20:16                       ` Steinar H. Gunderson
2012-04-14  0:37                       ` Rick Jones
2012-04-07 21:13                     ` Jesper Dangaard Brouer
2012-04-07 21:31                       ` Steinar H. Gunderson
2012-04-07 19:02                   ` Dave Taht
2012-04-07 21:49                 ` Fred Baker
2012-04-07 22:36                   ` Dave Taht
2012-04-07 23:59                     ` Fred Baker
2012-04-07 20:27         ` Neil Davies
2012-04-14  0:35     ` Rick Jones
2012-04-14 21:06       ` Roger Jørgensen
2012-04-16 17:05         ` Rick Jones
2012-04-07 11:54 ` Neil Davies
2012-04-07 14:17   ` Fred Baker
2012-04-07 15:08     ` Neil Davies
2012-04-07 15:16       ` Steinar H. Gunderson
2012-04-14  0:44     ` Rick Jones
2012-04-07 14:48   ` Dave Taht
2012-05-12 20:08 ` Steinar H. Gunderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox