From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-x242.google.com (mail-yk0-x242.google.com [IPv6:2607:f8b0:4002:c07::242]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 2AAD921F357 for ; Fri, 12 Jun 2015 12:09:45 -0700 (PDT) Received: by ykp9 with SMTP id 9so4082408ykp.3 for ; Fri, 12 Jun 2015 12:09:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=q8U+MExuNT8rUfQu7D7xIgd7OLKkRZNomuUSa0UdVbg=; b=oc3ywOftlaHtZ3Qc0HMka+4vea3c0wNIRdwipZXgHaW4AFrxEGaMLa6vjcfoV8Hpd/ 6fFl7AAinnui9D34BpcPxjCXU+dz0dLMB8kk2xMwbfqbTA+/+WAEFS1cMpd/H/E+pq2N V95sr2vRfkOVuZO7dyref0aoBOVpR2VvCUoC51IXnWt/HFXcKakrg1IAxb0NRr+/qJdP zWta+95wDzgxSgW9ZkN7FPW8RjZSaL2UrYTLdKXbKFm6pGwWPzLu2hhhX2bq+XgqaFbs UMl8k92v7W6dK4bOO73wR2HIULA3s9YUaLeewxxhkknOhxSQyH/roujhocLSZGK5qvf9 N9JA== MIME-Version: 1.0 X-Received: by 10.129.93.136 with SMTP id r130mr16191368ywb.52.1434136184743; Fri, 12 Jun 2015 12:09:44 -0700 (PDT) Received: by 10.129.148.194 with HTTP; Fri, 12 Jun 2015 12:09:44 -0700 (PDT) Date: Fri, 12 Jun 2015 14:09:44 -0500 Message-ID: From: Benjamin Cronce To: cake@lists.bufferbloat.net Content-Type: multipart/alternative; boundary=001a114d80fad9ee78051856d95d Subject: [Cake] lower bounds for latency X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jun 2015 19:10:14 -0000 --001a114d80fad9ee78051856d95d Content-Type: text/plain; charset=UTF-8 > [Cake] lower bounds for latency > > David Lang david at lang.hm > Fri Jun 5 20:08:54 PDT 2015 > Previous message: [Cake] lower bounds for latency > Next message: [Cake] lower bounds for latency > Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] > On Fri, 5 Jun 2015, Dave Taht wrote: > > > On Fri, Jun 5, 2015 at 6:02 PM, David Lang wrote: > >> On Fri, 5 Jun 2015, Dave Taht wrote: > >> > >>> bob's been up to good stuff lately.. > >>> > >>> http://bobbriscoe.net/projects/latency/sub-mss-w.pdf > >> > >> > >> one thing that looks wrong to me. He talks about how TCP implementations > >> cannot operate at less than two packets per RTT. It's not clear what he > >> means. Does he mean two packets in flight per RTT? or two packets worth of > >> buffering per RTT? > > > > Flight. > > In that case I think his analysis of the effects of AQM are incorrect because > they only limit the buffer size on one device, not the number of packets in > flight. I think what he was getting after is if you have a low provisioned bandwidth and a really low ping, TCP effectively has a lower bound on how slow it will go. Say you have a 1ms ping to a CDN in your ISP's network, but your ISP only provisioned you 1Mb of bandwidth. Since all current TCP implementations require at least 2 segments in flight, this puts a lower bound on how slowly TCP will send those segments. (1500*2)/1ms = 12Mb/s. Even though you have a 1Mb connection, TCP will flood your connection with 12Mb and will not back off, even with packetloss signalling to back off. The example in the paper is a lot more realistic than my extreme example. Several streams with a 6ms ping, putting a lower bound of (1500*2)/6ms = 4Mb/s per stream. > > > He uses the example of 12 connections totaling 40Mb with a 6ms RTT. What if the > systems are in the same rack and have <1ms RTT? according to him TCP just won't > work. If the system's are in the same rack, the choke points are probably somewhere other than the backplane of the switch, like the CPU, harddrive, or possibly even the port. In datacenters, it's relatively easy to throw more bandwidth at the problem. > > > I also disagree with this statement: > > > > "It is always wrong to send smaller packets more > > often, because the constraint may be packet pro- > > cessing, not bits." > > > > because I believe (but would have to go look at some data to make > > sure) that we're good with packet sizes down into the 300 byte range > > on most hardware, and thus could (and SHOULD) also reduce the mss in > > these cases to keep the "signal strength" up at a sustainable levels. > > > > I do recall that bittorrent used to reduce the mss and were asked to > > stop (a decade ago), when they got as far down as 600 bytes or so. > > > > but the rest I quite liked. > > I think it depends on the speed of the link. At the very low end (cheap home > routers) and the very high end (multiple links >10Gb/s) the available processing > per packet may be a limit. But in the huge middle ground between these extremes, > you have quite a bit of cpu time per packet. The 3800 at 100Mb is an example of > running into this limit, but at 10Mb it has no problem. The WRT1200 with it's > dual-core GHz+ cpu has a lot of processor available for a bit more money. From > there up until the large datacenter/ISP cores with multiple 10Gb ports to > manage, you hae pleanty of cpu. > > The other issue is the length of the 'dead air' between packets. The current > standards have this being a fixed amount of time. combined with the 'wasted' > packet header data results in the same amount of data using less total time if > it's sent in fewer, larger packets than in smaller packets. when you are pushing > the limits of the wire, this can make a difference. > > This is why wifi tries to combine multiple packets into one transmission. > > > We just need to break people of the mindset that it makes sense to hold off on > transmitting something "just in case" something more comes along that could be > combiend with it. > > Instead they need to start off tranmitting the first one that comes along > without any wait, and then in the future, check the queue to see if there's > something else that can be combined with what you are ready to transmit, and if > so, send it at the same time. Rsyslog implements exactly this algorithm in how > it batches log messages that it's processing, the first message gets the minimum > delay and future messages only get as much delay as is required to keep things > moving. It does mean that the process hits continuous processing sooner (where > there is no delay between finishing working on one batch and starting work on > the next), but that 'always busy' point is not peak throughput, as batches are > more efficient to process, the throughput keeps climbing while the latency > climbs at a much lower rate (eventually you do hit a solid wall where larger > batches don't help, so you just fall behind until traffic slows) > > > >> > >> Two packets in flight per RTT would make sense as a minimum, but two packets > >> worth of buffering on N devices in the path doesn't. > >> > >> using the example of a 6ms RTT. Depending on the equipment involved, this > >> could have from one to several devices handling the packets between the > >> source and the destination. Saying that each device in the path must have > >> two packets worth of buffering doesn't make sense. At a given line speed and > >> data rate, you will have X packets in flight. the number of devices between > >> the source and the destination will not change X. > > > > Flight. > > > >> If the requirement is that there are always at least two packets in flight > >> in a RTT, it doesn't then follow that both packets are going to be in the > >> buffer of the same device at the same time. I spoke with a vendor promising > >> 7ms Los Angeles to Los Vegas. For the vast majority of that 7ms the packets > >> are not in the buffers of the routers, but exist only as light in the fiber > >> (I guess you could view the fiber acting as a buffer in such conditions) > >> > >> where is the disconnect between my understanding and what Bob is talking > >> about? > > > > Flight, not buffering. Redefining the goal of an aqm to keep packets > > in flight rather than achieve a fixed queuing delay is what this is > > about, and improving tcps to also keep packets in flight with > > subpacket windows is part of his answer. > > > > I like getting away from a target constant for delay (why 5ms when 5us > > is doable) and this is an interesting way to think about it from both > > ends. > > I agree, the idea of trying to maintain a fixed buffer delay is not what we're > trying to do, we're trying for the minimum amount of uneccessary buffer delay. > The 'target' numbers are just the point where we say the delay is so bad that > the traffic must be slowed. > > > And I was nattering about how I didn't like delayed acks just a few hours ago. > > what we need is a TCP stack that can combine acks that arrive separately over > time and only send one. > > David Lang --001a114d80fad9ee78051856d95d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
> [Cake] lower bounds for latency
>= =C2=A0
> David Lang david at lang.h= m=C2=A0
> Fri Jun 5 20:08:54 PDT 2015
> Previ= ous message: [Cake] lower bounds for latency
> Next message: [= Cake] lower bounds for latency
> Messages sorted by: [ date ] = [ thread ] [ subject ] [ author ]
> On Fri, 5 Jun 2015, Dave T= aht wrote:
>=C2=A0
> > On Fri, Jun 5, 2015 at = 6:02 PM, David Lang <david at lang.hm>= wrote:
> >> On Fri, 5 Jun 2015, Dave Taht wrote:
<= div>> >>
> >>> bob's been up to good stu= ff lately..
> >>>
> >>
&= gt; >>
> >> one thing that looks wrong to me. He t= alks about how TCP implementations
> >> cannot operate a= t less than two packets per RTT. It's not clear what he
> = >> means. Does he mean two packets in flight per RTT? or two packets = worth of
> >> buffering per RTT?
> >
> > Flight.
>=C2=A0
> In that case I= think his analysis of the effects of AQM are incorrect because=C2=A0
=
> they only limit the buffer size on one device, not the number of = packets in=C2=A0
> flight.

I think wh= at he was getting after is if you have a low provisioned bandwidth and a re= ally low ping, TCP effectively has a lower bound on how slow it will go. Sa= y you have a 1ms ping to a CDN in your ISP's network, but your ISP only= provisioned you 1Mb of bandwidth. Since all current TCP implementations re= quire at least 2 segments in flight, this puts a lower bound on how slowly = TCP will send those segments. (1500*2)/1ms =3D 12Mb/s. Even though you have= a 1Mb connection, TCP will flood your connection with 12Mb and will not ba= ck off, even with packetloss signalling to back off.
The example = in the paper is a lot more realistic than my extreme example. Several strea= ms with a 6ms ping, putting a lower bound of (1500*2)/6ms =3D 4Mb/s per str= eam.

>=C2=A0
>=C2=A0
>= ; He uses the example of 12 connections totaling 40Mb with a 6ms RTT. What = if the=C2=A0
> systems are in the same rack and have <1ms R= TT? according to him TCP just won't=C2=A0
> work.

If the system's are in the same rack, the choke points= are probably somewhere other than the backplane of the switch, like the CP= U, harddrive, or possibly even the port. In datacenters, it's relativel= y easy to throw more bandwidth at the problem.

>= ;=C2=A0
> > I also disagree with this statement:
= > >
> > "It is always wrong to send smaller pack= ets more
> > often, because the constraint may be packet pr= o-
> > cessing, not bits."
> >
> > because I believe (but would have to go look at some data to m= ake
> > sure) that we're good with packet sizes down in= to the 300 byte range
> > on most hardware, and thus could = (and SHOULD) also reduce the mss in
> > these cases to keep= the "signal strength" up at a sustainable levels.
>= >
> > I do recall that bittorrent used to reduce the ms= s and were asked to
> > stop (a decade ago), when they got = as far down as 600 bytes or so.
> >
> > but= the rest I quite liked.
>=C2=A0
> I think it dep= ends on the speed of the link. At the very low end (cheap home=C2=A0
<= div>> routers) and the very high end (multiple links >10Gb/s) the ava= ilable processing=C2=A0
> per packet may be a limit. But in th= e huge middle ground between these extremes,=C2=A0
> you have = quite a bit of cpu time per packet. The 3800 at 100Mb is an example of=C2= =A0
> running into this limit, but at 10Mb it has no problem. = The WRT1200 with it's=C2=A0
> dual-core GHz+ cpu has a lot= of processor available for a bit more money. From=C2=A0
> the= re up until the large datacenter/ISP cores with multiple 10Gb ports to=C2= =A0
> manage, you hae pleanty of cpu.
>=C2=A0
> The other issue is the length of the 'dead air' between= packets. The current=C2=A0
> standards have this being a fixe= d amount of time. combined with the 'wasted'=C2=A0
> p= acket header data results in the same amount of data using less total time = if=C2=A0
> it's sent in fewer, larger packets than in smal= ler packets. when you are pushing=C2=A0
> the limits of the wi= re, this can make a difference.
>=C2=A0
> This is= why wifi tries to combine multiple packets into one transmission.
>=C2=A0
> <rant>
> We just need to bre= ak people of the mindset that it makes sense to hold off on=C2=A0
> transmitting something "just in case" something more comes = along that could be=C2=A0
> combiend with it.
>= =C2=A0
> Instead they need to start off tranmitting the first = one that comes along=C2=A0
> without any wait, and then in the= future, check the queue to see if there's=C2=A0
> somethi= ng else that can be combined with what you are ready to transmit, and if=C2= =A0
> so, send it at the same time. Rsyslog implements exactly= this algorithm in how=C2=A0
> it batches log messages that it= 's processing, the first message gets the minimum=C2=A0
> = delay and future messages only get as much delay as is required to keep thi= ngs=C2=A0
> moving. It does mean that the process hits continu= ous processing sooner (where=C2=A0
> there is no delay between= finishing working on one batch and starting work on=C2=A0
> t= he next), but that 'always busy' point is not peak throughput, as b= atches are=C2=A0
> more efficient to process, the throughput k= eeps climbing while the latency=C2=A0
> climbs at a much lower= rate (eventually you do hit a solid wall where larger=C2=A0
>= batches don't help, so you just fall behind until traffic slows)
=
> </rant>
>=C2=A0
> >>
<= div>> >> Two packets in flight per RTT would make sense as a minim= um, but two packets
> >> worth of buffering on N devices= in the path doesn't.
> >>
> >> u= sing the example of a 6ms RTT. Depending on the equipment involved, this
> >> could have from one to several devices handling the p= ackets between the
> >> source and the destination. Sayi= ng that each device in the path must have
> >> two packe= ts worth of buffering doesn't make sense. At a given line speed and
> >> data rate, you will have X packets in flight. the numb= er of devices between
> >> the source and the destinatio= n will not change X.
> >
> > Flight.
<= div>> >
> >> If the requirement is that there are = always at least two packets in flight
> >> in a RTT, it = doesn't then follow that both packets are going to be in the
= > >> buffer of the same device at the same time. I spoke with a ve= ndor promising
> >> 7ms Los Angeles to Los Vegas. For th= e vast majority of that 7ms the packets
> >> are not in = the buffers of the routers, but exist only as light in the fiber
= > >> (I guess you could view the fiber acting as a buffer in such = conditions)
> >>
> >> where is the di= sconnect between my understanding and what Bob is talking
> &g= t;> about?
> >
> > Flight, not buffering= . Redefining the goal of an aqm to keep packets
> > in flig= ht rather than achieve a fixed queuing delay is what this is
>= > about, and improving tcps to also keep packets in flight with
> > subpacket windows is part of his answer.
> >
> > I like getting away from a target constant for delay (wh= y 5ms when 5us
> > is doable) and this is an interesting wa= y to think about it from both
> > ends.
>=C2= =A0
> I agree, the idea of trying to maintain a fixed buffer d= elay is not what we're=C2=A0
> trying to do, we're try= ing for the minimum amount of uneccessary buffer delay.=C2=A0
>= ; The 'target' numbers are just the point where we say the delay is= so bad that=C2=A0
> the traffic must be slowed.
>= ;=C2=A0
> > And I was nattering about how I didn't like= delayed acks just a few hours ago.
>=C2=A0
> wha= t we need is a TCP stack that can combine acks that arrive separately over= =C2=A0
> time and only send one.
>=C2=A0
> David Lang
--001a114d80fad9ee78051856d95d--