From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp151.iad.emailsrvr.com (smtp151.iad.emailsrvr.com [207.97.245.151]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 389B221F129 for ; Mon, 26 Nov 2012 11:59:00 -0800 (PST) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp25.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id DD6253004FE; Mon, 26 Nov 2012 14:58:58 -0500 (EST) X-Virus-Scanned: OK Received: from legacy8.wa-web.iad1a (legacy8.wa-web.iad1a.rsapps.net [192.168.4.110]) by smtp25.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id B9F293004AA; Mon, 26 Nov 2012 14:58:58 -0500 (EST) Received: from reed.com (localhost.localdomain [127.0.0.1]) by legacy8.wa-web.iad1a (Postfix) with ESMTP id 992AC29D0002; Mon, 26 Nov 2012 14:58:58 -0500 (EST) Received: by apps.rackspace.com (Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) with HTTP; Mon, 26 Nov 2012 14:58:58 -0500 (EST) Date: Mon, 26 Nov 2012 14:58:58 -0500 (EST) From: dpreed@reed.com To: "Michael Richardson" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20121126145858000000_67554" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: <9615.1353953507@obiwan.sandelman.ca> References: <20121125232034.GF24680@merlins.org> <31933.1353939756@obiwan.sandelman.ca> <1353942251.571510886@apps.rackspace.com> <13866.1353944313@obiwan.sandelman.ca> <1353947863.437620265@apps.rackspace.com> <9615.1353953507@obiwan.sandelman.ca> Message-ID: <1353959938.625616504@apps.rackspace.com> X-Mailer: webmail7.0 Cc: cerowrt-users@lists.bufferbloat.net, cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] [Cerowrt-users] QOS settings vs speedboost and random bandwidth X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Nov 2012 19:59:00 -0000 ------=_20121126145858000000_67554 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0AHi Michael -=0A =0AIn specific, what my code did was this:=0A =0AIt obse= rved the IPv4 headers of *large* TCP/IP datagrams going upstream, so that i= t could construct "no-op" "content-free" datagrams that would certainly pas= s muster through all the filters and be routed exactly the same as the TCP/= IP datagrams that were carrying large flows. It would remember only the m= ost recent one.=0A =0AEvery K bytes of upstream traffic (K chosen so that t= he overhead [=3D minimal TCP/IP datagram divided by K] is a tiny percentage= ) it would construct a NO-OP TCP/IP datagram that appears to be part of tha= t flow (same source/dest addr/port info, and just for grins, a duplicate se= quence number and no content bytes at all), and set its TTL to make it time= out very close to the "other side" of the CMTS, and queue it normally.=0A = =0AThe TTL expiration causes an ICMP packet to be sent back. My code inte= rcepts that packet based on its contents, and removes it as "handled" befor= e it gets processed by the TCP/IP state machines.=0A =0AThe time between th= e queueing of the TCP/IP NO-OP and the return of the ICMP packet is a direc= t measure of the queueing delays through the cable modem and CMTS. When th= is grows by around "1 full datagram" from its minimum, the upload queue is = becoming congested, and it's time to stop sending content for a bit. Immed= iately when content is held on the egress link into the cable modem from th= e router, we send another NO-OP with the short TTL, and as soon as its ICMP= comes back, you know the queue in the CMTS is drained, so you can resume = sending into an empty CMTS, at a lower rate (you've just gotten a good esti= mate of the rate that you should reduce to, if you've been keeping track of= how many bytes are flowing over the egress link.)=0A =0ASymmetrically, you= can periodically (less frequently) experiment with a possible rate *increa= se* by sending a small NO-OP packet immediately followed by a large/maximal= sized NO-OP datagram, and using the "packet pair" concept to determine the= bottleneck rate by measuring the time between ICMP responses. The time be= tween the ICMP responses is an estimator of the achievable peak rate throug= h the upstream path.=0A =0AThis assumes that the downstream (incoming) path= is uncongested. But you can elaborate this scheme further.=0A =0AThe goa= l of the "tcptraceroute" method is to get a "loopback" that follows the sam= e path as an existing TCP connection, in order to get timing right.=0A =0AI= f options exist to get intermediate timestamps on a route, you can also use= similar techniques under TCP with the "NO-OP" datagram technique.=0A =0A--= ---Original Message-----=0AFrom: "Michael Richardson" =0A= Sent: Monday, November 26, 2012 1:11pm=0ATo: dpreed@reed.com=0ACc: cerowrt-= users@lists.bufferbloat.net, cerowrt-devel@lists.bufferbloat.net=0ASubject:= Re: [Cerowrt-devel] [Cerowrt-users] QOS settings vs speedboost and random = bandwidth=0A=0A=0A=0A>>>>> "dpreed" =3D=3D dpreed writes= :=0A dpreed> But I've thought about coding it again for cerowrt. Where=0A = dpreed> to modularly slot it in seems to be worth thinking about.=0A dpreed= > Perhaps in two key pieces: an iptables/xfilter module and a=0A dpreed> ro= uting/traffic control module - with some direct=0A dpreed> interaction betw= een the two using some appropriate=0A dpreed> intermodule bus/link/coordina= tion link. =0A=0ASo an uplink bitrate value with an easy to reach sysctl th= at=0Auserspace can toggle? It would be an enhancement to existing tc/qos c= ode.=0A=0A dpreed> I'd be happy to think about defining the pieces, but I= =0A dpreed> really don't have time to code it, given all the other stuff=0A= dpreed> I've done. I wonder if by putting it in these modules, one=0A dpr= eed> can use existing kernel APIs. =0A=0AHow precise timing do you think we= need?=0A=0AAs I understand what you are saying, by periodically sending a = few ICMP=0Amessages (does it help if they are back to back?) and looking wh= en they =0Aare returned, one can calculate the uplink bandwidth?=0A=0AOr ar= e you saying that we are measuring the point in uplink usage where=0Athe la= tency begins to peak?=0A=0A-- =0A] He who is tired of Weird Al is tir= ed of life! | firewalls [=0A] Michael Richardson, Sandelman S= oftware Works, Ottawa, ON |net architect[=0A] mcr@sandelman.ottawa.on.ca= http://www.sandelman.ottawa.on.ca/ |device driver[=0A Kyoto Plus: watch th= e video =0A then sign the pet= ition. ------=_20121126145858000000_67554 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

= Hi Michael -

=0A

 

=0A

In specific, what my code did was this:

=0A

 

=0A

= It observed the IPv4 headers of *large* TCP/IP datagrams going upstream, so= that it could construct "no-op" "content-free" datagrams that would certai= nly pass muster through all the filters and be routed exactly the same as t= he TCP/IP datagrams  that were carrying large flows.  It would re= member only the most recent one.

=0A

&nb= sp;

=0A

Every K bytes of upstream traffi= c (K chosen so that the overhead [=3D minimal TCP/IP datagram divided by K]= is a tiny percentage) it would construct a NO-OP TCP/IP datagram that appe= ars to be part of that flow (same source/dest addr/port info, and just for = grins, a duplicate sequence number and no content bytes at all), and set it= s TTL to make it time out very close to the "other side" of the CMTS, and q= ueue it normally.

=0A

 

=0A

The TTL expiration causes an ICMP packet to be = sent back.   My code intercepts that packet based on its contents= , and removes it as "handled" before it gets processed by the TCP/IP state = machines.

=0A

 

=0A

The time between the queueing of the TCP/IP NO-OP and t= he return of the ICMP packet is a direct measure of the queueing delays thr= ough the cable modem and CMTS.  When this grows by around "1 full data= gram" from its minimum, the upload queue is becoming congested, and it's ti= me to stop sending content for a bit.  Immediately when content is hel= d on the egress link into the cable modem from the router, we send another = NO-OP with the short TTL, and as soon as its ICMP comes back, you  kno= w the queue in the CMTS is drained, so you can resume sending into an empty= CMTS, at a lower rate (you've just gotten a good estimate of the rate that= you should reduce to, if you've been keeping track of how many bytes are f= lowing over the egress link.)

=0A

 =

=0A

Symmetrically, you can periodically= (less frequently) experiment with a possible rate *increase* by sending a = small NO-OP packet immediately followed by a large/maximal sized NO-OP data= gram, and using the "packet pair" concept to determine the bottleneck rate = by measuring the time between ICMP responses.  The time between the IC= MP responses is an estimator of the achievable peak rate through the upstre= am path.

=0A

 

=0A

This assumes that the downstream (incoming) path is unco= ngested.   But you can elaborate this scheme further.

=0A

 

=0A

T= he goal of the "tcptraceroute" method is to get a "loopback" that follows t= he same path as an existing TCP connection, in order to get timing right.=0A

 

=0A

If options exist to get intermediate timestamps on a route, you ca= n also use similar techniques under TCP with the "NO-OP" datagram technique= .

=0A

 

=0A

-----Original Message-----
From: "Michael Richardson" <= mcr@sandelman.ca>
Sent: Monday, November 26, 2012 1:11pm
To: d= preed@reed.com
Cc: cerowrt-users@lists.bufferbloat.net, cerowrt-devel@= lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] [Cerowrt-users] QOS= settings vs speedboost and random bandwidth

=0A

>>>>> "dpreed" =3D=3D dpreed <= ;dpreed@reed.com> writes:
dpreed> But I've thought about coding= it again for cerowrt. Where
dpreed> to modularly slot it in seem= s to be worth thinking about.
dpreed> Perhaps in two key pieces: a= n iptables/xfilter module and a
dpreed> routing/traffic control mo= dule - with some direct
dpreed> interaction between the two using = some appropriate
dpreed> intermodule bus/link/coordination link. <= br />
So an uplink bitrate value with an easy to reach sysctl that
userspace can toggle? It would be an enhancement to existing tc/qos code= .

dpreed> I'd be happy to think about defining the pieces, b= ut I
dpreed> really don't have time to code it, given all the othe= r stuff
dpreed> I've done. I wonder if by putting it in these mod= ules, one
dpreed> can use existing kernel APIs.

How pr= ecise timing do you think we need?

As I understand what you are = saying, by periodically sending a few ICMP
messages (does it help if t= hey are back to back?) and looking when they
are returned, one can ca= lculate the uplink bandwidth?

Or are you saying that we are meas= uring the point in uplink usage where
the latency begins to peak?

--
] He who is tired of Weird Al is tired of life! = | firewalls [
] Michael Richardson, Sandelman Software Works,= Ottawa, ON |net architect[
] mcr@sandelman.ottawa.on.ca http://www= .sandelman.ottawa.on.ca/ |device driver[
Kyoto Plus: watch the video = <http://www.youtube.com/watch?v=3Dkzx1ycLXQSE>
then sign the pe= tition.
------=_20121126145858000000_67554--