From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp113.iad3a.emailsrvr.com (smtp113.iad3a.emailsrvr.com [173.203.187.113]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 6A55C21F2E2 for ; Mon, 26 Jan 2015 16:12:16 -0800 (PST) Received: from smtp23.relay.iad3a.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp23.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id AB07C28023C; Mon, 26 Jan 2015 19:12:15 -0500 (EST) Received: from app24.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by smtp23.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 8926B280213; Mon, 26 Jan 2015 19:12:15 -0500 (EST) X-Sender-Id: dpreed@reed.com Received: from app24.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by 0.0.0.0:25 (trex/5.4.2); Tue, 27 Jan 2015 00:12:15 GMT Received: from reed.com (localhost.localdomain [127.0.0.1]) by app24.wa-webapps.iad3a (Postfix) with ESMTP id 744948003E; Mon, 26 Jan 2015 19:12:15 -0500 (EST) Received: by apps.rackspace.com (Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) with HTTP; Mon, 26 Jan 2015 19:12:15 -0500 (EST) Date: Mon, 26 Jan 2015 19:12:15 -0500 (EST) From: dpreed@reed.com To: "Dave Taht" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20150126191215000000_81962" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: References: <54B5D28A.3010906@gmail.com> <7B1EA8F0-FCB6-4A37-950F-2558FC751DE8@gmail.com> <54C038D0.1000305@gmail.com> <54C0BD22.3000608@gmail.com> <54C13F47.1010203@gmail.com> <1422111577.328132080@apps.rackspace.com> <1422217048.025611275@apps.rackspace.com> <1422237076.005718796@apps.rackspace.com> <1422242279.46066942@apps.rackspace.com> X-Auth-ID: dpreed@reed.com Message-ID: <1422317535.474322223@apps.rackspace.com> X-Mailer: webmail/11.3.10-RC Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] =?utf-8?q?Recording_RF_management_info_=5Fand=5F_?= =?utf-8?q?associated_traffic=3F?= X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jan 2015 00:12:46 -0000 ------=_20150126191215000000_81962 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0AWell, we all may want to agree to disagree. I don't buy the argument th= at hash tables are slow compared to the TCAMs - and even if cache misses ha= ppened, a hash table is still o(1) - you look at exactly one memory address= on the average in a hash table - that's the point of it. The constant fac= tor is the speed of memory - not terribly slow by any means.=0A =0ATo get i= nto this deeper would require actual measurements, of which I am a great fa= n. But your handwaves are pretty unquantitative, Dave, so at best they are= similar to mine. I'm very measurement focused, being part hardware archit= ecture guy.=0A =0ADavid - my comment about HP doing layer 3 switching in TC= AMs just was there to point out that there's nothing magic about layer 2. = I was not suggesting that they don't use proprietary binary blobs, because = they do. But so do the TCAM programs in layer 2 devices.=0A =0ADave - you = are conflating the implementation technique of the routing algorithm when y= ou focus on "prefix matching" as being hard to do. It's not hard to invent= a performant algorithm to do that combined with a hash table. A simple wa= y to do that is to treat the address one is looking up as several addresses= (of shorter prefixes of the address). Then look each one up separately by= its hash. Its still o(1) if you do that, just a larger constant factor. I= assume you don't actually think it is optimal to do linear searches on the= routing table like hosts sometimes do. Linear search is not necessary.=0A= =0AThere is literally nothing magical about looking up 48-bit random Ether= net addresses in a LAN.=0A =0AAs far as NAT'ing is concerned - that is done= by the gateways. It's possible in principle to create a distributed NAT f= ace to an Enterprise - if you do so, then roaming within the enterprise jus= t amounts to telling the NAT face about the new internal IP address that co= rresponds to the old one - an update of one address translation with anothe= r.=0A =0AThis is how phones roam, by the way. They update their location vi= a an HLR as they roam.=0A =0A=0A=0AOn Sunday, January 25, 2015 10:45pm, "Da= ve Taht" said:=0A=0A=0A=0A> On Sun, Jan 25, 2015 at 7= :17 PM, wrote:=0A> > Looking up an address in a routing t= able is o(1) if the routing table is a=0A> > hash table. That's much more e= fficient than a TCAM. My simple example just=0A> > requires a delete/insert= at each node's route lookup table.=0A> =0A> Regrettably it is not O(1) onc= e you take into account the cpu cache hierarchy,=0A> or the potential colli= sions you will have once you shrink the hash to=0A> something reasonable.= =0A> =0A> Also I think you are ignoring the problem of covering routes. Say= I have to=0A> get something to a.b.c.z/32. I do a lookup of that and find = nothing. I then=0A> look to find a.b.c.z/31 and find nothing, then /30, the= n /29, /28, until I find=0A> a hit for the next hop. Now you can of course = do a binary search for likely=0A> subprefixes, but in any case, the search = is not O(1).=0A> =0A> In terms of cache efficient data structures, a straig= ht hash is not the way=0A> to go, of late I have been trying to wrap my hea= d around the hat-trie as=0A> possibly being useful in these circumstances.= =0A> =0A> Now, if you think about limiting the domain of the problem to som= ething=0A> greater than the typical mac table, but less than the whole inte= rnet,=0A> it starts looking more reasonable to have a 1x1 ratio of destinat= ion=0A> IPs to hash table entries for lookups, but updates have to probe/ch= ange=0A> large segments of the table in order to deal with covering prefixe= s.=0A> =0A> > My point was about collections of WLAN's bridged together. Lo= ok at what=0A> > happens (at the packet/radio layer) when a new node joins = a bridged set of=0A> > WLANs using STP. It is not exactly simple to rebuild= the Ethernet layer's=0A> > bridge routing tables in a complex network. And= the limit of 4096 entries=0A> > in many inexpensive switches is not a triv= ial limit.=0A> =0A> Agreed. But see http://en.wikipedia.org/wiki/Virtual_Ex= tensible_LAN=0A> =0A> >=0A> >=0A> >=0A> > Routers used to be memory-starved= (a small number of KB of RAM was the=0A> > norm). Perhaps the thinking the= n (back before 2000) has not been revised,=0A> > even though the hardware i= s a lot more capacious.=0A> =0A> The profit margins have not been revised.= =0A> =0A> I would not mind, incidentally expanding the scope of the fqswitc= h project ot=0A> try to build something that would scale up at l3 farther t= han we've ever seen=0A> before, however funding for needed gear like:=0A> = =0A> http://www.eetimes.com/document.asp?doc_id=3D1321334=0A> =0A> and time= , and fpga expertise, is lacking. I am currently distracted by=0A> evaluati= ng=0A> a very cool new cpu architecture ( see=0A> http://www.millcomputing.= com/wiki/Memory )=0A> and even as nifty as that is I foresee a need for a l= ot of dedicated packet=0A> processing logic and memories to get into the 40= GBit+ range.=0A> >=0A> >=0A> > Remember, the Ethernet layer in WLANs is imp= lemented by microcontrollers,=0A> > typically not very capable ones, plus T= CAMs which are pretty limited in=0A> > their flexibility.=0A> =0A> I do ten= d to think that the next era of SDN enabled hardware will eventually=0A> le= ad to more innovation in both the control and data plane - however it=0A> s= eems we are still in a "me-too" phase=0A> of development of openvswitch (bt= w: there is a new software switch for=0A> linux called rocker we should loo= k at, and make sure runs fq_codel), and=0A> a long way from flexibly progra= mmable switch hardware in general.=0A> =0A> http://openvswitch.org/pipermai= l/dev/2014-September/045084.html=0A> >=0A> >=0A> >=0A> > While it is tempti= ng to use the "pre-packaged, proprietary" Ethernet switch=0A> > functionali= ty, routing gets you out of the binary blobs, and let's you be a=0A> > lot = smarter and more scalable. Given that it does NOT cost more to do=0A> > rou= ting at the IP layer, building complex Ethernet bridging is not obviously= =0A> > a win.=0A> =0A> SDN is certainly a way out of this mess. Eventually.= But I fear we are making=0A> all the same mistakes over again, and making = slower hardware, where in the=0A> end, it needs to be faster, to win.=0A> = =0A> >=0A> >=0A> > BTW, TCAMs are used in IP layer switching, too, and also= are used in packet=0A> > filtering. Maybe not in cheap consumer switches, = but lots of Gigabit=0A> > switches implement IP layer switching and filteri= ng. At HP, their switches=0A> > routinely did all their IP layer switching = entirely in TCAMs.=0A> =0A> Yep. I really wish big, fat TCAMS were standard= equipment.=0A> =0A> >=0A> >=0A> > On Sunday, January 25, 2015 9:58pm, "Dav= e Taht" =0A> said:=0A> >=0A> >> On Sun, Jan 25, 2015 a= t 6:43 PM, David Lang wrote:=0A> >> > On Sun, 25 Jan 2015, = Dave Taht wrote:=0A> >> >=0A> >> >> To your roaming point, yes this is cert= ainly one place where=0A> migrating=0A> >> >> bridged vms across machines b= reaks down, and yet more and more=0A> vm=0A> >> >> layers are doing it. I w= ould certainly prefer routing in this=0A> case.=0A> >> >=0A> >> >=0A> >> > = What's the difference between "roaming" and moving a VM from one=0A> place= =0A> >> > in=0A> >> > the network to another?=0A> >>=0A> >> I think most pe= ople think of "roaming" as moving fairly rapidly from one=0A> >> piece of e= dge connectivity to another, and moving a vm is a great deal=0A> >> more=0A= > >> permanent operation.=0A> >>=0A> >> > As far as layer 2 vs layer 3 goes= . If you try to operate at layer 3,=0A> you=0A> >> > are=0A> >> > going to = have quite a bit of smarts in the endpoint. Even if it's=0A> only=0A> >> > = connected vi a single link. If you think about it, even if your=0A> network= =0A> >> > routing tables list every machine in our environment individually= ,=0A> you=0A> >> > still=0A> >> > have a problem of what gateway the endpoi= nt uses. It would have to=0A> >> > change=0A> >> > every time it moved. Sin= ce DHCP doesn't update frequently enough to=0A> be=0A> >> > transparent, yo= u would need to have each endpoint running a routing=0A> >> > protocol.=0A>= >>=0A> >> Hmm? I don't ever use a dhcp-supplied default gateway, I depend = on the=0A> >> routing=0A> >> protocol to supply that. In terms of each vm r= unning a routing protocol,=0A> >> well, no, I would rely on the underlying = bare metal OS to be doing=0A> >> that, supplying=0A> >> the FIB tables to t= he overlying vms, if they need it, but otherwise the=0A> >> vms=0A> >> just= see a "default" route and don't bother with it. They do need to=0A> >> inf= orm the=0A> >> bare metal OS (better term for this please? hypervisor?) of = what IPs=0A> they=0A> >> own.=0A> >>=0A> >> static default gateways are evi= l. and easily disabled. in linux you=0A> >> merely comment=0A> >> out the "= routers" in /etc/dhcp/dhclient.conf, in openwrt, set=0A> >> "defaultroute 0= " for the=0A> >> interface fetching dhcp.=0A> >>=0A> >> When a box migrates= , it tells the hypervisor it's addresses, and then=0A> that=0A> >> box=0A> = >> propagates out the route change to elsewhere.=0A> >>=0A> >> >=0A> >> > T= his can work for individual hobbiests, but not when you need to=0A> support= =0A> >> > random devices (how would you configure an iPhone to support this= ?)=0A> >>=0A> >> Carefully. :)=0A> >>=0A> >> I do note that this stuff does= (or at least did) work on some of the=0A> open=0A> >> source variants of a= ndroid. I would rather like it if android added ipv6=0A> >> tethering soon,= and made it possible to mesh together multiple phones.=0A> >>=0A> >> >=0A>= >> >=0A> >> > Letting the layer 2 equipment deal with the traffic within t= he=0A> building=0A> >> > and=0A> >> > invoking layer 3 to go outside the bu= ilding (or to a different=0A> security=0A> >> > domain) makes a lot of sens= e. Even if that means that layer 2 within=0A> a=0A> >> > building looks ver= y similar to what layer 3 used to look like around=0A> a=0A> >> > city.=0A>= >>=0A> >> Be careful what you wish for.=0A> >>=0A> >> >=0A> >> >=0A> >> > = back to the topic of wifi, I'm not aware of any APs that participate=0A> in= =0A> >> > the=0A> >> > switch protocols at this level. I also don't know of= any reasonably=0A> >> > priced=0A> >> > switches that can do anything smar= ter than plain spanning tree when=0A> >> > connected through multiple paths= (I'd love to learn otherwise)=0A> >> >=0A> >> > David Lang=0A> >>=0A> >>= =0A> >>=0A> >> --=0A> >> Dave T=C3=A4ht=0A> >>=0A> >> thttp://www.bufferblo= at.net/projects/bloat/wiki/Upcoming_Talks=0A> >>=0A> =0A> =0A> =0A> --=0A> = Dave T=C3=A4ht=0A> =0A> thttp://www.bufferbloat.net/projects/bloat/wiki/Upc= oming_Talks=0A> ------=_20150126191215000000_81962 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Well, we all may want to= agree to disagree.  I don't buy the argument that hash tables are slo= w compared to the TCAMs - and even if cache misses happened, a hash table i= s still o(1) - you look at exactly one memory address on the average in a h= ash table - that's the point of it.  The constant factor is the speed = of memory - not terribly slow by any means.

=0A

 <= /p>=0A

To get into this deeper would require actual measur= ements, of which I am a great fan.  But your handwaves are pretty unqu= antitative, Dave, so at best they are similar to mine.  I'm very measu= rement focused, being part hardware architecture guy.

=0A

 

=0A

David - my comment about HP doing layer 3= switching in TCAMs just was there to point out that there's nothing magic = about layer 2.  I was not suggesting that they don't use proprietary b= inary blobs, because they do.  But so do the TCAM programs in layer 2 = devices.

=0A

 

=0A

Dave - you = are conflating the implementation technique of the routing algorithm when y= ou focus on "prefix matching" as being hard to do.  It's not hard to i= nvent a performant algorithm to do that combined with a hash table.  A= simple way to do that is to treat the address one is looking up as several= addresses (of shorter prefixes of the address).  Then look each one u= p separately by its hash.  Its still o(1) if you do that, just a large= r constant factor. I assume you don't actually think it is optimal to = do linear searches on the routing table like hosts sometimes do.  Line= ar search is not necessary.

=0A

 

=0A

There is literally nothing magical about looking up 48-bit random= Ethernet addresses in a LAN.

=0A

 

=0A

As far as NAT'ing is concerned - that is done by the gateways. &n= bsp;It's possible in principle to create a distributed NAT face to an Enter= prise - if you do so, then roaming within the enterprise just amounts to te= lling the NAT face about the new internal IP address that corresponds to th= e old one - an update of one address translation with another.

=0A

 

=0A

This is how phones roam, by the = way. They update their location via an HLR as they roam.

=0A

 

=0A=0A



On Sunday, January 25, 2015 10:45p= m, "Dave Taht" <dave.taht@gmail.com> said:

=0A
=0A

> On Sun, Jan 25, 2015 at= 7:17 PM, <dpreed@reed.com> wrote:
> > Looking up an addre= ss in a routing table is o(1) if the routing table is a
> > hash= table. That's much more efficient than a TCAM. My simple example just
> > requires a delete/insert at each node's route lookup table.
>
> Regrettably it is not O(1) once you take into account the = cpu cache hierarchy,
> or the potential collisions you will have on= ce you shrink the hash to
> something reasonable.
>
&= gt; Also I think you are ignoring the problem of covering routes. Say I hav= e to
> get something to a.b.c.z/32. I do a lookup of that and find = nothing. I then
> look to find a.b.c.z/31 and find nothing, then /3= 0, then /29, /28, until I find
> a hit for the next hop. Now you ca= n of course do a binary search for likely
> subprefixes, but in any= case, the search is not O(1).
>
> In terms of cache effic= ient data structures, a straight hash is not the way
> to go, of la= te I have been trying to wrap my head around the hat-trie as
> poss= ibly being useful in these circumstances.
>
> Now, if you = think about limiting the domain of the problem to something
> great= er than the typical mac table, but less than the whole internet,
> = it starts looking more reasonable to have a 1x1 ratio of destination
&= gt; IPs to hash table entries for lookups, but updates have to probe/change=
> large segments of the table in order to deal with covering prefi= xes.
>
> > My point was about collections of WLAN's bri= dged together. Look at what
> > happens (at the packet/radio lay= er) when a new node joins a bridged set of
> > WLANs using STP. = It is not exactly simple to rebuild the Ethernet layer's
> > bri= dge routing tables in a complex network. And the limit of 4096 entries
> > in many inexpensive switches is not a trivial limit.
> <= br />> Agreed. But see http://en.wikipedia.org/wiki/Virtual_Extensible_L= AN
>
> >
> >
> >
> > R= outers used to be memory-starved (a small number of KB of RAM was the
= > > norm). Perhaps the thinking then (back before 2000) has not been = revised,
> > even though the hardware is a lot more capacious.>
> The profit margins have not been revised.
> > I would not mind, incidentally expanding the scope of the fqswitch = project ot
> try to build something that would scale up at l3 farth= er than we've ever seen
> before, however funding for needed gear l= ike:
>
> http://www.eetimes.com/document.asp?doc_id=3D1321= 334
>
> and time, and fpga expertise, is lacking. I am cur= rently distracted by
> evaluating
> a very cool new cpu arc= hitecture ( see
> http://www.millcomputing.com/wiki/Memory )
&= gt; and even as nifty as that is I foresee a need for a lot of dedicated pa= cket
> processing logic and memories to get into the 40GBit+ range.=
> >
> >
> > Remember, the Ethernet layer = in WLANs is implemented by microcontrollers,
> > typically not v= ery capable ones, plus TCAMs which are pretty limited in
> > the= ir flexibility.
>
> I do tend to think that the next era o= f SDN enabled hardware will eventually
> lead to more innovation in= both the control and data plane - however it
> seems we are still = in a "me-too" phase
> of development of openvswitch (btw: there is = a new software switch for
> linux called rocker we should look at, = and make sure runs fq_codel), and
> a long way from flexibly progra= mmable switch hardware in general.
>
> http://openvswitch.= org/pipermail/dev/2014-September/045084.html
> >
> ><= br />> >
> > While it is tempting to use the "pre-packaged= , proprietary" Ethernet switch
> > functionality, routing gets y= ou out of the binary blobs, and let's you be a
> > lot smarter a= nd more scalable. Given that it does NOT cost more to do
> > rou= ting at the IP layer, building complex Ethernet bridging is not obviously> > a win.
>
> SDN is certainly a way out of thi= s mess. Eventually. But I fear we are making
> all the same mistake= s over again, and making slower hardware, where in the
> end, it ne= eds to be faster, to win.
>
> >
> >
>= ; > BTW, TCAMs are used in IP layer switching, too, and also are used in= packet
> > filtering. Maybe not in cheap consumer switches, but= lots of Gigabit
> > switches implement IP layer switching and f= iltering. At HP, their switches
> > routinely did all their IP l= ayer switching entirely in TCAMs.
>
> Yep. I really wish b= ig, fat TCAMS were standard equipment.
>
> >
> = >
> > On Sunday, January 25, 2015 9:58pm, "Dave Taht" <dav= e.taht@gmail.com>
> said:
> >
> >> On S= un, Jan 25, 2015 at 6:43 PM, David Lang <david@lang.hm> wrote:
&= gt; >> > On Sun, 25 Jan 2015, Dave Taht wrote:
> >> = >
> >> >> To your roaming point, yes this is certain= ly one place where
> migrating
> >> >> bridged = vms across machines breaks down, and yet more and more
> vm
&g= t; >> >> layers are doing it. I would certainly prefer routing = in this
> case.
> >> >
> >> >
> >> > What's the difference between "roaming" and moving a V= M from one
> place
> >> > in
> >> &g= t; the network to another?
> >>
> >> I think mo= st people think of "roaming" as moving fairly rapidly from one
> &g= t;> piece of edge connectivity to another, and moving a vm is a great de= al
> >> more
> >> permanent operation.
>= ; >>
> >> > As far as layer 2 vs layer 3 goes. If yo= u try to operate at layer 3,
> you
> >> > are
> >> > going to have quite a bit of smarts in the endpoint. Ev= en if it's
> only
> >> > connected vi a single lin= k. If you think about it, even if your
> network
> >>= > routing tables list every machine in our environment individually,> you
> >> > still
> >> > have a pr= oblem of what gateway the endpoint uses. It would have to
> >>= ; > change
> >> > every time it moved. Since DHCP doesn= 't update frequently enough to
> be
> >> > transpa= rent, you would need to have each endpoint running a routing
> >= > > protocol.
> >>
> >> Hmm? I don't ever= use a dhcp-supplied default gateway, I depend on the
> >> ro= uting
> >> protocol to supply that. In terms of each vm runni= ng a routing protocol,
> >> well, no, I would rely on the und= erlying bare metal OS to be doing
> >> that, supplying
&= gt; >> the FIB tables to the overlying vms, if they need it, but othe= rwise the
> >> vms
> >> just see a "default" ro= ute and don't bother with it. They do need to
> >> inform the=
> >> bare metal OS (better term for this please? hypervisor?= ) of what IPs
> they
> >> own.
> >>
> >> static default gateways are evil. and easily disabled. in l= inux you
> >> merely comment
> >> out the "rout= ers" in /etc/dhcp/dhclient.conf, in openwrt, set
> >> "defaul= troute 0" for the
> >> interface fetching dhcp.
> >= ;>
> >> When a box migrates, it tells the hypervisor it's = addresses, and then
> that
> >> box
> >>= ; propagates out the route change to elsewhere.
> >>
>= ; >> >
> >> > This can work for individual hobbie= sts, but not when you need to
> support
> >> > ran= dom devices (how would you configure an iPhone to support this?)
> = >>
> >> Carefully. :)
> >>
> >= > I do note that this stuff does (or at least did) work on some of the> open
> >> source variants of android. I would rather= like it if android added ipv6
> >> tethering soon, and made = it possible to mesh together multiple phones.
> >>
> = >> >
> >> >
> >> > Letting the l= ayer 2 equipment deal with the traffic within the
> building
&= gt; >> > and
> >> > invoking layer 3 to go outsid= e the building (or to a different
> security
> >> >= ; domain) makes a lot of sense. Even if that means that layer 2 within
> a
> >> > building looks very similar to what layer 3= used to look like around
> a
> >> > city.
&g= t; >>
> >> Be careful what you wish for.
> >= >
> >> >
> >> >
> >> >= ; back to the topic of wifi, I'm not aware of any APs that participate
> in
> >> > the
> >> > switch protoco= ls at this level. I also don't know of any reasonably
> >> &g= t; priced
> >> > switches that can do anything smarter tha= n plain spanning tree when
> >> > connected through multip= le paths (I'd love to learn otherwise)
> >> >
> &g= t;> > David Lang
> >>
> >>
> >= >
> >> --
> >> Dave T=C3=A4ht
> >= >
> >> thttp://www.bufferbloat.net/projects/bloat/wiki/Upc= oming_Talks
> >>
>
>
>
> -= -
> Dave T=C3=A4ht
>
> thttp://www.bufferbloat.net= /projects/bloat/wiki/Upcoming_Talks
>

=0A
------=_20150126191215000000_81962--