<font face="tahoma" size="2"><p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;">Well, we all may want to agree to disagree. I don't buy the argument that hash tables are slow compared to the TCAMs - and even if cache misses happened, a hash table is still o(1) - you look at exactly one memory address on the average in a hash table - that's the point of it. The constant factor is the speed of memory - not terribly slow by any means.</p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;"> </p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;">To get into this deeper would require actual measurements, of which I am a great fan. But your handwaves are pretty unquantitative, Dave, so at best they are similar to mine. I'm very measurement focused, being part hardware architecture guy.</p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;"> </p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;">David - my comment about HP doing layer 3 switching in TCAMs just was there to point out that there's nothing magic about layer 2. I was not suggesting that they don't use proprietary binary blobs, because they do. But so do the TCAM programs in layer 2 devices.</p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;"> </p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;">Dave - you are conflating the implementation technique of the routing algorithm when you focus on "prefix matching" as being hard to do. It's not hard to invent a performant algorithm to do that combined with a hash table. A simple way to do that is to treat the address one is looking up as several addresses (of shorter prefixes of the address). Then look each one up separately by its hash. Its still o(1) if you do that, just a larger constant factor. I assume you don't actually think it is optimal to do linear searches on the routing table like hosts sometimes do. Linear search is not necessary.</p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;"> </p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;">There is literally nothing magical about looking up 48-bit random Ethernet addresses in a LAN.</p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;"> </p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;">As far as NAT'ing is concerned - that is done by the gateways. It's possible in principle to create a distributed NAT face to an Enterprise - if you do so, then roaming within the enterprise just amounts to telling the NAT face about the new internal IP address that corresponds to the old one - an update of one address translation with another.</p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;"> </p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;">This is how phones roam, by the way. They update their location via an HLR as they roam.</p>
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;"> </p>
<!--WM_COMPOSE_SIGNATURE_START--><!--WM_COMPOSE_SIGNATURE_END-->
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;"><br /><br />On Sunday, January 25, 2015 10:45pm, "Dave Taht" <dave.taht@gmail.com> said:<br /><br /></p>
<div id="SafeStyles1422316592">
<p style="margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wrap: break-word;">> On Sun, Jan 25, 2015 at 7:17 PM, <dpreed@reed.com> wrote:<br />> > Looking up an address in a routing table is o(1) if the routing table is a<br />> > hash table. That's much more efficient than a TCAM. My simple example just<br />> > requires a delete/insert at each node's route lookup table.<br />> <br />> Regrettably it is not O(1) once you take into account the cpu cache hierarchy,<br />> or the potential collisions you will have once you shrink the hash to<br />> something reasonable.<br />> <br />> Also I think you are ignoring the problem of covering routes. Say I have to<br />> get something to a.b.c.z/32. I do a lookup of that and find nothing. I then<br />> look to find a.b.c.z/31 and find nothing, then /30, then /29, /28, until I find<br />> a hit for the next hop. Now you can of course do a binary search for likely<br />> subprefixes, but in any case, the search is not O(1).<br />> <br />> In terms of cache efficient data structures, a straight hash is not the way<br />> to go, of late I have been trying to wrap my head around the hat-trie as<br />> possibly being useful in these circumstances.<br />> <br />> Now, if you think about limiting the domain of the problem to something<br />> greater than the typical mac table, but less than the whole internet,<br />> it starts looking more reasonable to have a 1x1 ratio of destination<br />> IPs to hash table entries for lookups, but updates have to probe/change<br />> large segments of the table in order to deal with covering prefixes.<br />> <br />> > My point was about collections of WLAN's bridged together. Look at what<br />> > happens (at the packet/radio layer) when a new node joins a bridged set of<br />> > WLANs using STP. It is not exactly simple to rebuild the Ethernet layer's<br />> > bridge routing tables in a complex network. And the limit of 4096 entries<br />> > in many inexpensive switches is not a trivial limit.<br />> <br />> Agreed. But see http://en.wikipedia.org/wiki/Virtual_Extensible_LAN<br />> <br />> ><br />> ><br />> ><br />> > Routers used to be memory-starved (a small number of KB of RAM was the<br />> > norm). Perhaps the thinking then (back before 2000) has not been revised,<br />> > even though the hardware is a lot more capacious.<br />> <br />> The profit margins have not been revised.<br />> <br />> I would not mind, incidentally expanding the scope of the fqswitch project ot<br />> try to build something that would scale up at l3 farther than we've ever seen<br />> before, however funding for needed gear like:<br />> <br />> http://www.eetimes.com/document.asp?doc_id=1321334<br />> <br />> and time, and fpga expertise, is lacking. I am currently distracted by<br />> evaluating<br />> a very cool new cpu architecture ( see<br />> http://www.millcomputing.com/wiki/Memory )<br />> and even as nifty as that is I foresee a need for a lot of dedicated packet<br />> processing logic and memories to get into the 40GBit+ range.<br />> ><br />> ><br />> > Remember, the Ethernet layer in WLANs is implemented by microcontrollers,<br />> > typically not very capable ones, plus TCAMs which are pretty limited in<br />> > their flexibility.<br />> <br />> I do tend to think that the next era of SDN enabled hardware will eventually<br />> lead to more innovation in both the control and data plane - however it<br />> seems we are still in a "me-too" phase<br />> of development of openvswitch (btw: there is a new software switch for<br />> linux called rocker we should look at, and make sure runs fq_codel), and<br />> a long way from flexibly programmable switch hardware in general.<br />> <br />> http://openvswitch.org/pipermail/dev/2014-September/045084.html<br />> ><br />> ><br />> ><br />> > While it is tempting to use the "pre-packaged, proprietary" Ethernet switch<br />> > functionality, routing gets you out of the binary blobs, and let's you be a<br />> > lot smarter and more scalable. Given that it does NOT cost more to do<br />> > routing at the IP layer, building complex Ethernet bridging is not obviously<br />> > a win.<br />> <br />> SDN is certainly a way out of this mess. Eventually. But I fear we are making<br />> all the same mistakes over again, and making slower hardware, where in the<br />> end, it needs to be faster, to win.<br />> <br />> ><br />> ><br />> > BTW, TCAMs are used in IP layer switching, too, and also are used in packet<br />> > filtering. Maybe not in cheap consumer switches, but lots of Gigabit<br />> > switches implement IP layer switching and filtering. At HP, their switches<br />> > routinely did all their IP layer switching entirely in TCAMs.<br />> <br />> Yep. I really wish big, fat TCAMS were standard equipment.<br />> <br />> ><br />> ><br />> > On Sunday, January 25, 2015 9:58pm, "Dave Taht" <dave.taht@gmail.com><br />> said:<br />> ><br />> >> On Sun, Jan 25, 2015 at 6:43 PM, David Lang <david@lang.hm> wrote:<br />> >> > On Sun, 25 Jan 2015, Dave Taht wrote:<br />> >> ><br />> >> >> To your roaming point, yes this is certainly one place where<br />> migrating<br />> >> >> bridged vms across machines breaks down, and yet more and more<br />> vm<br />> >> >> layers are doing it. I would certainly prefer routing in this<br />> case.<br />> >> ><br />> >> ><br />> >> > What's the difference between "roaming" and moving a VM from one<br />> place<br />> >> > in<br />> >> > the network to another?<br />> >><br />> >> I think most people think of "roaming" as moving fairly rapidly from one<br />> >> piece of edge connectivity to another, and moving a vm is a great deal<br />> >> more<br />> >> permanent operation.<br />> >><br />> >> > As far as layer 2 vs layer 3 goes. If you try to operate at layer 3,<br />> you<br />> >> > are<br />> >> > going to have quite a bit of smarts in the endpoint. Even if it's<br />> only<br />> >> > connected vi a single link. If you think about it, even if your<br />> network<br />> >> > routing tables list every machine in our environment individually,<br />> you<br />> >> > still<br />> >> > have a problem of what gateway the endpoint uses. It would have to<br />> >> > change<br />> >> > every time it moved. Since DHCP doesn't update frequently enough to<br />> be<br />> >> > transparent, you would need to have each endpoint running a routing<br />> >> > protocol.<br />> >><br />> >> Hmm? I don't ever use a dhcp-supplied default gateway, I depend on the<br />> >> routing<br />> >> protocol to supply that. In terms of each vm running a routing protocol,<br />> >> well, no, I would rely on the underlying bare metal OS to be doing<br />> >> that, supplying<br />> >> the FIB tables to the overlying vms, if they need it, but otherwise the<br />> >> vms<br />> >> just see a "default" route and don't bother with it. They do need to<br />> >> inform the<br />> >> bare metal OS (better term for this please? hypervisor?) of what IPs<br />> they<br />> >> own.<br />> >><br />> >> static default gateways are evil. and easily disabled. in linux you<br />> >> merely comment<br />> >> out the "routers" in /etc/dhcp/dhclient.conf, in openwrt, set<br />> >> "defaultroute 0" for the<br />> >> interface fetching dhcp.<br />> >><br />> >> When a box migrates, it tells the hypervisor it's addresses, and then<br />> that<br />> >> box<br />> >> propagates out the route change to elsewhere.<br />> >><br />> >> ><br />> >> > This can work for individual hobbiests, but not when you need to<br />> support<br />> >> > random devices (how would you configure an iPhone to support this?)<br />> >><br />> >> Carefully. :)<br />> >><br />> >> I do note that this stuff does (or at least did) work on some of the<br />> open<br />> >> source variants of android. I would rather like it if android added ipv6<br />> >> tethering soon, and made it possible to mesh together multiple phones.<br />> >><br />> >> ><br />> >> ><br />> >> > Letting the layer 2 equipment deal with the traffic within the<br />> building<br />> >> > and<br />> >> > invoking layer 3 to go outside the building (or to a different<br />> security<br />> >> > domain) makes a lot of sense. Even if that means that layer 2 within<br />> a<br />> >> > building looks very similar to what layer 3 used to look like around<br />> a<br />> >> > city.<br />> >><br />> >> Be careful what you wish for.<br />> >><br />> >> ><br />> >> ><br />> >> > back to the topic of wifi, I'm not aware of any APs that participate<br />> in<br />> >> > the<br />> >> > switch protocols at this level. I also don't know of any reasonably<br />> >> > priced<br />> >> > switches that can do anything smarter than plain spanning tree when<br />> >> > connected through multiple paths (I'd love to learn otherwise)<br />> >> ><br />> >> > David Lang<br />> >><br />> >><br />> >><br />> >> --<br />> >> Dave Täht<br />> >><br />> >> thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks<br />> >><br />> <br />> <br />> <br />> --<br />> Dave Täht<br />> <br />> thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks<br />> </p>
</div></font>