From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-ob0-x22a.google.com (mail-ob0-x22a.google.com
	[IPv6:2607:f8b0:4003:c01::22a])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 1ED3821F1D4
	for <cerowrt-devel@lists.bufferbloat.net>;
	Sun, 25 Jan 2015 18:19:06 -0800 (PST)
Received: by mail-ob0-f170.google.com with SMTP id wp4so5708563obc.1
	for <cerowrt-devel@lists.bufferbloat.net>;
	Sun, 25 Jan 2015 18:19:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=z643ODXG5RlTfn3Q5K+R3MB+zxEzwcQiovJ+M23Jmm0=;
	b=j0P5/eK+KKGphYhNXj7xmTZdXyDKMW7ZWEuCh/TSNBB8zW7ByXNFv1tS6ThfYoE29v
	il84Rt2Nij9kJ2wfqOzAduAfPy9/5JaFnQUc+9OCIcwEOhFW9tyFbTpW7ewlJXgKjJh3
	IL/3CuuNB/NW0eOEtY6jDhg3vakytpgbc+FVI9EhFmpOt41xhh+Xl+L3MTtV/MVz7VSw
	6kGOmXgV9UBbTewzuL0kdFZNnd3JX2z+Ll4/5wXU5d7AywWLiqYGhFOjEp+WstClFias
	IinB/oskro2QVYpZYYsNeSb4OCJ76dzP8kuzph6/mrWN9uYJVqzOK6i55s+69vYbM8c/
	5EhA==
MIME-Version: 1.0
X-Received: by 10.202.204.142 with SMTP id c136mr10810824oig.81.1422238745885; 
	Sun, 25 Jan 2015 18:19:05 -0800 (PST)
Received: by 10.202.51.66 with HTTP; Sun, 25 Jan 2015 18:19:05 -0800 (PST)
In-Reply-To: <1422237076.005718796@apps.rackspace.com>
References: <54B5D28A.3010906@gmail.com>
	<7B1EA8F0-FCB6-4A37-950F-2558FC751DE8@gmail.com>
	<54C038D0.1000305@gmail.com>
	<alpine.DEB.2.02.1501211553090.21864@nftneq.ynat.uz>
	<54C0BD22.3000608@gmail.com>
	<alpine.DEB.2.02.1501220110170.19609@nftneq.ynat.uz>
	<54C13F47.1010203@gmail.com>
	<1422111577.328132080@apps.rackspace.com>
	<alpine.DEB.2.02.1501242029320.19609@nftneq.ynat.uz>
	<1422217048.025611275@apps.rackspace.com>
	<alpine.DEB.2.02.1501251538031.19609@nftneq.ynat.uz>
	<1422237076.005718796@apps.rackspace.com>
Date: Sun, 25 Jan 2015 18:19:05 -0800
Message-ID: <CAA93jw4DYgbv0oFwOfJmDfnOfAz6VYAdv9BcgS51sNg-rEopCA@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: David Reed <dpreed@reed.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
	"cerowrt-devel@lists.bufferbloat.net" <cerowrt-devel@lists.bufferbloat.net>
Subject: Re: [Cerowrt-devel] Recording RF management info _and_ associated
	traffic?
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 26 Jan 2015 02:19:35 -0000

Two notes:

1)

Switches all have a very fast (t)CAM based lookup for mac addresses
and vlan tags. The typical size for these is around 4096 entries per
vlan, although the next generation VXLAN standard will push this to a
lot more bits.

Routing, on the other hand, requires a lot more storage, that is
difficult to search in linear time, and worse, requires that a layer
three retain tables for ipv4, ipv6, and "other". Furthermore it
requires that every device that needs it participate in the routing
protocol - of which there are dozens - where spanning tree only has a
few variants and improvements. I don't know the extent to which

2) I am no fan of the various things I see being built on top of VXLAN
(see conga) - but it is a prevailing trend. I am a partial advocate of
moving all the routing support to the servers, and letting the
switches remain pretty dumb. There has been a lot of good work in this
area in Linux of late, as alexander has successfully cut the cost of a
a routing lookup that falls through to default from several hundred ns
to, like 16ns on the high end intel chips. I look forward to testing
that on the next round of cerowrt.

This is still a great deal slower than a switch can find the right mac
address (well, depending on how you measure it). And still needs a
commonly agreed upon routing protocol to fill the fib tables. Most
routing protocols do not fail over very quickly either, with typical
timeouts measured in 10s of seconds. On my very long todo list would
be one day trying to get babel to fail over or otherwise switch ideal
routes in under 40ms in a 10gigE environment - and even that is too
slow, and going faster would require changing the babel protocol,
which has a minimum time representation of 10ms. It would be an
interesting research project for someone to attempt high speed routing
in a data center virtual machine environment, instead of bridging.

To your roaming point, yes this is certainly one place where migrating
bridged vms across machines breaks down, and yet more and more vm
layers are doing it. I would certainly prefer routing in this case.

On Sun, Jan 25, 2015 at 5:51 PM,  <dpreed@reed.com> wrote:
> If you are using Ethernet bridging, your Ethernet switches are doing exac=
tly
> this at the Ethernet layer... they have large tables of MAC addresses tha=
t
> are known throughout the network, and for each MAC address in the
> Enterprise, they have the next hop destination.
>
>
>
> So IP routing tables, one IP address per destination in the Enterprise,
> would occupy no more space than do the Ethernet routing tables....  so an=
y
> argument about space efficiency is mooted.
>
>
>
> This is why bridging is no better than routing - you have to solve the sa=
me
> problem at one layer or the other. The Ethernet layer's "solution" is
> actually very suboptimal, especially when roaming is going on.
>
>
>
> On Sunday, January 25, 2015 6:57pm, "David Lang" <david@lang.hm> said:
>
>> On Sun, 25 Jan 2015, dpreed@reed.com wrote:
>>
>> > Disagree. See below.
>> >
>> >
>> > On Saturday, January 24, 2015 11:35pm, "David Lang" <david@lang.hm>
>> said:
>> >
>> >
>> >
>> >> On Sat, 24 Jan 2015, dpreed@reed.com wrote:
>> >> > A side comment, meant to discourage continuing to bridge rather tha=
n
>> route.
>> >> >
>> >> > There's no reason that the AP's cannot have different IP addresses,
>> but a
>> >> > common ESSID. Roaming between them would be like roaming among mesh
>> subnets.
>> >> > Assuming you are securing your APs' air interfaces using encryption
>> over the
>> >> > air, you are already re-authenticating as you move from AP to AP. S=
o
>> using
>> >> > routing rather than bridging is a good idea for all the reasons tha=
t
>> routing
>> >> > rather than bridging is better for mesh.
>> >>
>> >> The problem with doing this is that all existing TCP connections will
>> break when
>> >> you move from one AP to another and while some apps will quickly noti=
ce
>> this and
>> >> establish new connections, there are many apps that will not and this
>> will cause
>> >> noticable disruption to the user.
>> >>
>> >> Bridgeing allows the connections to remain intact. The wifi stack
>> re-negotiates
>> >> the encryption, but the encapsulated IP packets don't change.
>> >
>> >
>> > There is no reason why one cannot set up an enterprise network to
>> > support
>> > roaming, yet maintaining the property that IP addresses don't change
>> > while
>> > roaming from AP to AP. Here's a simple concept, that amounts to moving
>> > what
>> > would be in the Ethernet bridging tables up to the IP layer.
>> >
>> > All addresses in the enterprise are assigned from a common prefix
>> > (XXX/16 in
>> > IPv4, perhaps). Routing in each access point is used to decide whether
>> > to
>> > send the packet on its LAN, or to reflect it to another LAN. A node's
>> > preferred location would be updated by the endpoint itself, sending it=
s
>> > current location to its current access point (via ARP or some other
>> protocol).
>> > The access point that hears of a new node that it can reach tells all
>> > the
>> > other access points that the node is attached to it. Delivery of a
>> > packet to
>> > a node is done by the access point that receives the packet by looking
>> > up the
>> > destination IP address in its local table, and sending it to the acces=
s
>> > point
>> > that currently has the destination IP address.
>> >
>> > This is far better than "bridging" at the Ethernet level from a
>> > functionality
>> > point of view - it is using routing, not bridging. Bridging at the
>> > Ethernet
>> > level uses Ethernet's STP feature, which doesn't work very well in
>> collections
>> > of wireless LAN's (it is slow to recalculate when something moves,
>> > because it
>> > was designed for unplug/plug of actual cables, and moving the host fro=
m
>> > one
>> > physical location to another).
>> >
>> > IMO, Ethernet sometimes aspires to solve problems that are already
>> well-solved
>> > in the Internet protocols. (for example the 802.11s mess which tries t=
o
>> > do a
>> > mesh entirely in the Ethernet layer, and fails pretty miserably).
>> >
>> > Of course that's only my opinion, but I think it applies to overuse of
>> > bridging at the Ethernet layer when there are better approaches at the
>> > next
>> > layer up.
>>
>> Unless you are going to have your routing tables handle every address in
>> your
>> network separately (and fix all the software that depends on broadcasts)
>> you are
>> going to have trouble trying to do this at the IP layer.
>>
>> The 'modern Enterprise' datacenter has lots of large machines that get
>> sliced
>> into multiple virtual machines. For redundancy purposes you want to have
>> the
>> machines used for a particular job to be spread across as many of these
>> machines
>> as possible, spread around your datacenter.
>>
>> Switches in this environment are becoming layer 2 routers. They are
>> connected
>> together with multiple links providing redundant paths around the networ=
k.
>> This
>> isn't being done with Spanning Tree because Spanning Tree only allows on=
e
>> path
>> to exist at once, and that is inefficient and creates bottlenecks. As a
>> result,
>> they are now keeping all these links live at the same time and using lea=
st
>> cost
>> paths to route the layer 2 traffic across the switches.
>>
>> It's fair to argue that this is abuse of layer 2, but the difficulties i=
n
>> having
>> to change the software operating at higher layers vs the fact that makin=
g
>> these
>> changes at the layer 2 level is completely transparent to the higher
>> layers make
>> it so that using this layer 2 capability is pragmantically a far better
>> choice.
>>
>> The Computer Scientist will cringe at the 'hacks' that this introduces,
>> but
>> there is far more progress made when new capabilities can be added in a
>> way
>> that's transparent to other layers of the stack then when it requires
>> major
>> changes to how things work.
>>
>> The software layer is the worst to try and force fundamental changes to.
>> You
>> would be horrified to learn how old some of the software is that's runni=
ng
>> major
>> jobs at large companies. Even if the software is in continuous
>> development, the
>> age of the core software frequently shows.
>>
>> David Lang
>>
>
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>


--=20
Dave T=C3=A4ht

thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks