From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-oi0-x22c.google.com (mail-oi0-x22c.google.com
 [IPv6:2607:f8b0:4003:c06::22c])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 101F03B2B0
 for <bloat@lists.bufferbloat.net>; Mon, 13 Jun 2016 00:34:08 -0400 (EDT)
Received: by mail-oi0-x22c.google.com with SMTP id w5so111987384oib.2
 for <bloat@lists.bufferbloat.net>; Sun, 12 Jun 2016 21:34:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=SCgRJjYzQG+RJREz1pvsMet8iUbA1WRm7ck8CTvAFME=;
 b=ZRefKYHVN1V0USGY1muzpYVZHSdfyirLxVlvMgpfouv3iD+OBf7Y7e1Zf0zBuoHA3L
 Y0xAP1XFLQ7pj1baRjliuVC0Kv9V6HVYxdwvS74cKhh1ZojDJD+1lvsDgft8kKrzGuox
 pGRv9LkaCNw2/BcrASve9zvSdkMRHvfScMDOpFVZO4/pqSPN3r2zFfxGshTKvzb+kXyT
 MMqhToJNsscc0SenMcxw7GsYj0k8UoE5sEU80mrPC0Jau8e8wToTn35i0BKw0Q5Smj91
 UwvaSvkAaqbw3Sg6PxRPaLnJ9AoxQYjlgO+iWMh5GD4wfMZonqs12ziaD+POxgoKk+oV
 iKQg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=SCgRJjYzQG+RJREz1pvsMet8iUbA1WRm7ck8CTvAFME=;
 b=SyynyZGQWhb5Aq6Len+PO+sfOmiVmbqsLjg0VDD/CFR8grCSdudzRPhAvxqB8xGLkd
 ybDoI3W73UPyk1mckXVuWBlXLRd3Jiw1eFCNv+jx3w46WzQ3rpHdLc7v95vyNtFt1IUv
 vA1IwOOSIeAgx3RgWqWmnzASpTWAMkpyVGs+QN/TJ+GYKGcgJVgCBsyNoMhWpU4NA1i9
 j2zw1Df7uypjDLOgy9VQO6+EY8IghgZZrJfJ1TI6wqGwlbVFCIy6N5BctcS1qG+hMsvD
 tFYV2Xi//nrFm+W59y8GA0xt22ucUCQt3Y5ylCFhc0yIoEaaNbQE7Zfl/g+WYrr67V0m
 ihdA==
X-Gm-Message-State: ALyK8tKyDFIMpbxLvVoC/rqgK9xKhku8kEkVmHnVEqNP8MJ2GEnVdC1ftVqLMZo1K1ri9rK/jXDsDWUwj4gjdw==
X-Received: by 10.157.34.99 with SMTP id o90mr3594665ota.63.1465792447313;
 Sun, 12 Jun 2016 21:34:07 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.202.229.210 with HTTP; Sun, 12 Jun 2016 21:34:06 -0700 (PDT)
In-Reply-To: <CAJ_ENFE5arvB9DuWWzugb4ucNvpJh5k8PBoHBEPS8b3pu8yNZA@mail.gmail.com>
References: <151299a8-87ec-6a8a-b44b-9f710c31a46f@gmail.com>
 <CAJ_ENFGV0+1gf8nos5NTCr98syMXxgB7Sj4CA0oyx9CtTqR7Bw@mail.gmail.com>
 <20160612212440.GB25090@sesse.net>
 <CAGrdgiXwqaHLG__Sn+-cedasUuJaRAOHycNg23-1rOPTuaRUiQ@mail.gmail.com>
 <CAJ_ENFE5arvB9DuWWzugb4ucNvpJh5k8PBoHBEPS8b3pu8yNZA@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
Date: Sun, 12 Jun 2016 21:34:06 -0700
Message-ID: <CAA93jw44ziuKTvWQrYvyqs530i2V5vP=NiyWcHZte3-sN44m2Q@mail.gmail.com>
To: Benjamin Cronce <bcronce@gmail.com>
Cc: Jesper Louis Andersen <jesper.louis.andersen@gmail.com>,
 bloat <bloat@lists.bufferbloat.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Bloat] Questions About Switch Buffering
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 13 Jun 2016 04:34:08 -0000

On Sun, Jun 12, 2016 at 6:07 PM, Benjamin Cronce <bcronce@gmail.com> wrote:
> I'm not arguing the an AQM isn't useful, just that you can't have your ca=
ke
> and eat it to. I wouldn't spend much time going down this path without fi=
rst
> talking to someone with a strong background in ASICs for network switches
> and asking if it's even a feasible. Everything(very little) I know about
> ASICs and buffers is buffering is a very hard problem that east up a lot =
of
> transistors and more transistors means slower ASICs. Almost always a
> trade-off between performance and buffer size. CoDel and especially Cake
> sound like they would not be ASIC friendly.

For much of two years I shopped around trying to get a basic proof of
concept done in a FPGA, going so far as to help fund onenetswitch's
softswitch, and trying to line up a university or other lab to tackle
the logic (given my predilection for wanting things to be open
source). Several shops were very enthusiastic for a while... then went
dark.

...

Codel would be very ASIC friendly - all you need to do is prepend a
timestamp to the packet on input, and transport it across the internal
switch fabric. Getting the timestamp on ingress is essentially free.
Making the drop or mark decision is also cheap. Looping, as codel can
do on overload, not so much, but it isn't that necessary either.

As for the "fq" portion - proofs of concept already exist for DRR in
the netfpga.org's project's verilog

https://github.com/NetFPGA/netfpga/wiki/DRRNetFPGA

Where the dpi required to create the hash would have to end up at the
end of the packet, and thus get bypassed by cut-through switching
(when the output port was empty), and need at least one packet queued
up at the destination to be able to slide it into the right virtual
queue.

The stumbling blocks were:

A) that most shops were only interested in producing the next 100gps
or faster chip, rather than trying to address 10GigE and lower. And
were also intimidated by the big players in the market. Even the big
players are intimidated - broadcom just exited the wifi biz (selling
off their business to cypress) - as one example.

B) The potential revenues from producing a revolutionary smart "gbit"
switch chip using technologies that could just forward packets dumbly
at 10gbit was a lot of NRE for chips that sell for well under a few
dollars. Everybody has a gbit switch chip and ethernet chip already
paid for....

C) The building blocks for packet processing in hardware are hard to
license together except for a very few players, and what's out there
in open source, basically limited to opencore's and netfpga's work.

I have no doubt that eventually someone will produce implementations
of codel, fq_codel and/or cake algorithms in gates, tho I lean more
towards something qfq derived than drr derived. Best I could estimate
was that a DRR version would need at least a 3 packet "pipeline" to
hash and route, tho I thought some interesting things could be done by
also having multiple CAMs to route the different flows and be able to
handle cache misses better.

In the interim, I kind of expect stuff derived from the QCA or
caviums' specialized co-processors to gain some of these ideas. There
are also hugely parallel network processors around the corner based on
arm architectures in cavium's and amd's roadmap. Cisco has some
insanely parallel processors in their designs.

More than fixing the native DC switch market (where everybody
generally overprovisions anyway) or improving consumer switches, I'd
felt that ISP edge devices (dslams/cable) were where someone would
innovate with a hardware assist to meet their business models (
http://jvimal.github.io/senic/ ), and maybe we'd see some assistance
for inbound rate limiting also arrive in consumer hardware with
offloads, a more limited version of senic perhaps.

But it takes a long time to develop hardware, chip designers scarce,
and without clear market demand... I dunno, who knows? perhaps what
the OCP folk are doing will start feeding back into actual chip
designs one day.

It was a nice diversion for me to play with the chisel language, the
risc-v and mill cpus , at least.

Anybody up for repurposing some machine learning chips? These look like fun=
:

https://www.engadget.com/2016/04/28/movidius-fathom-neural-compute-stick/

Oh, yea, mellonox's latest programmable ethernet devices look promising als=
o.

http://www.mellanox.com/page/programmable_network_adapters

The ironic thing is that the biggest problems in 10GigE+ is on input,
not output. In fact, on much hardware, even at lower rates,we tend to
be dropping on input more than enough to balance out the potential
bufferbloat problems there. Moving the timestamp and hash and having
parallel memory channels is sort of happening on multiple newer chips
on the rx path...


> On Sun , Jun 12, 2016 at 5:01 PM, Jesper Louis Andersen
> <jesper.louis.andersen@gmail.com> wrote:
>>
>> This *is* commonly a problem. Look up "TCP incast".
>>
>> The scenario is exactly as you describe. A distributed database makes
>> queries over the same switch to K other nodes in order to verify the
>> integrity of the answer. Data is served from memory and thus access time=
s
>> are roughly the same on all the K nodes. If the data response is sizable=
,
>> then the switch output port is overwhelmed with traffic, and it drops
>> packets. TCPs congestion algorithm gets into play.
>>
>> It is almost like resonance in engineering. At the wrong "frequency", th=
e
>> bridge/switch will resonate and make everything go haywire.
>>
>>
>> On Sun, Jun 12, 2016 at 11:24 PM, Steinar H. Gunderson
>> <sgunderson@bigfoot.com> wrote:
>>>
>>> On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote:
>>> > Internal networks rarely have bandwidth issues and congestion only
>>> > happens
>>> > when you don't have enough bandwidth.
>>>
>>> I don't think this is true. You might not have an aggregate bandwidth
>>> issues,
>>> but given the burstiness of TCP and the typical switch buffer depth
>>> (64 frames is a typical number), it's very very easy to lose packets in
>>> your
>>> switch even on a relatively quiet network with no downconversion.
>>> (Witness
>>> the rise of DCTCP, made especially for internal traffic on this kind of
>>> network.)
>>>
>>> /* Steinar */
>>> --
>>> Homepage: https://www.sesse.net/
>>> _______________________________________________
>>> Bloat mailing list
>>> Bloat@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>>
>>
>>
>>
>> --
>> J.
>>
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>


--=20
Dave T=C3=A4ht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org