[Bloat] RED against bufferbloat

Dave Taht dave.taht at gmail.com
Thu Feb 26 13:07:51 EST 2015


On Thu, Feb 26, 2015 at 9:04 AM, Dave Taht <dave.taht at gmail.com> wrote:
> On Thu, Feb 26, 2015 at 7:18 AM, MUSCARIELLO Luca IMT/OLN
> <luca.muscariello at orange.com> wrote:
>> On 02/26/2015 03:18 PM, Mikael Abrahamsson wrote:
>>
>> On Thu, 26 Feb 2015, MUSCARIELLO Luca IMT/OLN wrote:
>>
>> Done with the vendor itself with related NDA etc. It takes longer to set the
>> agreement than coding the system. The problem is that this process is not
>> ok. An ISP cannot maintain someone else product if it is closed.
>>
>>
>> Do you have a requirement document that makes sense to the people
>> programming these ASICs for vendors? When I try to explain what needs to be
>> done I usually run into very frustrating discussions.
>>
>>
>> I think there are people in this list that should be able to answer to this
>> question better than me.
>>
>> AFAIK the process is complex because even vendors use network processors
>> they don't build and
>> traffic management is developed by the chipco in the chip. Especially for
>> the segment we are considering here.
>> In the end the dequeue process is always managed  by someone else and
>> mechanisms and their implementations opaque.
>> You can do testing on the equipment and do some reverse engineering. What a
>> waste of time...
>>
>> This is why single queue AQM is preferred by vendors, because it does not
>> affect current product lines
>> and the enqueue is easier to code. FQ requires to recode the dequeue or to
>> shadow the hardware dequeue.
>
> OK, I need to dispel a few misconceptions.
>
> First, everyone saying that fq_codel can't be done in hardware is
> *wrong* See my last point far below, I know I write over-long
> emails...
>
> YES, at the lowest level of the hardware, where packets turn into
> light or fuzzy electrons, or need to be tied up in a an aggregate (as
> in cable) and shipped onto the wire, you can't do it. BUT, as BQL has
> shown, you CAN stick in enough buffering on the final single queue to
> *keep the device busy* - which on fiber might be a few us, on cable
> modems 2ms - and then do smarter things above that portion of the
> device. I am perfectly willing to lose those 2ms when you can cut off
> hundreds elsewhere.
>
> Everybody saying that it cant be done doesnt understand how BQL works.
> Demonstrably. On two+ dozen devices. Already. For 3 years now.

Or hasn't taken apart free.fr's revolution 6 product. *Shipping*, with
DRR for classification (similar to the sqm's simple.qos model) +
fq_codel to manage the three queues - since *august 2011*.

It has, in particular, a really tightly written DSL driver which makes
everything else "just work".

Other DSL firmware makers have hopefully taken note as the only way
their market will survive against the onslought of fiber is to make
better devices. More people should clue up their DSL makers about it,
though. The recent paper from alcatel-lucent was really terrible and
missed most of the points.

> On the higher end than CPE, people that keep whinging about have
> thousands of queues are not thinking about it correctly. This is
> merely a change in memory access pattern, not anything physical at
> all, and the overall reduction in queue lengths lead to much better
> cache behavior, and they should, um, at least try this stuff on a
> modern intel processor - and just go deploy that. 10GigE was totally
> feasible on day one, ongoing work is getting up to 40GigE (but running
> into trouble elsewhere on the rx path which jesper can talk to in
> painful detail)
>
> Again, if you do it right, on any architecture, all the smarts happen
> long before you hit the wire. You are not - quite - dropping from the
> head of the queue, but we never have, and I really don't get why
> people don't grok this.
>
> EVERY benchmark I ever publish can show the intrinsic latency in the
> system as hovering around Xms due to the context switch overhead of
> the processor and OS - and although I don't mind shaving that figure,

Clarification: "The context switch overhead of processor and OS being
covered up by BQL's slight added buffering to keep the device busy".

I should really publish more 100Mbit and gbit benchmarks to show that
actual overhead. It is not a lot.

> compared to the gains of seconds elsewhere that can come from using
> these algorithms - I find getting those down more satisfying.
> (admittedly I have spent tons of time trying to shave off a few
> hundred us at that level too, as has jonathon and many others in the
> linux community)

The overhead of other packet processing now dominates the runtime. In
addition to the xmit_more work last quarter, recently a whole bunch of
*lovely* patches arrived for reducing overhead in the Linux FIB lookup
table. They are *GREAT*.

https://www.netdev01.org/docs/duyck-fib-trie.pdf

I don't have any data on how much the fib lookup used to cost compared
to fq_codel at these speeds, but I think it was *at least* 7 times
more expensive than running fq_codel.

And Dave Millers keynote at netconf01 a few weeks back was all about
smashing latencies all through the linux networking stack, and in
particular, integrating offloads well into the kernel.

https://www.netdev01.org/docs/miller-Ottawa2015-Keynote.pdf

SDN is on the rise. In fact there was so much demonstrable progress
and interest shown at that conference on issues that I care about, in
subsystems and product that are shipping or will ship widely, that I
felt like we were going to win the war against network latency much
sooner than I ever dreamed.

https://www.netdev01.org/downloads

See the rocker switch, hemmingers preso on dpdk, the hardware
accellerating talk, really great stuff, I am sorry I was too sick to
attend.

>
> I also don't give a hoot about core routers, mostly the edge. I *do
> care deeply* about FQ/AQMing interconnects between providers,
> particularly in event of a disaster like earthquake or tsunami when
> suddenly 2/3s the interconnects get drowned. What will happen in that
> case, if an earthquake hit california the same size as the one that
> hit japan? It worries me. I live 500 meters from the intersection of
> two major fault lines.
>
> Secondly,  I need to clarify a statement above:
>
> "This is why single queue AQM is preferred by vendors *participating
> publicly on the aqm mailing list*, because it does not affect current
> product lines, and the enqueue is easier to code. "
>
> When fq_codel landed, the next morning, I said to myself, "ok, is it
> time to walk down sand hill road? We need this in switch chips and
> given the two monopolistic vendors left, it is ripe for disruption".
>
> After wrestling with myself for a few weeks I decided it would be
> simpler and easier if I tried to pursuade the chipmakers making packet
> co-processing engines (like octeon, intel, tilera) that this algorithm
> and htb-like rate control would be a valuable addition to their
> products. Note - in *none* of their cases did it have to reduce to
> gates, they have a specialized cpu co-processor that struggles to work
> at line rate (with features like nat offloads, etc) - with specialized
> firmware that they write and hold proprietary, and there were *no*
> hardware mods needed - Their struggle at line rate was not the point,
> I wanted something that could work at an ISPs set rate, which is very
> easy to do...
>
> I talked to all these chipmakers (and a few more startup-like ones in
> particularly the high speed trading market).
>
> The told me there was no demand. So I went talked to their customers...
>
> and I am certain that more than one of the companies I have talked to
> in the last 3 years is actually doing FQ now, and certain that codel
> is also implemented - but I cannot reveal which ones, and for all I
> know the ones that are not talking to me (anymore) are off doing. And
> at least one of the companies doing it in their packet co-processor,
> was getting it wrong, until I straightened 'em out, and for all I know
> they didn't listen.
>
> I figured whichever vendor shipped products first would have a market
> advantage, and then everybody else would pile on, and that if I
> focused on creating demand for the algorithm (as I did all over the
> world, and with ubnt in particular I went to the trouble of
> backporting it to their edgerouter personally), demand would be
> created for better firmware, from the chipcos and products would
> arrive.
>
> and they have. Every 802.11ac router now has some sort of "better" QoS
> system in it. (and of course, openwrt and derivatives). There is a ton
> of stuff in the pipeline.
>
> The streamboost folk were pretty effective in spreading their meme,
> but I am mad at them about quite a few things about their
> implementation and test regimes, so I'll save what they have done
> wrong for another day when I have more venom stored up, and have
> acquired stuff I can say publicly about their implementation via a bit
> more inspection of their GPL drops and testing the related products.
>
> ...
>
> "FQ requires to recode the dequeue or to shadow the hardware dequeue."
>
> Well this statement is not correct.
>
> *Lastly*: IF you want to reduce things to gates, rather than use a
> packet co-processor:
>
> 1) DRR in hardware is entirely doable. How do I know this? - because
> it was done for the netfpga.org project *7* years ago. Here is the
> project, paper, and *verilog*:
> https://github.com/NetFPGA/netfpga/wiki/DRRNetFPGA
>
> It is a single define to synthesize a configurable number of queues,
> and it worked on top of the GigE Xilinx virtex-2 pro FPGA, which is
> like so low-end now as I am not even sure if they are made anymore.
> http://netfpga.org/2014/#/systems/4netfpga-1g/details/
>
> They never got around to writing a five-tuple packet inspector/hasher
> but that is straightforward.
>
> 2) Rate management in hardware is entirely doable, also: Here is the
> project, paper, and verilog: https://github.com/gengyl08/SENIC
>
> 3) I long ago figured out how to make something fq_codel-like work (in
> theory) in hardware with enough parallization (and a bit of BQL).
> Sticking points were a complete re-org of the ethernet device and
> device driver, and a whole lot of Xilinx IP I wanted to dump, and I am
> really too busy to do any of the work, but:
>
> Since I am fed up with the debate, I am backing this kickstarter
> project. I have had several discussions with the people doing it -
> they are using all the same hardware I chose for my mental design -
> and I urge others here to do so.
>
> https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hardware-for-networking
>
> I am not entirely broke for a change, and plan to throw in 1k or so.
> Need to find 25k from other people for them to make their initial
> targets.
>
> That board meets all the needs for fixing wifi also. They already have
> shipping, more complex products that might be more right for the job,
> as working out the number of gates needed is something that needs
> actual work and simulation.
>
> But I did like this:
>
> https://github.com/MeshSr/wiki/wiki/ONetSwitch45
>
> I will write a bit more about this (as negotiations continue) in a
> less long, more specialized mail in the coming weeks, and perhaps, as
> so often happens around here (I am thinking of renaming this the
> "bufferbloat/stone soup project"), some EEs will show up eager to do
> something truly new, and amazing, as a summer project. If you got any
> spare students, well, go to town.
>
> I really, really like chisel in particular,
> https://chisel.eecs.berkeley.edu/ and the openrisc folk could use a
> better ethernet device.
>
>> My experience is not based on providing a requirement document, well we
>> tried that first,
>> but on joint coding with the chipco because you need to see a lot of chip
>> internals.
>>
>> _______________________________________________
>> Bloat mailing list
>> Bloat at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>>
>
>
>
> --
> Dave Täht
> Let's make wifi fast, less jittery and reliable again!
>
> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb



More information about the Bloat mailing list