From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-x22a.google.com (mail-ob0-x22a.google.com [IPv6:2607:f8b0:4003:c01::22a]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 7723821F294 for ; Thu, 26 Feb 2015 09:04:54 -0800 (PST) Received: by mail-ob0-f170.google.com with SMTP id va2so12449956obc.1 for ; Thu, 26 Feb 2015 09:04:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=ulpmQBeB8AlW77vNd6ttCiEpkbiNLevOW3FQZZonHjo=; b=SFQdKPt8lcZurF3YXe9EgcD6Sck6izaWn8wIkpKBzpj/p+xCuuCdixFt13Z7CG38P2 r2IJ76UjXb7qW5UMxd9fDYiNfIk7HRTpbJdyg43ar+LqH+kRigCkmtT+MmkUNgtEPLhH jDm3KhfLjxH8uq4EOh1XOfA6Ks0S6XOFzCRkYD9Xo6bc5M9Lp05DE6J4LtBLlYo6pNsA kzB1AXsKcO+QJpuTbcxbOJun+YP/D8s1KqWKXv2SHAPECJhHHH/D0x/Qlt7nqZfe5czr Zw5bNUUA19YkDgRtBjM4LSTWwGdaJWXF3PzwrY2acbU4hICB4FrZ6LJlslc4Q9I8++eK Uydw== MIME-Version: 1.0 X-Received: by 10.182.1.202 with SMTP id 10mr6757392obo.56.1424970293336; Thu, 26 Feb 2015 09:04:53 -0800 (PST) Received: by 10.202.51.66 with HTTP; Thu, 26 Feb 2015 09:04:52 -0800 (PST) In-Reply-To: <54EF3932.6020007@orange.com> References: <201502250806.t1P86o5N011632@bagheera.jungle.bt.co.uk> <4A80D1F9-F4A1-4D14-AC75-958C5A2E8168@gmx.de> <3F47B274-B0E4-44F2-A434-E3C9F7D5D041@ifi.uio.no> <87twyaffv3.fsf@toke.dk> <1D438EDC-358D-4DD5-9B8D-89182256F66C@gmx.de> <54EDD951.50904@orange.com> <54EE07C0.60703@orange.com> <54EF2877.8030302@orange.com> <54EF3932.6020007@orange.com> Date: Thu, 26 Feb 2015 09:04:52 -0800 Message-ID: From: Dave Taht To: "MUSCARIELLO Luca IMT/OLN" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "bloat@lists.bufferbloat.net" Subject: Re: [Bloat] RED against bufferbloat X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Feb 2015 17:05:23 -0000 On Thu, Feb 26, 2015 at 7:18 AM, MUSCARIELLO Luca IMT/OLN wrote: > On 02/26/2015 03:18 PM, Mikael Abrahamsson wrote: > > On Thu, 26 Feb 2015, MUSCARIELLO Luca IMT/OLN wrote: > > Done with the vendor itself with related NDA etc. It takes longer to set = the > agreement than coding the system. The problem is that this process is not > ok. An ISP cannot maintain someone else product if it is closed. > > > Do you have a requirement document that makes sense to the people > programming these ASICs for vendors? When I try to explain what needs to = be > done I usually run into very frustrating discussions. > > > I think there are people in this list that should be able to answer to th= is > question better than me. > > AFAIK the process is complex because even vendors use network processors > they don't build and > traffic management is developed by the chipco in the chip. Especially for > the segment we are considering here. > In the end the dequeue process is always managed by someone else and > mechanisms and their implementations opaque. > You can do testing on the equipment and do some reverse engineering. What= a > waste of time... > > This is why single queue AQM is preferred by vendors, because it does not > affect current product lines > and the enqueue is easier to code. FQ requires to recode the dequeue or t= o > shadow the hardware dequeue. OK, I need to dispel a few misconceptions. First, everyone saying that fq_codel can't be done in hardware is *wrong* See my last point far below, I know I write over-long emails... YES, at the lowest level of the hardware, where packets turn into light or fuzzy electrons, or need to be tied up in a an aggregate (as in cable) and shipped onto the wire, you can't do it. BUT, as BQL has shown, you CAN stick in enough buffering on the final single queue to *keep the device busy* - which on fiber might be a few us, on cable modems 2ms - and then do smarter things above that portion of the device. I am perfectly willing to lose those 2ms when you can cut off hundreds elsewhere. Everybody saying that it cant be done doesnt understand how BQL works. Demonstrably. On two+ dozen devices. Already. For 3 years now. On the higher end than CPE, people that keep whinging about have thousands of queues are not thinking about it correctly. This is merely a change in memory access pattern, not anything physical at all, and the overall reduction in queue lengths lead to much better cache behavior, and they should, um, at least try this stuff on a modern intel processor - and just go deploy that. 10GigE was totally feasible on day one, ongoing work is getting up to 40GigE (but running into trouble elsewhere on the rx path which jesper can talk to in painful detail) Again, if you do it right, on any architecture, all the smarts happen long before you hit the wire. You are not - quite - dropping from the head of the queue, but we never have, and I really don't get why people don't grok this. EVERY benchmark I ever publish can show the intrinsic latency in the system as hovering around Xms due to the context switch overhead of the processor and OS - and although I don't mind shaving that figure, compared to the gains of seconds elsewhere that can come from using these algorithms - I find getting those down more satisfying. (admittedly I have spent tons of time trying to shave off a few hundred us at that level too, as has jonathon and many others in the linux community) I also don't give a hoot about core routers, mostly the edge. I *do care deeply* about FQ/AQMing interconnects between providers, particularly in event of a disaster like earthquake or tsunami when suddenly 2/3s the interconnects get drowned. What will happen in that case, if an earthquake hit california the same size as the one that hit japan? It worries me. I live 500 meters from the intersection of two major fault lines. Secondly, I need to clarify a statement above: "This is why single queue AQM is preferred by vendors *participating publicly on the aqm mailing list*, because it does not affect current product lines, and the enqueue is easier to code. " When fq_codel landed, the next morning, I said to myself, "ok, is it time to walk down sand hill road? We need this in switch chips and given the two monopolistic vendors left, it is ripe for disruption". After wrestling with myself for a few weeks I decided it would be simpler and easier if I tried to pursuade the chipmakers making packet co-processing engines (like octeon, intel, tilera) that this algorithm and htb-like rate control would be a valuable addition to their products. Note - in *none* of their cases did it have to reduce to gates, they have a specialized cpu co-processor that struggles to work at line rate (with features like nat offloads, etc) - with specialized firmware that they write and hold proprietary, and there were *no* hardware mods needed - Their struggle at line rate was not the point, I wanted something that could work at an ISPs set rate, which is very easy to do... I talked to all these chipmakers (and a few more startup-like ones in particularly the high speed trading market). The told me there was no demand. So I went talked to their customers... and I am certain that more than one of the companies I have talked to in the last 3 years is actually doing FQ now, and certain that codel is also implemented - but I cannot reveal which ones, and for all I know the ones that are not talking to me (anymore) are off doing. And at least one of the companies doing it in their packet co-processor, was getting it wrong, until I straightened 'em out, and for all I know they didn't listen. I figured whichever vendor shipped products first would have a market advantage, and then everybody else would pile on, and that if I focused on creating demand for the algorithm (as I did all over the world, and with ubnt in particular I went to the trouble of backporting it to their edgerouter personally), demand would be created for better firmware, from the chipcos and products would arrive. and they have. Every 802.11ac router now has some sort of "better" QoS system in it. (and of course, openwrt and derivatives). There is a ton of stuff in the pipeline. The streamboost folk were pretty effective in spreading their meme, but I am mad at them about quite a few things about their implementation and test regimes, so I'll save what they have done wrong for another day when I have more venom stored up, and have acquired stuff I can say publicly about their implementation via a bit more inspection of their GPL drops and testing the related products. ... "FQ requires to recode the dequeue or to shadow the hardware dequeue." Well this statement is not correct. *Lastly*: IF you want to reduce things to gates, rather than use a packet co-processor: 1) DRR in hardware is entirely doable. How do I know this? - because it was done for the netfpga.org project *7* years ago. Here is the project, paper, and *verilog*: https://github.com/NetFPGA/netfpga/wiki/DRRNetFPGA It is a single define to synthesize a configurable number of queues, and it worked on top of the GigE Xilinx virtex-2 pro FPGA, which is like so low-end now as I am not even sure if they are made anymore. http://netfpga.org/2014/#/systems/4netfpga-1g/details/ They never got around to writing a five-tuple packet inspector/hasher but that is straightforward. 2) Rate management in hardware is entirely doable, also: Here is the project, paper, and verilog: https://github.com/gengyl08/SENIC 3) I long ago figured out how to make something fq_codel-like work (in theory) in hardware with enough parallization (and a bit of BQL). Sticking points were a complete re-org of the ethernet device and device driver, and a whole lot of Xilinx IP I wanted to dump, and I am really too busy to do any of the work, but: Since I am fed up with the debate, I am backing this kickstarter project. I have had several discussions with the people doing it - they are using all the same hardware I chose for my mental design - and I urge others here to do so. https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hard= ware-for-networking I am not entirely broke for a change, and plan to throw in 1k or so. Need to find 25k from other people for them to make their initial targets. That board meets all the needs for fixing wifi also. They already have shipping, more complex products that might be more right for the job, as working out the number of gates needed is something that needs actual work and simulation. But I did like this: https://github.com/MeshSr/wiki/wiki/ONetSwitch45 I will write a bit more about this (as negotiations continue) in a less long, more specialized mail in the coming weeks, and perhaps, as so often happens around here (I am thinking of renaming this the "bufferbloat/stone soup project"), some EEs will show up eager to do something truly new, and amazing, as a summer project. If you got any spare students, well, go to town. I really, really like chisel in particular, https://chisel.eecs.berkeley.edu/ and the openrisc folk could use a better ethernet device. > My experience is not based on providing a requirement document, well we > tried that first, > but on joint coding with the chipco because you need to see a lot of chip > internals. > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > --=20 Dave T=C3=A4ht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb