[Cerowrt-devel] hardware hacking on fq_codel in FPGA form at 10GigE

Dave Taht dave.taht at gmail.com
Thu Dec 20 09:18:23 EST 2012


On Thu, Dec 20, 2012 at 8:53 AM,  <dpreed at reed.com> wrote:
> I have lately been using (for my very wideband software defined radio
> amateur radio transceiver project) the brand new, very nice device called
> the Zynq 7000 series of Platform FPGA's from Xilinx.  It's a complete system
> on a chip, with a dual core ARM Cortex A9 and an enormous amount of
> programmable logic that has "cache coherent access" to the memory system.
>
>
>
> The chip is fabricated in 28 nm form, has a "zillion" SelectIO programmable
> pins, but more importantly has a bunch of hard logic I/O paths.
>
>
>
> Since it comes packaged "cheap" (full  6 inch square evalboard with 512 MB
> DRAM and full standard "PC type" interconnects - GigE, VGA, HDMI, USB), as a
> $299 board with free FPGA tool chain) and runs Linux out of the box, I
> highly recommend the board called Zedboard (just google that).

Their documentation is first rate!

http://www.zedboard.org/sites/default/files/documentations/ZedBoard_HW_UG_v1_6.pdf

> Easy to attach 10 GigE hardware if you can do simple PCB design and
> soldering.

The 10GigE thought was mostly because that's what's driving most of
the decision making behind doing these huge network offloads on the
host processor. Which is so problematic when it migrates out of the
data center/talks to stuff outside of the datacenter

and/or the software concepts end up in devices that need to run well
at 100Mbit and below.

So thoroughly solving the fq/codel problem in hardware at that level
will head off the next worldwide problem when these devices become
more affordable. And (IMHO) make them work better.

> You can be up and running with a development system for under $500 in a
> weekend, building FPGA acceleration, or if you want to add hardware that
> connects to the zillions of I/O pins to the PLL and memory system, that
> might take a week or more, depending on your hardware design and hacking
> skills. I've connected "eval boards" of various sorts using a breakout board
> you can buy from Xilinx quicker than that.
>
>
>
> In some ways, this is the Raspberry Pi of high speed digital logic hacking.

You should give them that quote.

And I'll order one. Heck, maybe two. Real find! Thanks!

How far along is your SDR project?

>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: "Hal Murray" <hmurray at megapathdsl.net>
> Sent: Thursday, December 20, 2012 5:32am
> To: "Dave Taht" <dave.taht at gmail.com>
> Cc: "bloat-devel" <bloat-devel at lists.bufferbloat.net>,
> codel at lists.bufferbloat.net, "Hal Murray" <hmurray at megapathdsl.net>,
> cerowrt-devel at lists.bufferbloat.net
> Subject: Re: [Cerowrt-devel] hardware hacking on fq_codel in FPGA form at
> 10GigE
>
>
> dave.taht at gmail.com said:
>>> If I was going to do something like that, I'd build a small/simple CPU
>>> the work in microcode.
>
>> There are two ppc 440 cpus already onboard the 10GigE device, I think.
>> It's
>> a REALLY NICE fpga.
>
>> I'd also looked at the octeon and the latest arm chipset from TI which I
>> can't remember the codename for at the moment...
>
> I wasn't thinking of a traditional general purpose CPU but rather something
> special for this problem.
>
>
>
>>> How many lines of assembler code would it take?
>
>> I could do a dump of the current code into any given assembly language.
>> It's
>> not a lot, but there are a lot of out of band functions.
>
> I didn't mean lines of traditional assembly code. If we want to pursue this,
> pick a chunk of c code (not too big) and break it into "lines" where
> everything on a line can be executed at the same time. I'll try to sketch a
> "CPU" and write the microcode.
>
>
>> The enqueue and dequeue algorithms are entirely decoupled, with the
>> exception of this error handling phase of (out of queue space) One thought
>> would be to track packet count on enqueue (this is more "sfq"-like than
>> fq_codel-like) which still has a tiny lock...
>
> Stuff that can be reasonably done in the driver should probably be done
> there
> if it saves a lot of work for the microcode. Avoiding out-of-queue-space
> might be a good example.
>
>
>
>> Well there are a few things that would benefit from moving directly into
>> hardware - the 5 tuple hash, for example.
>
> I'm probably missing the big picture. Are you building a router or a server?
>
> A server has socket control blocks. Can the hash be precomputed and stored
> there?
> That doesn't help with UDP sendto, but I think it would work with TCP.
>
>
> If you are building a router, does the routing as well as fq-ing have to fit
> in the FPGA?
>
>
> --
> These are my opinions. I hate spam.
>
>
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html



More information about the Cerowrt-devel mailing list