[Cerowrt-devel] Fastpass: A Centralized "Zero-Queue" Datacenter Network

Sat Jul 19 13:31:17 EDT 2014

On Sat, Jul 19, 2014 at 9:41 AM,  <Valdis.Kletnieks at vt.edu> wrote:
> On Fri, 18 Jul 2014 17:23:24 -0700, Dave Taht said:
>> In particular, I'd *really love* to rip most of the network stack out
>> of the kernel and into userspace. And I really like the idea of
>> writable hardware that can talk to virtual memory from userspace (the
>> zynq can)

I enjoyed the first comment over here:

https://news.ycombinator.com/item?id=8056001

and stumbled across it in my every-morning's google search for "bufferbloat"

I really hate the fragmentation of the conversation that has happened since
netnews got overwhelmed by spam.

> To misquote Lost Boys, "One thing about living in a microkernel I never could
> stomach, all the damn context switches."

Ha. That movie was filmed in my "hometown" (santa cruz), and all the
local extras in it,
look like that, all the time. Love that movie. Been on those railroad tracks.

> Or were you thinking going entirely the opposite direction and offloading to
> ever-smarter network cards, sort of a combo of segment offload and zero-copy,
> on steroids no less?

Offloads are pita with our current architectures. Certainly I'd like
to make it easier
to prototype hardware and distribute loads to alternate cpus, whether they be
centralized or distributed.

Things like netmap and click are interesting, as is all the work on openswitch
and SDN related technologies.

http://info.iet.unipi.it/~luigi/netmap/

Not for the ultimate speed of it... but if you can move something that
is very hard to do,
and experiment with, if you do it in kernel, to where you can play
with it in userspace,
having the vm protections there make iterating on ideas much easier.

You can prototype stuff in really high level languages (like python)
and prove stuff
out, and that's a good thing. Certainly the performance won't be
there, but if you
can clearly identify a core performance enhancing thing, you can move it to C
or ASIC later.

I think, incidentally, HAD micro-kernels been successful, hardware
support for them
would have evolved, and it would be far better, more reliable
computing world. I note at the
time that I was working on things like mach, (early 90s) I didn't feel
this way!, as moving
virtual memory management to userspace incurred such a substantial overhead
as to obviate the advantages. There was plenty of other stuff that was
pretty useful
to move to userspace (plan 9 did it better than mach), too, but it all
got lost in how hard
and slow it was at the time to abstract so much out of the kernel.

(there are good ideas in every bad paper)

I have since, given the amount of hassle and finicy/crashyness of how
hard it is to do kernel programming in general, revised my opinion.
One of my biggest regrets of the "evolution" of computer design over
the last 20 years is that most hardware offloads can only be used in
kernel space, and that has led to those improvements being difficult
to code and design for to only work there, and often, downright
useless as they can't easily be used on small amounts of data without
excessive context switching.

Secondly it has led to a division of labor where EE's in love with the
billions of transistors at their disposal, burn time writing things
that userspace apps can't use. So I'm VERY bugged about that, and was
overjoyed at the prospect in the zynq of being able to "write
hardware", and have it talk through a virtual memory port, so that if
you could identify a thing that could be done better in hardware, you
could get at it via vm with no context switches, which is particularly
valuable on a multi-core cpu architecture.

To me the availability of the virtual memory port on the zynq is the
greatest possible innovation I've seen in FPGA design in a decade and
may one day re-unify the outlook of the EEs and userspace programmers
to do genuinely useful stuff. There are zillions of useful things that
can be done better with just a little extra hardware support, from
userspace. Two things that have come up of late are CAM memory
comparisons, and echo cancelling, both of which are easy to do
efficiently in hardware, with performance a conventional van neuman
architecture can't match. Other things on my mind are things like
packet scheduling (as per the senic paper I posted earlier), and much,
much more.

And it would be great if the EEs and CS folk started going to the same
parties again.

Last regret of the day: Back in the 80s, the LISP machine was a 36 bit
tagged architecture. I loved it. I have been mildly bugged, that we
didn't use the top 4 bits on 64 bit architectures for tags, it would
make things like garbage collection so much easier...

-- 
Dave Täht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article