From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <eric.dumazet@gmail.com>
Received: from mail-gy0-f171.google.com (mail-gy0-f171.google.com
	[209.85.160.171]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 122ED20027E
	for <bloat@lists.bufferbloat.net>; Tue, 13 Mar 2012 12:16:20 -0700 (PDT)
Received: by ghbz17 with SMTP id z17so1462952ghb.16
	for <bloat@lists.bufferbloat.net>; Tue, 13 Mar 2012 12:16:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=subject:from:to:cc:in-reply-to:references:content-type:date
	:message-id:mime-version:x-mailer:content-transfer-encoding;
	bh=17yp+rLseIj4NPVvj1Y8vz1vFU11vJTWQsrdCwlX6UU=;
	b=ToUAg/PfThFN6GvosurQJQPkOkiRzeDaR7OwKMDK2Wukadzaj98Ap5J0GjoAcCspDF
	XB1Q+AezYg8NKrpr0pK6aVmHlY1Msp5O4sUBIhTDRsbIpxnNi6UTPJmwbSEeQGG9Nb0O
	wcG9pQ+H8hySUS8AkMKNiSGboHhymf6wUbwF0ie8mCd0OrdVVYw5nYDNx4GJewYRKOpp
	l/HXRAYyr812ptKQmNveUujKMTnLsZj+PVQOArhUifGntWIvCr9kNQ00wKffjA71NdL6
	y/+8Qrf1WiWdfK3wYz9VQ04qnfRtL0wIkeu5JGxf/O/wOMK/NljjTBzTda0/+dB7+hWi
	D/aw==
Received: by 10.182.159.41 with SMTP id wz9mr12962033obb.69.1331666178793;
	Tue, 13 Mar 2012 12:16:18 -0700 (PDT)
Received: from ?IPv6:2620:0:1000:147c:224:d7ff:fee3:2a94?
	([2620:0:1000:147c:224:d7ff:fee3:2a94])
	by mx.google.com with ESMTPS id h9sm2113059obr.20.2012.03.13.12.16.16
	(version=SSLv3 cipher=OTHER); Tue, 13 Mar 2012 12:16:17 -0700 (PDT)
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Luigi Rizzo <rizzo@iet.unipi.it>
In-Reply-To: <20120313190838.GA78827@onelab2.iet.unipi.it>
References: <CAA93jw7H3RAwjppjtOhTUdtHnNPMypP3g+yKRKn2cijEgyQ3Cg@mail.gmail.com>
	<5ce67de28ae4c472a088f0a37e643092@localhost>
	<20120313090357.2c9ac0db@nehalam.linuxnetplumber.net>
	<20120313190838.GA78827@onelab2.iet.unipi.it>
Content-Type: text/plain; charset="UTF-8"
Date: Tue, 13 Mar 2012 12:16:15 -0700
Message-ID: <1331666175.4449.43.camel@edumazet-glaptop>
Mime-Version: 1.0
X-Mailer: Evolution 2.28.3 
Content-Transfer-Encoding: 7bit
Cc: Stephen Hemminger <shemminger@vyatta.com>,
	bloat <bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] high speed networking from userspace
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 13 Mar 2012 19:16:20 -0000

On Tue, 2012-03-13 at 20:08 +0100, Luigi Rizzo wrote:

> The firewall is actually one place where an efficient I/O mechanism
> is really useful. Netmap (or the netfilter API or netgraph in
> FreeBSD, if they were not built on top of skbufs/mbufs) give you
> an ideal place to efficiently drop rogue traffic, and reinject the
> interesting one in the stack for further processing.
> 
> Also the concepts used in netmap (and in VJ's netchannel)
> are not confined to userspace networking.
> Even in the kernel one can and probably should:
> 
> - get rid of skbufs/mbufs even in the kernel (replacing
>   them with cheaper containers or data copies)
> - do more packet coalescing (software RSC is an example), to amortize
>   certain costs over larger batches;
> - move work away from the interrupt/polling threads and closer 
>   to the user thread (for better cache locality and load management)
>   
> The real gain of these mechanisms, i think, is having the option
> to avoid costly operations when you don't need them.
> 
> That's the message i would like to convey.
> 
> Of course everything would be more convincing if i came up
> with a full skbuf-less in-kernel stack and not just the
> bottom layer+libpcap :)
> 

OK, but what about process scheduler and ability to queue packets
somewhere if your low priority application is stalled because of some
high priority stuff coming, or what happens if your tcp receive windows
are 16Mbytes per flow...

Pre-allocating huge ring buffers is not an option if you handle thousand
of flows.

We could avoid memory allocators everywhere and come back to 30 years
old designs and MSDOS. But thats not the path taken by modern stuff.

netchannels have the multiplex/demux problem, and this need some
hardware support. Once you have decent hardware support, and xx core
machines, you can scale as you need with traditional stacks, as long as
you fully understand cache issues and memory locality.

Most problems come if you want to use NICs with one queue and one cpu.
This just doesnt make sense in 2012, does it ?

Sure, all sort of tricks can be used to implement full stack in user
land and be fast. Just make sure a NIC can be efficiently shared by this
application and others as well.

We had the opposite (implement a web server in kernel) and it was
probably an interesting idea in its time, but in the long term, you can
see nobody uses this anymore.