From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x242.google.com (mail-lj1-x242.google.com [IPv6:2a00:1450:4864:20::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 4F6243B29E for ; Wed, 8 Aug 2018 09:04:38 -0400 (EDT) Received: by mail-lj1-x242.google.com with SMTP id y17-v6so1650764ljy.8 for ; Wed, 08 Aug 2018 06:04:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=qEb/UO9gUp/ZVenh3TXmsBIekmlaKFKWd/PoXEZWsWs=; b=aEfMLCSmq7qPlMZnjVEKlhOvLNv2ReG9YUyFNg8RpnWMbNCQHQ2NttkUBJaSNeF3n7 YDRJP7dfwIov17uOTBRMO/eLUe45CXCvfNpjaBgTp8XwyQT92SrDBGms+KTfaXaTa38t o+oIUTSq20kddv5SmWsL5wj/YUpg2a6BYlpRh6yKt2AXgQUl3Jg7RBmHLE28uDlg3uVv cAR/bGzuqWFO8QrC3JpBBQ6nnWeyLiRKOJDDl0GDwbrBS4VnnvWjoJIGwlXwniE+qksA 66+tb2RMGkZzWn+doAEtlYwJ+5jMotJHNkx7byOTGpoiVuT1kJHQJXQprs6ql+S8cCNd XW+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=qEb/UO9gUp/ZVenh3TXmsBIekmlaKFKWd/PoXEZWsWs=; b=NPkfQcgXTTDjo1o8MnH7Vio6NDPhKugZuqoGUaUKXw+J1MSqxbRFjolVAnXPyldG0U QoV5RtZ+FdmBrW/97hXrbLMmMx9pppwDNOKjiuUalxvepjf9pyzXR3aPaR43UUU7CvnX g2mZc/XKt5ClFSp04iI0Hv4ht0QgI2MmgefH8MbFEer+d0mETwsUkZ3TXPqBCeRnBP9u 4Ey9tUx2J0nCMWREVG6SwWF+vadwAFC3Loe7XtxjW7YZdhLWPapMMSrFU0LhWkokQMy6 fuFGyIVJJIbKlsJ0/VUofofRz+cTbsXuh4/EzxGoLBRa+ireiXqy1Vq4KsK4Etsq+gob SgIA== X-Gm-Message-State: AOUpUlH0Vpani10BUKRsQSOxFhJnzZiFYtb3LRiEp14RsXORcEptfXQN p3CrKLQlMYPl9vLKZ1lNB0s= X-Google-Smtp-Source: AA+uWPyzTIacGMcWNhZV8FTw44N5LmgPp0PArvZoxBraJBV7bqDLW+m9tmeL6MXBLx9bQMbNUp8Odw== X-Received: by 2002:a2e:4055:: with SMTP id n82-v6mr1836479lja.99.1533733477179; Wed, 08 Aug 2018 06:04:37 -0700 (PDT) Received: from jonathartonsmbp.lan (83-245-238-47-nat-p.elisa-mobile.fi. [83.245.238.47]) by smtp.gmail.com with ESMTPSA id x11-v6sm820129lfi.8.2018.08.08.06.04.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Aug 2018 06:04:36 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) From: Jonathan Morton In-Reply-To: Date: Wed, 8 Aug 2018 16:04:34 +0300 Cc: Cake List Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Dave Taht X-Mailer: Apple Mail (2.3445.9.1) Subject: Re: [Cake] ebpf policing? X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2018 13:04:38 -0000 > On 7 Aug, 2018, at 3:12 am, Dave Taht wrote: >=20 >> Writing a modern policer in ebpf is feasible. it's got nsec >> timestamping, counters, threads, lots of potential parallelism. Loops >> that have to be carefully bound. The worst possible config api. A = nice >> statistics export system. >>=20 >> Classic policers use token buckets, but anyone up for codel, >> time_per_byte, framing awareness and deficits, and a little aqm? >=20 > :crickets: :) >=20 > I started taking a stab at writing a straight tc filter for this, > trying to learn enough about how the tc filter subsystem works > today - like, can you scribble on a packet? Can you keep local state? I actually thought about it myself, but while it's undoubtedly = *possible* to write a policer in eBPF, I don't see an overwhelmingly = good reason to actually do so. It still makes the most sense to do it = in C, for maximum performance, until the semantics are proved useful = enough to bake in hardware. There does still seem to be an awful lot of boilerplate in a netfilter = action, though. Makes it harder to tease out what is actually going on. > What's this rcu thing do? Read-Copy-Update is a scheme for efficient, lock-free concurrent access, = where most of the accesses are reads while changes are comparatively = rare. Basically there's a pointer to the real data, and the pointer can = be switched atomically after a complete new set of data is constructed, = then the old data can be deleted once all the read-only references to it = are released. It's a natural fit for configuration data, but would be a = bad choice for realtime statistics - better to use atomic_add et al for = that. It strikes me that the filter environment may differ from the qdisc = environment in one crucial matter: concurrency. A qdisc is always = called with a lock held, so concurrency with respect to itself is not a = factor, but maximum throughput is limited. If that is *not* true of a = filter action, then greater throughput should be feasible but the = programming techniques required will be more subtle. Does anyone know = for certain which it is? - Jonathan Morton