From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com
 [IPv6:2a00:1450:4864:20::42d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 51C873B29D
 for <libreqos@lists.bufferbloat.net>; Sat, 22 Oct 2022 19:04:47 -0400 (EDT)
Received: by mail-wr1-x42d.google.com with SMTP id a3so10328149wrt.0
 for <libreqos@lists.bufferbloat.net>; Sat, 22 Oct 2022 16:04:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=DSJLX0uKBqPqfWFSs3vJYyF3QNZHv8+tiXJrve77aho=;
 b=fTmIK7h4K8Nqf3uN7S+g6A5izKvTK80mx1Z15Qloo40mvStvxpkwXpdrBsy3G7Az4D
 7VpovPpxiT/ksoiz5vVTUOQWB/CQnXblbpWpeGTse9kPvHfSkYF/DIiF9F2bo47vhass
 IaDFaSdcR1VGinEEhEZgiTTTX1S1C+Z9ZyMY7xnPAu3cIxOR4jAAWAPzDeY5kIplssSq
 /IgxJpCMKWLqJztDoYYy/sGyycI9acTjgc0dqZ8+1V6VNh4zJ8YCyPzu7VGyH/5d5uQI
 UA3Qhe0ekOwwNMaoCwwFcXWxu7vg0fMzYPy7sK3q7fFunfEhhxx/Cdde+9iAne04Bmu2
 IzIw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=DSJLX0uKBqPqfWFSs3vJYyF3QNZHv8+tiXJrve77aho=;
 b=D4rFCrMXQAaMhjpdrGKrryDZVm/SYlHnQZtD6Lu/slRi4/b4Vfckq5lIR5cDME55D5
 r9+PHOdMm9BYa+B1if7BcFQZgxwwHpy9jco+zeQq7d8AoFTrlZUqZa8wVcOBAIM25txo
 KCoM8g1aDA6J6EqCfORVjOls/d1VB1KA0NUBVcZzLUHykdjPinHt7HhU/L3c6Htu9U3m
 v7ZXq388I7BLhqfvl0Kj0zYnZcSbDck0Z3+FhulaMlmgnQnApx+B6t1Pigb1bDnqr3Yp
 gNatrTNXP3qYz/6hBWTXfMfXchNlevavAU+DuexS76kmS1Buz7DSL6CoGiPU7GNMPmgO
 TqWg==
X-Gm-Message-State: ACrzQf33H5SThdNVMcIcYT3ze72fcVeTfdU3jvjK9MmfYMHzkvaV5/ZN
 r6xuWtLw22FYq5+wRJ5wQDnf9m0gRGRakWQjJZ224c5OKoo=
X-Google-Smtp-Source: AMsMyM7FxUT6YOdpSr4d8SmIOuGVqOc3B9vOtNC35h3L7ETAT9DvBaPOUOkboCXC1ZJUcYfTHf7LSwpcRvrvE8VacsY=
X-Received: by 2002:adf:f2c9:0:b0:236:6968:3f6 with SMTP id
 d9-20020adff2c9000000b00236696803f6mr613439wrp.383.1666479886046; Sat, 22 Oct
 2022 16:04:46 -0700 (PDT)
MIME-Version: 1.0
References: <c4a1d4ff-82eb-82c9-619e-37c18b41a017@tarent.de>
 <44a7e82b-0fe9-d6ba-ee12-02dfa4980966@gmail.com>
 <a896ad54-297b-c55e-1d34-14ab26949ab6@tarent.de>
 <2b195a93-a88b-33c2-661a-85fa8513c063@gmail.com>
 <9c1bb95b-3933-2b33-b8c6-ddefc8459afa@tarent.de>
In-Reply-To: <9c1bb95b-3933-2b33-b8c6-ddefc8459afa@tarent.de>
From: Dave Taht <dave.taht@gmail.com>
Date: Sat, 22 Oct 2022 16:04:32 -0700
Message-ID: <CAA93jw5r4bayK-PLSBfaqNxhaNoZcN9x2800YRjbpqhg04r=Vw@mail.gmail.com>
To: Thorsten Glaser <t.glaser@tarent.de>, libreqos@lists.bufferbloat.net
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [LibreQoS] qdisc_watchdog_schedule_range_ns granularity
X-BeenThere: libreqos@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Many ISPs need the kinds of quality shaping cake can do
 <libreqos.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/libreqos>,
 <mailto:libreqos-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/libreqos>
List-Post: <mailto:libreqos@lists.bufferbloat.net>
List-Help: <mailto:libreqos-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/libreqos>,
 <mailto:libreqos-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sat, 22 Oct 2022 23:04:47 -0000

Dear Thorsten:

I'm rather interested in your work because I don't understand how well
cake's shaper interacts with all the other loads,
neither, htb. We're hacking on this, running at speeds (10gbps) that
are hard to measure at... with tools that don't exist yet,
with kernel dependencies we don't understand.

https://github.com/rchac/LibreQoS

How are you measuring, below? ebpf?


On Sat, Oct 22, 2022 at 3:33 PM Thorsten Glaser <t.glaser@tarent.de> wrote:
>
> On Wed, 12 Oct 2022, Eric Dumazet wrote:
>
> > net/sched/sch_fq.c is not using the skb tstamp which could very well be=
 in the
> > past,
>
> For the enqueue timestamp I now save u64 ktime_get_ns() in the per-skb
> extra data, so I have a reliable one.
>
> > > > I don't know how you measure this latency, but net/sched/sch_fq.c h=
as
>
> I=E2=80=99ve now added some measurements. I use the qdisc watchdog in two
> scenarios:
>
> =E2=80=A2 extralatency, and the enqueue timestamp is > now (ktime_get_ns(=
))
> =E2=80=A2 independent of extralatency, if the time from the previous pack=
age
>   (size * ns_per_byte) is not yet elapsed
>
> The latter was not so problematic, because, even if I=E2=80=99m called =
=E2=80=9Ctoo
> late=E2=80=9D by the qdisc watchdog, I measure the time for the _next_ pa=
cket
> since the time from the previous packet, not now, unless there was
> no previous packet. These are also enqueued with an 1ms tolerance.
>
> The former is problematic as it relies on being called somewhat
> precisely once now has reached the enqueue timestamp of the head
> of the FIFO.
>
> In my measurement, I report detailled information on whenever my
> dequeue function is called (which of the two watchdog causes it
> was, if any, and the difference from the time I was supposed to
> be called to now), but I also count them into five buckets:
>
> =E2=80=A2 called too early (perhaps by the regular softirq instead of
>   the watchdog)
> =E2=80=A2 called within 50 us of the passed time
> =E2=80=A2 called within 1 ms of the passed time
> =E2=80=A2 called within 4 ms of the passed time (< HZ)
> =E2=80=A2 called at 4 ms or more (>=3D HZ)
>
> This categorisation is =E2=80=9Cquick=E2=80=9D i.e. first-match.
>
> In an experimental run of about 66 seconds of transmission time,
> running iperf at 400kbps on a 512kbps queue, I get:
>
> =E2=80=A2 2257 calls too early
> =E2=80=A2  444 calls within 50 us
> =E2=80=A2 1702 calls within 1 ms
> =E2=80=A2 2460 calls within 4 ms
> =E2=80=A2   62 calls at/above 4 ms
>
> Note that every =E2=80=9Ctoo early=E2=80=9D call would also result in ano=
ther
> qdisc_watchdog_schedule_range_ns() call.
>
> I used three separate machines on two separate test networks for this
> (sender, qdisc, recipient); the test network is otherwise idle save
> for background traffic (ARP, neighbour discovery, maybe Avahi).
>
> > > > Under high cpu pressure, it is possible the softirq is delayed,
> > > > because ksoftirqd might compete with user threads.
>
> The utilisation of all three CPUs assigned to the VM is minimal,
> around 1%, throughout the run, so I didn=E2=80=99t try that.
>
> =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
>
> See commit 2a61f1ea843dc767d291074eee9b2f1b8d3992a7 in
> git@github.com:tarent/sch_jens.git branch master for the
> added code and the schedule calls.
>
> =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
>
> Another run: 60s 15Mbit/s traffic on a 20Mbit/s queue.
>
> Before:
> $ cat /proc/softirqs
>                     CPU0       CPU1       CPU2
>           HI:          3          1          0
>        TIMER:    1880381    1871169    2737200
>       NET_TX:     498587          4         13
>       NET_RX:   16918041  110157910      90553
>        BLOCK:      90302      94916     141117
>     IRQ_POLL:          0          0          0
>      TASKLET:      23122       5419        252
>        SCHED:    2707165    2799111    3584005
>      HRTIMER:          2         45          2
>          RCU:    2225711    2200451    2082194
>
> After:
> $ cat /proc/softirqs
>                     CPU0       CPU1       CPU2
>           HI:          3          1          0
>        TIMER:    1880690    1871863    2737354
>       NET_TX:     516224          4         15
>       NET_RX:   17109410  110178118      90563
>        BLOCK:      90314      94918     141118
>     IRQ_POLL:          0          0          0
>      TASKLET:      31992       9778       1270
>        SCHED:    2708137    2800826    3585279
>      HRTIMER:          2         45          2
>          RCU:    2225787    2200548    2082297
>
> So 17639 more NET_TX (virtually all on CPU0, interestingly enough,
> even though irqbalance is running).
>
> The numbers:
>
> =E2=80=A2 17264 (or 82800 or 148336=E2=80=A6 this is an u16) early
> =E2=80=A2 11101 within 50 us
> =E2=80=A2 68129 within 1 ms
> =E2=80=A2 1291 within 4 ms
> =E2=80=A2 103 at/above 4 ms
>
> I=E2=80=99m =E2=80=9Conly=E2=80=9D seeing delays of 5+ ms though, not 15=
=E2=80=9320 as my coworker.
>
> That being said, they reconfigure the qdisc quite a high amount of
> times per second to simulate changing environment conditions (i.e.
> changing rate). Perhaps the sheer amount of tc change commands has
> some impact? Let=E2=80=99s retry with changing the rate 50 times a second=
=E2=80=A6
>
> CPU usage about 60% for each core, though my shell script likely
> uses more than the real solution. top alone uses 20% of one CPU=E2=80=A6
>
> I see 40 ms latency in one packet.
>
> =E2=80=A2 871 (possibly plus n*65536) early
> =E2=80=A2 2366 within 50 us
> =E2=80=A2 16041 within 1 ms
> =E2=80=A2 5186 within 4 ms
> =E2=80=A2 1462 at/above 4 ms
>
> This definitely shifts things.
>
> With reniced ksoftirqd/*:
>
> =E2=80=A2 60798 (possibly plus n*65536) early
> =E2=80=A2 2902 within 50 us
> =E2=80=A2 21532 within 1 ms
> =E2=80=A2 1735 within 4 ms
> =E2=80=A2 1495 at/above 4 ms
>
> So, it=E2=80=A6 helps, but not too much.
>
> Does the qdisc locking during reconfiguration impact things?
>
> One more thing I noticed is that [rcu_sched] uses a lot of CPU
> in the latter runs. Is that also related to the amount of tc(8)
> processes created?
>
> With tc -b - and writing to a ksh coprocess, both rcu_sched and
> entire system load go down slightly.
>
> =E2=80=A2 43564 (possibly plus n*65536) early
> =E2=80=A2 4908 within 50 us
> =E2=80=A2 41450 within 1 ms
> =E2=80=A2 5406 within 4 ms
> =E2=80=A2 899 at/above 4 ms
>
> It also helps but I still saw a packet with 42 ms queue delay=E2=80=A6
>
> Should I look into writing my own reconfiguration channel that
> does not use tc, to adjust the runtime parameters frequently?
> (Probably only rate; handover is infrequent enough one could
> just use tc, and extralatency should be stable across one run.)
>
> What technology would I use for that? Something like relayfs
> but the other way round. For the rate, I basically would only
> have to atomically swap one 64-bit quantity, or would I need
> locking? (The harder part would be to find the address of that
> quantity=E2=80=A6)
>
> bye,
> //mirabilos
> --
> Infrastrukturexperte =E2=80=A2 tarent solutions GmbH
> Am Dickobskreuz 10, D-53121 Bonn =E2=80=A2 http://www.tarent.de/
> Telephon +49 228 54881-393 =E2=80=A2 Fax: +49 228 54881-235
> HRB AG Bonn 5168 =E2=80=A2 USt-ID (VAT): DE122264941
> Gesch=C3=A4ftsf=C3=BChrer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, A=
lexander Steeg
>
>                         *************************************************=
***
> /=E2=81=80\ The UTF-8 Ribbon
> =E2=95=B2 =E2=95=B1 Campaign against      Mit dem tarent-Newsletter nicht=
s mehr verpassen:
>  =E2=95=B3  HTML eMail! Also,     https://www.tarent.de/newsletter
> =E2=95=B1 =E2=95=B2 header encryption!
>                         *************************************************=
***


--=20
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-69813666656=
07352320-FXtz
Dave T=C3=A4ht CEO, TekLibre, LLC