From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id E1C303CB35 for ; Thu, 7 Oct 2021 11:44:44 -0400 (EDT) Received: by mail-io1-xd36.google.com with SMTP id n71so7330255iod.0 for ; Thu, 07 Oct 2021 08:44:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=VzQ7Sp3FKY2BudJhnB3kexkZMWlywMFY4LnAZCsYrx8=; b=CY5kLvIWymzCEGziXgLbQYFheAe5yVSiTePSivBuyiBSPtqnwhDhP132kqYUg4mO01 4hl1FQn15fdQCMQltHYXmXgPZApJ0a++hh8yzardgz/L4gpM2Pmyg3MHwr3v7jwaXAsZ pAdUjlA6shpNbaZfSN9DeCoGDO4G3q/9BKlsQjtLaN4R6GgtBlFE/HH4r07mDypMY6LU 9V5P/f1mAO1vKn02l5/bGxpPZh+jMPfOLU9l6xiolC/uADCs9GOd24Wp0eTdWvLnBpVu QlTZrBuE3m1IKLYtFA0HJcpQcFFDXs6w40bUVM7BEOrRndnv8mrxuNnO3yDAxnw/j9rs O21w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=VzQ7Sp3FKY2BudJhnB3kexkZMWlywMFY4LnAZCsYrx8=; b=2qest276ov4wbQ/8xLJLJ7a8UC+VG9kPmxILri/EPbUmF3cMMaQKdZskXWtSqxjNkR Jwa0knw64DClAwO4C/4aflX4r0adYKStuQM9mnrAJsCGzNx/NYW1sjmZHzIMMQPPot1a 6v6IaqLLaC+pMUzLgKxxL20HSNdxHWanU2GyaCNhnRDgz0eo43FE/eORMbbRzqyTMLT/ JQ2Z5ptwTFotOaFtEKl3w6AGO1LtnGEBKoolDR4ulmmjflSxeKi+XJtfmu0ZyqiE4Jms lMZNL6Mj5JYeCWGwD5BneZMnK7PhSvfXzsJVFfEO+zw9PCZZmCtiYhjKuoeTDAswC2Rt leCQ== X-Gm-Message-State: AOAM533C05EJk2SAjhvg6zZp/QufN2ZzPRnuXKsLCzf1QB0Vi7KhTJDx AKnVxTpU6kLCc5QQcoKjpRFcqbajpaWlGACt1yY= X-Google-Smtp-Source: ABdhPJwhHqVPyyEnU0KA3fvmIYXb3+YJPTmWbR461pdBV4PPIjnJO+1uCk8ym7aw3FN/MSWrF7fIGNPMTCO7bjvwtQk= X-Received: by 2002:a6b:b242:: with SMTP id b63mr3520108iof.133.1633621484050; Thu, 07 Oct 2021 08:44:44 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dave Taht Date: Thu, 7 Oct 2021 08:44:30 -0700 Message-ID: To: Jonathan Morton Cc: Christoph Paasch , Rpm Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [Rpm] apple's fq_"codel" implementation X-BeenThere: rpm@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: revolutions per minute - a new metric for measuring responsiveness List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 15:44:45 -0000 On Thu, Oct 7, 2021 at 3:29 AM Jonathan Morton wrot= e: > > > On 7 Oct, 2021, at 3:11 am, Christoph Paasch wrote: > > > >>> On 7 Oct, 2021, at 12:22 am, Dave Taht via Rpm wrote: > >>> There are additional cases where, perhaps, the fq component works, an= d the aqm doesn't. > >> > >> Such as Apple's version of FQ-Codel? The source code is public, so we= might as well talk about it. > > > > Let's not just talk about it, but actually read it ;-) Since enough people have now actually read the code, and there are two students performing experiments, we can have this conversation. > >> There are two deviations I know about in the AQM portion of that. Fir= st is that they do the marking and/or dropping at the tail of the queue, no= t the head. Second is that the marking/dropping frequency is fixed, instea= d of increasing during a continuous period of congestion as real Codel does= . > > > > We don't drop/mark locally generated traffic (which is the use-case we = care abhout). "We", who? :) It's unclear as to what happens in the case of virtualization. It's unclear what happens with UDP flows. It's unclear what happens with tunneled flows (userspace vpn) It's unclear what happens with sockets, rather than the apple APIs. What I observed - exercising sockets (using 16 netperf, 4 irtt, osx as the target) - was a sharp spike in the "drop_overload" statistic, and tcp rsts in the captures, and that inspired me to inspect the code to see what was hit, and to be a mite :deleted: at what I thought were two essential components of the codel aqm not being there. at the time I had WAY more other sources of error in my network setup than I'd cared for and got pulled into something else before being able to qualm my uncertainties here. > > We signal flow-control straight back to the TCP-stack at which point th= e queue > > is entirely drained before TCP starts transmitting again. This is rather bursty. The 1/count reduction in the drop scheduler (or in this case the "pushback scheduler"), should gradually reduce the needed local buffering in the queue to 5ms (or in the case of apple, 10ms), and compensate for the natural variabity of wifi and lte better. I'd have to go read the code again to remember what the drop_overlimit behavior was. I had thought that dropping cnt-1 rather than "entirely" made more sense. Anyway there were many, many other variables in play - a queue size of 300, 2000, or more, the presense off offloads, no BQL, testing how usb-c-ethernet worked - > > So, drop-frequency really doesn't matter because there is no drop. It "should" be cutting the cwnd until the queue also is under control. Without doing that, it will just fill up immediately again, with the wrong rtt estimate. > > Hmm, that would be more reasonable behaviour for a machine that never has= to forward anything - but that is not at all obvious from the source code = I found. I think I'll need to run tests to see what actually happens in pr= actice. Please!!! I did feel it was potentially a big bug, with some easy fixes, needing only more eyeballs and time to diagnose, or at least describe. > > - Jonathan Morton --=20 Fixing Starlink's Latencies: https://www.youtube.com/watch?v=3Dc9gLo6Xrwgw Dave T=C3=A4ht CEO, TekLibre, LLC