From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id E4B683B2A4; Tue, 26 Oct 2021 00:24:03 -0400 (EDT) Received: by mail-pg1-x52b.google.com with SMTP id e65so12846733pgc.5; Mon, 25 Oct 2021 21:24:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=PKb7UU1QSgRHi5jYJKikFlYHRXkEGpsBiEsJ2fQnE30=; b=PL01X5O9yz8rAexDpkpxxpmYnwuiPzKDsNP180RoMDJ9bcNczxaWYls+UalMCIULQg H68azVecPYIxyTIvAROgyztXp+uJDpcDE5Pn8IbK6sOsLztxjQi4+Sg018iCgwzDC6M8 F+YS+97OTIgsrh5oFrXPWSuTnhzAorhvO0O3hebCqBW2h4fCPYOn1Qpo3JgCRK4qm3kY l1e+4tK5LexVwAMRIrJIUU4g+Jnrq25MeK/C3Kk9YNL5e0MhM3SFc476cMfFW9ZTsjL1 SzedyLvsAvxiLUuGxYvJxjzPI1paVwSf0dZrMIrtBeaHu036Irw+Y/bKVgijhPTlD2Uf 2ipA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=PKb7UU1QSgRHi5jYJKikFlYHRXkEGpsBiEsJ2fQnE30=; b=e+PWhcTxFtzLEW+yLp7Ss2df1npgn9oTLWC8ZGvboSr5AmM+p8sCygAHSwbCCVdOvs RN4HMpzEmL2ZVIJ/zQxVZruqwzL7obRE4Bjj3WvqCwupR7ObN1GPyJg5pPtJxDhjIXmk 8ICF2j6D+wk2aXYVrNCdC/bAIkxffmoQieQQ1tDx8RLICy3xU+6a2kyPO8sEYDCDeal/ IMslqkvW9z+Cv/XpsdJzxI5uJ764denQCpu2DOoVWq2NVk4epIBNcY3FWndVzRGEJFGh 4gXy+V8JxULTj1Kkm8E6X4ihaRQo7VwX445NMrI4aK4EINSOhvFLd+AFbfiurqR7OdDD uAkQ== X-Gm-Message-State: AOAM533hLjprr3ml2k5q8iDYk/CemcUY6UcOGX+hHrYoy0wVluAfzvOd OfVSrlbDty8C1yJmkW/vD90= X-Google-Smtp-Source: ABdhPJz/UOqdqlhGynStBjdV7Gt+qvrR0RNstSiDSHlpdvElGsLOKWc7608LOAVw0P758IoiIwbT6w== X-Received: by 2002:a62:e901:0:b0:47b:f1bc:55e4 with SMTP id j1-20020a62e901000000b0047bf1bc55e4mr9934159pfh.0.1635222242926; Mon, 25 Oct 2021 21:24:02 -0700 (PDT) Received: from [192.168.86.235] (c-73-241-150-58.hsd1.ca.comcast.net. [73.241.150.58]) by smtp.gmail.com with ESMTPSA id k14sm20577188pji.45.2021.10.25.21.24.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 25 Oct 2021 21:24:02 -0700 (PDT) To: Stuart Cheshire , Bob McMahon Cc: starlink@lists.bufferbloat.net, =?UTF-8?Q?Valdis_Kl=c4=93tnieks?= , Make-Wifi-fast , "David P. Reed" , Cake List , codel , cerowrt-devel , bloat , Steve Crocker , Vint Cerf References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> From: Eric Dumazet Message-ID: <0e29e225-9f55-4392-640a-2d27c4c26116@gmail.com> Date: Mon, 25 Oct 2021 21:24:00 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Codel] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Oct 2021 04:24:04 -0000 On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast wrote: > >> Hi All, >> >> Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. >> >> Does this seem reasonable? > > I’m not 100% sure what you’re asking, but I will try to help. > > When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that sendmsg() would actually stop feeding more bytes in TCP transmit queue if the current amount of unsent bytes was above the threshold. So it looks like Apple implementation is different, based on your description ? [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 netperf does not use epoll(), but rather a loop over sendmsg(). One of the point of TCP_NOTSENT_LOWAT for Google was to be able to considerably increase max number of bytes in transmit queues (3rd column of /proc/sys/net/ipv4/tcp_wmem) by 10x, allowing for autotune to increase BDP for big RTT flows, this without increasing memory needs for flows with small RTT. In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.) > > If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time. > > How to use TCP_NOTSENT_LOWAT is explained in this video: > > > > Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness. > > > > Stuart Cheshire > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat >