* [Bloat] SO_SNDBUF and SO_RCVBUF
@ 2015-04-22 19:10 Hal Murray
2015-04-22 19:26 ` Rick Jones
2015-04-22 19:28 ` Dave Taht
0 siblings, 2 replies; 19+ messages in thread
From: Hal Murray @ 2015-04-22 19:10 UTC (permalink / raw)
To: bloat; +Cc: Hal Murray
> As I understand it (I thought) SO_SNDBUF and SO_RCVBUF are socket buffers
> for the application layer, they do not change the TCP window size either
> send or receive. Which is perhaps why they aren't used much. They don't do
> much good in iperf that's for sure! Might be wrong, but I agree with the
> premise - auto-tuning should work.
I sure expect them to do the obvious thing.
man 7 socket says:
SO_SNDBUF
Sets or gets the maximum socket send buffer in bytes.
It doesn't actually say that turns into the TCP window size.
On Linux, there is a factor of 2 for overhead and whatever.
man tcp says:
TCP uses the extra space for administrative purposes and inter-
nal kernel structures, and the /proc file values reflect the larger
sizes compared to the actual TCP windows.
So it looks like the number you feed it turns into the window size.
A few quick tests with netperf confirm that it is doing something close to
what I expect but I haven't fired up tcpdump to verify that the window size
is what I asked for. netperf does print out values that are 2x what I asked
for.
Yuck. (That's Yuck at Linux, not netperf.)
--
These are my opinions. I hate spam.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 19:10 [Bloat] SO_SNDBUF and SO_RCVBUF Hal Murray
@ 2015-04-22 19:26 ` Rick Jones
2015-04-22 19:28 ` Dave Taht
1 sibling, 0 replies; 19+ messages in thread
From: Rick Jones @ 2015-04-22 19:26 UTC (permalink / raw)
To: Hal Murray, bloat
> So it looks like the number you feed it turns into the window size.
>
> A few quick tests with netperf confirm that it is doing something close to
> what I expect but I haven't fired up tcpdump to verify that the window size
> is what I asked for. netperf does print out values that are 2x what I asked
> for.
It will do that until you start asking for (2x?) more than
net.core.[rw]mem_max. At that point they will be clipped.
> Yuck. (That's Yuck at Linux, not netperf.)
No worries :)
That bit of behaviour frustrated me and my BSD-stack upbringing for
years, along with the auto-tuning. Finally I ended-up adding support
for reporting three different socket buffer values via the "omni" output
selectors:
1) the size requested by the user via the command line
2) the size initially after the data socket was created (and any
setsockopts triggered by the command line)
3) the size at the end of the test
The classic netperf tests have always reported 2. If you use the omni
tests directly (-t omni) they will report 3. You can always use the
test-specific -o, -O or -k option to emit each specifically. For both
SO_[SND|RCV]BUF locally and remote:
raj@tardy:~/netperf2_trunk$ netperf -- -O \? | grep SIZE
LSS_SIZE_REQ
LSS_SIZE
LSS_SIZE_END
LSR_SIZE_REQ
LSR_SIZE
LSR_SIZE_END
RSS_SIZE_REQ
RSS_SIZE
RSS_SIZE_END
RSR_SIZE_REQ
RSR_SIZE
RSR_SIZE_END
...
happy benchmarking,
rick jones
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 19:10 [Bloat] SO_SNDBUF and SO_RCVBUF Hal Murray
2015-04-22 19:26 ` Rick Jones
@ 2015-04-22 19:28 ` Dave Taht
2015-04-22 21:02 ` Eric Dumazet
1 sibling, 1 reply; 19+ messages in thread
From: Dave Taht @ 2015-04-22 19:28 UTC (permalink / raw)
To: Hal Murray; +Cc: bloat
SO_SNDLOWAT or something similar to it with a name I cannot recall,
can be useful.
On Wed, Apr 22, 2015 at 12:10 PM, Hal Murray <hmurray@megapathdsl.net> wrote:
>
>> As I understand it (I thought) SO_SNDBUF and SO_RCVBUF are socket buffers
>> for the application layer, they do not change the TCP window size either
>> send or receive. Which is perhaps why they aren't used much. They don't do
>> much good in iperf that's for sure! Might be wrong, but I agree with the
>> premise - auto-tuning should work.
>
> I sure expect them to do the obvious thing.
>
> man 7 socket says:
>
> SO_SNDBUF
> Sets or gets the maximum socket send buffer in bytes.
>
> It doesn't actually say that turns into the TCP window size.
>
> On Linux, there is a factor of 2 for overhead and whatever.
>
> man tcp says:
> TCP uses the extra space for administrative purposes and inter-
> nal kernel structures, and the /proc file values reflect the larger
> sizes compared to the actual TCP windows.
>
> So it looks like the number you feed it turns into the window size.
>
> A few quick tests with netperf confirm that it is doing something close to
> what I expect but I haven't fired up tcpdump to verify that the window size
> is what I asked for. netperf does print out values that are 2x what I asked
> for.
>
> Yuck. (That's Yuck at Linux, not netperf.)
>
>
> --
> These are my opinions. I hate spam.
>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
Open Networking needs **Open Source Hardware**
https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 19:28 ` Dave Taht
@ 2015-04-22 21:02 ` Eric Dumazet
2015-04-22 21:05 ` Rick Jones
2015-04-22 21:07 ` Steinar H. Gunderson
0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-22 21:02 UTC (permalink / raw)
To: Dave Taht; +Cc: Hal Murray, bloat
Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
On Wed, 2015-04-22 at 12:28 -0700, Dave Taht wrote:
> SO_SNDLOWAT or something similar to it with a name I cannot recall,
> can be useful.
>
> On Wed, Apr 22, 2015 at 12:10 PM, Hal Murray <hmurray@megapathdsl.net> wrote:
> >
> >> As I understand it (I thought) SO_SNDBUF and SO_RCVBUF are socket buffers
> >> for the application layer, they do not change the TCP window size either
> >> send or receive. Which is perhaps why they aren't used much. They don't do
> >> much good in iperf that's for sure! Might be wrong, but I agree with the
> >> premise - auto-tuning should work.
> >
> > I sure expect them to do the obvious thing.
> >
> > man 7 socket says:
> >
> > SO_SNDBUF
> > Sets or gets the maximum socket send buffer in bytes.
> >
> > It doesn't actually say that turns into the TCP window size.
> >
> > On Linux, there is a factor of 2 for overhead and whatever.
> >
> > man tcp says:
> > TCP uses the extra space for administrative purposes and inter-
> > nal kernel structures, and the /proc file values reflect the larger
> > sizes compared to the actual TCP windows.
> >
> > So it looks like the number you feed it turns into the window size.
> >
> > A few quick tests with netperf confirm that it is doing something close to
> > what I expect but I haven't fired up tcpdump to verify that the window size
> > is what I asked for. netperf does print out values that are 2x what I asked
> > for.
> >
> > Yuck. (That's Yuck at Linux, not netperf.)
> >
> >
> > --
> > These are my opinions. I hate spam.
> >
> >
> >
> > _______________________________________________
> > Bloat mailing list
> > Bloat@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 21:02 ` Eric Dumazet
@ 2015-04-22 21:05 ` Rick Jones
2015-04-22 21:46 ` Eric Dumazet
2015-04-24 4:37 ` Dave Taht
2015-04-22 21:07 ` Steinar H. Gunderson
1 sibling, 2 replies; 19+ messages in thread
From: Rick Jones @ 2015-04-22 21:05 UTC (permalink / raw)
To: Eric Dumazet, Dave Taht; +Cc: Hal Murray, bloat
On 04/22/2015 02:02 PM, Eric Dumazet wrote:
> Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
Don't go telling Dave about that, he wants me to put too much into
netperf as it is!-)
rick
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 21:02 ` Eric Dumazet
2015-04-22 21:05 ` Rick Jones
@ 2015-04-22 21:07 ` Steinar H. Gunderson
2015-04-22 21:42 ` Eric Dumazet
1 sibling, 1 reply; 19+ messages in thread
From: Steinar H. Gunderson @ 2015-04-22 21:07 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Hal Murray, bloat
On Wed, Apr 22, 2015 at 02:02:32PM -0700, Eric Dumazet wrote:
> Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
But this is only for when your data could change underway, right?
Like, not relevant for sending one big file, but might be relevant for e.g.
VNC (or someone mentioned the usecase of HTTP/2, where a high-priority
request might come in, which you don't want buried behind a megabyte of
stuff in the send queue).
/* Steinar */
--
Homepage: http://www.sesse.net/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 21:07 ` Steinar H. Gunderson
@ 2015-04-22 21:42 ` Eric Dumazet
2015-04-22 21:47 ` Dave Taht
2015-04-22 22:11 ` Steinar H. Gunderson
0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-22 21:42 UTC (permalink / raw)
To: Steinar H. Gunderson; +Cc: Hal Murray, bloat
On Wed, 2015-04-22 at 23:07 +0200, Steinar H. Gunderson wrote:
> On Wed, Apr 22, 2015 at 02:02:32PM -0700, Eric Dumazet wrote:
> > Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
>
> But this is only for when your data could change underway, right?
> Like, not relevant for sending one big file, but might be relevant for e.g.
> VNC (or someone mentioned the usecase of HTTP/2, where a high-priority
> request might come in, which you don't want buried behind a megabyte of
> stuff in the send queue).
Sorry, I do not understand you.
The nice thing about TCP_NOTSENT_LOWAT is that you no longer have to
care about choosing the 'right SO_SNDBUF'
It is still CC responsibility to choose/set cwnd, but you hadn't set an
artificial cap on cwnd.
You control the amount of 'unsent data' per socket.
If you set a low limit, application might have to issue more send()
calls and get more EPOLLOUT events.
This also means that if you get an abort / eof, you no longer have a
huge unsent queue that TCP API does not allow to cancel.
https://insouciant.org/tech/prioritization-only-works-when-theres-pending-data-to-prioritize/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 21:05 ` Rick Jones
@ 2015-04-22 21:46 ` Eric Dumazet
2015-04-22 22:20 ` Simon Barber
2015-04-24 4:37 ` Dave Taht
1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2015-04-22 21:46 UTC (permalink / raw)
To: Rick Jones; +Cc: Hal Murray, bloat
On Wed, 2015-04-22 at 14:05 -0700, Rick Jones wrote:
> On 04/22/2015 02:02 PM, Eric Dumazet wrote:
> > Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
>
> Don't go telling Dave about that, he wants me to put too much into
> netperf as it is!-)
Note that one can also set a sysctl, in case netperf author does not
want to change its baby ;)
echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 21:42 ` Eric Dumazet
@ 2015-04-22 21:47 ` Dave Taht
2015-04-22 22:11 ` Steinar H. Gunderson
1 sibling, 0 replies; 19+ messages in thread
From: Dave Taht @ 2015-04-22 21:47 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Hal Murray, bloat
On Wed, Apr 22, 2015 at 2:42 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-04-22 at 23:07 +0200, Steinar H. Gunderson wrote:
>> On Wed, Apr 22, 2015 at 02:02:32PM -0700, Eric Dumazet wrote:
>> > Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
>>
>> But this is only for when your data could change underway, right?
>> Like, not relevant for sending one big file, but might be relevant for e.g.
>> VNC (or someone mentioned the usecase of HTTP/2, where a high-priority
>> request might come in, which you don't want buried behind a megabyte of
>> stuff in the send queue).
>
> Sorry, I do not understand you.
>
> The nice thing about TCP_NOTSENT_LOWAT is that you no longer have to
> care about choosing the 'right SO_SNDBUF'
Stuart cheshire has a very nice youtube video due out soon on this option.
He demonstrates the enormous difference it made in a screen sharing
application...
... and there are many libs and toolkits (like X11, userspace tcp
vpns, etc) that
could use it. It should be going into every tcp app that might congest AND can
do more intelligent things when congested.
It looks useful in web browsers also.
I have no idea to what extent this socket option has been picked up by
the marketplace/open source world.
>
> It is still CC responsibility to choose/set cwnd, but you hadn't set an
> artificial cap on cwnd.
>
> You control the amount of 'unsent data' per socket.
>
> If you set a low limit, application might have to issue more send()
> calls and get more EPOLLOUT events.
>
> This also means that if you get an abort / eof, you no longer have a
> huge unsent queue that TCP API does not allow to cancel.
>
> https://insouciant.org/tech/prioritization-only-works-when-theres-pending-data-to-prioritize/
>
>
--
Dave Täht
Open Networking needs **Open Source Hardware**
https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 21:42 ` Eric Dumazet
2015-04-22 21:47 ` Dave Taht
@ 2015-04-22 22:11 ` Steinar H. Gunderson
1 sibling, 0 replies; 19+ messages in thread
From: Steinar H. Gunderson @ 2015-04-22 22:11 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Hal Murray, bloat
On Wed, Apr 22, 2015 at 02:42:59PM -0700, Eric Dumazet wrote:
> Sorry, I do not understand you.
>
> The nice thing about TCP_NOTSENT_LOWAT is that you no longer have to
> care about choosing the 'right SO_SNDBUF'
>
> It is still CC responsibility to choose/set cwnd, but you hadn't set an
> artificial cap on cwnd.
>
> You control the amount of 'unsent data' per socket.
>
> If you set a low limit, application might have to issue more send()
> calls and get more EPOLLOUT events.
>
> This also means that if you get an abort / eof, you no longer have a
> huge unsent queue that TCP API does not allow to cancel.
>
> https://insouciant.org/tech/prioritization-only-works-when-theres-pending-data-to-prioritize/
So this URL is basically what I said; you need to have data to prioritize
between for this to be useful. If you just want to send a simple file
(and aborts in HTTP/1.1 basically don't really exist), it doesn't really
matter if you have a huge backlog or not.
So I'm sure it's useful for HTTP/2 or SPDY, but that's already pretty
advanced functionality.
/* Steinar */
--
Homepage: http://www.sesse.net/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 21:46 ` Eric Dumazet
@ 2015-04-22 22:20 ` Simon Barber
2015-04-22 23:08 ` Eric Dumazet
0 siblings, 1 reply; 19+ messages in thread
From: Simon Barber @ 2015-04-22 22:20 UTC (permalink / raw)
To: Eric Dumazet, Rick Jones; +Cc: Hal Murray, bloat
Wouldn't the LOWAT setting be much easier for applications to use if it was
set in estimated time (ie time it will take to deliver the data) rather
than bytes?
Simon
Sent with AquaMail for Android
http://www.aqua-mail.com
On April 22, 2015 2:47:34 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-04-22 at 14:05 -0700, Rick Jones wrote:
> > On 04/22/2015 02:02 PM, Eric Dumazet wrote:
> > > Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
> >
> > Don't go telling Dave about that, he wants me to put too much into
> > netperf as it is!-)
>
> Note that one can also set a sysctl, in case netperf author does not
> want to change its baby ;)
>
> echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 22:20 ` Simon Barber
@ 2015-04-22 23:08 ` Eric Dumazet
0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-22 23:08 UTC (permalink / raw)
To: Simon Barber; +Cc: Hal Murray, bloat
On Wed, 2015-04-22 at 15:20 -0700, Simon Barber wrote:
> Wouldn't the LOWAT setting be much easier for applications to use if it was
> set in estimated time (ie time it will take to deliver the data) rather
> than bytes?
Sure, but you have all the info to infer one from the other.
Note also TCP stack has immediate notion of bytes, while adding time
delays immediately impose a possibly expensive time acquisition in high
precision.
# git show c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 | diffstat -p1 -w70
Documentation/networking/ip-sysctl.txt | 13 +++++++++++++
include/linux/tcp.h | 1 +
include/net/sock.h | 19 +++++++++++++------
include/net/tcp.h | 14 ++++++++++++++
include/uapi/linux/tcp.h | 1 +
net/ipv4/sysctl_net_ipv4.c | 7 +++++++
net/ipv4/tcp.c | 7 +++++++
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/tcp_output.c | 3 +++
net/ipv6/tcp_ipv6.c | 1 +
10 files changed, 61 insertions(+), 6 deletions(-)
A time based implementation would be way more complex/expensive.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-22 21:05 ` Rick Jones
2015-04-22 21:46 ` Eric Dumazet
@ 2015-04-24 4:37 ` Dave Taht
2015-04-24 4:40 ` Dave Taht
2015-04-24 5:23 ` Eric Dumazet
1 sibling, 2 replies; 19+ messages in thread
From: Dave Taht @ 2015-04-24 4:37 UTC (permalink / raw)
To: Rick Jones; +Cc: Hal Murray, bloat
On Wed, Apr 22, 2015 at 2:05 PM, Rick Jones <rick.jones2@hp.com> wrote:
> On 04/22/2015 02:02 PM, Eric Dumazet wrote:
>>
>> Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
I will argue that that would give a much better estimate as to how
much data was really outstanding on the wire.
>
> Don't go telling Dave about that, he wants me to put too much into netperf
> as it is!-)
Please release 2.7 soonest so we can get it into openwrt chaos calmer.
Then we can discuss new features, like the UDPLITE stuff and this. :)
That said, I would like try a comparison test against rrul results
taken with and without the tcp_notsent_lowat option in netperf when I
get back from vacation next week. My guess is that it won't affect the
stats we get currently much (might hurt as netperf will run more
often), but MIGHT reduce the tail burst problem we see fairly often at
the conclusion of the test
>
> rick
>
--
Dave Täht
Open Networking needs **Open Source Hardware**
https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-24 4:37 ` Dave Taht
@ 2015-04-24 4:40 ` Dave Taht
2015-04-24 13:50 ` Eric Dumazet
2015-04-24 5:23 ` Eric Dumazet
1 sibling, 1 reply; 19+ messages in thread
From: Dave Taht @ 2015-04-24 4:40 UTC (permalink / raw)
To: Rick Jones; +Cc: Hal Murray, bloat
and of course, after writing the previous email, I go reading the
original commit for this option. Yea, that is a huge increase in
context switches...
https://lwn.net/Articles/560082/
... but totally worth it for many apps that can do something else
while their connection congests, and totally awesome for tcp vpns,
x11, screen sharers, etc....
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-24 4:37 ` Dave Taht
2015-04-24 4:40 ` Dave Taht
@ 2015-04-24 5:23 ` Eric Dumazet
1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-24 5:23 UTC (permalink / raw)
To: Dave Taht; +Cc: Hal Murray, bloat
On Thu, 2015-04-23 at 21:37 -0700, Dave Taht wrote:
> On Wed, Apr 22, 2015 at 2:05 PM, Rick Jones <rick.jones2@hp.com> wrote:
> > On 04/22/2015 02:02 PM, Eric Dumazet wrote:
> >>
> >> Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
>
> I will argue that that would give a much better estimate as to how
> much data was really outstanding on the wire.
This would be the responsibility of the CC.
Each CC has its own ways to be controlled. vegas & cubic have different
knobs.
TCP_NOTSENT_LOWAT controls the number of unsent bytes. This is generic.
You do not want to add a 'knob' that would lock all CC to a given
behavior : It is already there with pacing.
If you know the rtt, then with pacing you also can limit XXX bytes on
the wire.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-24 4:40 ` Dave Taht
@ 2015-04-24 13:50 ` Eric Dumazet
2015-04-24 14:34 ` Dave Taht
2015-04-24 16:31 ` Rick Jones
0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-24 13:50 UTC (permalink / raw)
To: Dave Taht; +Cc: Hal Murray, bloat
On Thu, 2015-04-23 at 21:40 -0700, Dave Taht wrote:
> and of course, after writing the previous email, I go reading the
> original commit for this option. Yea, that is a huge increase in
> context switches...
>
> https://lwn.net/Articles/560082/
>
> ... but totally worth it for many apps that can do something else
> while their connection congests, and totally awesome for tcp vpns,
> x11, screen sharers, etc....
It all depends on how many bytes are pushed by the application per
sendmsg()
To keep the amount of unsent bytes low, the application should not issue
a large write, but it still can if it needs to for whatever reason.
netperf -t TCP_STREAM" uses a default size of 16384 bytes per sendmsg.
So obviously, if a wakeup is needed per sendmsg(), number of context
switches is exactly bandwidth_in_bytes_per_second / 16384
Normally, without this TCP_NOTSENT_LOWAT option, number of wakeups is
more like bandwidth_in_bytes_per_second / SO_SNDBUF, because kernel
wakes up the blocked task when output buffers size occupancy reached 50%
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-24 13:50 ` Eric Dumazet
@ 2015-04-24 14:34 ` Dave Taht
2015-04-24 16:31 ` Rick Jones
1 sibling, 0 replies; 19+ messages in thread
From: Dave Taht @ 2015-04-24 14:34 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Hal Murray, bloat
On Fri, Apr 24, 2015 at 6:50 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2015-04-23 at 21:40 -0700, Dave Taht wrote:
>> and of course, after writing the previous email, I go reading the
>> original commit for this option. Yea, that is a huge increase in
>> context switches...
>>
>> https://lwn.net/Articles/560082/
>>
>> ... but totally worth it for many apps that can do something else
>> while their connection congests, and totally awesome for tcp vpns,
>> x11, screen sharers, etc....
>
> It all depends on how many bytes are pushed by the application per
> sendmsg()
>
> To keep the amount of unsent bytes low, the application should not issue
> a large write, but it still can if it needs to for whatever reason.
>
> netperf -t TCP_STREAM" uses a default size of 16384 bytes per sendmsg.
>
> So obviously, if a wakeup is needed per sendmsg(), number of context
> switches is exactly bandwidth_in_bytes_per_second / 16384
>
> Normally, without this TCP_NOTSENT_LOWAT option, number of wakeups is
> more like bandwidth_in_bytes_per_second / SO_SNDBUF, because kernel
> wakes up the blocked task when output buffers size occupancy reached 50%
>
>
>
I think a "userspace janitors" project is needed, where we identify
everything that could benefit from TCP_NOTSENT_LOWAT[1], and go patch
it.
I did a little of this for using IPV6_TCLASS right on a ton of
applications and (for example) have some long standing patches
submitted to rsync for selecting congestion control and setting
IP_TOS/IPV6_TCLASS (sigh - still not accepted).
Maybe GSOC? Getting, say just one college class to up and go do it,
for a week or two, together, analyzing the the results as they go,
would make a dent....
[1] I think userspace vpns could use an internal fq+codel algorithm,
or perhaps the kernel socket read buffer could gain a socket option to
present one
--
Dave Täht
Open Networking needs **Open Source Hardware**
https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-24 13:50 ` Eric Dumazet
2015-04-24 14:34 ` Dave Taht
@ 2015-04-24 16:31 ` Rick Jones
2015-04-24 18:41 ` Eric Dumazet
1 sibling, 1 reply; 19+ messages in thread
From: Rick Jones @ 2015-04-24 16:31 UTC (permalink / raw)
To: Eric Dumazet, Dave Taht; +Cc: Hal Murray, bloat
> netperf -t TCP_STREAM" uses a default size of 16384 bytes per sendmsg.
Under Linux at least, and only because that is the default initial value
for SO_SNDBUF for a TCP socket (via tcp_wmem).
More generally, the default send size used by netperf is the value of
SO_SNDBUF for the data socket immediately after its creation.
rick
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
2015-04-24 16:31 ` Rick Jones
@ 2015-04-24 18:41 ` Eric Dumazet
0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-24 18:41 UTC (permalink / raw)
To: Rick Jones; +Cc: Hal Murray, bloat
On Fri, 2015-04-24 at 09:31 -0700, Rick Jones wrote:
> > netperf -t TCP_STREAM" uses a default size of 16384 bytes per sendmsg.
>
> Under Linux at least, and only because that is the default initial value
> for SO_SNDBUF for a TCP socket (via tcp_wmem).
>
> More generally, the default send size used by netperf is the value of
> SO_SNDBUF for the data socket immediately after its creation.
>
Yeah, this looks odd.
Note that right after a connect() or accept(), getsocktop(SO_SNDBUF)
might be very different than the 'default=16384'
Otherwise, we could not even send the first 10 packets for IW10 from one
sendmsg(), or a single full packet on loopback interface (MTU=65536)
Anyway, 16384 bytes as default buffer size on netperf is fine.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-04-24 18:41 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-22 19:10 [Bloat] SO_SNDBUF and SO_RCVBUF Hal Murray
2015-04-22 19:26 ` Rick Jones
2015-04-22 19:28 ` Dave Taht
2015-04-22 21:02 ` Eric Dumazet
2015-04-22 21:05 ` Rick Jones
2015-04-22 21:46 ` Eric Dumazet
2015-04-22 22:20 ` Simon Barber
2015-04-22 23:08 ` Eric Dumazet
2015-04-24 4:37 ` Dave Taht
2015-04-24 4:40 ` Dave Taht
2015-04-24 13:50 ` Eric Dumazet
2015-04-24 14:34 ` Dave Taht
2015-04-24 16:31 ` Rick Jones
2015-04-24 18:41 ` Eric Dumazet
2015-04-24 5:23 ` Eric Dumazet
2015-04-22 21:07 ` Steinar H. Gunderson
2015-04-22 21:42 ` Eric Dumazet
2015-04-22 21:47 ` Dave Taht
2015-04-22 22:11 ` Steinar H. Gunderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox