[Bloat] SO_SNDBUF and SO

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* [Bloat] SO_SNDBUF and SO_RCVBUF
@ 2015-04-22 19:10 Hal Murray
  2015-04-22 19:26 ` Rick Jones
  2015-04-22 19:28 ` Dave Taht
  0 siblings, 2 replies; 19+ messages in thread
From: Hal Murray @ 2015-04-22 19:10 UTC (permalink / raw)
  To: bloat; +Cc: Hal Murray

> As I understand it (I thought) SO_SNDBUF and SO_RCVBUF are socket buffers
> for the application layer, they do not change the TCP window size either
> send or receive. Which is perhaps why they aren't used much. They don't do
> much good in iperf that's for sure! Might be wrong, but I agree with the
> premise - auto-tuning should work.

I sure expect them to do the obvious thing.

man 7 socket says:

       SO_SNDBUF
              Sets  or gets the maximum socket send buffer in bytes.

It doesn't actually say that turns into the TCP window size.

On Linux, there is a factor of 2 for overhead and whatever.

man tcp says:
      TCP uses the extra space for administrative purposes and inter-
       nal kernel structures, and the /proc file  values  reflect  the  larger
       sizes  compared  to the actual TCP windows.

So it looks like the number you feed it turns into the window size.

A few quick tests with netperf confirm that it is doing something close to 
what I expect but I haven't fired up tcpdump to verify that the window size 
is what I asked for.  netperf does print out values that are 2x what I asked 
for.

Yuck.  (That's Yuck at Linux, not netperf.)

-- 
These are my opinions.  I hate spam.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 19:10 [Bloat] SO_SNDBUF and SO_RCVBUF Hal Murray
@ 2015-04-22 19:26 ` Rick Jones
  2015-04-22 19:28 ` Dave Taht
  1 sibling, 0 replies; 19+ messages in thread
From: Rick Jones @ 2015-04-22 19:26 UTC (permalink / raw)
  To: Hal Murray, bloat

> So it looks like the number you feed it turns into the window size.
>
> A few quick tests with netperf confirm that it is doing something close to
> what I expect but I haven't fired up tcpdump to verify that the window size
> is what I asked for.  netperf does print out values that are 2x what I asked
> for.

It will do that until you start asking for (2x?) more than 
net.core.[rw]mem_max.  At that point they will be clipped.

> Yuck.  (That's Yuck at Linux, not netperf.)

No worries :)

That bit of behaviour frustrated me and my BSD-stack upbringing for 
years, along with the auto-tuning.  Finally I ended-up adding support 
for reporting three different socket buffer values via the "omni" output 
selectors:

1) the size requested by the user via the command line
2) the size initially after the data socket was created (and any 
setsockopts triggered by the command line)
3) the size at the end of the test

The classic netperf tests have always reported 2.  If you use the omni 
tests directly (-t omni) they will report 3.  You can always use the 
test-specific -o, -O or -k option to emit each specifically.  For both 
SO_[SND|RCV]BUF locally and remote:

raj@tardy:~/netperf2_trunk$ netperf -- -O \? | grep SIZE
LSS_SIZE_REQ
LSS_SIZE
LSS_SIZE_END
LSR_SIZE_REQ
LSR_SIZE
LSR_SIZE_END
RSS_SIZE_REQ
RSS_SIZE
RSS_SIZE_END
RSR_SIZE_REQ
RSR_SIZE
RSR_SIZE_END
...

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 19:10 [Bloat] SO_SNDBUF and SO_RCVBUF Hal Murray
  2015-04-22 19:26 ` Rick Jones
@ 2015-04-22 19:28 ` Dave Taht
  2015-04-22 21:02   ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Taht @ 2015-04-22 19:28 UTC (permalink / raw)
  To: Hal Murray; +Cc: bloat

SO_SNDLOWAT or something similar to it with a name I cannot recall,
can be useful.

On Wed, Apr 22, 2015 at 12:10 PM, Hal Murray <hmurray@megapathdsl.net> wrote:
>
>> As I understand it (I thought) SO_SNDBUF and SO_RCVBUF are socket buffers
>> for the application layer, they do not change the TCP window size either
>> send or receive. Which is perhaps why they aren't used much. They don't do
>> much good in iperf that's for sure! Might be wrong, but I agree with the
>> premise - auto-tuning should work.
>
> I sure expect them to do the obvious thing.
>
> man 7 socket says:
>
>        SO_SNDBUF
>               Sets  or gets the maximum socket send buffer in bytes.
>
> It doesn't actually say that turns into the TCP window size.
>
> On Linux, there is a factor of 2 for overhead and whatever.
>
> man tcp says:
>       TCP uses the extra space for administrative purposes and inter-
>        nal kernel structures, and the /proc file  values  reflect  the  larger
>        sizes  compared  to the actual TCP windows.
>
> So it looks like the number you feed it turns into the window size.
>
> A few quick tests with netperf confirm that it is doing something close to
> what I expect but I haven't fired up tcpdump to verify that the window size
> is what I asked for.  netperf does print out values that are 2x what I asked
> for.
>
> Yuck.  (That's Yuck at Linux, not netperf.)
>
>
> --
> These are my opinions.  I hate spam.
>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 19:28 ` Dave Taht
@ 2015-04-22 21:02   ` Eric Dumazet
  2015-04-22 21:05     ` Rick Jones
  2015-04-22 21:07     ` Steinar H. Gunderson
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-22 21:02 UTC (permalink / raw)
  To: Dave Taht; +Cc: Hal Murray, bloat

Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12




On Wed, 2015-04-22 at 12:28 -0700, Dave Taht wrote:
> SO_SNDLOWAT or something similar to it with a name I cannot recall,
> can be useful.
> 
> On Wed, Apr 22, 2015 at 12:10 PM, Hal Murray <hmurray@megapathdsl.net> wrote:
> >
> >> As I understand it (I thought) SO_SNDBUF and SO_RCVBUF are socket buffers
> >> for the application layer, they do not change the TCP window size either
> >> send or receive. Which is perhaps why they aren't used much. They don't do
> >> much good in iperf that's for sure! Might be wrong, but I agree with the
> >> premise - auto-tuning should work.
> >
> > I sure expect them to do the obvious thing.
> >
> > man 7 socket says:
> >
> >        SO_SNDBUF
> >               Sets  or gets the maximum socket send buffer in bytes.
> >
> > It doesn't actually say that turns into the TCP window size.
> >
> > On Linux, there is a factor of 2 for overhead and whatever.
> >
> > man tcp says:
> >       TCP uses the extra space for administrative purposes and inter-
> >        nal kernel structures, and the /proc file  values  reflect  the  larger
> >        sizes  compared  to the actual TCP windows.
> >
> > So it looks like the number you feed it turns into the window size.
> >
> > A few quick tests with netperf confirm that it is doing something close to
> > what I expect but I haven't fired up tcpdump to verify that the window size
> > is what I asked for.  netperf does print out values that are 2x what I asked
> > for.
> >
> > Yuck.  (That's Yuck at Linux, not netperf.)
> >
> >
> > --
> > These are my opinions.  I hate spam.
> >
> >
> >
> > _______________________________________________
> > Bloat mailing list
> > Bloat@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
> 
> 
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 21:02   ` Eric Dumazet
@ 2015-04-22 21:05     ` Rick Jones
  2015-04-22 21:46       ` Eric Dumazet
  2015-04-24  4:37       ` Dave Taht
  2015-04-22 21:07     ` Steinar H. Gunderson
  1 sibling, 2 replies; 19+ messages in thread
From: Rick Jones @ 2015-04-22 21:05 UTC (permalink / raw)
  To: Eric Dumazet, Dave Taht; +Cc: Hal Murray, bloat

On 04/22/2015 02:02 PM, Eric Dumazet wrote:
> Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12

Don't go telling Dave about that, he wants me to put too much into 
netperf as it is!-)

rick


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 21:02   ` Eric Dumazet
  2015-04-22 21:05     ` Rick Jones
@ 2015-04-22 21:07     ` Steinar H. Gunderson
  2015-04-22 21:42       ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Steinar H. Gunderson @ 2015-04-22 21:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Hal Murray, bloat

On Wed, Apr 22, 2015 at 02:02:32PM -0700, Eric Dumazet wrote:
> Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12

But this is only for when your data could change underway, right? 
Like, not relevant for sending one big file, but might be relevant for e.g.
VNC (or someone mentioned the usecase of HTTP/2, where a high-priority
request might come in, which you don't want buried behind a megabyte of
stuff in the send queue).

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 21:07     ` Steinar H. Gunderson
@ 2015-04-22 21:42       ` Eric Dumazet
  2015-04-22 21:47         ` Dave Taht
  2015-04-22 22:11         ` Steinar H. Gunderson
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-22 21:42 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: Hal Murray, bloat

On Wed, 2015-04-22 at 23:07 +0200, Steinar H. Gunderson wrote:
> On Wed, Apr 22, 2015 at 02:02:32PM -0700, Eric Dumazet wrote:
> > Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
> 
> But this is only for when your data could change underway, right? 
> Like, not relevant for sending one big file, but might be relevant for e.g.
> VNC (or someone mentioned the usecase of HTTP/2, where a high-priority
> request might come in, which you don't want buried behind a megabyte of
> stuff in the send queue).

Sorry, I do not understand you.

The nice thing about TCP_NOTSENT_LOWAT is that you no longer have to
care about choosing the 'right SO_SNDBUF'

It is still CC responsibility to choose/set cwnd, but you hadn't set an
artificial cap on cwnd.

You control the amount of 'unsent data' per socket.

If you set a low limit, application might have to issue more send()
calls and get more EPOLLOUT events.

This also means that if you get an abort / eof, you no longer have a
huge unsent queue that TCP API does not allow to cancel.

https://insouciant.org/tech/prioritization-only-works-when-theres-pending-data-to-prioritize/



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 21:05     ` Rick Jones
@ 2015-04-22 21:46       ` Eric Dumazet
  2015-04-22 22:20         ` Simon Barber
  2015-04-24  4:37       ` Dave Taht
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2015-04-22 21:46 UTC (permalink / raw)
  To: Rick Jones; +Cc: Hal Murray, bloat

On Wed, 2015-04-22 at 14:05 -0700, Rick Jones wrote:
> On 04/22/2015 02:02 PM, Eric Dumazet wrote:
> > Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
> 
> Don't go telling Dave about that, he wants me to put too much into 
> netperf as it is!-)

Note that one can also set a sysctl, in case netperf author does not
want to change its baby ;)

echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 21:42       ` Eric Dumazet
@ 2015-04-22 21:47         ` Dave Taht
  2015-04-22 22:11         ` Steinar H. Gunderson
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Taht @ 2015-04-22 21:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Hal Murray, bloat

On Wed, Apr 22, 2015 at 2:42 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-04-22 at 23:07 +0200, Steinar H. Gunderson wrote:
>> On Wed, Apr 22, 2015 at 02:02:32PM -0700, Eric Dumazet wrote:
>> > Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
>>
>> But this is only for when your data could change underway, right?
>> Like, not relevant for sending one big file, but might be relevant for e.g.
>> VNC (or someone mentioned the usecase of HTTP/2, where a high-priority
>> request might come in, which you don't want buried behind a megabyte of
>> stuff in the send queue).
>
> Sorry, I do not understand you.
>
> The nice thing about TCP_NOTSENT_LOWAT is that you no longer have to
> care about choosing the 'right SO_SNDBUF'

Stuart cheshire has a very nice youtube video due out soon on this option.
He demonstrates the enormous difference it made in a screen sharing
application...

... and there are many libs and toolkits (like X11, userspace tcp
vpns, etc) that
could use it. It should be going into every tcp app that might congest AND can
do more intelligent things when congested.

It looks useful in web browsers also.

I have no idea to what extent this socket option has been picked up by
the marketplace/open source world.

>
> It is still CC responsibility to choose/set cwnd, but you hadn't set an
> artificial cap on cwnd.
>
> You control the amount of 'unsent data' per socket.
>
> If you set a low limit, application might have to issue more send()
> calls and get more EPOLLOUT events.
>
> This also means that if you get an abort / eof, you no longer have a
> huge unsent queue that TCP API does not allow to cancel.
>
> https://insouciant.org/tech/prioritization-only-works-when-theres-pending-data-to-prioritize/
>
>



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 21:42       ` Eric Dumazet
  2015-04-22 21:47         ` Dave Taht
@ 2015-04-22 22:11         ` Steinar H. Gunderson
  1 sibling, 0 replies; 19+ messages in thread
From: Steinar H. Gunderson @ 2015-04-22 22:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Hal Murray, bloat

On Wed, Apr 22, 2015 at 02:42:59PM -0700, Eric Dumazet wrote:
> Sorry, I do not understand you.
> 
> The nice thing about TCP_NOTSENT_LOWAT is that you no longer have to
> care about choosing the 'right SO_SNDBUF'
> 
> It is still CC responsibility to choose/set cwnd, but you hadn't set an
> artificial cap on cwnd.
> 
> You control the amount of 'unsent data' per socket.
> 
> If you set a low limit, application might have to issue more send()
> calls and get more EPOLLOUT events.
> 
> This also means that if you get an abort / eof, you no longer have a
> huge unsent queue that TCP API does not allow to cancel.
> 
> https://insouciant.org/tech/prioritization-only-works-when-theres-pending-data-to-prioritize/

So this URL is basically what I said; you need to have data to prioritize
between for this to be useful. If you just want to send a simple file
(and aborts in HTTP/1.1 basically don't really exist), it doesn't really
matter if you have a huge backlog or not.

So I'm sure it's useful for HTTP/2 or SPDY, but that's already pretty
advanced functionality.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 21:46       ` Eric Dumazet
@ 2015-04-22 22:20         ` Simon Barber
  2015-04-22 23:08           ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Simon Barber @ 2015-04-22 22:20 UTC (permalink / raw)
  To: Eric Dumazet, Rick Jones; +Cc: Hal Murray, bloat

Wouldn't the LOWAT setting be much easier for applications to use if it was 
set in estimated time (ie time it will take to deliver the data) rather 
than bytes?

Simon

Sent with AquaMail for Android
http://www.aqua-mail.com


On April 22, 2015 2:47:34 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Wed, 2015-04-22 at 14:05 -0700, Rick Jones wrote:
> > On 04/22/2015 02:02 PM, Eric Dumazet wrote:
> > > Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
> >
> > Don't go telling Dave about that, he wants me to put too much into
> > netperf as it is!-)
>
> Note that one can also set a sysctl, in case netperf author does not
> want to change its baby ;)
>
> echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 22:20         ` Simon Barber
@ 2015-04-22 23:08           ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-22 23:08 UTC (permalink / raw)
  To: Simon Barber; +Cc: Hal Murray, bloat

On Wed, 2015-04-22 at 15:20 -0700, Simon Barber wrote:
> Wouldn't the LOWAT setting be much easier for applications to use if it was 
> set in estimated time (ie time it will take to deliver the data) rather 
> than bytes?

Sure, but you have all the info to infer one from the other.

Note also TCP stack has immediate notion of bytes, while adding time
delays immediately impose a possibly expensive time acquisition in high
precision.

# git show c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 | diffstat -p1 -w70
 Documentation/networking/ip-sysctl.txt |   13 +++++++++++++
 include/linux/tcp.h                    |    1 +
 include/net/sock.h                     |   19 +++++++++++++------
 include/net/tcp.h                      |   14 ++++++++++++++
 include/uapi/linux/tcp.h               |    1 +
 net/ipv4/sysctl_net_ipv4.c             |    7 +++++++
 net/ipv4/tcp.c                         |    7 +++++++
 net/ipv4/tcp_ipv4.c                    |    1 +
 net/ipv4/tcp_output.c                  |    3 +++
 net/ipv6/tcp_ipv6.c                    |    1 +
 10 files changed, 61 insertions(+), 6 deletions(-)

A time based implementation would be way more complex/expensive.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-22 21:05     ` Rick Jones
  2015-04-22 21:46       ` Eric Dumazet
@ 2015-04-24  4:37       ` Dave Taht
  2015-04-24  4:40         ` Dave Taht
  2015-04-24  5:23         ` Eric Dumazet
  1 sibling, 2 replies; 19+ messages in thread
From: Dave Taht @ 2015-04-24  4:37 UTC (permalink / raw)
  To: Rick Jones; +Cc: Hal Murray, bloat

On Wed, Apr 22, 2015 at 2:05 PM, Rick Jones <rick.jones2@hp.com> wrote:
> On 04/22/2015 02:02 PM, Eric Dumazet wrote:
>>
>> Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12

I will argue that that would give a much better estimate as to how
much data was really outstanding on the wire.

>
> Don't go telling Dave about that, he wants me to put too much into netperf
> as it is!-)

Please release 2.7 soonest so we can get it into openwrt chaos calmer.

Then we can discuss new features, like the UDPLITE stuff and this. :)

That said, I would like try a comparison test against rrul results
taken with and without the tcp_notsent_lowat option in netperf when I
get back from vacation next week. My guess is that it won't affect the
stats we get currently much (might hurt as netperf will run more
often), but MIGHT reduce the tail burst problem we see fairly often at
the conclusion of the test

>
> rick
>

-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-24  4:37       ` Dave Taht
@ 2015-04-24  4:40         ` Dave Taht
  2015-04-24 13:50           ` Eric Dumazet
  2015-04-24  5:23         ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Taht @ 2015-04-24  4:40 UTC (permalink / raw)
  To: Rick Jones; +Cc: Hal Murray, bloat

and of course, after writing the previous email, I go reading the
original commit for this option. Yea, that is a huge increase in
context switches...

https://lwn.net/Articles/560082/

... but totally worth it for many apps that can do something else
while their connection congests, and totally awesome for tcp vpns,
x11, screen sharers, etc....

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-24  4:37       ` Dave Taht
  2015-04-24  4:40         ` Dave Taht
@ 2015-04-24  5:23         ` Eric Dumazet
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-24  5:23 UTC (permalink / raw)
  To: Dave Taht; +Cc: Hal Murray, bloat

On Thu, 2015-04-23 at 21:37 -0700, Dave Taht wrote:
> On Wed, Apr 22, 2015 at 2:05 PM, Rick Jones <rick.jones2@hp.com> wrote:
> > On 04/22/2015 02:02 PM, Eric Dumazet wrote:
> >>
> >> Yeah, the real nice thing is TCP_NOTSENT_LOWAT added in linux-3.12
> 
> I will argue that that would give a much better estimate as to how
> much data was really outstanding on the wire.

This would be the responsibility of the CC.

Each CC has its own ways to be controlled. vegas & cubic have different
knobs.

TCP_NOTSENT_LOWAT controls the number of unsent bytes. This is generic.

You do not want to add a 'knob' that would lock all CC to a given
behavior : It is already there with pacing.

If you know the rtt, then with pacing you also can limit XXX bytes on
the wire.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-24  4:40         ` Dave Taht
@ 2015-04-24 13:50           ` Eric Dumazet
  2015-04-24 14:34             ` Dave Taht
  2015-04-24 16:31             ` Rick Jones
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-24 13:50 UTC (permalink / raw)
  To: Dave Taht; +Cc: Hal Murray, bloat

On Thu, 2015-04-23 at 21:40 -0700, Dave Taht wrote:
> and of course, after writing the previous email, I go reading the
> original commit for this option. Yea, that is a huge increase in
> context switches...
> 
> https://lwn.net/Articles/560082/
> 
> ... but totally worth it for many apps that can do something else
> while their connection congests, and totally awesome for tcp vpns,
> x11, screen sharers, etc....

It all depends on how many bytes are pushed by the application per
sendmsg()

To keep the amount of unsent bytes low, the application should not issue
a large write, but it still can if it needs to for whatever reason.

netperf -t TCP_STREAM" uses a default size of 16384 bytes per sendmsg.

So obviously, if a wakeup is needed per sendmsg(), number of context
switches is exactly bandwidth_in_bytes_per_second / 16384

Normally, without this TCP_NOTSENT_LOWAT option, number of wakeups is
more like bandwidth_in_bytes_per_second / SO_SNDBUF, because kernel
wakes up the blocked task when output buffers size occupancy reached 50%




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-24 13:50           ` Eric Dumazet
@ 2015-04-24 14:34             ` Dave Taht
  2015-04-24 16:31             ` Rick Jones
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Taht @ 2015-04-24 14:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Hal Murray, bloat

On Fri, Apr 24, 2015 at 6:50 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2015-04-23 at 21:40 -0700, Dave Taht wrote:
>> and of course, after writing the previous email, I go reading the
>> original commit for this option. Yea, that is a huge increase in
>> context switches...
>>
>> https://lwn.net/Articles/560082/
>>
>> ... but totally worth it for many apps that can do something else
>> while their connection congests, and totally awesome for tcp vpns,
>> x11, screen sharers, etc....
>
> It all depends on how many bytes are pushed by the application per
> sendmsg()
>
> To keep the amount of unsent bytes low, the application should not issue
> a large write, but it still can if it needs to for whatever reason.
>
> netperf -t TCP_STREAM" uses a default size of 16384 bytes per sendmsg.
>
> So obviously, if a wakeup is needed per sendmsg(), number of context
> switches is exactly bandwidth_in_bytes_per_second / 16384
>
> Normally, without this TCP_NOTSENT_LOWAT option, number of wakeups is
> more like bandwidth_in_bytes_per_second / SO_SNDBUF, because kernel
> wakes up the blocked task when output buffers size occupancy reached 50%
>
>
>

I think a "userspace janitors" project is needed, where we identify
everything that could benefit from TCP_NOTSENT_LOWAT[1], and go patch
it.

I did a little of this for using IPV6_TCLASS right on a ton of
applications and (for example) have some long standing patches
submitted to rsync for selecting congestion control and setting
IP_TOS/IPV6_TCLASS (sigh - still not accepted).

Maybe GSOC? Getting, say just one college class to up and go do it,
for a week or two, together, analyzing the the results as they go,
would make a dent....

[1] I think userspace vpns could use an internal fq+codel algorithm,
or perhaps the kernel socket read buffer could gain a socket option to
present one

-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-24 13:50           ` Eric Dumazet
  2015-04-24 14:34             ` Dave Taht
@ 2015-04-24 16:31             ` Rick Jones
  2015-04-24 18:41               ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Rick Jones @ 2015-04-24 16:31 UTC (permalink / raw)
  To: Eric Dumazet, Dave Taht; +Cc: Hal Murray, bloat

> netperf -t TCP_STREAM" uses a default size of 16384 bytes per sendmsg.

Under Linux at least, and only because that is the default initial value 
for SO_SNDBUF for a TCP socket (via tcp_wmem).

More generally, the default send size used by netperf is the value of 
SO_SNDBUF for the data socket immediately after its creation.

rick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] SO_SNDBUF and SO_RCVBUF
  2015-04-24 16:31             ` Rick Jones
@ 2015-04-24 18:41               ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2015-04-24 18:41 UTC (permalink / raw)
  To: Rick Jones; +Cc: Hal Murray, bloat

On Fri, 2015-04-24 at 09:31 -0700, Rick Jones wrote:
> > netperf -t TCP_STREAM" uses a default size of 16384 bytes per sendmsg.
> 
> Under Linux at least, and only because that is the default initial value 
> for SO_SNDBUF for a TCP socket (via tcp_wmem).
> 
> More generally, the default send size used by netperf is the value of 
> SO_SNDBUF for the data socket immediately after its creation.
> 

Yeah, this looks odd.

Note that right after a connect() or accept(), getsocktop(SO_SNDBUF)
might be very different than the 'default=16384'

Otherwise, we could not even send the first 10 packets for IW10 from one
sendmsg(), or a single full packet on loopback interface (MTU=65536)

Anyway, 16384 bytes as default buffer size on netperf is fine.



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-04-24 18:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-22 19:10 [Bloat] SO_SNDBUF and SO_RCVBUF Hal Murray
2015-04-22 19:26 ` Rick Jones
2015-04-22 19:28 ` Dave Taht
2015-04-22 21:02   ` Eric Dumazet
2015-04-22 21:05     ` Rick Jones
2015-04-22 21:46       ` Eric Dumazet
2015-04-22 22:20         ` Simon Barber
2015-04-22 23:08           ` Eric Dumazet
2015-04-24  4:37       ` Dave Taht
2015-04-24  4:40         ` Dave Taht
2015-04-24 13:50           ` Eric Dumazet
2015-04-24 14:34             ` Dave Taht
2015-04-24 16:31             ` Rick Jones
2015-04-24 18:41               ` Eric Dumazet
2015-04-24  5:23         ` Eric Dumazet
2015-04-22 21:07     ` Steinar H. Gunderson
2015-04-22 21:42       ` Eric Dumazet
2015-04-22 21:47         ` Dave Taht
2015-04-22 22:11         ` Steinar H. Gunderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox