* [Codel] better tc support for bittorrent/diffserv
@ 2012-05-14 15:51 dave taht
2012-05-14 17:50 ` [Codel] [Bloat] " Juliusz Chroboczek
0 siblings, 1 reply; 11+ messages in thread
From: dave taht @ 2012-05-14 15:51 UTC (permalink / raw)
To: codel, bloat
I spent a little time this weekend thinking about how to make bittorrent
do more of the right thing on a fq_codel equipped edge gateway. I set up
a seeder (transmission-gtk) and watched it go. I was also going to try
hekate on the router itself but couldn't get it to cross build.
Notes:
0) No ecn on transmission, at least. :( Packets not marked as CS1,
either. What are these characteristics on other torrent protocol clients?
1) The port number is consistent, but user-settable. Felix tells me that
the port number varies wildly, however, on other implementations, and
that dpi is required. His current implementation of dpi for torrent is
95% accurate with no false positives but requires connection tracking.
2) Doing fair queuing on traffic that is artificially rate limited by
transmission below the actual egress rate makes bittorrent 'vanish'.
Totally never noticed it running, even though I got as far as seeding
32+ machines downstream. I imagine things get worse with 1000 seeds
running but would like to setup a test... on a legal file that everybody
wants to have. (suggestions for a good seed torrent welcomed)
3) There is a new feature of linux, for containers, that let you run
apps in network containers and set packet priorities based on that.
(priorities in this case is a netfilter-ism, which lets you mark packets
based on which control group a process is in). This new feature holds
great promise to re-mark things like skype, at the source, for example,
and control apps (like torrent) in new ways.
4) Syntactically tc is missing some ip6 functionality, notably an
orthogonal "tos" field to the ipv4 version.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 15:51 [Codel] better tc support for bittorrent/diffserv dave taht
@ 2012-05-14 17:50 ` Juliusz Chroboczek
2012-05-14 18:03 ` Eric Dumazet
2012-05-14 18:14 ` Dave Taht
0 siblings, 2 replies; 11+ messages in thread
From: Juliusz Chroboczek @ 2012-05-14 17:50 UTC (permalink / raw)
To: dave taht; +Cc: codel, bloat
[Pro memoria, I'm the author of a few of the networking bits in
Transmission, notably the DHT, the µTP support and large parts of the
IPv6 code.]
> 0) No ecn on transmission, at least.
TCP flows will use ECN if your system is configured to use ECN. µTP
doesn't do ECN, and there's no easy way to implement it -- there's no
API to set ECN on UDP packets.
> Packets not marked as CS1,
Add
"peer-socket-tos": "lowcost",
to your settings.json.
> 1) The port number is consistent, but user-settable.
Only for µTP and incoming TCP. Outgoing TCP uses a random port.
It's a well known flaw of the sockets API that you cannot use the same
local port for incoming and outgoing connections.
> 4) Syntactically tc is missing some ip6 functionality, notably an
> orthogonal "tos" field to the ipv4 version.
Use ip6tables -j MARK.
-- Juliusz
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 17:50 ` [Codel] [Bloat] " Juliusz Chroboczek
@ 2012-05-14 18:03 ` Eric Dumazet
2012-05-14 18:18 ` dave taht
2012-05-14 18:24 ` Juliusz Chroboczek
2012-05-14 18:14 ` Dave Taht
1 sibling, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2012-05-14 18:03 UTC (permalink / raw)
To: Juliusz Chroboczek; +Cc: codel, bloat
On Mon, 2012-05-14 at 19:50 +0200, Juliusz Chroboczek wrote:
> [Pro memoria, I'm the author of a few of the networking bits in
> Transmission, notably the DHT, the µTP support and large parts of the
> IPv6 code.]
>
> > 0) No ecn on transmission, at least.
>
> TCP flows will use ECN if your system is configured to use ECN. µTP
> doesn't do ECN, and there's no easy way to implement it -- there's no
> API to set ECN on UDP packets.
There is an API to set ECN on UDP packets.
setsockopt() IP_TOS, with val = 2
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 17:50 ` [Codel] [Bloat] " Juliusz Chroboczek
2012-05-14 18:03 ` Eric Dumazet
@ 2012-05-14 18:14 ` Dave Taht
2012-05-14 18:31 ` Juliusz Chroboczek
1 sibling, 1 reply; 11+ messages in thread
From: Dave Taht @ 2012-05-14 18:14 UTC (permalink / raw)
To: Juliusz Chroboczek; +Cc: codel, bloat
On Mon, May 14, 2012 at 10:50 AM, Juliusz Chroboczek <jch@pps.jussieu.fr> wrote:
>
>> 4) Syntactically tc is missing some ip6 functionality, notably an
>> orthogonal "tos" field to the ipv4 version.
>
> Use ip6tables -j MARK.
ip6tables...
Doesn't work on ingress presently, if you are using ifb (I can be wrong).
Using something to lightly shape the last hop (host) incoming packets, like
htb set to a very large fraction of the card's bandwidth and codel set to
a target of 500us delay on ingress, seems to work.
With egress and ingress rate limiting on I can get delay for large numbers
of flows in both directions in a saturated situation down to < 5ms with
fq_codel in place attached to htb.
Best of all, if you have ecn enabled on that host and the packets are marked
3 on ingress in that scenario, you don't drop anything on that last hop...
>
> -- Juliusz
>
--
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 18:03 ` Eric Dumazet
@ 2012-05-14 18:18 ` dave taht
2012-05-14 18:30 ` Juliusz Chroboczek
2012-05-14 18:24 ` Juliusz Chroboczek
1 sibling, 1 reply; 11+ messages in thread
From: dave taht @ 2012-05-14 18:18 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Juliusz Chroboczek, codel, bloat
On 05/14/2012 11:03 AM, Eric Dumazet wrote:
> On Mon, 2012-05-14 at 19:50 +0200, Juliusz Chroboczek wrote:
>> [Pro memoria, I'm the author of a few of the networking bits in
>> Transmission, notably the DHT, the µTP support and large parts of the
>> IPv6 code.]
>>
>>> 0) No ecn on transmission, at least.
>> TCP flows will use ECN if your system is configured to use ECN. µTP
>> doesn't do ECN, and there's no easy way to implement it -- there's no
>> API to set ECN on UDP packets.
> There is an API to set ECN on UDP packets.
>
> setsockopt() IP_TOS, with val = 2
I think, but am not certain, jch is referring to the problem of
negotiating ecn over uTP.
I'd like it if it could, as a background protocol it could then give us
some statistics as to the actual success rate of passing it, and if it
fails to work right, fall back into a more 'happy eyeballs' scenario.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 18:03 ` Eric Dumazet
2012-05-14 18:18 ` dave taht
@ 2012-05-14 18:24 ` Juliusz Chroboczek
2012-05-14 18:31 ` Eric Dumazet
1 sibling, 1 reply; 11+ messages in thread
From: Juliusz Chroboczek @ 2012-05-14 18:24 UTC (permalink / raw)
To: Eric Dumazet; +Cc: codel, bloat
> There is an API to set ECN on UDP packets.
>
> setsockopt() IP_TOS, with val = 2
Interesting, I didn't realise that. Is it also possible to read the ECN
bits?
-- Juliusz
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 18:18 ` dave taht
@ 2012-05-14 18:30 ` Juliusz Chroboczek
2012-05-14 18:35 ` dave taht
0 siblings, 1 reply; 11+ messages in thread
From: Juliusz Chroboczek @ 2012-05-14 18:30 UTC (permalink / raw)
To: dave taht; +Cc: codel, bloat
> I think, but am not certain, jch is referring to the problem of
> negotiating ecn over uTP.
No, I was actually unaware that Linux allows one to set the ECN bits
from userspace.
OTOH, you're right that µTP doesn't currently support ECN. Not that it
matters much, ECN is useful for interactive connections (where it can
avoid a one-RTT delay due to a packet loss) but pretty useless for bulk
transfers[1].
On a related note, the µTP framing is pretty horrible IMHO, and LEDBAT
is better implemented within TCP. Unfortunately, this is not possible
under Windows (yeah, I got one of the µTorrent guys to contact Microsoft
on this subject).
On a note related to the related note, Transmission already has the
ability to make use of a non-default congestion controller for TCP:
http://www.pps.jussieu.fr/~jch/software/bittorrent/tcp-congestion-control.html
-- Juliusz
[1] Marek Malowidzki, Simulation-based Study of ECN Performance in RED
Networks, In Proc. SPECTS'03. 2003.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 18:14 ` Dave Taht
@ 2012-05-14 18:31 ` Juliusz Chroboczek
0 siblings, 0 replies; 11+ messages in thread
From: Juliusz Chroboczek @ 2012-05-14 18:31 UTC (permalink / raw)
To: Dave Taht; +Cc: codel, bloat
> Doesn't work on ingress presently, if you are using ifb (I can be wrong).
?
-- Juliusz
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 18:24 ` Juliusz Chroboczek
@ 2012-05-14 18:31 ` Eric Dumazet
2012-05-14 18:42 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2012-05-14 18:31 UTC (permalink / raw)
To: Juliusz Chroboczek; +Cc: codel, bloat
On Mon, 2012-05-14 at 20:24 +0200, Juliusz Chroboczek wrote:
> > There is an API to set ECN on UDP packets.
> >
> > setsockopt() IP_TOS, with val = 2
>
> Interesting, I didn't realise that. Is it also possible to read the ECN
> bits?
Yep
man 7 ip
IP_RECVTOS (since Linux 2.2)
If enabled the IP_TOS ancillary message is passed with incoming packets.
It contains a byte which specifies the Type of Service/Precedence field
of the packet header. Expects a boolean integer flag.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 18:30 ` Juliusz Chroboczek
@ 2012-05-14 18:35 ` dave taht
0 siblings, 0 replies; 11+ messages in thread
From: dave taht @ 2012-05-14 18:35 UTC (permalink / raw)
To: Juliusz Chroboczek; +Cc: codel, bloat
On 05/14/2012 11:30 AM, Juliusz Chroboczek wrote:
>> I think, but am not certain, jch is referring to the problem of
>> negotiating ecn over uTP.
> No, I was actually unaware that Linux allows one to set the ECN bits
> from userspace.
It's also possible to get them from userspace.
>
> OTOH, you're right that µTP doesn't currently support ECN. Not that it
> matters much, ECN is useful for interactive connections (where it can
> avoid a one-RTT delay due to a packet loss) but pretty useless for bulk
> transfers[1].
My scenario is that AQM isn't just for routers anymore.
>
> On a related note, the µTP framing is pretty horrible IMHO, and LEDBAT
> is better implemented within TCP. Unfortunately, this is not possible
> under Windows (yeah, I got one of the µTorrent guys to contact Microsoft
> on this subject).
>
> On a note related to the related note, Transmission already has the
> ability to make use of a non-default congestion controller for TCP:
>
> http://www.pps.jussieu.fr/~jch/software/bittorrent/tcp-congestion-control.html
TCP_LEDBAT has been in cerowrt for months. Haven't played with it much, was
mostly comparing westwood and cubic.
http://perso.telecom-paristech.fr/~valenti/pmwiki/pmwiki.php?n=Main.LEDBAT
Also have the ability to exercise alternate algorithms in the latest
netperf in svn
(as well as classification).
> -- Juliusz
>
> [1] Marek Malowidzki, Simulation-based Study of ECN Performance in RED
> Networks, In Proc. SPECTS'03. 2003.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Codel] [Bloat] better tc support for bittorrent/diffserv
2012-05-14 18:31 ` Eric Dumazet
@ 2012-05-14 18:42 ` Eric Dumazet
0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2012-05-14 18:42 UTC (permalink / raw)
To: Juliusz Chroboczek; +Cc: codel, bloat
On Mon, 2012-05-14 at 20:31 +0200, Eric Dumazet wrote:
> On Mon, 2012-05-14 at 20:24 +0200, Juliusz Chroboczek wrote:
> > > There is an API to set ECN on UDP packets.
> > >
> > > setsockopt() IP_TOS, with val = 2
> >
> > Interesting, I didn't realise that. Is it also possible to read the ECN
> > bits?
>
>
> Yep
>
> man 7 ip
>
> IP_RECVTOS (since Linux 2.2)
> If enabled the IP_TOS ancillary message is passed with incoming packets.
> It contains a byte which specifies the Type of Service/Precedence field
> of the packet header. Expects a boolean integer flag.
>
>
Here is a quick and dirty UDP echo server (reflecting the TOS and
correct src address on multi homed host )
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/udp.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <arpa/inet.h>
#define PORT 4040
int pktinfo_get(struct msghdr *my_hdr, struct in_pktinfo *pktinfo)
{
int res = -1;
if (my_hdr->msg_controllen > 0) {
struct cmsghdr *get_cmsg;
for (get_cmsg = CMSG_FIRSTHDR(my_hdr); get_cmsg;
get_cmsg = CMSG_NXTHDR(my_hdr, get_cmsg)) {
if (get_cmsg->cmsg_type == IP_PKTINFO) {
struct in_pktinfo *get_pktinfo = (struct in_pktinfo *)CMSG_DATA(get_cmsg);
memcpy(pktinfo, get_pktinfo, sizeof(*pktinfo));
res = 0;
}
}
}
return res;
}
int tos_get(struct msghdr *my_hdr, unsigned char *tos)
{
int res = -1;
if (my_hdr->msg_controllen > 0) {
struct cmsghdr *get_cmsg;
for (get_cmsg = CMSG_FIRSTHDR(my_hdr); get_cmsg;
get_cmsg = CMSG_NXTHDR(my_hdr, get_cmsg)) {
if (get_cmsg->cmsg_type == IP_TOS) {
unsigned char *pkttos = (unsigned char *)CMSG_DATA(get_cmsg);
*tos = *pkttos;
res = 0;
}
}
}
return res;
}
int main(int argc, char *argv[])
{
int fd = socket(AF_INET, SOCK_DGRAM, 0);
struct sockaddr_in addr, rem_addr;
int res, on = 1;
struct msghdr msghdr;
struct iovec vec[1];
char cbuf[512];
char frame[4096];
struct in_pktinfo pktinfo;
int c, count = 1000000;
unsigned char last_tos = 0;
while ((c = getopt(argc, argv, "c:")) != -1) {
if (c == 'c') count = atoi(optarg);
}
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(PORT);
if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
perror("bind");
return 1;
}
setsockopt(fd, SOL_IP, IP_PKTINFO, &on, sizeof(on));
setsockopt(fd, SOL_IP, IP_RECVTOS, &on, sizeof(on));
while (1) {
unsigned char tos;
memset(&msghdr, 0, sizeof(msghdr));
msghdr.msg_control = cbuf;
msghdr.msg_controllen = sizeof(cbuf);
msghdr.msg_iov = vec;
msghdr.msg_iovlen = 1;
vec[0].iov_base = frame;
vec[0].iov_len = sizeof(frame);
msghdr.msg_name = &rem_addr;
msghdr.msg_namelen = sizeof(rem_addr);
res = recvmsg(fd, &msghdr, 0);
if (res == -1)
break;
if (pktinfo_get(&msghdr, &pktinfo) == 0) {
// printf("Got IP_PKTINFO dst addr=%s\n", inet_ntoa(pktinfo.ipi_spec_dst));
}
if (tos_get(&msghdr, &tos) == 0) {
/* IP_TOS option wont be used in sendmsg(), we must use setsockopt() instead */
if (tos != last_tos) {
if (setsockopt(fd, SOL_IP, IP_TOS, &tos, sizeof(tos)) == 0)
last_tos = tos;
}
}
/* ok, just echo reply this frame.
* Using sendmsg() will provide IP_PKTINFO back to kernel
* to let it use the 'right' source address
* (destination address of the incoming packet)
*/
vec[0].iov_len = res;
sendmsg(fd, &msghdr, 0);
if (--count == 0)
break;
}
return 0;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-05-14 18:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-14 15:51 [Codel] better tc support for bittorrent/diffserv dave taht
2012-05-14 17:50 ` [Codel] [Bloat] " Juliusz Chroboczek
2012-05-14 18:03 ` Eric Dumazet
2012-05-14 18:18 ` dave taht
2012-05-14 18:30 ` Juliusz Chroboczek
2012-05-14 18:35 ` dave taht
2012-05-14 18:24 ` Juliusz Chroboczek
2012-05-14 18:31 ` Eric Dumazet
2012-05-14 18:42 ` Eric Dumazet
2012-05-14 18:14 ` Dave Taht
2012-05-14 18:31 ` Juliusz Chroboczek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox