* [Cake] lockup with cake and veth
@ 2018-07-06 13:17 Pete Heist
2018-07-06 13:29 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 7+ messages in thread
From: Pete Heist @ 2018-07-06 13:17 UTC (permalink / raw)
To: cake
[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]
I don’t know if we want to call this an issue, but...
I’m seeing a lockup with cake (and also sfq, but not either pfifo or fq_codel), when run over veth devices. Two network namespaces are created, one for client and one for server, each with one veth device. Netem is added as the root qdisc with a delay of 1ms, and a leaf qdisc may be added. Lockups occur on my box when the leaf qdisc is either cake or sfq, and I'm running flent’s tcp_ndown test with >= 4 download streams. Note that I happen to be running on a quad-core.
- If no leaf qdisc is added below netem, no lockup occurs.
- If either pfifo or fq_codel is added below netem, no lockup occurs.
- If either cake or sfq is the leaf, the lockup occurs.
The symptoms (lockup with >= 4 streams on a quad-core box), and the fact that it occurs with both cake and sfq, make me think that it may simply have to do with the code not being re-entrant, which may be the case for veth, and this is just by design? maybe something that we should consider fixing but wouldn’t be a show-stopper? But that should be confirmed.
I’ll keep investigating, but am sharing the scripts I’m running meanwhile in case anyone else wants to look. See README.txt in the attached...
Pete
[-- Attachment #2: veth_cake.tgz --]
[-- Type: application/octet-stream, Size: 1415 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Cake] lockup with cake and veth
2018-07-06 13:17 [Cake] lockup with cake and veth Pete Heist
@ 2018-07-06 13:29 ` Toke Høiland-Jørgensen
2018-07-06 13:34 ` Georgios Amanakis
2018-07-06 13:37 ` Pete Heist
0 siblings, 2 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-07-06 13:29 UTC (permalink / raw)
To: Pete Heist, cake
Pete Heist <pete@heistp.net> writes:
> I don’t know if we want to call this an issue, but...
>
> I’m seeing a lockup with cake (and also sfq, but not either pfifo or
> fq_codel), when run over veth devices. Two network namespaces are
> created, one for client and one for server, each with one veth device.
> Netem is added as the root qdisc with a delay of 1ms, and a leaf qdisc
> may be added. Lockups occur on my box when the leaf qdisc is either
> cake or sfq, and I'm running flent’s tcp_ndown test with >= 4 download
> streams. Note that I happen to be running on a quad-core.
>
> - If no leaf qdisc is added below netem, no lockup occurs.
> - If either pfifo or fq_codel is added below netem, no lockup occurs.
> - If either cake or sfq is the leaf, the lockup occurs.
>
> The symptoms (lockup with >= 4 streams on a quad-core box), and the
> fact that it occurs with both cake and sfq, make me think that it may
> simply have to do with the code not being re-entrant, which may be the
> case for veth, and this is just by design? maybe something that we
> should consider fixing but wouldn’t be a show-stopper? But that should
> be confirmed.
>
> I’ll keep investigating, but am sharing the scripts I’m running
> meanwhile in case anyone else wants to look. See README.txt in the
> attached...
Thanks for investigating! I'll take a look later. The fact that it
happens with sfq as well means it's probably not cake-specific, though,
so I don't think we should hold off on the upstream submission until
we've figured it out. Using leaf qdiscs with netem has been dodgy for a
while IIRC...
-Toke
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Cake] lockup with cake and veth
2018-07-06 13:29 ` Toke Høiland-Jørgensen
@ 2018-07-06 13:34 ` Georgios Amanakis
2018-07-06 13:37 ` Pete Heist
1 sibling, 0 replies; 7+ messages in thread
From: Georgios Amanakis @ 2018-07-06 13:34 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Pete Heist, Cake List
[-- Attachment #1: Type: text/plain, Size: 2023 bytes --]
Thank you both for the great work!
I will give it a try, too.
George
On Fri, Jul 6, 2018, 9:29 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> Pete Heist <pete@heistp.net> writes:
>
> > I don’t know if we want to call this an issue, but...
> >
> > I’m seeing a lockup with cake (and also sfq, but not either pfifo or
> > fq_codel), when run over veth devices. Two network namespaces are
> > created, one for client and one for server, each with one veth device.
> > Netem is added as the root qdisc with a delay of 1ms, and a leaf qdisc
> > may be added. Lockups occur on my box when the leaf qdisc is either
> > cake or sfq, and I'm running flent’s tcp_ndown test with >= 4 download
> > streams. Note that I happen to be running on a quad-core.
> >
> > - If no leaf qdisc is added below netem, no lockup occurs.
> > - If either pfifo or fq_codel is added below netem, no lockup occurs.
> > - If either cake or sfq is the leaf, the lockup occurs.
> >
> > The symptoms (lockup with >= 4 streams on a quad-core box), and the
> > fact that it occurs with both cake and sfq, make me think that it may
> > simply have to do with the code not being re-entrant, which may be the
> > case for veth, and this is just by design? maybe something that we
> > should consider fixing but wouldn’t be a show-stopper? But that should
> > be confirmed.
> >
> > I’ll keep investigating, but am sharing the scripts I’m running
> > meanwhile in case anyone else wants to look. See README.txt in the
> > attached...
>
> Thanks for investigating! I'll take a look later. The fact that it
> happens with sfq as well means it's probably not cake-specific, though,
> so I don't think we should hold off on the upstream submission until
> we've figured it out. Using leaf qdiscs with netem has been dodgy for a
> while IIRC...
>
> -Toke
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
[-- Attachment #2: Type: text/html, Size: 2751 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Cake] lockup with cake and veth
2018-07-06 13:29 ` Toke Høiland-Jørgensen
2018-07-06 13:34 ` Georgios Amanakis
@ 2018-07-06 13:37 ` Pete Heist
2018-07-06 19:24 ` Pete Heist
1 sibling, 1 reply; 7+ messages in thread
From: Pete Heist @ 2018-07-06 13:37 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: cake
[-- Attachment #1: Type: text/plain, Size: 857 bytes --]
> On Jul 6, 2018, at 3:29 PM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
> Thanks for investigating! I'll take a look later. The fact that it
> happens with sfq as well means it's probably not cake-specific, though,
> so I don't think we should hold off on the upstream submission until
> we've figured it out. Using leaf qdiscs with netem has been dodgy for a
> while IIRC...
Yeah, I’m seeing now that the lockup does _not_ occur when netem is not used and cake is the root.
I’m also satisfied for now that this shouldn’t hold us up. However, what I’ll also want to try next is netem on a separate veth device from cake. I believe Dave’s veth testing earlier was using three namespaces / veth devices, maybe for this reason(?)
Nice work Jon on the lockup bug also- it’s nice seeing this come together… :)
Pete
[-- Attachment #2: Type: text/html, Size: 5020 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Cake] lockup with cake and veth
2018-07-06 13:37 ` Pete Heist
@ 2018-07-06 19:24 ` Pete Heist
2018-07-06 19:40 ` Dave Taht
0 siblings, 1 reply; 7+ messages in thread
From: Pete Heist @ 2018-07-06 19:24 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: cake
[-- Attachment #1: Type: text/plain, Size: 1115 bytes --]
> On Jul 6, 2018, at 3:37 PM, Pete Heist <pete@heistp.net> wrote:
>
> I’m also satisfied for now that this shouldn’t hold us up. However, what I’ll also want to try next is netem on a separate veth device from cake. I believe Dave’s veth testing earlier was using three namespaces / veth devices, maybe for this reason(?)
So under the category of “unsurprising”, with netem as a root qdisc in a separate namespace/veth device from cake, there is no lockup. The attached script creates an environment with 6 namespaces (client, client qdisc, client delay, server delay, server qdisc, server) with veth devices / bridges between. It’s easy to switch the qdisc or netem params, if anyone needs such a thing.
Only, I wonder if high BDP links can be emulated with netem. Say I want 10gbit with 100ms rtt. If I use “netem delay 50ms” each direction, rtt is correct, but throughput for iperf3 drops from 30Gbit to 450Mbit. CPU is a few percent, so that’s not the issue. Fiddling with TCP window (iperf3 -w) just seems to make it worse. So I’m still figuring out netem, I guess...
Pete
[-- Attachment #2: veth_qm.tgz --]
[-- Type: application/octet-stream, Size: 1476 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Cake] lockup with cake and veth
2018-07-06 19:24 ` Pete Heist
@ 2018-07-06 19:40 ` Dave Taht
2018-07-06 22:12 ` Pete Heist
0 siblings, 1 reply; 7+ messages in thread
From: Dave Taht @ 2018-07-06 19:40 UTC (permalink / raw)
To: Pete Heist; +Cc: Toke Høiland-Jørgensen, Cake List
You ran out of queue to store for that rtt.
netem limit 100000 or higher for that.
On Fri, Jul 6, 2018 at 12:25 PM Pete Heist <pete@heistp.net> wrote:
>
>
> > On Jul 6, 2018, at 3:37 PM, Pete Heist <pete@heistp.net> wrote:
> >
> > I’m also satisfied for now that this shouldn’t hold us up. However, what I’ll also want to try next is netem on a separate veth device from cake. I believe Dave’s veth testing earlier was using three namespaces / veth devices, maybe for this reason(?)
>
> So under the category of “unsurprising”, with netem as a root qdisc in a separate namespace/veth device from cake, there is no lockup. The attached script creates an environment with 6 namespaces (client, client qdisc, client delay, server delay, server qdisc, server) with veth devices / bridges between. It’s easy to switch the qdisc or netem params, if anyone needs such a thing.
>
> Only, I wonder if high BDP links can be emulated with netem. Say I want 10gbit with 100ms rtt. If I use “netem delay 50ms” each direction, rtt is correct, but throughput for iperf3 drops from 30Gbit to 450Mbit. CPU is a few percent, so that’s not the issue. Fiddling with TCP window (iperf3 -w) just seems to make it worse. So I’m still figuring out netem, I guess...
>
> Pete
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Cake] lockup with cake and veth
2018-07-06 19:40 ` Dave Taht
@ 2018-07-06 22:12 ` Pete Heist
0 siblings, 0 replies; 7+ messages in thread
From: Pete Heist @ 2018-07-06 22:12 UTC (permalink / raw)
To: Dave Taht; +Cc: Cake List
[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]
Ok, it ended up being partly that, and mainly needing to increase kernel params for max TCP window memory, otherwise there was a hard limit on what I’d actually end up getting for the TCP window.
Rates start getting finicky with these large TCP windows, go too large and the rates can collapse for some reason, but I have managed to run about 6Gbit with 100ms RTT through Cake this way without a problem, either with a single stream or the aggregate of up to four streams. Or 20Gbit at 20ms RTT.
./qenv.sh on
./run_iperf.sh
./run_flent.sh
> On Jul 6, 2018, at 9:40 PM, Dave Taht <dave.taht@gmail.com> wrote:
>
> You ran out of queue to store for that rtt.
>
> netem limit 100000 or higher for that.
> On Fri, Jul 6, 2018 at 12:25 PM Pete Heist <pete@heistp.net> wrote:
>>
>> Only, I wonder if high BDP links can be emulated with netem. Say I want 10gbit with 100ms rtt. If I use “netem delay 50ms” each direction, rtt is correct, but throughput for iperf3 drops from 30Gbit to 450Mbit. CPU is a few percent, so that’s not the issue. Fiddling with TCP window (iperf3 -w) just seems to make it worse. So I’m still figuring out netem, I guess...
[-- Attachment #2: veth_qm.tgz --]
[-- Type: application/octet-stream, Size: 1611 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-07-06 22:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-06 13:17 [Cake] lockup with cake and veth Pete Heist
2018-07-06 13:29 ` Toke Høiland-Jørgensen
2018-07-06 13:34 ` Georgios Amanakis
2018-07-06 13:37 ` Pete Heist
2018-07-06 19:24 ` Pete Heist
2018-07-06 19:40 ` Dave Taht
2018-07-06 22:12 ` Pete Heist
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox