Re: [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots

Cake - FQ_codel the next generation
 help / color / mirror / Atom feed

* Re: [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots
       [not found] <mailman.1042.1510953593.3609.cake@lists.bufferbloat.net>
@ 2017-11-18 13:18 ` Pete Heist
  2017-11-18 19:02   ` Dave Taht
  2017-11-20  2:56   ` [Cake] Cleaning up cake Dave Taht
  0 siblings, 2 replies; 8+ messages in thread
From: Pete Heist @ 2017-11-18 13:18 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake

[-- Attachment #1: Type: text/plain, Size: 1252 bytes --]

> On Nov 17, 2017, at 11:55 PM,dave.taht@gmail.com wrote:
> 
> Slotting is a crude approximation of the behaviors of shared media such
> as cable, wifi, and LTE, which gather up a bunch of packets within a
> varying delay window and deliver them, relative to that, nearly all at
> once.

Nice…

One of the things I also notice in my LAN tests is latencies for different flows staying at more or less fixed (and different) positions relative to the mean in flent results. Those positions, and the mean, can change with each test run. Do you think this could result from the hashing to different hardware queues (four in my case) changing between test runs? And is it worth trying to simulate this effect, or not really?

Just for info, in my case (Intel i210) the hashing is documented starting on page 254 of the specs: https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i210-ethernet-controller-datasheet.pdf <https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i210-ethernet-controller-datasheet.pdf> (7.1.2.10.1 RSS Hash Function). For TCP/UDP it uses source and destination addresses and ports. I suppose this could be smoothed over in testing by using a spread of ports for the latency test.

[-- Attachment #2: Type: text/html, Size: 1853 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots
  2017-11-18 13:18 ` [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots Pete Heist
@ 2017-11-18 19:02   ` Dave Taht
  2017-11-19 18:48     ` Pete Heist
  2017-11-20  2:56   ` [Cake] Cleaning up cake Dave Taht
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Taht @ 2017-11-18 19:02 UTC (permalink / raw)
  To: Pete Heist; +Cc: Cake List

On Sat, Nov 18, 2017 at 5:18 AM, Pete Heist <peteheist@gmail.com> wrote:
>
> On Nov 17, 2017, at 11:55 PM,dave.taht@gmail.com wrote:
>
>
> Slotting is a crude approximation of the behaviors of shared media such
> as cable, wifi, and LTE, which gather up a bunch of packets within a
> varying delay window and deliver them, relative to that, nearly all at
> once.
>
>
> Nice…

Meh. It really is "crude", and I keeping kicking about ways to somehow
emulate half (or less) duplex, variable rates around a mean, mcast,
etc.

it IS very nice to have a rate limiter that actually behaves a bit
more like wifi, and I hope to also add the new ack fitering stuff to
it.

>
> One of the things I also notice in my LAN tests is latencies for different
> flows staying at more or less fixed (and different) positions relative to
> the mean in flent results. Those positions, and the mean, can change with
> each test run. Do you think this could result from the hashing to different
> hardware queues (four in my case) changing between test runs?

yes if you are using bql probably. Is it sch_mq on top?

> And is it
> worth trying to simulate this effect, or not really?

Dunno. There are a couple ways to turn it off.

> Just for info, in my case (Intel i210) the hashing is documented starting on
> page 254 of the specs:
> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i210-ethernet-controller-datasheet.pdf
> (7.1.2.10.1 RSS Hash Function). For TCP/UDP it uses source and destination
> addresses and ports. I suppose this could be smoothed over in testing by
> using a spread of ports for the latency test.



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots
  2017-11-18 19:02   ` Dave Taht
@ 2017-11-19 18:48     ` Pete Heist
  2017-11-19 20:41       ` Dave Taht
  0 siblings, 1 reply; 8+ messages in thread
From: Pete Heist @ 2017-11-19 18:48 UTC (permalink / raw)
  To: Dave Taht; +Cc: Cake List


> On Nov 18, 2017, at 8:02 PM, Dave Taht <dave.taht@gmail.com> wrote:
> 
> On Sat, Nov 18, 2017 at 5:18 AM, Pete Heist <peteheist@gmail.com> wrote:
>> 
>> On Nov 17, 2017, at 11:55 PM,dave.taht@gmail.com wrote:
>> 
>> Slotting is a crude approximation of the behaviors of shared media such
>> as cable, wifi, and LTE, which gather up a bunch of packets within a
>> varying delay window and deliver them, relative to that, nearly all at
>> once.
>> 
>> Nice…
> 
> Meh. It really is "crude", and I keeping kicking about ways to somehow
> emulate half (or less) duplex, variable rates around a mean, mcast,
> etc.
> 
> it IS very nice to have a rate limiter that actually behaves a bit
> more like wifi, and I hope to also add the new ack fitering stuff to
> it.

I guess there may never be a way to make this perfect, only to try to reproduce the behavior that matters enough to make it usable for testing.

>> One of the things I also notice in my LAN tests is latencies for different
>> flows staying at more or less fixed (and different) positions relative to
>> the mean in flent results. Those positions, and the mean, can change with
>> each test run. Do you think this could result from the hashing to different
>> hardware queues (four in my case) changing between test runs?
> 
> yes if you are using bql probably. Is it sch_mq on top?

Yep with bql. I hadn’t thought about mq before. What my qos setup scripts are doing though is replacing the root qdisc (which now I see defaults to mq) with a single cake instance. With bql, should rather be leaving mq and putting four cake instances underneath it?

>> And is it
>> worth trying to simulate this effect, or not really?
> 
> Dunno. There are a couple ways to turn it off.

Fair enough...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots
  2017-11-19 18:48     ` Pete Heist
@ 2017-11-19 20:41       ` Dave Taht
  2017-11-20  9:02         ` Pete Heist
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Taht @ 2017-11-19 20:41 UTC (permalink / raw)
  To: Pete Heist; +Cc: Dave Taht, Cake List

Pete Heist <peteheist@gmail.com> writes:

>> On Nov 18, 2017, at 8:02 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> 
>> On Sat, Nov 18, 2017 at 5:18 AM, Pete Heist <peteheist@gmail.com> wrote:
>>> 
>>> On Nov 17, 2017, at 11:55 PM,dave.taht@gmail.com wrote:
>>> 
>>> Slotting is a crude approximation of the behaviors of shared media such
>>> as cable, wifi, and LTE, which gather up a bunch of packets within a
>>> varying delay window and deliver them, relative to that, nearly all at
>>> once.
>>> 
>>> Nice…
>> 
>> Meh. It really is "crude", and I keeping kicking about ways to somehow
>> emulate half (or less) duplex, variable rates around a mean, mcast,
>> etc.
>> 
>> it IS very nice to have a rate limiter that actually behaves a bit
>> more like wifi, and I hope to also add the new ack fitering stuff to
>> it.
>
> I guess there may never be a way to make this perfect, only to try to reproduce
> the behavior that matters enough to make it usable for testing.
>
>>> One of the things I also notice in my LAN tests is latencies for different
>>> flows staying at more or less fixed (and different) positions relative to
>>> the mean in flent results. Those positions, and the mean, can change with
>>> each test run. Do you think this could result from the hashing to different
>>> hardware queues (four in my case) changing between test runs?
>> 
>> yes if you are using bql probably. Is it sch_mq on top?

Only if you want unlimited mode, and birthday problems, and don't have
cpu to burn.

>
> Yep with bql. I hadn’t thought about mq before. What my qos setup scripts are
> doing though is replacing the root qdisc (which now I see defaults to mq) with a
> single cake instance. With bql, should rather be leaving mq and putting four
> cake instances underneath it?
>
>>> And is it
>>> worth trying to simulate this effect, or not really?
>> 
>> Dunno. There are a couple ways to turn it off.
>
> Fair enough...
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots
  2017-11-19 20:41       ` Dave Taht
@ 2017-11-20  9:02         ` Pete Heist
  2017-11-21 18:52           ` Dave Taht
  0 siblings, 1 reply; 8+ messages in thread
From: Pete Heist @ 2017-11-20  9:02 UTC (permalink / raw)
  To: Dave Taht; +Cc: Dave Taht, Cake List

[-- Attachment #1: Type: text/plain, Size: 647 bytes --]


> On Nov 19, 2017, at 9:41 PM, Dave Taht <dave@taht.net> wrote:
> 
>> Yep with bql. I hadn’t thought about mq before. What my qos setup scripts are
>> doing though is replacing the root qdisc (which now I see defaults to mq) with a
>> single cake instance. With bql, should rather be leaving mq and putting four
>> cake instances underneath it?
> Only if you want unlimited mode, and birthday problems, and don't have
> cpu to burn.

Ok, if I compare the two, latency under load for rrul_be looks “better” (~1-1.5ms instead of ~3ms) with a single “cake unlimited besteffort lan” instance instead of four of them underneath mq.

[-- Attachment #2: Type: text/html, Size: 2291 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots
  2017-11-20  9:02         ` Pete Heist
@ 2017-11-21 18:52           ` Dave Taht
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Taht @ 2017-11-21 18:52 UTC (permalink / raw)
  To: Pete Heist; +Cc: Dave Taht, Cake List

Pete Heist <peteheist@gmail.com> writes:

>     On Nov 19, 2017, at 9:41 PM, Dave Taht <dave@taht.net> wrote:
>
>     
>     
>     Yep with bql. I hadn’t thought about mq before. What my qos setup scripts
>         are
>         doing though is replacing the root qdisc (which now I see defaults to
>         mq) with a
>         single cake instance. With bql, should rather be leaving mq and putting
>         four
>         cake instances underneath it?
>
>     Only if you want unlimited mode, and birthday problems, and don't have
>     cpu to burn.
>     
>
> Ok, if I compare the two, latency under load for rrul_be looks “better”
> (~1-1.5ms instead of ~3ms) with a single “cake unlimited besteffort lan”
> instance instead of four of them underneath mq.

Yep. Adding more unmanaged queues doesn't help. However it should spread
the load across more cpus.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] Cleaning up cake
  2017-11-18 13:18 ` [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots Pete Heist
  2017-11-18 19:02   ` Dave Taht
@ 2017-11-20  2:56   ` Dave Taht
  1 sibling, 0 replies; 8+ messages in thread
From: Dave Taht @ 2017-11-20  2:56 UTC (permalink / raw)
  To: cake

I just finished a brutal exercise in making the cobalt branch of cake
vastly more checkpatch compliant. (I haven't checked it in yet)

What I am inclined to do next is move it to sch_cobalt.c, and reimport
(somehow) sch_cake.c from the stable commit that is currently shipping
in lede (?), and move that forward to also gain ack filtering.

Then add a q_cobalt equivalent to my iproute2 so we can build, test,
and compare the two versions, sanely.

I'd call this new branch "kobold". Or I'd find some way to fix head.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] [RFC PATCH 0/5] patches for iproute2-net-next for netem slotting and cake
@ 2017-11-17 21:19 Dave Taht
  2017-11-17 21:19 ` [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots Dave Taht
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Taht @ 2017-11-17 21:19 UTC (permalink / raw)
  To: cake

This has some work in progress towards finalizing the new netem 'slot'
feature (still needs to be made backward compatible with old kernels)
and a forward port of the existing tc-adv cake code to the latest iproute.

Dave Taht (5):
  Update to the actual net-next
  Add cake pkt_sched.h
  tc: support conversions to or from 64 bit nanosecond-based time
  q_netem: support delivering packets in delayed time slots
  netem: add documentation for the new slotting feature

 include/uapi/linux/pkt_sched.h | 68 ++++++++++++++++++++++++++++++++++++++++++
 man/man8/tc-netem.8            | 32 +++++++++++++++++++-
 tc/q_netem.c                   | 55 +++++++++++++++++++++++++++++++++-
 tc/tc_util.c                   | 60 +++++++++++++++++++++++++++++++++++++
 tc/tc_util.h                   |  3 ++
 5 files changed, 216 insertions(+), 2 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots
  2017-11-17 21:19 [Cake] [RFC PATCH 0/5] patches for iproute2-net-next for netem slotting and cake Dave Taht
@ 2017-11-17 21:19 ` Dave Taht
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Taht @ 2017-11-17 21:19 UTC (permalink / raw)
  To: cake

Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.

It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.

The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.

The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.

Examples of use:

tc qdisc add dev eth0 root netem delay 200us \
        slot 800us 10ms bytes 64k packets 42

A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:

tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
         limit 10000
tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
         slot 800us 10ms bytes 64k packets 42 limit 512
---
 tc/q_netem.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/tc/q_netem.c b/tc/q_netem.c
index 82eb46f..1524788 100644
--- a/tc/q_netem.c
+++ b/tc/q_netem.c
@@ -40,7 +40,10 @@ static void explain(void)
 "                 [ loss gemodel PERCENT [R [1-H [1-K]]]\n" \
 "                 [ ecn ]\n" \
 "                 [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]\n" \
-"                 [ rate RATE [PACKETOVERHEAD] [CELLSIZE] [CELLOVERHEAD]]\n");
+"                 [ rate RATE [PACKETOVERHEAD] [CELLSIZE] [CELLOVERHEAD]]\n" \
+"                 [ slot MIN_DELAY MAX_DELAY [packets MAX_PACKETS]" \
+" [bytes MAX_BYTES]]\n" \
+		);
 }
 
 static void explain1(const char *arg)
@@ -178,6 +181,7 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 	struct tc_netem_gimodel gimodel;
 	struct tc_netem_gemodel gemodel;
 	struct tc_netem_rate rate = {};
+	struct tc_netem_slot slot = {};
 	__s16 *dist_data = NULL;
 	__u16 loss_type = NETEM_LOSS_UNSPEC;
 	int present[__TCA_NETEM_MAX] = {};
@@ -421,6 +425,36 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 					return -1;
 				}
 			}
+		} else if (matches(*argv, "slot") == 0) {
+			NEXT_ARG();
+			present[TCA_NETEM_SLOT] = 1;
+			if (get_time64(&slot.min_delay, *argv)) {
+				explain1("slot min_delay");
+				return -1;
+			}
+			if (NEXT_IS_NUMBER()) {
+				NEXT_ARG();
+				if (get_time64(&slot.max_delay, *argv)) {
+					explain1("slot min_delay max_delay");
+					return -1;
+				}
+			}
+			if (slot.max_delay < slot.min_delay)
+				slot.max_delay = slot.min_delay;
+		} else if (matches(*argv, "packets") == 0) {
+			NEXT_ARG();
+			if (get_s32(&slot.max_packets, *argv, 0)) {
+				explain1("slot packets");
+				return -1;
+			}
+		} else if (matches(*argv, "bytes") == 0) {
+			unsigned int max_bytes;
+			NEXT_ARG();
+			if (get_size(&max_bytes, *argv)) {
+				explain1("slot bytes");
+				return -1;
+			}
+			slot.max_bytes = (int) max_bytes;
 		} else if (strcmp(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -481,6 +515,10 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 	    addattr_l(n, 1024, TCA_NETEM_CORRUPT, &corrupt, sizeof(corrupt)) < 0)
 		return -1;
 
+	if (present[TCA_NETEM_SLOT] &&
+	    addattr_l(n, 1024, TCA_NETEM_SLOT, &slot, sizeof(slot)) < 0)
+		return -1;
+
 	if (loss_type != NETEM_LOSS_UNSPEC) {
 		struct rtattr *start;
 
@@ -535,6 +573,7 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 	int *ecn = NULL;
 	struct tc_netem_qopt qopt;
 	const struct tc_netem_rate *rate = NULL;
+	const struct tc_netem_slot *slot = NULL;
 	int len;
 	__u64 rate64 = 0;
 
@@ -595,6 +634,11 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 				return -1;
 			rate64 = rta_getattr_u64(tb[TCA_NETEM_RATE64]);
 		}
+		if (tb[TCA_NETEM_SLOT]) {
+			if (RTA_PAYLOAD(tb[TCA_NETEM_SLOT]) < sizeof(*slot))
+				return -1;
+		        slot = RTA_DATA(tb[TCA_NETEM_SLOT]);
+		}
 	}
 
 	fprintf(f, "limit %d", qopt.limit);
@@ -668,6 +712,15 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 			fprintf(f, " celloverhead %d", rate->cell_overhead);
 	}
 
+	if (slot) {
+		fprintf(f, " slot %s", sprint_time64(slot->min_delay, b1));
+		fprintf(f, " %s", sprint_time64(slot->max_delay, b1));
+		if(slot->max_packets)
+			fprintf(f, " packets %d", slot->max_packets);
+		if(slot->max_bytes)
+			fprintf(f, " bytes %d", slot->max_bytes);
+	}
+
 	if (ecn)
 		fprintf(f, " ecn ");
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-21 18:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.1042.1510953593.3609.cake@lists.bufferbloat.net>
2017-11-18 13:18 ` [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots Pete Heist
2017-11-18 19:02   ` Dave Taht
2017-11-19 18:48     ` Pete Heist
2017-11-19 20:41       ` Dave Taht
2017-11-20  9:02         ` Pete Heist
2017-11-21 18:52           ` Dave Taht
2017-11-20  2:56   ` [Cake] Cleaning up cake Dave Taht
2017-11-17 21:19 [Cake] [RFC PATCH 0/5] patches for iproute2-net-next for netem slotting and cake Dave Taht
2017-11-17 21:19 ` [Cake] [RFC PATCH 4/5] q_netem: support delivering packets in delayed time slots Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox