From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-qk0-x233.google.com (mail-qk0-x233.google.com
	[IPv6:2607:f8b0:400d:c09::233])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id A049D21F403
	for <cake@lists.bufferbloat.net>; Sat, 11 Apr 2015 19:34:01 -0700 (PDT)
Received: by qkx62 with SMTP id 62so106491195qkx.0
	for <cake@lists.bufferbloat.net>; Sat, 11 Apr 2015 19:34:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:content-transfer-encoding;
	bh=H6SdndRHKBbnjlXQAvVblIrAZl1ulkOC7Rd0oYVbZQA=;
	b=bZ8CLzyUaIbWnqZd7EmPhFzXkVFKC8sYQqaOftvQLzNJrRMWAFFdQpAHyVdkgnDVBW
	8H832qp1R+7+rw1/p2KvYkH3dN7NLF6Ubv34lB6aFjCvUSaqa3Q+OHIyPvyWzF2dBy6Z
	t3U678FhJ7qAzrgydhVfQrQpReaKa3xuLD0HoIw+Y1zHYiurtjMaFjjUWq1h6BLbCJFF
	8VtS8OKL9Tr3k+Oc9hEd4m/wd1KY+IjVxjCfTPr2ZCoWSLAr5RsEX3ZvDQZOMxCNHIua
	SSkX4iH1x4Ut8wwPRFytVn0SV4XU0Hhmm8MsSk2Z+fEsxxJ19s8iVBZek0xJqIrmjx2e
	VDtA==
MIME-Version: 1.0
X-Received: by 10.182.243.229 with SMTP id xb5mr7686263obc.63.1428806039975;
	Sat, 11 Apr 2015 19:33:59 -0700 (PDT)
Received: by 10.202.51.66 with HTTP; Sat, 11 Apr 2015 19:33:59 -0700 (PDT)
In-Reply-To: <CAA93jw4JmVNK6Fy-gAhW6CT=TJFesmVVd_DxmqHcfEiKhAhc8g@mail.gmail.com>
References: <CAA93jw7t=YNrJB+Ja60P_dGSLgo8b4KudxQBM7aqHCyj67M6vA@mail.gmail.com>
	<CAA93jw75+WnJn-0bPrb--oKnCh0ykoUnU8XxA82r1NxH61sTnw@mail.gmail.com>
	<CAA93jw7GE7f_LrVUAOH+nuSMfUY7Nhwz3y+i4Q_Q6Nt0bxViyQ@mail.gmail.com>
	<CAA93jw4z5kJdhUCVyTbee+k5_CYTxiY=Z_SjCD=pVPV38ScGng@mail.gmail.com>
	<CAA93jw4JmVNK6Fy-gAhW6CT=TJFesmVVd_DxmqHcfEiKhAhc8g@mail.gmail.com>
Date: Sat, 11 Apr 2015 19:33:59 -0700
Message-ID: <CAA93jw500bsKEBuCbp+p_=dXfG4LhHtmGN-Q-H=tpsOBZ-rqaw@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: cake@lists.bufferbloat.net
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Cake] cake exploration
X-BeenThere: cake@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Cake - FQ_codel the next generation <cake.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cake>,
	<mailto:cake-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cake>
List-Post: <mailto:cake@lists.bufferbloat.net>
List-Help: <mailto:cake-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cake>,
	<mailto:cake-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 12 Apr 2015 02:34:30 -0000

17)  the atm compensation in cake is entirely untested. And it is
unclear as to how best handle pppoe.

On Sat, Apr 11, 2015 at 12:12 PM, Dave Taht <dave.taht@gmail.com> wrote:
> 16) Better VPN handling
>
> 5 flows of encapsulated vpn traffic competes badly with 5 flows of
> (say) bittorrent traffic.
>
> We could give the ipsec form of vpn traffic a boost by explicitly
> recognising and classifying the AH and ESP headers into a different
> diffserv bin.
>
> Similarly, tinc and openvpn, could use a port match.
>
>
>
> On Sat, Apr 11, 2015 at 11:48 AM, Dave Taht <dave.taht@gmail.com> wrote:
>> 15) Needs to work so an ISP can create service classes for their custome=
rs
>>
>> DRR 1: cake bandwidth X
>> DRR 2: cake bandwidth Y
>> DRR 3:
>>
>> I have no idea whether this can work at all, last I tried it DRR would
>> stall everytime fq_codel had no packets to deliver.
>>
>> A related issue is that there needs to be a way to have a tc or
>> iptables filter to map multiple IPs and addresses so that they return
>> a single distinct integer for placement into such queue systems
>> so that a lookup of someone that has x.y.x.z, q.f.b.a/24, j:k:h/64 and
>> l:m:n/48 can return a single integer representing the customer so it
>> can be fed into the above correct sub-queuedisc.
>>
>> I think, but that is not sure, that is all my backlog!
>>
>> On Sat, Apr 11, 2015 at 11:47 AM, Dave Taht <dave.taht@gmail.com> wrote:
>>> 14) strict priority queues. Some CBR techniques, notably IPTV, want 0
>>> packet loss, but run at a rate determined by the provider to be below
>>> what the subscriber will use. Sharing that "fairly" will lead to loss
>>> of packets to those applications.
>>>
>>> I do not like strict priority queues. I would prefer, for example,
>>> that the CBR application be marked with ECN, and ignored, vs the the
>>> high probability someone will abuse a strict priority queue.
>>>
>>>
>>>
>>> On Sat, Apr 11, 2015 at 11:45 AM, Dave Taht <dave.taht@gmail.com> wrote=
:
>>>> 12) Better starting interval and target for codel=C2=B4s maintence var=
s in
>>>> relationship to existing flows
>>>>
>>>> Right now sch_fq, sch_pie give priority to flows in their first IW
>>>> phases. This makes them vulnerable to DDOS attacks with tons of new
>>>> flows.
>>>>
>>>> sch_fq_codel mitigates this somewhat by starting to hash flows into
>>>> the same buckets.
>>>>
>>>> sch_cake=C2=B4s more perfect hashing gives IW more of a boost.
>>>>
>>>> A thought was to do a combined ewma of all active flows and to hand
>>>> their current codel settings to new flows as they arrive, with less of
>>>> a boost.
>>>>
>>>> This MIGHT work better when you have short RTTs generally on local
>>>> networks. Other thoughts appreciated.
>>>>
>>>> There is another related problem in the resumption portion of the
>>>> algorithm as the decay of the existing state variables is arbitrary
>>>> and way too long in some cases. I think I had solved this by coming up
>>>> with an estimate for the amount of decay needed other than count - 2,
>>>> doing a calculation from the last time a flow had packets to the next,
>>>> but can=C2=B4t remember how I did it! It is easy if you have a last ti=
me
>>>> per queue and use a normal sqrt with a divide... but my brain crashes
>>>> at the reciprocal cache math we have instead....
>>>>
>>>> I am not allergic to a divide. I am not allergic to using a shift for
>>>> the target and calculating the interval only relative to bandwidth, as
>>>> mentioned elsewhere. At 64k worth of bandwidth we just end up with a
>>>> huge interval, no big deal. But plan to ride along with the two
>>>> separately for now.
>>>>
>>>> 13)  It might be possible to write a faster codel - and easier to read
>>>> by using a case statement on the 2 core variables in it. The current
>>>> code does not show the 3 way state machine as well as that could, and
>>>> for all I know there is something intelligent we could do with the 4th
>>>> state.
>>>>
>>>> On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht <dave.taht@gmail.com> wrot=
e:
>>>>> Stuff on my backlog of researchy stuff.
>>>>>
>>>>> 1) cake_drop_monitor - I wanted a way to throw drop AND mark
>>>>> notifications up to userspace,
>>>>> including the packet=C2=B4s time of entry and the time of drop, as we=
ll as
>>>>> the IP headers
>>>>> and next hop destination macaddr.
>>>>>
>>>>> There are many use cases for this:
>>>>>
>>>>> A)  - testing the functionality of the algorithm and being able to
>>>>> collect and analyze drops as  they happen.
>>>>>
>>>>> NET_DROP_MONITOR did not cut it but I have not looked at it in a year=
.
>>>>> It drives me crazy to be dropping packets all over the system and to
>>>>> not be able to track down where they happened.
>>>>>
>>>>> This is the primary reason why I had switched back to 64 bit timestam=
ps, btw.
>>>>>
>>>>> B) Having the drop notifications might be useful in tuning or steerin=
g
>>>>> traffic to different routes.
>>>>>
>>>>> C) It is way easier to do a graph of the drop pattern with this info
>>>>> thrown to userspace.
>>>>>
>>>>> 2) Dearly wanted to actually be doing the timestamping AND hashing in
>>>>> the native skb
>>>>> struct on entry to the system itself, not the qdisc. Measuring the
>>>>> latency from ingress from the
>>>>> wire to egress would result in much better cpu overload behavior. I a=
m
>>>>> totally aware of
>>>>> how much mainline linux would not take this option, but things have
>>>>> evolved over there, so
>>>>> leveraging the rxhash and skb->timestamp fields seems a possibility..=
.
>>>>>
>>>>> I think this would let us get along better with netem also, but would
>>>>> have to go look again.
>>>>>
>>>>> Call that cake-rxhash. :)
>>>>>
>>>>> 3) In my benchmark of the latest cake3, ecn traffic was not as good a=
s
>>>>> expected, but that might have been an anomoly of the test. Need to
>>>>> test ecn thoroughly this time, almost in preference to looking at dro=
p
>>>>> behavior. Toke probably has ecn off by default right now. On, after
>>>>> this test run?
>>>>>
>>>>> 4) Testing higher rates and looking at cwnd for codel is important.
>>>>> The dropoff toke noted in his paper is real. Also there is possibly
>>>>> some ideal ratio between number of flows and bandwidth that makes mor=
e
>>>>> sense than a fixed number of flows. Also I keep harping on the darn
>>>>> resumption algo... but need to test with lousier tcps like windows.
>>>>>
>>>>> 5) Byte Mode-ish handling
>>>>>
>>>>> Dropping a single 64 byte packet does little good. You will find in
>>>>> the 50 flow tests that a ton of traffic is acks, not being dropped,
>>>>> and pie does better in this case than does fq, as it shoots
>>>>> wildly at everything, but usually misses the fat packets, where DRR
>>>>> will merrily store up an entire
>>>>> MTU worth of useless acks when only one is needed.
>>>>>
>>>>> So just trying to drop more little packets might be helpful in some c=
ases.
>>>>>
>>>>> 6) Ack thinning. I gave what is conventionally called "stretch acks" =
a
>>>>> new name, as stretch acks
>>>>> have a deserved reputation as sucking. Well, they dont suck anymore i=
n
>>>>> linux, and what I was
>>>>> mostly thinking was to drop no more than 2 in a row...
>>>>>
>>>>> One thing this would help with is in packing wifi aggregates - which
>>>>> have hard limits on the number of packets in a TXOP (42), and a byte
>>>>> limit on wireless n of 64k. Sending 41 acks from
>>>>> one flow, when you could send the last 2, seems like a big win on
>>>>> packing a TXOP.
>>>>>
>>>>> (this is something eric proposed, and given the drop rates we now see
>>>>> from wifi and the wild and wooly internet I am inclined to agree that
>>>>> it is worth fiddling with)
>>>>>
>>>>> (I am not huge on it, though)
>>>>>
>>>>> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
>>>>> an internal, switched network, it  would be better to try and maximiz=
e
>>>>> the port usage rather than the 5 tuple in some cases.
>>>>>
>>>>> I have never got around to writing a mac hash I liked, my goal
>>>>> originally was to write one that found a minimal perfect hash solutio=
n
>>>>> eventually as mac addrs tend to be pretty stable on a network and
>>>>> rarely change.
>>>>>
>>>>> Warning: minimal perfect hash attempts are a wet paint thing! I reall=
y
>>>>> want a FPGA solver for them.... dont go play with the code out there,
>>>>> you will lose days to it... you have been warned.
>>>>>
>>>>> http://cmph.sourceforge.net/concepts.html
>>>>>
>>>>> I would like there to be a generic mac hashing thing in tc, actually.
>>>>>
>>>>> 8) Parallel FIB lookup
>>>>>
>>>>> IF you assume that you have tons of queues routing packets from
>>>>> ingress to egress, on tons of cpus, you can actually do the FIB looku=
p
>>>>> in parallel also. There is some old stuff on virtualqueue
>>>>> and virtual clock fqing which makes for tighter
>>>>>
>>>>> 9) Need a codel *library* that works at the mac80211 layer. I think
>>>>> codel*.h sufficies but am not sure. And for that matter, codel itself
>>>>> seems like it would need a calculated target and a few other thing to
>>>>> work right on wifi.
>>>>>
>>>>> As for the hashing...
>>>>>
>>>>> Personally I do not think that the 8 way set associative has is what
>>>>> wifi needs for cake, I tend to think we need to "pack" aggregates wit=
h
>>>>> as many different flows as possible, and randomize how we packet
>>>>> them... I think.... maybe....
>>>>>
>>>>> 10) I really dont like BQL with multi-queued hardware queues. More
>>>>> backpressure is needed in that case than we get.
>>>>>
>>>>> 11) GRO peeling
>>>>>
>>>>> Offloads suck
>>>>>
>>>>> --
>>>>> Dave T=C3=A4ht
>>>>> Let's make wifi fast, less jittery and reliable again!
>>>>>
>>>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>>>>
>>>>
>>>>
>>>> --
>>>> Dave T=C3=A4ht
>>>> Let's make wifi fast, less jittery and reliable again!
>>>>
>>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>>>
>>>
>>>
>>> --
>>> Dave T=C3=A4ht
>>> Let's make wifi fast, less jittery and reliable again!
>>>
>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>>
>>
>>
>> --
>> Dave T=C3=A4ht
>> Let's make wifi fast, less jittery and reliable again!
>>
>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>
>
>
> --
> Dave T=C3=A4ht
> Let's make wifi fast, less jittery and reliable again!
>
> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb


--=20
Dave T=C3=A4ht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb