From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x231.google.com (mail-wi0-x231.google.com [IPv6:2a00:1450:400c:c05::231]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id D039821F28B for ; Mon, 6 Oct 2014 03:07:41 -0700 (PDT) Received: by mail-wi0-f177.google.com with SMTP id fb4so4015738wid.16 for ; Mon, 06 Oct 2014 03:07:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=BIZDsrH6pkgFC6K3utHr3W+EWbCpA9BnGM4yUrWOmlc=; b=c9BzsIk+71uj89dvIWGprciLPVYYHEHL4xRaadypxb4B59r3XXP3JalnqT/NGRXEr7 XiAPZWAJ771Mjs8L1W56L+Ufd1b9b6xc3+fCESh0lVkSZ2rM12HxAFtqQ/D9jsGzIB1U 35o3vpcCT1yTNYpKSR9YiH7D+eI2YWyOq0bTDaRzk2V7HnQhg/g1r3yJHqX8XSGkMSNB S7l+XaX9Vp9CB86BpHvKf3TDs9Dbmp21hbm9VZjDiQsrL8mtvU+uMbmBGI1TqA/GpRTV uStXQ0/rPJgmnOkJD9vdDDw92HMj0qlkD6QxdLAV3xheLeTbqp+6jeEslSuW7XSF3mCF HXuw== X-Received: by 10.180.11.196 with SMTP id s4mr16584313wib.16.1412590059262; Mon, 06 Oct 2014 03:07:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.16.7 with HTTP; Mon, 6 Oct 2014 03:07:19 -0700 (PDT) From: Richard Edmands Date: Mon, 6 Oct 2014 21:07:19 +1100 Message-ID: To: cerowrt-devel@lists.bufferbloat.net Content-Type: multipart/alternative; boundary=001a11c23788b217260504be403f Subject: [Cerowrt-devel] possible ingress shaping bug? X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Oct 2014 10:08:10 -0000 --001a11c23788b217260504be403f Content-Type: text/plain; charset=UTF-8 Greetings Everyone! As part of research into a related part of the kernel we ran across an oddity with nfct on ifb devices. To give weight to the existence of the oddity, a Debian Jessie (so 3.16.3 based kernel) system was setup with wan and lan ports. A modified version of the gentoo fq_codel script was used (Wan is eth0 and associated IFB is ifb0). Including script. The kernel had 2 printk's inserted (see included change to cls_flows.c) to print out the ip addresses which were being returned to the flows process for nfct-src and nfct-dst. The nfct-src attached to eth0 printk was heard from, it returned the right answers and some nonsense as expected. The nfct-dst attached to ifb0 printk didn't return a thing, not even nonsense. ------------------------------------------------------------------------------------------------------ (Wild speculation) Maybe the conntrack lookup only works correctly from the right device? so the lookup from the packet on ifb0 is being looked up in conntrack as if it should relate to a connection on ifb0 instead of a connection on eth0? --------------------------------------------------------------------------- #!/bin/sh ## Paths and definitions tc=/sbin/tc IPT=iptables #The interface name of the internet link ext=eth0 #Ignore this ext_ingress=ifb0 # Set these as per your adsl sync rate, at 90% to start with, modify up and down until the tests work. ext_up=40000kbit ext_down=40000kbit ethtool -K eth0 tso off gro off gso off ethtool -K eth1 tso off gro off gso off modprobe ifb modprobe sch_fq_codel modprobe act_mirred modprobe cls_flow # Clear old queuing disciplines (qdisc) on the interfaces $tc qdisc del dev $ext root $tc qdisc del dev $ext ingress $tc qdisc del dev $ext_ingress root $tc qdisc del dev $ext_ingress ingress ######### # INGRESS ######### # Create ingress on external interface $tc qdisc add dev $ext handle ffff: ingress ifconfig $ext_ingress up # if the interace is not up bad things happen # Forward all ingress traffic to the IFB device $tc filter add dev $ext parent ffff: protocol all u32 match u32 0 0 action mirred egress redirect dev $ext_ingress # Create an EGRESS filter on the IFB device #You can play with the r2q value, higher for faster links, lower for slower. FIXME? Totally need some maths and tables for the correct values in various situations. Is being too high safer than being too low? $tc qdisc add dev $ext_ingress root handle 1: htb default 12 r2q 5 # Add root class HTB with rate limiting $tc class add dev $ext_ingress parent 1: classid 1:1 htb rate $ext_down overhead 40 mtu 1492 mpu 53 linklayer atm $tc class add dev $ext_ingress parent 1:1 classid 1:11 htb rate $ext_down prio 0 overhead 40 mtu 1492 mpu 53 linklayer atm $tc class add dev $ext_ingress parent 1:1 classid 1:12 htb rate $ext_down prio 0 overhead 40 mtu 1492 mpu 53 linklayer atm #This is where fq_codel is installed for downlink traffic $tc qdisc add dev $ext_ingress parent 1:11 fq_codel noecn target 50ms interval 300ms quantum 512 flows 1024 limit 3000 $tc qdisc add dev $ext_ingress parent 1:12 fq_codel noecn target 50ms interval 300ms quantum 512 flows 1024 limit 3000 #FIXME? Really need to find/have a discussion and determine good values for this $tc filter add dev $ext_ingress protocol ip parent 1: handle 11 flow hash keys nfct-dst divisor 1024 baseclass 1:11 $tc filter add dev $ext_ingress protocol ip parent 1: handle 12 flow hash keys nfct-dst divisor 1024 baseclass 1:12 # Yes I KNOW this does not work. $tc filter add dev $ext_ingress protocol ip parent 1: prio 2 u32 match ip dst 172.30.200.2 flowid 1:11 ######### # EGRESS ######### # Add FQ_CODEL to EGRESS on external interface #You can play with the r2q value, higher for faster links, lower for slower. FIXME? Totally need some maths and tables for the correct values in various situations. Is being too high safer than being too low? $tc qdisc add dev $ext root handle 1: htb default 12 r2q 1 # Add root class HTB with rate limiting $tc class add dev $ext parent 1: classid 1:1 htb rate $ext_up overhead 40 mtu 1492 mpu 53 linklayer atm $tc class add dev $ext parent 1:1 classid 1:11 htb rate 70kbit ceil $ext_up overhead 40 mtu 1492 mpu 53 linklayer atm $tc class add dev $ext parent 1:1 classid 1:12 htb rate 30kbit ceil $ext_up overhead 40 mtu 1492 mpu 53 linklayer atm #This is where fq_codel is installed for uplink traffic $tc qdisc add dev $ext parent 1:11 handle 11: fq_codel noecn target 100ms interval 300ms quantum 300 flows 1024 limit 3000 $tc qdisc add dev $ext parent 1:12 handle 12: fq_codel noecn target 100ms interval 300ms quantum 300 flows 1024 limit 3000 $tc filter add dev $ext protocol ip parent 1: handle 11 flow hash keys nfct-src divisor 1024 baseclass 1:11 $tc filter add dev $ext protocol ip parent 1: handle 12 flow hash keys nfct-src divisor 1024 baseclass 1:12 $tc filter add dev $ext protocol ip parent 1: prio 2 u32 match ip src 172.30.200.2 flowid 1:11 ---------------------------------------------------------------------------------------------------------------------------- --- linux-3.16.3.orig/net/sched/cls_flow.c 2014-09-18 03:22:16.000000000 +1000 +++ linux-3.16.3/net/sched/cls_flow.c 2014-10-04 13:01:56.000000000 +1000 @@ -143,6 +143,7 @@ { switch (skb->protocol) { case htons(ETH_P_IP): +printk(KERN_WARNING "nfct src %d \n", (ntohl(CTTUPLE(skb, src.u3.ip))) ); return ntohl(CTTUPLE(skb, src.u3.ip)); case htons(ETH_P_IPV6): return ntohl(CTTUPLE(skb, src.u3.ip6[3])); @@ -155,6 +156,7 @@ { switch (skb->protocol) { case htons(ETH_P_IP): +printk(KERN_WARNING "nfct dst %d \n", (ntohl(CTTUPLE(skb, dst.u3.ip))) ); return ntohl(CTTUPLE(skb, dst.u3.ip)); case htons(ETH_P_IPV6): return ntohl(CTTUPLE(skb, dst.u3.ip6[3])); --001a11c23788b217260504be403f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Greetings Everyone!
As part of research into a re= lated part of the kernel we ran across an oddity with nfct on ifb devices.<= /div>
To give weight to the existence of the oddity, a Debian Jessie (s= o 3.16.3 based kernel) system was setup with wan and lan ports.=C2=A0
=
A modified version of the gentoo fq_codel script was used (Wan is eth0= and associated IFB is ifb0). Including script.
The kernel had 2 = printk's inserted (see included change to cls_flows.c) to print out the= ip addresses which were being returned to the flows process for nfct-src a= nd nfct-dst.
The nfct-src attached to eth0 printk was heard from,= it returned the right answers and some nonsense as expected.
The= nfct-dst attached to ifb0 printk didn't return a thing, not even nonse= nse.
------------------------------------------------------------= ------------------------------------------
(Wild speculation)
Maybe the conntrack lookup only works correctly from the right devic= e? so the lookup from the packet on ifb0 is being looked up in conntrack as= if it should relate to a connection on ifb0 instead of a connection on eth= 0?

-----------------------------------------------= ----------------------------
#!/bin/sh

<= div>

## Paths and definitions
tc=3D/= sbin/tc
IPT=3Diptables

#The interface na= me of the internet link
ext=3Deth0

#Igno= re this
ext_ingress=3Difb0



# Set these as per your adsl sync rate, at 90% to start wi= th, modify up and down until the tests work.
ext_up=3D40000kbit
ext_down=3D40000kbit


ethto= ol -K eth0 tso off gro off gso off
ethtool -K eth1 tso off gro of= f gso off

modprobe ifb
modprobe sch_fq_c= odel
modprobe act_mirred
modprobe cls_flow
# Clear old queuing disciplines (qdisc) on the interfaces
$tc qdisc del dev $ext root
$tc qdisc del dev $ext ingress<= /div>
$tc qdisc del dev $ext_ingress root
$tc qdisc del dev $= ext_ingress ingress



= #########
# INGRESS
#########

= # Create ingress on external interface
$tc qdisc add dev $ext han= dle ffff: ingress
ifconfig $ext_ingress up # if the interace is n= ot up bad things happen

# Forward all ingress traf= fic to the IFB device
$tc filter add dev $ext parent ffff: protoc= ol all u32 match u32 0 0 action mirred egress redirect dev $ext_ingress

# Create an EGRESS filter on the IFB device
#You can play with the r2q value, higher for faster links, lower for slowe= r. FIXME? Totally need some maths and tables for the correct values in vari= ous situations. Is being too high safer than being too low?
$tc q= disc add dev $ext_ingress root handle 1: htb default 12 r2q 5
# Add root class HTB with rate limiting
$tc class add= dev $ext_ingress parent 1: classid 1:1 htb rate $ext_down overhead 40 mtu = 1492 mpu 53 linklayer atm
$tc class add dev $ext_ingress parent 1= :1 classid 1:11 htb rate $ext_down prio 0 overhead 40 mtu 1492 mpu 53 linkl= ayer atm
$tc class add dev $ext_ingress parent 1:1 classid 1:12 h= tb rate $ext_down prio 0 overhead 40 mtu 1492 mpu 53 linklayer atm


#This is where fq_codel is installed for do= wnlink traffic
$tc qdisc add dev $ext_ingress parent 1:11 fq_code= l noecn target 50ms interval 300ms quantum 512 flows 1024 limit 3000
$tc qdisc add dev $ext_ingress parent 1:12 fq_codel noecn target 50= ms interval 300ms quantum 512 flows 1024 limit 3000


#FIXME? Really need to find/have a discussion and determin= e good values for this
$tc filter add dev $ext_ingress protoc= ol ip parent 1: handle 11 flow hash keys nfct-dst divisor 1024 baseclass 1:= 11
$tc filter add dev $ext_ingress protocol ip parent 1: handle 1= 2 flow hash keys nfct-dst divisor 1024 baseclass 1:12

<= div># Yes I KNOW this does not work.
$tc filter add dev $ext_ingr= ess protocol ip parent 1: prio 2 u32 match ip dst 172.30.200.2 flowid 1:11<= /div>


#########
# EGRESS
<= div>#########

# Add FQ_CODEL to EGRESS on external= interface
#You can play with the r2q value, higher for faster li= nks, lower for slower. FIXME? Totally need some maths and tables for the co= rrect values in various situations. Is being too high safer than being too = low?
$tc qdisc add dev $ext root handle 1: htb default 12 r2q 1

# Add root class HTB with rate limiting
$= tc class add dev $ext parent 1: classid 1:1 htb rate $ext_up overhead 40 mt= u 1492 mpu 53 linklayer atm
$tc class add dev $ext parent 1:1= classid 1:11 htb rate 70kbit ceil $ext_up overhead 40 mtu 1492 mpu 53 link= layer atm
$tc class add dev $ext parent 1:1 classid 1:12 htb rate= 30kbit ceil $ext_up overhead 40 mtu 1492 mpu 53 linklayer atm

#This is where fq_codel is installed for uplink= traffic
$tc qdisc add dev $ext parent 1:11 handle 11: fq_codel n= oecn target 100ms interval 300ms quantum 300 flows 1024 limit 3000
$tc qdisc add dev $ext parent 1:12 handle 12: fq_codel noecn target 100ms= interval 300ms quantum 300 flows 1024 limit 3000

=



$tc filter add = dev $ext protocol ip parent 1: handle 11 flow hash keys nfct-src divisor 10= 24 baseclass 1:11
$tc filter add dev $ext protocol ip parent 1: h= andle 12 flow hash keys nfct-src divisor 1024 baseclass 1:12

=
$tc filter add dev $ext protocol ip parent 1: prio 2 u32 match i= p src 172.30.200.2 flowid 1:11
----------------------------= ---------------------------------------------------------------------------= ---------------------

--- linux-3.16.3.orig/net/sc= hed/cls_flow.c 2014-09-18= 03:22:16.000000000 +1000
+++ linux-3.16.3/net/sched/cls= _flow.c 2014-10-04 13:01:= 56.000000000 +1000
@@ -143,6 +143,7 @@
=C2=A0{
=C2=A0 switch (skb->= protocol) {
=C2=A0 case htons(ETH_P_IP):
+printk(KERN_WARNING "nfct src %d = \n", (ntohl(CTTUPLE(skb, src.u3.ip))) );
=C2=A0 return ntohl(CTTUPLE(skb, src.u3.i= p));
=C2=A0 cas= e htons(ETH_P_IPV6):
=C2=A0 return ntohl(CTTUPLE(skb, src.u3.ip6[3]));
@@ -155,= 6 +156,7 @@
=C2=A0{
=C2=A0 switch (skb->protocol) {
=C2=A0 case htons(ETH_P_IP):
+= printk(KERN_WARNING "nfct dst %d \n", (ntohl(CTTUPLE(skb, dst.u3.= ip))) );
=C2=A0 return ntohl(CTTUPLE(skb, dst.u3.ip));
=C2=A0 case htons(ETH_P_IPV6):
=C2=A0 return ntohl(CTTUPLE(skb,= dst.u3.ip6[3]));

--001a11c23788b217260504be403f--