Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
* [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
       [not found]                           ` <CAA93jw6Aj3Rcsm=Q=KZVrW_TGThVwu6pRAN3nNQ4tvSODY_zUg@mail.gmail.com>
@ 2016-05-06  4:35                             ` Dave Taht
  2016-05-06  4:44                               ` Jonathan Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Taht @ 2016-05-06  4:35 UTC (permalink / raw)
  To: cake

this would be a pretty nifty feature for cake to have in this hostile universe.


---------- Forwarded message ----------
From: Dave Taht <dave.taht@gmail.com>
Date: Thu, May 5, 2016 at 11:33 AM
Subject: Re: [Codel] fq_codel_drop vs a udp flood
To: Jonathan Morton <chromatix99@gmail.com>
Cc: Roman Yeryomin <leroi.lists@gmail.com>, Eric Dumazet
<eric.dumazet@gmail.com>, make-wifi-fast@lists.bufferbloat.net,
"codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>, ath10k
<ath10k@lists.infradead.org>


On Thu, May 5, 2016 at 9:59 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>> Having same (low) speeds.
>> So it didn't help at all :(
>
> Although the new “emergency drop” code is now dropping batches of consecutive packets, Codel is also still dropping individual packets in between these batches, probably at a high rate.  Since all fragments of an original packet are required to reassemble it, but Codel doesn’t link related fragments when deciding to drop, each fragment lost in this way reduces throughput efficiency.  Only a fraction of the original packets can be reassembled correctly, but the surviving (yet useless) fragments still occupy link capacity.

I could see an AQM dropper testing to see if it is dropping a frag,
and then dropping any further fragments, also. We're looking at the IP
headers anyway in that section of the code, and the decision to drop
is (usually) rare, and fragments a PITA.

> This phenomenon is not Codel specific; I would also expect to see it on most other AQMs, and definitely on RED variants, including PIE.  Fortunately for real traffic, it normally arises only on artificial traffic such as iperf runs with large UDP packets.  Unfortunately for AQM advocates, iperf uses large UDP packets by default, and it is very easy to misinterpret the results unfavourably for AQM (as opposed to unfavourably for iperf).
>
> If you re-run the test with iperf set to a packet size compatible with the path MTU, you should see much better throughput numbers due to the elimination of fragmented packets.  A UDP payload size of 1280 bytes is a safe, conservative figure for a normal MTU in the vicinity of 1500.
>
>> Limit of 1024 packets and 1024 flows is not wise I think.
>>
>> (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>> which is almost the same than having no queue at all)
>
> This, while theoretically important in extreme cases with very large numbers of flows, is not relevant to the specific test in question.
>
>  - Jonathan Morton
>



--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06  4:35                             ` [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood Dave Taht
@ 2016-05-06  4:44                               ` Jonathan Morton
  2016-05-06  4:57                                 ` Dave Taht
  2016-05-06  8:49                                 ` moeller0
  0 siblings, 2 replies; 15+ messages in thread
From: Jonathan Morton @ 2016-05-06  4:44 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake


> On 6 May, 2016, at 07:35, Dave Taht <dave.taht@gmail.com> wrote:
> 
> this would be a pretty nifty feature for cake to have in this hostile universe.

Yes, but difficult to implement since the trailing fragments lose the proto/port information, and thus get sorted into a different queue than the leading fragment.  We would essentially need to implement the same tracking mechanisms as for actual reassembly.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06  4:44                               ` Jonathan Morton
@ 2016-05-06  4:57                                 ` Dave Taht
  2016-05-06  8:49                                 ` moeller0
  1 sibling, 0 replies; 15+ messages in thread
From: Dave Taht @ 2016-05-06  4:57 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

On Thu, May 5, 2016 at 9:44 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 6 May, 2016, at 07:35, Dave Taht <dave.taht@gmail.com> wrote:
>>
>> this would be a pretty nifty feature for cake to have in this hostile universe.
>
> Yes, but difficult to implement since the trailing fragments lose the proto/port information, and thus get sorted into a different queue than the leading fragment.  We would essentially need to implement the same tracking mechanisms as for actual reassembly.

No. At least in the iperf3 case you end up with 3 trailing fragments
in their own queue for every first fragment in another queue. Nuking
everything once in drop mode with "more fragments" set or a non-zero
fragment offset field will do some good.

https://en.wikipedia.org/wiki/IPv4#Fragmentation_and_reassembly

In the netperf case (which does 64k fragments), even better. And
against your typical fragmentation attack, dunno, but all and all it
strikes me as a measurable win.

>
>  - Jonathan Morton
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06  4:44                               ` Jonathan Morton
  2016-05-06  4:57                                 ` Dave Taht
@ 2016-05-06  8:49                                 ` moeller0
  2016-05-06  9:00                                   ` David Lang
  1 sibling, 1 reply; 15+ messages in thread
From: moeller0 @ 2016-05-06  8:49 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Dave Täht, cake

Hi Jonathan,

> On May 6, 2016, at 06:44 , Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> 
>> On 6 May, 2016, at 07:35, Dave Taht <dave.taht@gmail.com> wrote:
>> 
>> this would be a pretty nifty feature for cake to have in this hostile universe.
> 
> Yes, but difficult to implement since the trailing fragments lose the proto/port information, and thus get sorted into a different queue than the leading fragment.  We would essentially need to implement the same tracking mechanisms as for actual reassembly.

	But the receiver needs to be able to re-segment the fragments so all required information needs to be there; what about looking at src and dst address and the MF flag in the header as well as the fragment offset and scrape proto/port from the leading fragment and “virtually” apply it to all following fragments, that way cake will do the right thing. All of this might be too costly in implementation and computation to be feasible…

Best Regards
	Sebastian

> 
> - Jonathan Morton
> 
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06  8:49                                 ` moeller0
@ 2016-05-06  9:00                                   ` David Lang
  2016-05-06  9:36                                     ` moeller0
  2016-05-06 15:31                                     ` Stephen Hemminger
  0 siblings, 2 replies; 15+ messages in thread
From: David Lang @ 2016-05-06  9:00 UTC (permalink / raw)
  To: moeller0; +Cc: Jonathan Morton, cake

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1259 bytes --]

On Fri, 6 May 2016, moeller0 wrote:

> Hi Jonathan,
>
>> On May 6, 2016, at 06:44 , Jonathan Morton <chromatix99@gmail.com> wrote:
>> 
>> 
>>> On 6 May, 2016, at 07:35, Dave Taht <dave.taht@gmail.com> wrote:
>>> 
>>> this would be a pretty nifty feature for cake to have in this hostile universe.
>> 
>> Yes, but difficult to implement since the trailing fragments lose the proto/port information, and thus get sorted into a different queue than the leading fragment.  We would essentially need to implement the same tracking mechanisms as for actual reassembly.
>
> 	But the receiver needs to be able to re-segment the fragments so all required information needs to be there; what about looking at src and dst address and the MF flag in the header as well as the fragment offset and scrape proto/port from the leading fragment and “virtually” apply it to all following fragments, that way cake will do the right thing. All of this might be too costly in implementation and computation to be feasible…

wait a minute here. If the fragments are going to go over the network as 
separate packets, each fragment must include source/dest ip and source/dest 
port, otherwise the recipient isn't going to be able to figure out what to do 
with it.

David Lang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06  9:00                                   ` David Lang
@ 2016-05-06  9:36                                     ` moeller0
  2016-05-06 15:31                                     ` Stephen Hemminger
  1 sibling, 0 replies; 15+ messages in thread
From: moeller0 @ 2016-05-06  9:36 UTC (permalink / raw)
  To: David Lang; +Cc: Jonathan Morton, cake


> On May 6, 2016, at 11:00 , David Lang <david@lang.hm> wrote:
> 
> On Fri, 6 May 2016, moeller0 wrote:
> 
>> Hi Jonathan,
>> 
>>> On May 6, 2016, at 06:44 , Jonathan Morton <chromatix99@gmail.com> wrote:
>>>> On 6 May, 2016, at 07:35, Dave Taht <dave.taht@gmail.com> wrote:
>>>> this would be a pretty nifty feature for cake to have in this hostile universe.
>>> Yes, but difficult to implement since the trailing fragments lose the proto/port information, and thus get sorted into a different queue than the leading fragment.  We would essentially need to implement the same tracking mechanisms as for actual reassembly.
>> 
>> 	But the receiver needs to be able to re-segment the fragments so all required information needs to be there; what about looking at src and dst address and the MF flag in the header as well as the fragment offset and scrape proto/port from the leading fragment and “virtually” apply it to all following fragments, that way cake will do the right thing. All of this might be too costly in implementation and computation to be feasible…
> 
> wait a minute here. If the fragments are going to go over the network as separate packets, each fragment must include source/dest ip and source/dest port, otherwise the recipient isn’t going to be able to figure out what to do with it.

	That is what I thought as well, but as I understand now fragmentation happens on the IP level independent of the “payload” so fragmentation is all the same for UDP/TCP/ICMP. According to https://en.wikipedia.org/wiki/IPv4#Fragmentation_and_reassembly all packets in a fragment group should have the same IP identification value, so matching fragmented packets should be even easier, just use the SRCIP, DSTIP PROTOCOL IDENTIFICATION quadruple (all values that live in the IP header, or use these values to find the matching port from the first fragments protocol header… For sanity checking one might even require for all but the last packet to have the MF flag set and the fragment offsets to be monotonically increasing. But this will require to at least look at the MF flag to notice fragments at all… But I guess https://tools.ietf.org/html/rfc6864 says all of this more distinctively…

Best Regards
	Sebastian

> 
> David Lang


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06  9:00                                   ` David Lang
  2016-05-06  9:36                                     ` moeller0
@ 2016-05-06 15:31                                     ` Stephen Hemminger
  2016-05-06 18:50                                       ` David Lang
  1 sibling, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2016-05-06 15:31 UTC (permalink / raw)
  To: David Lang; +Cc: moeller0, cake

On Fri, 6 May 2016 02:00:02 -0700 (PDT)
David Lang <david@lang.hm> wrote:

> On Fri, 6 May 2016, moeller0 wrote:
> 
> > Hi Jonathan,
> >
> >> On May 6, 2016, at 06:44 , Jonathan Morton <chromatix99@gmail.com> wrote:
> >> 
> >> 
> >>> On 6 May, 2016, at 07:35, Dave Taht <dave.taht@gmail.com> wrote:
> >>> 
> >>> this would be a pretty nifty feature for cake to have in this hostile universe.
> >> 
> >> Yes, but difficult to implement since the trailing fragments lose the proto/port information, and thus get sorted into a different queue than the leading fragment.  We would essentially need to implement the same tracking mechanisms as for actual reassembly.
> >
> > 	But the receiver needs to be able to re-segment the fragments so all required information needs to be there; what about looking at src and dst address and the MF flag in the header as well as the fragment offset and scrape proto/port from the leading fragment and “virtually” apply it to all following fragments, that way cake will do the right thing. All of this might be too costly in implementation and computation to be feasible…
> 
> wait a minute here. If the fragments are going to go over the network as 
> separate packets, each fragment must include source/dest ip and source/dest 
> port, otherwise the recipient isn't going to be able to figure out what to do 
> with it.
> 
> David Lang

Fragments are reassembled by IP id, not src/dest port.
Only the first fragment has the L4 header with src/dest port,
all the rest are just data.

That is why most firewalls reassemble all packets (and then refragment as needed)
to allow matching on port values.

For several cases where flow information is necessary most code does:
  flowid = is_fragementd(ip) ? ip->id : hash(ip + tcp)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 15:31                                     ` Stephen Hemminger
@ 2016-05-06 18:50                                       ` David Lang
  2016-05-06 18:53                                         ` Jonathan Morton
  2016-05-06 23:14                                         ` Benjamin Cronce
  0 siblings, 2 replies; 15+ messages in thread
From: David Lang @ 2016-05-06 18:50 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: moeller0, cake

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2035 bytes --]

On Fri, 6 May 2016, Stephen Hemminger wrote:

> On Fri, 6 May 2016 02:00:02 -0700 (PDT)
> David Lang <david@lang.hm> wrote:
>
>> On Fri, 6 May 2016, moeller0 wrote:
>>
>>> Hi Jonathan,
>>>
>>>> On May 6, 2016, at 06:44 , Jonathan Morton <chromatix99@gmail.com> wrote:
>>>>
>>>>
>>>>> On 6 May, 2016, at 07:35, Dave Taht <dave.taht@gmail.com> wrote:
>>>>>
>>>>> this would be a pretty nifty feature for cake to have in this hostile universe.
>>>>
>>>> Yes, but difficult to implement since the trailing fragments lose the proto/port information, and thus get sorted into a different queue than the leading fragment.  We would essentially need to implement the same tracking mechanisms as for actual reassembly.
>>>
>>> 	But the receiver needs to be able to re-segment the fragments so all required information needs to be there; what about looking at src and dst address and the MF flag in the header as well as the fragment offset and scrape proto/port from the leading fragment and “virtually” apply it to all following fragments, that way cake will do the right thing. All of this might be too costly in implementation and computation to be feasible…
>>
>> wait a minute here. If the fragments are going to go over the network as
>> separate packets, each fragment must include source/dest ip and source/dest
>> port, otherwise the recipient isn't going to be able to figure out what to do
>> with it.
>>
>> David Lang
>
> Fragments are reassembled by IP id, not src/dest port.
> Only the first fragment has the L4 header with src/dest port,
> all the rest are just data.
>
> That is why most firewalls reassemble all packets (and then refragment as needed)
> to allow matching on port values.

actually, many firewalls do not reassemble packets, they pass packets through 
without reassembly.

what IP id are you referring to? I don't remember any such field in the packet 
header.

David Lang

> For several cases where flow information is necessary most code does:
>  flowid = is_fragementd(ip) ? ip->id : hash(ip + tcp)
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 18:50                                       ` David Lang
@ 2016-05-06 18:53                                         ` Jonathan Morton
  2016-05-06 19:14                                           ` David Lang
  2016-05-06 23:14                                         ` Benjamin Cronce
  1 sibling, 1 reply; 15+ messages in thread
From: Jonathan Morton @ 2016-05-06 18:53 UTC (permalink / raw)
  To: David Lang; +Cc: Stephen Hemminger, cake


> On 6 May, 2016, at 21:50, David Lang <david@lang.hm> wrote:
> 
> what IP id are you referring to? I don't remember any such field in the packet header.

It’s the third halfword.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 18:53                                         ` Jonathan Morton
@ 2016-05-06 19:14                                           ` David Lang
  2016-05-06 19:33                                             ` Jonathan Morton
  0 siblings, 1 reply; 15+ messages in thread
From: David Lang @ 2016-05-06 19:14 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Stephen Hemminger, cake

[-- Attachment #1: Type: TEXT/PLAIN, Size: 825 bytes --]

On Fri, 6 May 2016, Jonathan Morton wrote:

>> On 6 May, 2016, at 21:50, David Lang <david@lang.hm> wrote:
>>
>> what IP id are you referring to? I don't remember any such field in the packet header.
>
> It’s the third halfword.

half a word is hardly enough to be unique across the Internet, anything that 
small would lead to lots of attackes that inserted garbage data into threads.

looking at the IP header, the minimum size header includes fragment offset, 
source and destination IP addresses, and I'd bet a lot of money that every 
fragment of TCP/UDP includes the port numbers as well because there is just not 
enough into in the 20 byte header to identify what it matches with.

and I don't see this field you are talking about.

http://www.erg.abdn.ac.uk/users/gorry/course/inet-pages/ip-packet.html

David Lang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 19:14                                           ` David Lang
@ 2016-05-06 19:33                                             ` Jonathan Morton
  2016-05-06 19:54                                               ` David Lang
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Morton @ 2016-05-06 19:33 UTC (permalink / raw)
  To: David Lang; +Cc: Stephen Hemminger, cake


> On 6 May, 2016, at 22:14, David Lang <david@lang.hm> wrote:
> 
> On Fri, 6 May 2016, Jonathan Morton wrote:
> 
>>> On 6 May, 2016, at 21:50, David Lang <david@lang.hm> wrote:
>>> 
>>> what IP id are you referring to? I don't remember any such field in the packet header.
>> 
>> It’s the third halfword.
> 
> half a word is hardly enough to be unique across the Internet, anything that small would lead to lots of attackes that inserted garbage data into threads.

It doesn’t need to be globally unique.  It merely identifies, in conjunction with src/dst address pair (so 80 bits in total), a particular sequence of fragments to be reassembled into the original packet.  If the fourth halfword is zero (or has only the Don’t Fragment bit set), the IP ID field has no meaning.  Hence the entire second word can be considered fragmentation related.

I agree that it’s not a very robust mechanism; it breaks under extensive packet reordering at high packet rates (circumstances which are probably showing up in iperf tests against flow-isolating AQMs).  It would be better not to have fragmentation at the IP layer at all.  But it’s not as bad as you say; it does work for low packet rates, which is all it was intended for.

Here’s my preferred reference diagram:  https://nmap.org/book/tcpip-ref.html

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 19:33                                             ` Jonathan Morton
@ 2016-05-06 19:54                                               ` David Lang
  2016-05-06 19:58                                                 ` David Lang
  0 siblings, 1 reply; 15+ messages in thread
From: David Lang @ 2016-05-06 19:54 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Stephen Hemminger, cake

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2035 bytes --]

On Fri, 6 May 2016, Jonathan Morton wrote:

>> On 6 May, 2016, at 22:14, David Lang <david@lang.hm> wrote:
>>
>> On Fri, 6 May 2016, Jonathan Morton wrote:
>>
>>>> On 6 May, 2016, at 21:50, David Lang <david@lang.hm> wrote:
>>>>
>>>> what IP id are you referring to? I don't remember any such field in the packet header.
>>>
>>> It’s the third halfword.
>>
>> half a word is hardly enough to be unique across the Internet, anything that small would lead to lots of attackes that inserted garbage data into threads.
>
> It doesn’t need to be globally unique.  It merely identifies, in conjunction with src/dst address pair (so 80 bits in total), a particular sequence of fragments to be reassembled into the original packet.  If the fourth halfword is zero (or has only the Don’t Fragment bit set), the IP ID field has no meaning.  Hence the entire second word can be considered fragmentation related.
>
> I agree that it’s not a very robust mechanism; it breaks under extensive packet reordering at high packet rates (circumstances which are probably showing up in iperf tests against flow-isolating AQMs).  It would be better not to have fragmentation at the IP layer at all.  But it’s not as bad as you say; it does work for low packet rates, which is all it was intended for.
>
> Here’s my preferred reference diagram:  https://nmap.org/book/tcpip-ref.html

rfc-6864 shows that this field is not used the way you think it is in practice 
(if it was, nobody would have been able to exceed 6.4Mbps)

Given all the things that can cause fragmentation on virtually every packet 
(tunnels/vpns), and the fact that having this be unique would restrict all 
traffice between a given source and destination to 6.4Mbps, I am extremely 
doubtful that it is used the way that rfc-6864 suggests (after all it's a recent 
RFC, 2013)

I know that I've looked at packet dumps that have shown fragmented data and 
seen the port numbers in the fragment headers.

I'd bet that in practice firewalls/etc ignore the IP ID field.

David Lang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 19:54                                               ` David Lang
@ 2016-05-06 19:58                                                 ` David Lang
  0 siblings, 0 replies; 15+ messages in thread
From: David Lang @ 2016-05-06 19:58 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3055 bytes --]

On Fri, 6 May 2016, David Lang wrote:

> On Fri, 6 May 2016, Jonathan Morton wrote:
>
>>> On 6 May, 2016, at 22:14, David Lang <david@lang.hm> wrote:
>>> 
>>> On Fri, 6 May 2016, Jonathan Morton wrote:
>>> 
>>>>> On 6 May, 2016, at 21:50, David Lang <david@lang.hm> wrote:
>>>>> 
>>>>> what IP id are you referring to? I don't remember any such field in the 
>>>>> packet header.
>>>> 
>>>> It’s the third halfword.
>>> 
>>> half a word is hardly enough to be unique across the Internet, anything 
>>> that small would lead to lots of attackes that inserted garbage data into 
>>> threads.
>> 
>> It doesn’t need to be globally unique.  It merely identifies, in 
>> conjunction with src/dst address pair (so 80 bits in total), a particular 
>> sequence of fragments to be reassembled into the original packet.  If the 
>> fourth halfword is zero (or has only the Don’t Fragment bit set), the IP ID 
>> field has no meaning.  Hence the entire second word can be considered 
>> fragmentation related.
>> 
>> I agree that it’s not a very robust mechanism; it breaks under extensive 
>> packet reordering at high packet rates (circumstances which are probably 
>> showing up in iperf tests against flow-isolating AQMs).  It would be better 
>> not to have fragmentation at the IP layer at all.  But it’s not as bad as 
>> you say; it does work for low packet rates, which is all it was intended 
>> for.
>> 
>> Here’s my preferred reference diagram: 
>> https://nmap.org/book/tcpip-ref.html
>
> rfc-6864 shows that this field is not used the way you think it is in 
> practice (if it was, nobody would have been able to exceed 6.4Mbps)
>
> Given all the things that can cause fragmentation on virtually every packet 
> (tunnels/vpns), and the fact that having this be unique would restrict all 
> traffice between a given source and destination to 6.4Mbps, I am extremely 
> doubtful that it is used the way that rfc-6864 suggests (after all it's a 
> recent RFC, 2013)
>
> I know that I've looked at packet dumps that have shown fragmented data and 
> seen the port numbers in the fragment headers.
>
> I'd bet that in practice firewalls/etc ignore the IP ID field.

from rfc-6864

    Many current devices support fragmentation that ignores the IPv4
    Don't Fragment (DF) bit.  Such devices already transit traffic from
    sources that reuse the ID.  If fragments of different datagrams
    reusing the same ID (within the source address/destination
    address/protocol tuple) arrive at the destination interleaved,
    fragmentation would fail and traffic would be dropped.  Either such
    interleaving is uncommon or traffic from such devices is not widely
    traversing these DF-ignoring devices, because significant occurrence
    of reassembly errors has not been reported.  DF-ignoring devices do
    not comply with existing standards, and it is not feasible to update
    the standards to allow them as compliant.

They ignore the possibility that the OS reassembly is doing something different 
than they are thinking.

David Lang

[-- Attachment #2: Type: TEXT/PLAIN, Size: 137 bytes --]

_______________________________________________
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 18:50                                       ` David Lang
  2016-05-06 18:53                                         ` Jonathan Morton
@ 2016-05-06 23:14                                         ` Benjamin Cronce
  2016-05-07  2:09                                           ` David Lang
  1 sibling, 1 reply; 15+ messages in thread
From: Benjamin Cronce @ 2016-05-06 23:14 UTC (permalink / raw)
  To: David Lang; +Cc: Stephen Hemminger, cake

[-- Attachment #1: Type: text/plain, Size: 2923 bytes --]

The good ones do. You need to reassemble the packets if you want to enforce
proper stateful TCP. I wonder how those new network stacks that use MSS to
send packets directly to a specific core will handle fragments, since they
need all packets for a flow to get assigned to the same core, which means
L3/L4 must hash to the same value, and no L4 for later fragments. Unless
all fragmented packets get handled on a specific core, like ICMP.

On Fri, May 6, 2016 at 1:50 PM, David Lang <david@lang.hm> wrote:

> On Fri, 6 May 2016, Stephen Hemminger wrote:
>
> On Fri, 6 May 2016 02:00:02 -0700 (PDT)
>> David Lang <david@lang.hm> wrote:
>>
>> On Fri, 6 May 2016, moeller0 wrote:
>>>
>>> Hi Jonathan,
>>>>
>>>> On May 6, 2016, at 06:44 , Jonathan Morton <chromatix99@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>> On 6 May, 2016, at 07:35, Dave Taht <dave.taht@gmail.com> wrote:
>>>>>>
>>>>>> this would be a pretty nifty feature for cake to have in this hostile
>>>>>> universe.
>>>>>>
>>>>>
>>>>> Yes, but difficult to implement since the trailing fragments lose the
>>>>> proto/port information, and thus get sorted into a different queue than the
>>>>> leading fragment.  We would essentially need to implement the same tracking
>>>>> mechanisms as for actual reassembly.
>>>>>
>>>>
>>>>         But the receiver needs to be able to re-segment the fragments
>>>> so all required information needs to be there; what about looking at src
>>>> and dst address and the MF flag in the header as well as the fragment
>>>> offset and scrape proto/port from the leading fragment and “virtually”
>>>> apply it to all following fragments, that way cake will do the right thing.
>>>> All of this might be too costly in implementation and computation to be
>>>> feasible…
>>>>
>>>
>>> wait a minute here. If the fragments are going to go over the network as
>>> separate packets, each fragment must include source/dest ip and
>>> source/dest
>>> port, otherwise the recipient isn't going to be able to figure out what
>>> to do
>>> with it.
>>>
>>> David Lang
>>>
>>
>> Fragments are reassembled by IP id, not src/dest port.
>> Only the first fragment has the L4 header with src/dest port,
>> all the rest are just data.
>>
>> That is why most firewalls reassemble all packets (and then refragment as
>> needed)
>> to allow matching on port values.
>>
>
> actually, many firewalls do not reassemble packets, they pass packets
> through without reassembly.
>
> what IP id are you referring to? I don't remember any such field in the
> packet header.
>
> David Lang
>
>
> For several cases where flow information is necessary most code does:
>>  flowid = is_fragementd(ip) ? ip->id : hash(ip + tcp)
>>
>>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
>

[-- Attachment #2: Type: text/html, Size: 4297 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 23:14                                         ` Benjamin Cronce
@ 2016-05-07  2:09                                           ` David Lang
  0 siblings, 0 replies; 15+ messages in thread
From: David Lang @ 2016-05-07  2:09 UTC (permalink / raw)
  To: Benjamin Cronce; +Cc: Stephen Hemminger, cake

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3138 bytes --]

On Fri, 6 May 2016, Benjamin Cronce wrote:

> The good ones do. You need to reassemble the packets if you want to enforce
> proper stateful TCP. I wonder how those new network stacks that use MSS to
> send packets directly to a specific core will handle fragments, since they
> need all packets for a flow to get assigned to the same core, which means
> L3/L4 must hash to the same value, and no L4 for later fragments. Unless
> all fragmented packets get handled on a specific core, like ICMP.

I remember a big fuss 10 or so years ago with a bunch of firewall 
vulnerabilities where people could get creative with fragments and bypass the 
firewall rules.


> On Fri, May 6, 2016 at 1:50 PM, David Lang <david@lang.hm> wrote:
>
>> On Fri, 6 May 2016, Stephen Hemminger wrote:
>>
>> On Fri, 6 May 2016 02:00:02 -0700 (PDT)
>>> David Lang <david@lang.hm> wrote:
>>>
>>> On Fri, 6 May 2016, moeller0 wrote:
>>>>
>>>> Hi Jonathan,
>>>>>
>>>>> On May 6, 2016, at 06:44 , Jonathan Morton <chromatix99@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 6 May, 2016, at 07:35, Dave Taht <dave.taht@gmail.com> wrote:
>>>>>>>
>>>>>>> this would be a pretty nifty feature for cake to have in this hostile
>>>>>>> universe.
>>>>>>>
>>>>>>
>>>>>> Yes, but difficult to implement since the trailing fragments lose the
>>>>>> proto/port information, and thus get sorted into a different queue than the
>>>>>> leading fragment.  We would essentially need to implement the same tracking
>>>>>> mechanisms as for actual reassembly.
>>>>>>
>>>>>
>>>>>         But the receiver needs to be able to re-segment the fragments
>>>>> so all required information needs to be there; what about looking at src
>>>>> and dst address and the MF flag in the header as well as the fragment
>>>>> offset and scrape proto/port from the leading fragment and “virtually”
>>>>> apply it to all following fragments, that way cake will do the right thing.
>>>>> All of this might be too costly in implementation and computation to be
>>>>> feasible…
>>>>>
>>>>
>>>> wait a minute here. If the fragments are going to go over the network as
>>>> separate packets, each fragment must include source/dest ip and
>>>> source/dest
>>>> port, otherwise the recipient isn't going to be able to figure out what
>>>> to do
>>>> with it.
>>>>
>>>> David Lang
>>>>
>>>
>>> Fragments are reassembled by IP id, not src/dest port.
>>> Only the first fragment has the L4 header with src/dest port,
>>> all the rest are just data.
>>>
>>> That is why most firewalls reassemble all packets (and then refragment as
>>> needed)
>>> to allow matching on port values.
>>>
>>
>> actually, many firewalls do not reassemble packets, they pass packets
>> through without reassembly.
>>
>> what IP id are you referring to? I don't remember any such field in the
>> packet header.
>>
>> David Lang
>>
>>
>> For several cases where flow information is necessary most code does:
>>>  flowid = is_fragementd(ip) ? ip->id : hash(ip + tcp)
>>>
>>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>>
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-05-07  2:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAA93jw6QLyx9EaS+ntB0D3duoysu_Z-UYyQfHnRa=pfqPDfWOw@mail.gmail.com>
     [not found] ` <1462125592.5535.194.camel@edumazet-glaptop3.roam.corp.google.com>
     [not found]   ` <865DA393-262D-40B6-A9D3-1B978CD5F6C6@gmail.com>
     [not found]     ` <1462128385.5535.200.camel@edumazet-glaptop3.roam.corp.google.com>
     [not found]       ` <C5D365DA-18EE-446E-9D25-41F48B1C583E@gmail.com>
     [not found]         ` <1462136140.5535.219.camel@edumazet-glaptop3.roam.corp.google.com>
     [not found]           ` <CACiydbKUu11=zWitkDha0ddgk1-G_Z4-e1+=9ky776VktF5HHg@mail.gmail.com>
     [not found]             ` <1462201620.5535.250.camel@edumazet-glaptop3.roam.corp.google.com>
     [not found]               ` <CACiydbKeKUENncrc-NmYRcku-DGVeGqqzYMqsCqKdxPsR7yUOQ@mail.gmail.com>
     [not found]                 ` <1462205669.5535.254.camel@edumazet-glaptop3.roam.corp.google.com>
     [not found]                   ` <CACiydbL26Jj3EcEL4EmqaH=1Dm-Q0dpVwoWxqUSZ7ry10bRgeg@mail.gmail.com>
     [not found]                     ` <CAA93jw5Y3DSzuOZo=S6_dsUqJvy_3ThNe6tMic2ZJ14kQPnFHg@mail.gmail.com>
     [not found]                       ` <CACiydb+kOLNBwEn+gDU3fZrXEQxp5FMFLH_mDS1ZO5J8r9yiBA@mail.gmail.com>
     [not found]                         ` <2D83E4F6-03DD-4421-AAE0-DD3C6A8AFCE0@gmail.com>
     [not found]                           ` <CAA93jw6Aj3Rcsm=Q=KZVrW_TGThVwu6pRAN3nNQ4tvSODY_zUg@mail.gmail.com>
2016-05-06  4:35                             ` [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood Dave Taht
2016-05-06  4:44                               ` Jonathan Morton
2016-05-06  4:57                                 ` Dave Taht
2016-05-06  8:49                                 ` moeller0
2016-05-06  9:00                                   ` David Lang
2016-05-06  9:36                                     ` moeller0
2016-05-06 15:31                                     ` Stephen Hemminger
2016-05-06 18:50                                       ` David Lang
2016-05-06 18:53                                         ` Jonathan Morton
2016-05-06 19:14                                           ` David Lang
2016-05-06 19:33                                             ` Jonathan Morton
2016-05-06 19:54                                               ` David Lang
2016-05-06 19:58                                                 ` David Lang
2016-05-06 23:14                                         ` Benjamin Cronce
2016-05-07  2:09                                           ` David Lang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox