[Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

Dave Taht dave.taht at gmail.com
Mon Jan 14 14:50:18 EST 2013


On Mon, Jan 14, 2013 at 1:14 AM, Dave Taht <dave.taht at gmail.com> wrote:
> I am so buried as to only be able to do new builds of cero once a week.
>
> Can the bad behavior be duplicated on a single core other sort of
> processor, like x86? Or merely boot up a x86 box in a single processor
> mode?
>
> I'll try to get a new release out next sunday.

I lied. Crash bugs bother me a lot. A release of cerowrt with the BUG_ON removed
for TFO is now up at:

There are no other changes from Cerowrt-3.7.2-1.

Those playing with it should enable TFO in polipo as per this thread
and also fiddle
with various settings for the gethostbyname option in polipo.

I did not look into the presumably separate DNS lookup issue, nor the multicast
issue also mentioned on this thread.

a new, cleaned up version of the ar71xx unaligned access code arrived
in openwrt head (thx nbd!), which addresses  some new stuff and leaves
out some stuff in the existing cerowrt patch set for unaligned access,
notably a bunch of ipv6 stuff that inspired the patch in the first
place.

I retain concerns re the checksum code on both versions.

There were multiple other (mosty ipv6 related) changes to openwrt over
the weekend ...

which made risking a pull forward of that stuff into this quick
snapshot release of cero too risky to do, and I would prefer that the
two differing unaligned patches be merged cleanly and pushed up to
openWrt.

So hopefully the TFO portion of this bug thread is resolved, and there
are 3 other bugs left to look at separately...

>
> On Sun, Jan 13, 2013 at 8:43 PM, Ketan Kulkarni <ketkulka at gmail.com> wrote:
>> Thanks Eric and Yuchung for taking care of the patch. I will test few more
>> TFO cases as well once this patch is built in cero.
>>
>> Thanks,
>> Ketan
>>
>> On Jan 14, 2013 9:37 AM, "Eric Dumazet" <edumazet at google.com> wrote:
>>>
>>> Quite frankly I would just remove the BUG_ON()
>>>
>>> diff --git a/net/core/request_sock.c b/net/core/request_sock.c
>>> index c31d9e8..4425148 100644
>>> --- a/net/core/request_sock.c
>>> +++ b/net/core/request_sock.c
>>> @@ -186,8 +186,6 @@ void reqsk_fastopen_remove(struct sock *sk, struct
>>> request_sock *req,
>>>         struct fastopen_queue *fastopenq =
>>>             inet_csk(lsk)->icsk_accept_queue.fastopenq;
>>>
>>> -       BUG_ON(!spin_is_locked(&sk->sk_lock.slock) &&
>>> !sock_owned_by_user(sk));
>>> -
>>>         tcp_sk(sk)->fastopen_rsk = NULL;
>>>         spin_lock_bh(&fastopenq->lock);
>>>         fastopenq->qlen--;
>>>
>>>
>>>
>>> On Sun, Jan 13, 2013 at 7:05 PM, Eric Dumazet <edumazet at google.com> wrote:
>>>>
>>>> Oh well yes, this doesnt quite work on !SMP.
>>>>
>>>> And this kind of bug is frequent....
>>>>
>>>> See following example :
>>>>
>>>> commit b9980cdcf2524c5fe15d8cbae9c97b3ed6385563
>>>> Author: Hugh Dickins <hughd at google.com>
>>>> Date:   Wed Feb 8 17:13:40 2012 -0800
>>>>
>>>>     mm: fix UP THP spin_is_locked BUGs
>>>>
>>>>     Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
>>>>     CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always
>>>> false,
>>>>     and so triggers some BUGs in Transparent HugePage codepaths.
>>>>
>>>>     asm-generic/bug.h mentions this problem, and provides a
>>>> WARN_ON_SMP(x);
>>>>     but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP,
>>>> WARN_ON_SMP_ONCE,
>>>>     VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing
>>>> VM_BUG_ONs.
>>>>
>>>>     Signed-off-by: Hugh Dickins <hughd at google.com>
>>>>     Cc: Andrea Arcangeli <aarcange at redhat.com>
>>>>     Cc: <stable at vger.kernel.org>
>>>>     Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>>>>     Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index b3ffc21..91d3efb 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2083,7 +2083,7 @@ static void collect_mm_slot(struct mm_slot
>>>> *mm_slot)
>>>>  {
>>>>         struct mm_struct *mm = mm_slot->mm;
>>>>
>>>> -       VM_BUG_ON(!spin_is_locked(&khugepaged_mm_lock));
>>>> +       VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&khugepaged_mm_lock));
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau <nbd at openwrt.org> wrote:
>>>>>
>>>>> On 2013-01-13 7:03 PM, Eric Dumazet wrote:
>>>>> > I suspect a bug in the spin_is_locked() implementation on your arch,
>>>>> > as
>>>>> > he socket lock should be held at this point.
>>>>> I don't think this is an arch implementation bug, this probably happens
>>>>> on all !SMP systems. See this bit from include/linux/spinlock_up.h:
>>>>>
>>>>> #define arch_spin_is_locked(lock)   ((void)(lock), 0)
>>>>>
>>>>> - Felix
>>>>>
>>>>
>>>
>>
>>
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html



More information about the Cerowrt-devel mailing list