* Re: [Make-wifi-fast] a bit of profiling on the archer
2016-11-18 7:55 ` Jesper Dangaard Brouer
@ 2016-11-18 10:26 ` Toke Høiland-Jørgensen
2016-11-18 14:13 ` Eric Dumazet
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-11-18 10:26 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: Dave Taht, make-wifi-fast
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> On Thu, 17 Nov 2016 20:14:49 -0800 Dave Taht <dave.taht@gmail.com> wrote:
>
>> I have not been profiling much on lower end platforms (it's hard, you
>> can crash a box pretty easily with the wrong options or sample rates).
>
> I'm happy to hear that perf does work on this lower end HW, although
> the disclaimer of sample rates.
>
> Does anyone know if hardware based PMU (Performance Monitor Units)
> exists for these kind of devices?
>
>> While watching the ath10k peak at 150-200mbits, at 99% of cpu in
>> softirq, I spent a bit of time profiling various counters and
>> statistics.
>>
>> for this one (while downloading 12 flows at the same time via flent)
>>
>> perf record -F 99 -e cpu-clock -ag -- sleep 10
>>
>> perf report
>
> Below perf report is not well suite for email, could you instead
> provide output from command below:
>
> perf report --no-children --stdio --call-graph none
>
>>
>> [[31m 67.81%[[m 0.00% ksoftirqd/0 [kernel.kallsyms] [k]
>> run_ksoftirqd
>> |
>> ---run_ksoftirqd
>> |
>> |[[31m--67.61%-- [[m __do_softirq
>> | |
>> | |[[31m--66.80%-- [[m net_rx_action
>> | | |
>> | | |[[31m--41.07%-- [[m ag71xx_poll
>>
>> ...
>
> (Looks like you managed to copy-paste the terminal escape codes for
> colors)
>
>>
>> It appears we're spending 66% of the time in the *ethernet* portion of
>> the path.
>
> Be careful you don't fool yourself. In your output you have the
> "children" mode on, so everything being called "under" net_rx_action is
> summed up. It could be it goes all the way through to the wifi TX
> parts and that is part of the sum. Even the memory allocations gets
> summed into this 66% number.
Yes, actually my guess would be that this is the case. When I was
profiling ath9k I saw this exact behaviour.
-Toke
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Make-wifi-fast] a bit of profiling on the archer
2016-11-18 7:55 ` Jesper Dangaard Brouer
2016-11-18 10:26 ` Toke Høiland-Jørgensen
@ 2016-11-18 14:13 ` Eric Dumazet
2016-11-18 14:30 ` Toke Høiland-Jørgensen
2016-11-19 17:33 ` Dave Taht
2016-11-19 18:30 ` Dave Taht
3 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2016-11-18 14:13 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: Dave Taht, make-wifi-fast
On Fri, 2016-11-18 at 08:55 +0100, Jesper Dangaard Brouer wrote:
> Be careful you don't fool yourself. In your output you have the
> "children" mode on, so everything being called "under" net_rx_action is
> summed up. It could be it goes all the way through to the wifi TX
> parts and that is part of the sum. Even the memory allocations gets
> summed into this 66% number.
Yes, I really do not see how ag71xx would be to blame ;)
Although.... looking at
https://aachen.uni-dsl.de/svn/unidsl_firmware/backfire/trunk/backfire/target/linux/ar71xx/files/drivers/net/ag71xx/ag71xx_main.c
I do see a bug :
if (rx_done < limit) {
if (status & RX_STATUS_PR)
goto more;
status = ag71xx_rr(ag, AG71XX_REG_TX_STATUS);
if (status & TX_STATUS_PS)
goto more;
DBG("%s: disable polling mode, rx=%d, tx=%d,limit=%d\n",
dev->name, rx_done, tx_done, limit);
napi_complete(napi);
Hint:
napi_complete_done(napi, rx_done);
/* enable interrupts */
spin_lock_irqsave(&ag->lock, flags);
ag71xx_int_enable(ag, AG71XX_INT_POLL);
spin_unlock_irqrestore(&ag->lock, flags);
return rx_done;
}
more:
DBG("%s: stay in polling mode, rx=%d, tx=%d, limit=%d\n",
dev->name, rx_done, tx_done, limit);
return rx_done;
This last statement should be : "return limit;"
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Make-wifi-fast] a bit of profiling on the archer
2016-11-18 14:13 ` Eric Dumazet
@ 2016-11-18 14:30 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-11-18 14:30 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Jesper Dangaard Brouer, make-wifi-fast
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On Fri, 2016-11-18 at 08:55 +0100, Jesper Dangaard Brouer wrote:
>
>> Be careful you don't fool yourself. In your output you have the
>> "children" mode on, so everything being called "under" net_rx_action is
>> summed up. It could be it goes all the way through to the wifi TX
>> parts and that is part of the sum. Even the memory allocations gets
>> summed into this 66% number.
>
> Yes, I really do not see how ag71xx would be to blame ;)
>
> Although.... looking at
>
> https://aachen.uni-dsl.de/svn/unidsl_firmware/backfire/trunk/backfire/target/linux/ar71xx/files/drivers/net/ag71xx/ag71xx_main.c
>
> I do see a bug :
>
> if (rx_done < limit) {
> if (status & RX_STATUS_PR)
> goto more;
>
> status = ag71xx_rr(ag, AG71XX_REG_TX_STATUS);
> if (status & TX_STATUS_PS)
> goto more;
>
> DBG("%s: disable polling mode, rx=%d, tx=%d,limit=%d\n",
> dev->name, rx_done, tx_done, limit);
>
> napi_complete(napi);
> Hint:
> napi_complete_done(napi, rx_done);
>
> /* enable interrupts */
> spin_lock_irqsave(&ag->lock, flags);
> ag71xx_int_enable(ag, AG71XX_INT_POLL);
> spin_unlock_irqrestore(&ag->lock, flags);
> return rx_done;
> }
>
> more:
> DBG("%s: stay in polling mode, rx=%d, tx=%d, limit=%d\n",
> dev->name, rx_done, tx_done, limit);
> return rx_done;
>
> This last statement should be : "return limit;"
And it seems to be in current versions:
https://git.lede-project.org/?p=source.git;a=blob;f=target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c;h=566e9513d8b7c6ef101902ae7d281dcc1c233893;hb=HEAD#l1156
Wonder why these drivers are not upstreamed, though?
-Toke
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Make-wifi-fast] a bit of profiling on the archer
2016-11-18 7:55 ` Jesper Dangaard Brouer
2016-11-18 10:26 ` Toke Høiland-Jørgensen
2016-11-18 14:13 ` Eric Dumazet
@ 2016-11-19 17:33 ` Dave Taht
2016-11-21 16:49 ` Jesper Dangaard Brouer
2016-11-19 18:30 ` Dave Taht
3 siblings, 1 reply; 10+ messages in thread
From: Dave Taht @ 2016-11-19 17:33 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: make-wifi-fast
On Thu, Nov 17, 2016 at 11:55 PM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Thu, 17 Nov 2016 20:14:49 -0800 Dave Taht <dave.taht@gmail.com> wrote:
>
>> I have not been profiling much on lower end platforms (it's hard, you
>> can crash a box pretty easily with the wrong options or sample rates).
>
> I'm happy to hear that perf does work on this lower end HW, although
> the disclaimer of sample rates.
>
> Does anyone know if hardware based PMU (Performance Monitor Units)
> exists for these kind of devices?
>
>> While watching the ath10k peak at 150-200mbits, at 99% of cpu in
>> softirq, I spent a bit of time profiling various counters and
>> statistics.
>>
>> for this one (while downloading 12 flows at the same time via flent)
>>
>> perf record -F 99 -e cpu-clock -ag -- sleep 10
>> perf report
>
> Below perf report is not well suite for email, could you instead
> provide output from command below:
>
> perf report --no-children --stdio --call-graph none
Thanks! That is way more readable output. Appended at the end of this post.
>
>>
>> [[31m 67.81%[[m 0.00% ksoftirqd/0 [kernel.kallsyms] [k]
>> run_ksoftirqd
>> |
>> ---run_ksoftirqd
>> |
>> |[[31m--67.61%-- [[m __do_softirq
>> | |
>> | |[[31m--66.80%-- [[m net_rx_action
>> | | |
>> | | |[[31m--41.07%-- [[m ag71xx_poll
>>
>> ...
>
> (Looks like you managed to copy-paste the terminal escape codes for
> colors)
Not sure what terminal emulation perf is expecting. TERM is set to XTERM.
>>
>> It appears we're spending 66% of the time in the *ethernet* portion of
>> the path.
>
> Be careful you don't fool yourself. In your output you have the
> "children" mode on, so everything being called "under" net_rx_action is
> summed up. It could be it goes all the way through to the wifi TX
> parts and that is part of the sum. Even the memory allocations gets
> summed into this 66% number.
>
>
>> I'm going to stop worrying so much about the performance of the new
>> wifi algorithms.
for at least 24 hours
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 990 of event 'cpu-clock'
# Event count (approx.): 9999999900
#
# Overhead Command Shared Object Symbol
# ........ ........... ......................
.........................................
#
4.34% ksoftirqd/0 [ebtables] [k] ebt_do_table
2.93% ksoftirqd/0 [kernel.kallsyms] [k]
__netif_receive_skb_core
2.83% ksoftirqd/0 [ip_tables] [k] ipt_do_table
2.53% ksoftirqd/0 [kernel.kallsyms] [k] nf_iterate
2.42% ksoftirqd/0 [mac80211] [k]
ieee80211_tx_dequeue
2.02% ksoftirqd/0 [kernel.kallsyms] [k]
__local_bh_enable_ip
2.02% ksoftirqd/0 [mac80211] [k]
__ieee80211_subif_start_xmit
1.82% ksoftirqd/0 [kernel.kallsyms] [k] __dev_queue_xmit
1.72% ksoftirqd/0 [nf_conntrack] [k] tcp_packet
1.52% ksoftirqd/0 [mac80211] [k]
ieee80211_prepare_and_rx_handle
1.52% ksoftirqd/0 [mac80211] [k]
ieee80211_rx_handlers
1.41% ksoftirqd/0 [kernel.kallsyms] [k] br_handle_frame
1.41% ksoftirqd/0 [kernel.kallsyms] [k] ip_finish_output2
1.41% ksoftirqd/0 [kernel.kallsyms] [k] ip_forward
1.41% ksoftirqd/0 [nf_conntrack] [k]
__nf_conntrack_find_get
1.31% ksoftirqd/0 [kernel.kallsyms] [k] format_decode
1.31% ksoftirqd/0 [mac80211] [k]
ieee80211_xmit_fast_finish
1.21% ksoftirqd/0 [mac80211] [k]
ieee80211_tx_h_select_key
1.11% ksoftirqd/0 [kernel.kallsyms] [k]
__copy_user_common
1.11% ksoftirqd/0 [mac80211] [k]
ieee80211_queue_skb
1.01% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_txrx_compl_task
1.01% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx.isra.35
1.01% ksoftirqd/0 [kernel.kallsyms] [k]
dev_hard_start_xmit
1.01% ksoftirqd/0 [kernel.kallsyms] [k] eth_type_trans
1.01% ksoftirqd/0 [kernel.kallsyms] [k]
netif_skb_features
1.01% ksoftirqd/0 [nf_conntrack_rtcache] [k]
nf_rtcache_in.part.2
0.91% ksoftirqd/0 [ath10k_pci] [k]
ath10k_pci_hif_tx_sg
0.91% ksoftirqd/0 [kernel.kallsyms] [k] ip_rcv
0.91% ksoftirqd/0 [kernel.kallsyms] [k] memcmp
0.91% ksoftirqd/0 [mac80211] [k] ieee80211_rx_napi
0.81% ksoftirqd/0 [ath10k_core] [k] ath10k_dbg
0.81% ksoftirqd/0 [ath10k_core] [k] ath10k_htt_tx
0.81% ksoftirqd/0 [ath10k_pci] [k]
ath10k_bus_pci_write32
0.81% ksoftirqd/0 [br_netfilter] [k] br_nf_pre_routing
0.81% ksoftirqd/0 [ebtable_broute] [k] ebt_broute
0.81% ksoftirqd/0 [kernel.kallsyms] [k] ag71xx_poll
0.81% ksoftirqd/0 [kernel.kallsyms] [k] ip_output
0.81% ksoftirqd/0 [kernel.kallsyms] [k]
skb_get_hash_perturb
0.81% ksoftirqd/0 [mac80211] [k]
sta_info_hash_lookup
0.81% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_helper
0.81% kworker/0:1 [kernel.kallsyms] [k] __delay
0.71% ksoftirqd/0 [kernel.kallsyms] [k] __bzero
0.71% ksoftirqd/0 [kernel.kallsyms] [k] __rmemcpy
0.71% ksoftirqd/0 [kernel.kallsyms] [k]
_find_next_bit.part.0
0.71% ksoftirqd/0 [kernel.kallsyms] [k] ip_rcv_finish
0.71% ksoftirqd/0 [mac80211] [k]
ieee80211_lookup_ra_sta
0.71% ksoftirqd/0 [nf_conntrack] [k]
hash_conntrack_raw
0.71% ksoftirqd/0 [nf_conntrack] [k] nf_conntrack_in
0.71% ksoftirqd/0 [nf_conntrack] [k] tcp_error
0.61% ksoftirqd/0 [br_netfilter] [k]
br_nf_post_routing
0.61% ksoftirqd/0 [cfg80211] [k]
cfg80211_classify8021d
0.61% ksoftirqd/0 [iptable_raw] [k] iptable_raw_hook
0.61% ksoftirqd/0 [kernel.kallsyms] [k]
br_handle_frame_finish
0.61% ksoftirqd/0 [kernel.kallsyms] [k] netdev_pick_tx
0.61% ksoftirqd/0 [kernel.kallsyms] [k]
r4k_dma_cache_wback_inv
0.61% ksoftirqd/0 [kernel.kallsyms] [k]
skb_network_protocol
0.61% ksoftirqd/0 [nf_conntrack_rtcache] [k]
nf_ct_rtcache_find_usable
0.61% ksoftirqd/0 [nf_nat_ipv4] [k] nf_nat_ipv4_fn
0.51% ksoftirqd/0 [ath10k_pci] [k]
ath10k_ce_send_nolock
0.51% ksoftirqd/0 [kernel.kallsyms] [k]
__skb_flow_get_ports
0.51% ksoftirqd/0 [kernel.kallsyms] [k] atomic64_add
0.51% ksoftirqd/0 [kernel.kallsyms] [k]
br_dev_queue_push_xmit
0.51% ksoftirqd/0 [kernel.kallsyms] [k] br_fdb_update
0.51% ksoftirqd/0 [kernel.kallsyms] [k] dst_release
0.51% ksoftirqd/0 [kernel.kallsyms] [k]
idr_get_empty_slot
0.51% ksoftirqd/0 [kernel.kallsyms] [k] r4k_dma_cache_inv
0.51% ksoftirqd/0 [nf_conntrack] [k]
nf_ct_deliver_cached_events
0.51% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_conntrack_in
0.51% ksoftirqd/0 [nf_nat] [k] nf_nat_packet
0.40% ksoftirqd/0 [ath10k_core] [k] ath10k_dbg_dump
0.40% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_op_wake_tx_queue
0.40% ksoftirqd/0 [ath10k_pci] [k]
ath10k_bus_pci_read32
0.40% ksoftirqd/0 [ath10k_pci] [k]
ath10k_ce_completed_send_next_nolock
0.40% ksoftirqd/0 [br_netfilter] [k] ip_sabotage_in
0.40% ksoftirqd/0 [kernel.kallsyms] [k] __br_fdb_get
0.40% ksoftirqd/0 [kernel.kallsyms] [k] __delay
0.40% ksoftirqd/0 [kernel.kallsyms] [k]
__skb_flow_dissect
0.40% ksoftirqd/0 [kernel.kallsyms] [k]
__wake_up_sync_key
0.40% ksoftirqd/0 [kernel.kallsyms] [k] memcpy
0.40% ksoftirqd/0 [kernel.kallsyms] [k] sch_direct_xmit
0.40% ksoftirqd/0 [kernel.kallsyms] [k] skb_release_data
0.40% ksoftirqd/0 [kernel.kallsyms] [k]
validate_xmit_skb.isra.30.part.31
0.40% ksoftirqd/0 [mac80211] [k]
fq_flow_classify.constprop.17
0.40% ksoftirqd/0 [mac80211] [k]
ieee80211_tx_status
0.40% ksoftirqd/0 [nf_conntrack] [k]
__nf_ct_refresh_acct
0.40% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_get_l4proto
0.40% ksoftirqd/0 [nf_conntrack_rtcache] [k]
nf_rtcache_forward
0.40% ksoftirqd/0 [nf_conntrack_rtcache] [k]
nf_rtcache_forward4
0.40% ksoftirqd/0 [nf_nat_ipv4] [k] nf_nat_ipv4_in
0.40% ksoftirqd/0 [nf_nat_ipv4] [k] nf_nat_ipv4_out
0.30% ksoftirqd/0 [ath10k_core] [k]
__ath10k_htt_rx_ring_fill_n
0.30% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_rx_h_deliver
0.30% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_rx_h_mpdu
0.30% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx_push_txq
0.30% ksoftirqd/0 [ath10k_pci] [k]
ath10k_ce_completed_send_next
0.30% ksoftirqd/0 [ath10k_pci] [k]
ath10k_pci_write32
0.30% ksoftirqd/0 [br_netfilter] [k] br_nf_dev_xmit
0.30% ksoftirqd/0 [br_netfilter] [k]
nf_bridge_encap_header_len
0.30% ksoftirqd/0 [cfg80211] [k] ieee80211_hdrlen
0.30% ksoftirqd/0 [kernel.kallsyms] [k] __free_page_frag
0.30% ksoftirqd/0 [kernel.kallsyms] [k]
ag71xx_hard_start_xmit
0.30% ksoftirqd/0 [kernel.kallsyms] [k] br_dev_xmit
0.30% ksoftirqd/0 [kernel.kallsyms] [k] br_pass_frame_up
0.30% ksoftirqd/0 [kernel.kallsyms] [k] build_skb
0.30% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive
0.30% ksoftirqd/0 [kernel.kallsyms] [k] idr_alloc
0.30% ksoftirqd/0 [kernel.kallsyms] [k] inet_gro_receive
0.30% ksoftirqd/0 [kernel.kallsyms] [k] ip_finish_output
0.30% ksoftirqd/0 [kernel.kallsyms] [k] nf_hook_slow
0.30% ksoftirqd/0 [kernel.kallsyms] [k] tcp_gro_receive
0.30% ksoftirqd/0 [mac80211] [k]
codel_dequeue_func
0.30% ksoftirqd/0 [mac80211] [k] fq_flow_dequeue
0.30% ksoftirqd/0 [mac80211] [k]
ieee80211_drop_unencrypted.part.3
0.30% ksoftirqd/0 [mac80211] [k]
ieee80211_get_bssid
0.30% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_confirm
0.30% ksoftirqd/0 [nf_defrag_ipv4] [k]
ipv4_conntrack_defrag
0.20% ksoftirqd/0 [ath10k_core] [k]
ath10k_txrx_tx_unref
0.20% ksoftirqd/0 [ath10k_pci] [k]
ath10k_pci_htt_rx_cb
0.20% ksoftirqd/0 [br_netfilter] [k] br_nf_local_in
0.20% ksoftirqd/0 [ebtable_filter] [k] ebt_out_hook
0.20% ksoftirqd/0 [ebtable_nat] [k] ebt_nat_in
0.20% ksoftirqd/0 [ebtable_nat] [k] ebt_nat_out
0.20% ksoftirqd/0 [iptable_mangle] [k]
iptable_mangle_hook
0.20% ksoftirqd/0 [kernel.kallsyms] [k] __dma_sync
0.20% ksoftirqd/0 [kernel.kallsyms] [k] __free_pages_ok
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
__netif_receive_skb
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
__slab_alloc.isra.13.constprop.17
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
__slab_free.isra.14
0.20% ksoftirqd/0 [kernel.kallsyms] [k] br_deliver
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
finish_task_switch
0.20% ksoftirqd/0 [kernel.kallsyms] [k] ipv4_dst_check
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
is_skb_forwardable
0.20% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_free
0.20% ksoftirqd/0 [kernel.kallsyms] [k] ktime_get
0.20% ksoftirqd/0 [kernel.kallsyms] [k] net_rx_action
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
skb_release_head_state
0.20% ksoftirqd/0 [kernel.kallsyms] [k] vsnprintf
0.20% ksoftirqd/0 [mac80211] [k]
ieee80211_deliver_skb
0.20% ksoftirqd/0 [mac80211] [k]
ieee80211_select_queue
0.20% ksoftirqd/0 [mac80211] [k]
ieee80211_skb_resize
0.20% ksoftirqd/0 [nf_conntrack_rtcache] [k] nf_rtcache_in4
0.10% hostapd [kernel.kallsyms] [k]
__copy_user_common
0.10% hostapd [kernel.kallsyms] [k] core_sys_select
0.10% hostapd [kernel.kallsyms] [k] datagram_poll
0.10% hostapd [kernel.kallsyms] [k] do_select
0.10% hostapd [kernel.kallsyms] [k]
finish_task_switch
0.10% hostapd [kernel.kallsyms] [k] handle_sys
0.10% hostapd [kernel.kallsyms] [k]
hrtimer_try_to_cancel
0.10% hostapd [kernel.kallsyms] [k] netlink_recvmsg
0.10% hostapd [kernel.kallsyms] [k] poll_initwait
0.10% hostapd [kernel.kallsyms] [k]
schedule_hrtimeout_range_clock
0.10% hostapd [kernel.kallsyms] [k] sock_poll
0.10% hostapd [kernel.kallsyms] [k] timespec_add_safe
0.10% hostapd libc.so [.] 0x0000ef74
0.10% hostapd libc.so [.] 0x000288e4
0.10% hostapd libc.so [.] 0x000722fc
0.10% hostapd wpad [.] 0x0001ab97
0.10% hostapd wpad [.] 0x0001b1a1
0.10% hostapd wpad [.] 0x0001cbd9
0.10% hostapd wpad [.] 0x0001cd0d
0.10% hostapd wpad [.] 0x0004c589
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_htc_rx_completion_handler
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_rx_amsdu_allowed.isra.4
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_tx_inc_pending
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_tx_txq_update
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx_frm_has_freq
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx_h_get_txmode.isra.4
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx_push_pending
0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_process_rx
0.10% ksoftirqd/0 [ath10k_pci] [k]
ath10k_ce_completed_recv_next_nolock
0.10% ksoftirqd/0 [ath9k] [k] ath_rx_tasklet
0.10% ksoftirqd/0 [ath9k_common] [k]
ath9k_cmn_process_rssi
0.10% ksoftirqd/0 [ath9k_hw] [k]
ath9k_hw_numtxpending
0.10% ksoftirqd/0 [cfg80211] [k]
__ieee80211_data_to_8023
0.10% ksoftirqd/0 [cfg80211] [k] cfg80211_rx_mgmt
0.10% ksoftirqd/0 [kernel.kallsyms] [k] __build_skb
0.10% ksoftirqd/0 [kernel.kallsyms] [k] __do_softirq
0.10% ksoftirqd/0 [kernel.kallsyms] [k] ag71xx_tx_packets
0.10% ksoftirqd/0 [kernel.kallsyms] [k] c0_hpt_read
0.10% ksoftirqd/0 [kernel.kallsyms] [k] consume_skb
0.10% ksoftirqd/0 [kernel.kallsyms] [k] idr_mark_full
0.10% ksoftirqd/0 [kernel.kallsyms] [k] idr_remove
0.10% ksoftirqd/0 [kernel.kallsyms] [k] ip_forward_finish
0.10% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_alloc
0.10% ksoftirqd/0 [kernel.kallsyms] [k] memmove
0.10% ksoftirqd/0 [kernel.kallsyms] [k] mod_timer
0.10% ksoftirqd/0 [kernel.kallsyms] [k] number.isra.12
0.10% ksoftirqd/0 [kernel.kallsyms] [k]
passthru_features_check
0.10% ksoftirqd/0 [kernel.kallsyms] [k] skb_copy_bits
0.10% ksoftirqd/0 [kernel.kallsyms] [k] skb_pull
0.10% ksoftirqd/0 [kernel.kallsyms] [k] skb_push
0.10% ksoftirqd/0 [mac80211] [k]
__ieee80211_beacon_add_tim
0.10% ksoftirqd/0 [mac80211] [k]
codel_should_drop.isra.4.constprop.20
0.10% ksoftirqd/0 [mac80211] [k]
ieee80211_report_used_skb
0.10% ksoftirqd/0 [mac80211] [k]
ieee80211_subif_start_xmit
0.10% ksoftirqd/0 [mac80211] [k]
remove_monitor_info
0.10% ksoftirqd/0 [nf_conntrack] [k] nf_ct_get_tuple
0.10% ksoftirqd/0 [nf_conntrack] [k] tcp_get_timeouts
0.10% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_pkt_to_tuple
0.10% ksoftirqd/0 [nf_nat] [k] nf_ct_nat_ext_add
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> Author of http://www.iptv-analyzer.org
> LinkedIn: http://www.linkedin.com/in/brouer
--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Make-wifi-fast] a bit of profiling on the archer
2016-11-19 17:33 ` Dave Taht
@ 2016-11-21 16:49 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 10+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-21 16:49 UTC (permalink / raw)
To: Dave Taht; +Cc: make-wifi-fast, brouer
On Sat, 19 Nov 2016 09:33:51 -0800
Dave Taht <dave.taht@gmail.com> wrote:
> On Thu, Nov 17, 2016 at 11:55 PM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Thu, 17 Nov 2016 20:14:49 -0800 Dave Taht <dave.taht@gmail.com> wrote:
> >
> >> I have not been profiling much on lower end platforms (it's hard, you
> >> can crash a box pretty easily with the wrong options or sample rates).
> >
> > I'm happy to hear that perf does work on this lower end HW, although
> > the disclaimer of sample rates.
> >
> > Does anyone know if hardware based PMU (Performance Monitor Units)
> > exists for these kind of devices?
> >
> >> While watching the ath10k peak at 150-200mbits, at 99% of cpu in
> >> softirq, I spent a bit of time profiling various counters and
> >> statistics.
> >>
> >> for this one (while downloading 12 flows at the same time via flent)
> >>
> >> perf record -F 99 -e cpu-clock -ag -- sleep 10
> >> perf report
> >
> > Below perf report is not well suite for email, could you instead
> > provide output from command below:
> >
> > perf report --no-children --stdio --call-graph none
>
> Thanks! That is way more readable output. Appended at the end of this post.
>
> >
> >>
> >> [[31m 67.81%[[m 0.00% ksoftirqd/0 [kernel.kallsyms] [k]
> >> run_ksoftirqd
> >> |
> >> ---run_ksoftirqd
> >> |
> >> |[[31m--67.61%-- [[m __do_softirq
> >> | |
> >> | |[[31m--66.80%-- [[m net_rx_action
> >> | | |
> >> | | |[[31m--41.07%-- [[m ag71xx_poll
> >>
> >> ...
> >
> > (Looks like you managed to copy-paste the terminal escape codes for
> > colors)
>
> Not sure what terminal emulation perf is expecting. TERM is set to XTERM.
>
> >>
> >> It appears we're spending 66% of the time in the *ethernet* portion of
> >> the path.
> >
> > Be careful you don't fool yourself. In your output you have the
> > "children" mode on, so everything being called "under" net_rx_action is
> > summed up. It could be it goes all the way through to the wifi TX
> > parts and that is part of the sum. Even the memory allocations gets
> > summed into this 66% number.
> >
> >
> >> I'm going to stop worrying so much about the performance of the new
> >> wifi algorithms.
>
> for at least 24 hours
>
> # To display the perf.data header info, please use
> --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 990 of event 'cpu-clock'
> # Event count (approx.): 9999999900
> #
> # Overhead Command Shared Object Symbol
> # ........ ........... ......................
> .........................................
> #
> 4.34% ksoftirqd/0 [ebtables] [k] ebt_do_table
Do you actually need ebtables? you should check....
> 2.93% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb_core
> 2.83% ksoftirqd/0 [ip_tables] [k] ipt_do_table
One should be careful just to blame "ebt_do_table" and "ipt_do_table",
because this can easily just be a symptom of a cache-miss on packet
data. On these low-end CPUs with small caches, one can even experience
cache-misses happening on the same packet different places in the
netstack.
> 2.53% ksoftirqd/0 [kernel.kallsyms] [k] nf_iterate
Upstream made changes in this area.
26dfab721629 ("netfilter: merge nf_iterate() into nf_hook_slow()")
https://git.kernel.org/davem/net-next/c/26dfab721629
What kernel release is this?
> 2.42% ksoftirqd/0 [mac80211] [k] ieee80211_tx_dequeue
> 2.02% ksoftirqd/0 [kernel.kallsyms] [k] __local_bh_enable_ip
> 2.02% ksoftirqd/0 [mac80211] [k] __ieee80211_subif_start_xmit
> 1.82% ksoftirqd/0 [kernel.kallsyms] [k] __dev_queue_xmit
> 1.72% ksoftirqd/0 [nf_conntrack] [k] tcp_packet
Notice this "tcp_packet" call happens in "nf_conntrack"
> 1.52% ksoftirqd/0 [mac80211] [k] ieee80211_prepare_and_rx_handle
> 1.52% ksoftirqd/0 [mac80211] [k] ieee80211_rx_handlers
> 1.41% ksoftirqd/0 [kernel.kallsyms] [k] br_handle_frame
> 1.41% ksoftirqd/0 [kernel.kallsyms] [k] ip_finish_output2
> 1.41% ksoftirqd/0 [kernel.kallsyms] [k] ip_forward
> 1.41% ksoftirqd/0 [nf_conntrack] [k] __nf_conntrack_find_get
If you sum things up, then conntrack seems to take a lot of time.
I find that the FlameGraphs are the best way to get such an overview.
> 1.31% ksoftirqd/0 [kernel.kallsyms] [k] format_decode
> 1.31% ksoftirqd/0 [mac80211] [k] ieee80211_xmit_fast_finish
> 1.21% ksoftirqd/0 [mac80211] [k] ieee80211_tx_h_select_key
> 1.11% ksoftirqd/0 [kernel.kallsyms] [k] __copy_user_common
> 1.11% ksoftirqd/0 [mac80211] [k] ieee80211_queue_skb
> 1.01% ksoftirqd/0 [ath10k_core] [k] ath10k_htt_txrx_compl_task
> 1.01% ksoftirqd/0 [ath10k_core] [k] ath10k_mac_tx.isra.35
> 1.01% ksoftirqd/0 [kernel.kallsyms] [k] dev_hard_start_xmit
> 1.01% ksoftirqd/0 [kernel.kallsyms] [k] eth_type_trans
> 1.01% ksoftirqd/0 [kernel.kallsyms] [k] netif_skb_features
> 1.01% ksoftirqd/0 [nf_conntrack_rtcache] [k] nf_rtcache_in.part.2
AFAIK nf_conntrack_rtcache is not upstream, which is too bad.
Felix Fietkau showed it improves performance a lot (slide 20):
http://people.netfilter.org/hawk/presentations/NetDev1.1_2016/links.html
> 0.91% ksoftirqd/0 [ath10k_pci] [k] ath10k_pci_hif_tx_sg
> 0.91% ksoftirqd/0 [kernel.kallsyms] [k] ip_rcv
> 0.91% ksoftirqd/0 [kernel.kallsyms] [k] memcmp
> 0.91% ksoftirqd/0 [mac80211] [k] ieee80211_rx_napi
> 0.81% ksoftirqd/0 [ath10k_core] [k] ath10k_dbg
> 0.81% ksoftirqd/0 [ath10k_core] [k] ath10k_htt_tx
> 0.81% ksoftirqd/0 [ath10k_pci] [k] ath10k_bus_pci_write32
> 0.81% ksoftirqd/0 [br_netfilter] [k] br_nf_pre_routing
> 0.81% ksoftirqd/0 [ebtable_broute] [k] ebt_broute
> 0.81% ksoftirqd/0 [kernel.kallsyms] [k] ag71xx_poll
> 0.81% ksoftirqd/0 [kernel.kallsyms] [k] ip_output
> 0.81% ksoftirqd/0 [kernel.kallsyms] [k] skb_get_hash_perturb
> 0.81% ksoftirqd/0 [mac80211] [k] sta_info_hash_lookup
> 0.81% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_helper
> 0.81% kworker/0:1 [kernel.kallsyms] [k] __delay
> 0.71% ksoftirqd/0 [kernel.kallsyms] [k] __bzero
> 0.71% ksoftirqd/0 [kernel.kallsyms] [k] __rmemcpy
> 0.71% ksoftirqd/0 [kernel.kallsyms] [k] _find_next_bit.part.0
> 0.71% ksoftirqd/0 [kernel.kallsyms] [k] ip_rcv_finish
> 0.71% ksoftirqd/0 [mac80211] [k] ieee80211_lookup_ra_sta
> 0.71% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
> 0.71% ksoftirqd/0 [nf_conntrack] [k] nf_conntrack_in
> 0.71% ksoftirqd/0 [nf_conntrack] [k] tcp_error
> 0.61% ksoftirqd/0 [br_netfilter] [k] br_nf_post_routing
> 0.61% ksoftirqd/0 [cfg80211] [k] cfg80211_classify8021d
> 0.61% ksoftirqd/0 [iptable_raw] [k] iptable_raw_hook
> 0.61% ksoftirqd/0 [kernel.kallsyms] [k] br_handle_frame_finish
> 0.61% ksoftirqd/0 [kernel.kallsyms] [k] netdev_pick_tx
> 0.61% ksoftirqd/0 [kernel.kallsyms] [k] r4k_dma_cache_wback_inv
> 0.61% ksoftirqd/0 [kernel.kallsyms] [k] skb_network_protocol
> 0.61% ksoftirqd/0 [nf_conntrack_rtcache] [k] nf_ct_rtcache_find_usable
> 0.61% ksoftirqd/0 [nf_nat_ipv4] [k] nf_nat_ipv4_fn
> 0.51% ksoftirqd/0 [ath10k_pci] [k] ath10k_ce_send_nolock
> 0.51% ksoftirqd/0 [kernel.kallsyms] [k] __skb_flow_get_ports
> 0.51% ksoftirqd/0 [kernel.kallsyms] [k] atomic64_add
Hmm, a 64bit add on a 32-bit platform?
> 0.51% ksoftirqd/0 [kernel.kallsyms] [k] br_dev_queue_push_xmit
> 0.51% ksoftirqd/0 [kernel.kallsyms] [k] br_fdb_update
> 0.51% ksoftirqd/0 [kernel.kallsyms] [k] dst_release
> 0.51% ksoftirqd/0 [kernel.kallsyms] [k] idr_get_empty_slot
> 0.51% ksoftirqd/0 [kernel.kallsyms] [k] r4k_dma_cache_inv
> 0.51% ksoftirqd/0 [nf_conntrack] [k] nf_ct_deliver_cached_events
> 0.51% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_conntrack_in
> 0.51% ksoftirqd/0 [nf_nat] [k] nf_nat_packet
> 0.40% ksoftirqd/0 [ath10k_core] [k] ath10k_dbg_dump
> 0.40% ksoftirqd/0 [ath10k_core] [k] ath10k_mac_op_wake_tx_queue
> 0.40% ksoftirqd/0 [ath10k_pci] [k] ath10k_bus_pci_read32
> 0.40% ksoftirqd/0 [ath10k_pci] [k] ath10k_ce_completed_send_next_nolock
> 0.40% ksoftirqd/0 [br_netfilter] [k] ip_sabotage_in
> 0.40% ksoftirqd/0 [kernel.kallsyms] [k] __br_fdb_get
> 0.40% ksoftirqd/0 [kernel.kallsyms] [k] __delay
> 0.40% ksoftirqd/0 [kernel.kallsyms] [k] __skb_flow_dissect
> 0.40% ksoftirqd/0 [kernel.kallsyms] [k] __wake_up_sync_key
> 0.40% ksoftirqd/0 [kernel.kallsyms] [k] memcpy
> 0.40% ksoftirqd/0 [kernel.kallsyms] [k] sch_direct_xmit
> 0.40% ksoftirqd/0 [kernel.kallsyms] [k] skb_release_data
> 0.40% ksoftirqd/0 [kernel.kallsyms] [k] validate_xmit_skb.isra.30.part.31
> 0.40% ksoftirqd/0 [mac80211] [k] fq_flow_classify.constprop.17
> 0.40% ksoftirqd/0 [mac80211] [k] ieee80211_tx_status
> 0.40% ksoftirqd/0 [nf_conntrack] [k] __nf_ct_refresh_acct
> 0.40% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_get_l4proto
> 0.40% ksoftirqd/0 [nf_conntrack_rtcache] [k] nf_rtcache_forward
> 0.40% ksoftirqd/0 [nf_conntrack_rtcache] [k] nf_rtcache_forward4
> 0.40% ksoftirqd/0 [nf_nat_ipv4] [k] nf_nat_ipv4_in
> 0.40% ksoftirqd/0 [nf_nat_ipv4] [k] nf_nat_ipv4_out
> 0.30% ksoftirqd/0 [ath10k_core] [k] __ath10k_htt_rx_ring_fill_n
> 0.30% ksoftirqd/0 [ath10k_core] [k] ath10k_htt_rx_h_deliver
> 0.30% ksoftirqd/0 [ath10k_core] [k] ath10k_htt_rx_h_mpdu
> 0.30% ksoftirqd/0 [ath10k_core] [k] ath10k_mac_tx_push_txq
> 0.30% ksoftirqd/0 [ath10k_pci] [k] ath10k_ce_completed_send_next
> 0.30% ksoftirqd/0 [ath10k_pci] [k] ath10k_pci_write32
> 0.30% ksoftirqd/0 [br_netfilter] [k] br_nf_dev_xmit
> 0.30% ksoftirqd/0 [br_netfilter] [k] nf_bridge_encap_header_len
> 0.30% ksoftirqd/0 [cfg80211] [k] ieee80211_hdrlen
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] __free_page_frag
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] ag71xx_hard_start_xmit
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] br_dev_xmit
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] br_pass_frame_up
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] build_skb
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] idr_alloc
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] inet_gro_receive
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] ip_finish_output
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] nf_hook_slow
> 0.30% ksoftirqd/0 [kernel.kallsyms] [k] tcp_gro_receive
> 0.30% ksoftirqd/0 [mac80211] [k] codel_dequeue_func
> 0.30% ksoftirqd/0 [mac80211] [k] fq_flow_dequeue
> 0.30% ksoftirqd/0 [mac80211] [k] ieee80211_drop_unencrypted.part.3
> 0.30% ksoftirqd/0 [mac80211] [k] ieee80211_get_bssid
> 0.30% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_confirm
> 0.30% ksoftirqd/0 [nf_defrag_ipv4] [k] ipv4_conntrack_defrag
> 0.20% ksoftirqd/0 [ath10k_core] [k] ath10k_txrx_tx_unref
> 0.20% ksoftirqd/0 [ath10k_pci] [k] ath10k_pci_htt_rx_cb
> 0.20% ksoftirqd/0 [br_netfilter] [k] br_nf_local_in
> 0.20% ksoftirqd/0 [ebtable_filter] [k] ebt_out_hook
> 0.20% ksoftirqd/0 [ebtable_nat] [k] ebt_nat_in
> 0.20% ksoftirqd/0 [ebtable_nat] [k] ebt_nat_out
> 0.20% ksoftirqd/0 [iptable_mangle] [k] iptable_mangle_hook
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] __dma_sync
Interesting __dma_sync does not take more than 0.2%.
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] __free_pages_ok
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] __slab_alloc.isra.13.constprop.17
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] __slab_free.isra.14
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] br_deliver
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] finish_task_switch
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] ipv4_dst_check
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] is_skb_forwardable
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_free
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] ktime_get
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] net_rx_action
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] skb_release_head_state
> 0.20% ksoftirqd/0 [kernel.kallsyms] [k] vsnprintf
> 0.20% ksoftirqd/0 [mac80211] [k] ieee80211_deliver_skb
> 0.20% ksoftirqd/0 [mac80211] [k] ieee80211_select_queue
> 0.20% ksoftirqd/0 [mac80211] [k] ieee80211_skb_resize
> 0.20% ksoftirqd/0 [nf_conntrack_rtcache] [k] nf_rtcache_in4
> 0.10% hostapd [kernel.kallsyms] [k] __copy_user_common
> 0.10% hostapd [kernel.kallsyms] [k] core_sys_select
> 0.10% hostapd [kernel.kallsyms] [k] datagram_poll
> 0.10% hostapd [kernel.kallsyms] [k] do_select
> 0.10% hostapd [kernel.kallsyms] [k] finish_task_switch
> 0.10% hostapd [kernel.kallsyms] [k] handle_sys
> 0.10% hostapd [kernel.kallsyms] [k] hrtimer_try_to_cancel
> 0.10% hostapd [kernel.kallsyms] [k] netlink_recvmsg
> 0.10% hostapd [kernel.kallsyms] [k] poll_initwait
> 0.10% hostapd [kernel.kallsyms] [k] schedule_hrtimeout_range_clock
> 0.10% hostapd [kernel.kallsyms] [k] sock_poll
> 0.10% hostapd [kernel.kallsyms] [k] timespec_add_safe
> 0.10% hostapd libc.so [.] 0x0000ef74
> 0.10% hostapd libc.so [.] 0x000288e4
> 0.10% hostapd libc.so [.] 0x000722fc
> 0.10% hostapd wpad [.] 0x0001ab97
> 0.10% hostapd wpad [.] 0x0001b1a1
> 0.10% hostapd wpad [.] 0x0001cbd9
> 0.10% hostapd wpad [.] 0x0001cd0d
> 0.10% hostapd wpad [.] 0x0004c589
> 0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_htc_rx_completion_handler
> 0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_htt_rx_amsdu_allowed.isra.4
> 0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_htt_tx_inc_pending
> 0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_htt_tx_txq_update
> 0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_mac_tx_frm_has_freq
> 0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_mac_tx_h_get_txmode.isra.4
> 0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_mac_tx_push_pending
> 0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_process_rx
> 0.10% ksoftirqd/0 [ath10k_pci] [k] ath10k_ce_completed_recv_next_nolock
> 0.10% ksoftirqd/0 [ath9k] [k] ath_rx_tasklet
> 0.10% ksoftirqd/0 [ath9k_common] [k] ath9k_cmn_process_rssi
> 0.10% ksoftirqd/0 [ath9k_hw] [k] ath9k_hw_numtxpending
> 0.10% ksoftirqd/0 [cfg80211] [k] __ieee80211_data_to_8023
> 0.10% ksoftirqd/0 [cfg80211] [k] cfg80211_rx_mgmt
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] __build_skb
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] __do_softirq
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] ag71xx_tx_packets
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] c0_hpt_read
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] consume_skb
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] idr_mark_full
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] idr_remove
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] ip_forward_finish
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_alloc
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] memmove
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] mod_timer
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] number.isra.12
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] passthru_features_check
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] skb_copy_bits
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] skb_pull
> 0.10% ksoftirqd/0 [kernel.kallsyms] [k] skb_push
> 0.10% ksoftirqd/0 [mac80211] [k] __ieee80211_beacon_add_tim
> 0.10% ksoftirqd/0 [mac80211] [k] codel_should_drop.isra.4.constprop.20
> 0.10% ksoftirqd/0 [mac80211] [k] ieee80211_report_used_skb
> 0.10% ksoftirqd/0 [mac80211] [k] ieee80211_subif_start_xmit
> 0.10% ksoftirqd/0 [mac80211] [k] remove_monitor_info
> 0.10% ksoftirqd/0 [nf_conntrack] [k] nf_ct_get_tuple
> 0.10% ksoftirqd/0 [nf_conntrack] [k] tcp_get_timeouts
> 0.10% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_pkt_to_tuple
> 0.10% ksoftirqd/0 [nf_nat] [k] nf_ct_nat_ext_add
No single function "jumps-out", but the amount of function calls for
sending packets are astonishing, I'm sure that the icache is also
hurting.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Make-wifi-fast] a bit of profiling on the archer
2016-11-18 7:55 ` Jesper Dangaard Brouer
` (2 preceding siblings ...)
2016-11-19 17:33 ` Dave Taht
@ 2016-11-19 18:30 ` Dave Taht
2016-11-21 16:24 ` Jesper Dangaard Brouer
3 siblings, 1 reply; 10+ messages in thread
From: Dave Taht @ 2016-11-19 18:30 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: make-wifi-fast
Jesper/eric:
If you have any satisfying network oneliners and stats to monitor via
perf along the lines of
http://www.brendangregg.com/perf.html#OneLiners
it would be helpful longer term.
The mips platforms have sprouted more events than it ever had before
(well, until last year, perf didn't work at all):
Of these besides the cpu-cycles thing, the only stuff I've looked at
are various wake_tx_queue related events, and I have a scar involved
in unaligned_accesses (but we're not doing any, soo)
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-store-misses [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
L1-icache-prefetches [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
rNNN [Raw hardware
event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware
event descriptor]
mem:<addr>[/len][:access] [Hardware breakpoint]
ath10k:ath10k_htt_pktlog [Tracepoint event]
ath10k:ath10k_htt_rx_desc [Tracepoint event]
ath10k:ath10k_htt_stats [Tracepoint event]
ath10k:ath10k_htt_tx [Tracepoint event]
ath10k:ath10k_log_dbg [Tracepoint event]
ath10k:ath10k_log_dbg_dump [Tracepoint event]
ath10k:ath10k_log_err [Tracepoint event]
ath10k:ath10k_log_info [Tracepoint event]
ath10k:ath10k_log_warn [Tracepoint event]
ath10k:ath10k_rx_hdr [Tracepoint event]
ath10k:ath10k_rx_payload [Tracepoint event]
ath10k:ath10k_tx_hdr [Tracepoint event]
ath10k:ath10k_tx_payload [Tracepoint event]
ath10k:ath10k_txrx_tx_unref [Tracepoint event]
ath10k:ath10k_wmi_cmd [Tracepoint event]
ath10k:ath10k_wmi_dbglog [Tracepoint event]
ath10k:ath10k_wmi_diag [Tracepoint event]
ath10k:ath10k_wmi_diag_container [Tracepoint event]
ath10k:ath10k_wmi_event [Tracepoint event]
ath:ath_log [Tracepoint event]
block:block_bio_backmerge [Tracepoint event]
block:block_bio_bounce [Tracepoint event]
block:block_bio_complete [Tracepoint event]
block:block_bio_frontmerge [Tracepoint event]
block:block_bio_queue [Tracepoint event]
block:block_bio_remap [Tracepoint event]
block:block_dirty_buffer [Tracepoint event]
block:block_getrq [Tracepoint event]
block:block_plug [Tracepoint event]
block:block_rq_abort [Tracepoint event]
block:block_rq_complete [Tracepoint event]
block:block_rq_insert [Tracepoint event]
block:block_rq_issue [Tracepoint event]
block:block_rq_remap [Tracepoint event]
block:block_rq_requeue [Tracepoint event]
block:block_sleeprq [Tracepoint event]
block:block_split [Tracepoint event]
block:block_touch_buffer [Tracepoint event]
block:block_unplug [Tracepoint event]
cfg80211:cfg80211_cac_event [Tracepoint event]
cfg80211:cfg80211_ch_switch_notify [Tracepoint event]
cfg80211:cfg80211_ch_switch_started_notify [Tracepoint event]
cfg80211:cfg80211_chandef_dfs_required [Tracepoint event]
cfg80211:cfg80211_cqm_pktloss_notify [Tracepoint event]
cfg80211:cfg80211_cqm_rssi_notify [Tracepoint event]
cfg80211:cfg80211_del_sta [Tracepoint event]
cfg80211:cfg80211_ft_event [Tracepoint event]
cfg80211:cfg80211_get_bss [Tracepoint event]
cfg80211:cfg80211_gtk_rekey_notify [Tracepoint event]
cfg80211:cfg80211_ibss_joined [Tracepoint event]
cfg80211:cfg80211_inform_bss_frame [Tracepoint event]
cfg80211:cfg80211_mgmt_tx_status [Tracepoint event]
cfg80211:cfg80211_michael_mic_failure [Tracepoint event]
cfg80211:cfg80211_new_sta [Tracepoint event]
cfg80211:cfg80211_notify_new_peer_candidate [Tracepoint event]
cfg80211:cfg80211_pmksa_candidate_notify [Tracepoint event]
cfg80211:cfg80211_probe_status [Tracepoint event]
cfg80211:cfg80211_radar_event [Tracepoint event]
cfg80211:cfg80211_ready_on_channel [Tracepoint event]
cfg80211:cfg80211_ready_on_channel_expired [Tracepoint event]
cfg80211:cfg80211_reg_can_beacon [Tracepoint event]
cfg80211:cfg80211_report_obss_beacon [Tracepoint event]
cfg80211:cfg80211_report_wowlan_wakeup [Tracepoint event]
cfg80211:cfg80211_return_bool [Tracepoint event]
cfg80211:cfg80211_return_bss [Tracepoint event]
cfg80211:cfg80211_return_u32 [Tracepoint event]
cfg80211:cfg80211_return_uint [Tracepoint event]
cfg80211:cfg80211_rx_mgmt [Tracepoint event]
cfg80211:cfg80211_rx_mlme_mgmt [Tracepoint event]
cfg80211:cfg80211_rx_spurious_frame [Tracepoint event]
cfg80211:cfg80211_rx_unexpected_4addr_frame [Tracepoint event]
cfg80211:cfg80211_rx_unprot_mlme_mgmt [Tracepoint event]
cfg80211:cfg80211_scan_done [Tracepoint event]
cfg80211:cfg80211_sched_scan_results [Tracepoint event]
cfg80211:cfg80211_sched_scan_stopped [Tracepoint event]
cfg80211:cfg80211_send_assoc_timeout [Tracepoint event]
cfg80211:cfg80211_send_auth_timeout [Tracepoint event]
cfg80211:cfg80211_send_rx_assoc [Tracepoint event]
cfg80211:cfg80211_send_rx_auth [Tracepoint event]
cfg80211:cfg80211_stop_iface [Tracepoint event]
cfg80211:cfg80211_tdls_oper_request [Tracepoint event]
cfg80211:cfg80211_tx_mlme_mgmt [Tracepoint event]
cfg80211:rdev_abort_scan [Tracepoint event]
cfg80211:rdev_add_key [Tracepoint event]
cfg80211:rdev_add_mpath [Tracepoint event]
cfg80211:rdev_add_nan_func [Tracepoint event]
cfg80211:rdev_add_station [Tracepoint event]
cfg80211:rdev_add_tx_ts [Tracepoint event]
cfg80211:rdev_add_virtual_intf [Tracepoint event]
cfg80211:rdev_assoc [Tracepoint event]
cfg80211:rdev_auth [Tracepoint event]
cfg80211:rdev_cancel_remain_on_channel [Tracepoint event]
cfg80211:rdev_change_beacon [Tracepoint event]
cfg80211:rdev_change_bss [Tracepoint event]
cfg80211:rdev_change_mpath [Tracepoint event]
cfg80211:rdev_change_station [Tracepoint event]
cfg80211:rdev_change_virtual_intf [Tracepoint event]
cfg80211:rdev_channel_switch [Tracepoint event]
cfg80211:rdev_connect [Tracepoint event]
cfg80211:rdev_crit_proto_start [Tracepoint event]
cfg80211:rdev_crit_proto_stop [Tracepoint event]
cfg80211:rdev_deauth [Tracepoint event]
cfg80211:rdev_del_key [Tracepoint event]
cfg80211:rdev_del_mpath [Tracepoint event]
cfg80211:rdev_del_nan_func [Tracepoint event]
cfg80211:rdev_del_pmksa [Tracepoint event]
cfg80211:rdev_del_station [Tracepoint event]
cfg80211:rdev_del_tx_ts [Tracepoint event]
cfg80211:rdev_del_virtual_intf [Tracepoint event]
cfg80211:rdev_disassoc [Tracepoint event]
cfg80211:rdev_disconnect [Tracepoint event]
cfg80211:rdev_dump_mpath [Tracepoint event]
cfg80211:rdev_dump_mpp [Tracepoint event]
cfg80211:rdev_dump_station [Tracepoint event]
cfg80211:rdev_dump_survey [Tracepoint event]
cfg80211:rdev_flush_pmksa [Tracepoint event]
cfg80211:rdev_get_antenna [Tracepoint event]
cfg80211:rdev_get_channel [Tracepoint event]
cfg80211:rdev_get_key [Tracepoint event]
cfg80211:rdev_get_mesh_config [Tracepoint event]
cfg80211:rdev_get_mpath [Tracepoint event]
cfg80211:rdev_get_mpp [Tracepoint event]
cfg80211:rdev_get_station [Tracepoint event]
cfg80211:rdev_get_tx_power [Tracepoint event]
cfg80211:rdev_join_ibss [Tracepoint event]
cfg80211:rdev_join_mesh [Tracepoint event]
cfg80211:rdev_join_ocb [Tracepoint event]
cfg80211:rdev_leave_ibss [Tracepoint event]
cfg80211:rdev_leave_mesh [Tracepoint event]
cfg80211:rdev_leave_ocb [Tracepoint event]
cfg80211:rdev_libertas_set_mesh_channel [Tracepoint event]
cfg80211:rdev_mgmt_frame_register [Tracepoint event]
cfg80211:rdev_mgmt_tx [Tracepoint event]
cfg80211:rdev_mgmt_tx_cancel_wait [Tracepoint event]
cfg80211:rdev_nan_change_conf [Tracepoint event]
cfg80211:rdev_probe_client [Tracepoint event]
cfg80211:rdev_remain_on_channel [Tracepoint event]
cfg80211:rdev_resume [Tracepoint event]
cfg80211:rdev_return_chandef [Tracepoint event]
cfg80211:rdev_return_int [Tracepoint event]
cfg80211:rdev_return_int_cookie [Tracepoint event]
cfg80211:rdev_return_int_int [Tracepoint event]
cfg80211:rdev_return_int_mesh_config [Tracepoint event]
cfg80211:rdev_return_int_mpath_info [Tracepoint event]
cfg80211:rdev_return_int_station_info [Tracepoint event]
cfg80211:rdev_return_int_survey_info [Tracepoint event]
cfg80211:rdev_return_int_tx_rx [Tracepoint event]
cfg80211:rdev_return_void [Tracepoint event]
cfg80211:rdev_return_void_tx_rx [Tracepoint event]
cfg80211:rdev_return_wdev [Tracepoint event]
cfg80211:rdev_rfkill_poll [Tracepoint event]
cfg80211:rdev_scan [Tracepoint event]
cfg80211:rdev_sched_scan_start [Tracepoint event]
cfg80211:rdev_sched_scan_stop [Tracepoint event]
cfg80211:rdev_set_antenna [Tracepoint event]
cfg80211:rdev_set_ap_chanwidth [Tracepoint event]
cfg80211:rdev_set_bitrate_mask [Tracepoint event]
cfg80211:rdev_set_coalesce [Tracepoint event]
cfg80211:rdev_set_cqm_rssi_config [Tracepoint event]
cfg80211:rdev_set_cqm_txe_config [Tracepoint event]
cfg80211:rdev_set_default_key [Tracepoint event]
cfg80211:rdev_set_default_mgmt_key [Tracepoint event]
cfg80211:rdev_set_mac_acl [Tracepoint event]
cfg80211:rdev_set_mcast_rate [Tracepoint event]
cfg80211:rdev_set_monitor_channel [Tracepoint event]
cfg80211:rdev_set_noack_map [Tracepoint event]
cfg80211:rdev_set_pmksa [Tracepoint event]
cfg80211:rdev_set_power_mgmt [Tracepoint event]
cfg80211:rdev_set_qos_map [Tracepoint event]
cfg80211:rdev_set_rekey_data [Tracepoint event]
cfg80211:rdev_set_tx_power [Tracepoint event]
cfg80211:rdev_set_txq_params [Tracepoint event]
cfg80211:rdev_set_wakeup [Tracepoint event]
cfg80211:rdev_set_wds_peer [Tracepoint event]
cfg80211:rdev_set_wiphy_params [Tracepoint event]
cfg80211:rdev_start_ap [Tracepoint event]
cfg80211:rdev_start_nan [Tracepoint event]
cfg80211:rdev_start_p2p_device [Tracepoint event]
cfg80211:rdev_start_radar_detection [Tracepoint event]
cfg80211:rdev_stop_ap [Tracepoint event]
cfg80211:rdev_stop_nan [Tracepoint event]
cfg80211:rdev_stop_p2p_device [Tracepoint event]
cfg80211:rdev_suspend [Tracepoint event]
cfg80211:rdev_tdls_cancel_channel_switch [Tracepoint event]
cfg80211:rdev_tdls_channel_switch [Tracepoint event]
cfg80211:rdev_tdls_mgmt [Tracepoint event]
cfg80211:rdev_tdls_oper [Tracepoint event]
cfg80211:rdev_testmode_cmd [Tracepoint event]
cfg80211:rdev_testmode_dump [Tracepoint event]
cfg80211:rdev_update_ft_ies [Tracepoint event]
cfg80211:rdev_update_mesh_config [Tracepoint event]
clk:clk_disable [Tracepoint event]
clk:clk_disable_complete [Tracepoint event]
clk:clk_enable [Tracepoint event]
clk:clk_enable_complete [Tracepoint event]
clk:clk_prepare [Tracepoint event]
clk:clk_prepare_complete [Tracepoint event]
clk:clk_set_parent [Tracepoint event]
clk:clk_set_parent_complete [Tracepoint event]
clk:clk_set_phase [Tracepoint event]
clk:clk_set_phase_complete [Tracepoint event]
clk:clk_set_rate [Tracepoint event]
clk:clk_set_rate_complete [Tracepoint event]
clk:clk_unprepare [Tracepoint event]
clk:clk_unprepare_complete [Tracepoint event]
fib:fib_table_lookup [Tracepoint event]
fib:fib_table_lookup_nh [Tracepoint event]
fib:fib_validate_source [Tracepoint event]
filelock:break_lease_block [Tracepoint event]
filelock:break_lease_noblock [Tracepoint event]
filelock:break_lease_unblock [Tracepoint event]
filelock:generic_add_lease [Tracepoint event]
filelock:generic_delete_lease [Tracepoint event]
filelock:time_out_leases [Tracepoint event]
filemap:mm_filemap_add_to_page_cache [Tracepoint event]
filemap:mm_filemap_delete_from_page_cache [Tracepoint event]
i2c:i2c_read [Tracepoint event]
i2c:i2c_reply [Tracepoint event]
i2c:i2c_result [Tracepoint event]
i2c:i2c_write [Tracepoint event]
i2c:smbus_read [Tracepoint event]
i2c:smbus_reply [Tracepoint event]
i2c:smbus_result [Tracepoint event]
i2c:smbus_write [Tracepoint event]
irq:irq_handler_entry [Tracepoint event]
irq:irq_handler_exit [Tracepoint event]
irq:softirq_entry [Tracepoint event]
irq:softirq_exit [Tracepoint event]
irq:softirq_raise [Tracepoint event]
kmem:kfree [Tracepoint event]
kmem:kmalloc [Tracepoint event]
kmem:kmalloc_node [Tracepoint event]
kmem:kmem_cache_alloc [Tracepoint event]
kmem:kmem_cache_alloc_node [Tracepoint event]
kmem:kmem_cache_free [Tracepoint event]
kmem:mm_page_alloc [Tracepoint event]
kmem:mm_page_alloc_extfrag [Tracepoint event]
kmem:mm_page_alloc_zone_locked [Tracepoint event]
kmem:mm_page_free [Tracepoint event]
kmem:mm_page_free_batched [Tracepoint event]
kmem:mm_page_pcpu_drain [Tracepoint event]
mac80211:api_beacon_loss [Tracepoint event]
mac80211:api_chswitch_done [Tracepoint event]
mac80211:api_connection_loss [Tracepoint event]
mac80211:api_cqm_beacon_loss_notify [Tracepoint event]
mac80211:api_cqm_rssi_notify [Tracepoint event]
mac80211:api_enable_rssi_reports [Tracepoint event]
mac80211:api_eosp [Tracepoint event]
mac80211:api_gtk_rekey_notify [Tracepoint event]
mac80211:api_radar_detected [Tracepoint event]
mac80211:api_ready_on_channel [Tracepoint event]
mac80211:api_remain_on_channel_expired [Tracepoint event]
mac80211:api_restart_hw [Tracepoint event]
mac80211:api_scan_completed [Tracepoint event]
mac80211:api_sched_scan_results [Tracepoint event]
mac80211:api_sched_scan_stopped [Tracepoint event]
mac80211:api_send_eosp_nullfunc [Tracepoint event]
mac80211:api_sta_block_awake [Tracepoint event]
mac80211:api_sta_set_buffered [Tracepoint event]
mac80211:api_start_tx_ba_cb [Tracepoint event]
mac80211:api_start_tx_ba_session [Tracepoint event]
mac80211:api_stop_tx_ba_cb [Tracepoint event]
mac80211:api_stop_tx_ba_session [Tracepoint event]
mac80211:drv_add_chanctx [Tracepoint event]
mac80211:drv_add_interface [Tracepoint event]
mac80211:drv_add_nan_func [Tracepoint event]
mac80211:drv_allow_buffered_frames [Tracepoint event]
mac80211:drv_ampdu_action [Tracepoint event]
mac80211:drv_assign_vif_chanctx [Tracepoint event]
mac80211:drv_bss_info_changed [Tracepoint event]
mac80211:drv_cancel_hw_scan [Tracepoint event]
mac80211:drv_cancel_remain_on_channel [Tracepoint event]
mac80211:drv_change_chanctx [Tracepoint event]
mac80211:drv_change_interface [Tracepoint event]
mac80211:drv_channel_switch [Tracepoint event]
mac80211:drv_channel_switch_beacon [Tracepoint event]
mac80211:drv_conf_tx [Tracepoint event]
mac80211:drv_config [Tracepoint event]
mac80211:drv_config_iface_filter [Tracepoint event]
mac80211:drv_configure_filter [Tracepoint event]
mac80211:drv_del_nan_func [Tracepoint event]
mac80211:drv_event_callback [Tracepoint event]
mac80211:drv_flush [Tracepoint event]
mac80211:drv_get_antenna [Tracepoint event]
mac80211:drv_get_et_sset_count [Tracepoint event]
mac80211:drv_get_et_stats [Tracepoint event]
mac80211:drv_get_et_strings [Tracepoint event]
mac80211:drv_get_expected_throughput [Tracepoint event]
mac80211:drv_get_key_seq [Tracepoint event]
mac80211:drv_get_ringparam [Tracepoint event]
mac80211:drv_get_stats [Tracepoint event]
mac80211:drv_get_survey [Tracepoint event]
mac80211:drv_get_tsf [Tracepoint event]
mac80211:drv_get_txpower [Tracepoint event]
mac80211:drv_hw_scan [Tracepoint event]
mac80211:drv_ipv6_addr_change [Tracepoint event]
mac80211:drv_join_ibss [Tracepoint event]
mac80211:drv_leave_ibss [Tracepoint event]
mac80211:drv_mgd_prepare_tx [Tracepoint event]
mac80211:drv_mgd_protect_tdls_discover [Tracepoint event]
mac80211:drv_nan_change_conf [Tracepoint event]
mac80211:drv_offchannel_tx_cancel_wait [Tracepoint event]
mac80211:drv_offset_tsf [Tracepoint event]
mac80211:drv_post_channel_switch [Tracepoint event]
mac80211:drv_pre_channel_switch [Tracepoint event]
mac80211:drv_prepare_multicast [Tracepoint event]
mac80211:drv_reconfig_complete [Tracepoint event]
mac80211:drv_release_buffered_frames [Tracepoint event]
mac80211:drv_remain_on_channel [Tracepoint event]
mac80211:drv_remove_chanctx [Tracepoint event]
mac80211:drv_remove_interface [Tracepoint event]
mac80211:drv_reset_tsf [Tracepoint event]
mac80211:drv_resume [Tracepoint event]
mac80211:drv_return_bool [Tracepoint event]
mac80211:drv_return_int [Tracepoint event]
mac80211:drv_return_u32 [Tracepoint event]
mac80211:drv_return_u64 [Tracepoint event]
mac80211:drv_return_void [Tracepoint event]
mac80211:drv_sched_scan_start [Tracepoint event]
mac80211:drv_sched_scan_stop [Tracepoint event]
mac80211:drv_set_antenna [Tracepoint event]
mac80211:drv_set_bitrate_mask [Tracepoint event]
mac80211:drv_set_coverage_class [Tracepoint event]
mac80211:drv_set_default_unicast_key [Tracepoint event]
mac80211:drv_set_frag_threshold [Tracepoint event]
mac80211:drv_set_key [Tracepoint event]
mac80211:drv_set_rekey_data [Tracepoint event]
mac80211:drv_set_ringparam [Tracepoint event]
mac80211:drv_set_rts_threshold [Tracepoint event]
mac80211:drv_set_tim [Tracepoint event]
mac80211:drv_set_tsf [Tracepoint event]
mac80211:drv_set_wakeup [Tracepoint event]
mac80211:drv_sta_add [Tracepoint event]
mac80211:drv_sta_notify [Tracepoint event]
mac80211:drv_sta_pre_rcu_remove [Tracepoint event]
mac80211:drv_sta_rate_tbl_update [Tracepoint event]
mac80211:drv_sta_rc_update [Tracepoint event]
mac80211:drv_sta_remove [Tracepoint event]
mac80211:drv_sta_state [Tracepoint event]
mac80211:drv_sta_statistics [Tracepoint event]
mac80211:drv_start [Tracepoint event]
mac80211:drv_start_ap [Tracepoint event]
mac80211:drv_start_nan [Tracepoint event]
mac80211:drv_stop [Tracepoint event]
mac80211:drv_stop_ap [Tracepoint event]
mac80211:drv_stop_nan [Tracepoint event]
mac80211:drv_suspend [Tracepoint event]
mac80211:drv_sw_scan_complete [Tracepoint event]
mac80211:drv_sw_scan_start [Tracepoint event]
mac80211:drv_switch_vif_chanctx [Tracepoint event]
mac80211:drv_sync_rx_queues [Tracepoint event]
mac80211:drv_tdls_cancel_channel_switch [Tracepoint event]
mac80211:drv_tdls_channel_switch [Tracepoint event]
mac80211:drv_tdls_recv_channel_switch [Tracepoint event]
mac80211:drv_tx_frames_pending [Tracepoint event]
mac80211:drv_tx_last_beacon [Tracepoint event]
mac80211:drv_unassign_vif_chanctx [Tracepoint event]
mac80211:drv_update_tkip_key [Tracepoint event]
mac80211:drv_wake_tx_queue [Tracepoint event]
mac80211:stop_queue [Tracepoint event]
mac80211:wake_queue [Tracepoint event]
module:module_free [Tracepoint event]
module:module_get [Tracepoint event]
module:module_load [Tracepoint event]
module:module_put [Tracepoint event]
module:module_request [Tracepoint event]
napi:napi_poll [Tracepoint event]
net:napi_gro_frags_entry [Tracepoint event]
net:napi_gro_receive_entry [Tracepoint event]
net:net_dev_queue [Tracepoint event]
net:net_dev_start_xmit [Tracepoint event]
net:net_dev_xmit [Tracepoint event]
net:netif_receive_skb [Tracepoint event]
net:netif_receive_skb_entry [Tracepoint event]
net:netif_rx [Tracepoint event]
net:netif_rx_entry [Tracepoint event]
net:netif_rx_ni_entry [Tracepoint event]
oom:oom_score_adj_update [Tracepoint event]
pagemap:mm_lru_activate [Tracepoint event]
pagemap:mm_lru_insertion [Tracepoint event]
power:clock_disable [Tracepoint event]
power:clock_enable [Tracepoint event]
power:clock_set_rate [Tracepoint event]
power:cpu_frequency [Tracepoint event]
power:cpu_idle [Tracepoint event]
power:dev_pm_qos_add_request [Tracepoint event]
power:dev_pm_qos_remove_request [Tracepoint event]
power:dev_pm_qos_update_request [Tracepoint event]
power:device_pm_callback_end [Tracepoint event]
power:device_pm_callback_start [Tracepoint event]
power:pm_qos_add_request [Tracepoint event]
power:pm_qos_remove_request [Tracepoint event]
power:pm_qos_update_flags [Tracepoint event]
power:pm_qos_update_request [Tracepoint event]
power:pm_qos_update_request_timeout [Tracepoint event]
power:pm_qos_update_target [Tracepoint event]
power:power_domain_target [Tracepoint event]
power:pstate_sample [Tracepoint event]
power:suspend_resume [Tracepoint event]
power:wakeup_source_activate [Tracepoint event]
power:wakeup_source_deactivate [Tracepoint event]
printk:console [Tracepoint event]
random:add_device_randomness [Tracepoint event]
random:add_disk_randomness [Tracepoint event]
random:add_input_randomness [Tracepoint event]
random:credit_entropy_bits [Tracepoint event]
random:debit_entropy [Tracepoint event]
random:extract_entropy [Tracepoint event]
random:extract_entropy_user [Tracepoint event]
random:get_random_bytes [Tracepoint event]
random:get_random_bytes_arch [Tracepoint event]
random:mix_pool_bytes [Tracepoint event]
random:mix_pool_bytes_nolock [Tracepoint event]
random:push_to_pool [Tracepoint event]
random:random_read [Tracepoint event]
random:urandom_read [Tracepoint event]
random:xfer_secondary_pool [Tracepoint event]
raw_syscalls:sys_enter [Tracepoint event]
raw_syscalls:sys_exit [Tracepoint event]
rcu:rcu_utilization [Tracepoint event]
sched:sched_kthread_stop [Tracepoint event]
sched:sched_kthread_stop_ret [Tracepoint event]
sched:sched_migrate_task [Tracepoint event]
sched:sched_move_numa [Tracepoint event]
sched:sched_pi_setprio [Tracepoint event]
sched:sched_process_exec [Tracepoint event]
sched:sched_process_exit [Tracepoint event]
sched:sched_process_fork [Tracepoint event]
sched:sched_process_free [Tracepoint event]
sched:sched_process_wait [Tracepoint event]
sched:sched_stat_blocked [Tracepoint event]
sched:sched_stat_iowait [Tracepoint event]
sched:sched_stat_runtime [Tracepoint event]
sched:sched_stat_sleep [Tracepoint event]
sched:sched_stat_wait [Tracepoint event]
sched:sched_stick_numa [Tracepoint event]
sched:sched_swap_numa [Tracepoint event]
sched:sched_switch [Tracepoint event]
sched:sched_wait_task [Tracepoint event]
sched:sched_wake_idle_without_ipi [Tracepoint event]
sched:sched_wakeup [Tracepoint event]
sched:sched_wakeup_new [Tracepoint event]
sched:sched_waking [Tracepoint event]
signal:signal_deliver [Tracepoint event]
signal:signal_generate [Tracepoint event]
skb:consume_skb [Tracepoint event]
skb:kfree_skb [Tracepoint event]
skb:skb_copy_datagram_iovec [Tracepoint event]
sock:sock_exceed_buf_limit [Tracepoint event]
sock:sock_rcvqueue_full [Tracepoint event]
spi:spi_master_busy [Tracepoint event]
spi:spi_master_idle [Tracepoint event]
spi:spi_message_done [Tracepoint event]
spi:spi_message_start [Tracepoint event]
spi:spi_message_submit [Tracepoint event]
spi:spi_transfer_start [Tracepoint event]
spi:spi_transfer_stop [Tracepoint event]
task:task_newtask [Tracepoint event]
task:task_rename [Tracepoint event]
timer:hrtimer_cancel [Tracepoint event]
timer:hrtimer_expire_entry [Tracepoint event]
timer:hrtimer_expire_exit [Tracepoint event]
timer:hrtimer_init [Tracepoint event]
timer:hrtimer_start [Tracepoint event]
timer:itimer_expire [Tracepoint event]
timer:itimer_state [Tracepoint event]
timer:timer_cancel [Tracepoint event]
timer:timer_expire_entry [Tracepoint event]
timer:timer_expire_exit [Tracepoint event]
timer:timer_init [Tracepoint event]
timer:timer_start [Tracepoint event]
udp:udp_fail_queue_rcv_skb [Tracepoint event]
vmscan:mm_shrink_slab_end [Tracepoint event]
vmscan:mm_shrink_slab_start [Tracepoint event]
vmscan:mm_vmscan_direct_reclaim_begin [Tracepoint event]
vmscan:mm_vmscan_direct_reclaim_end [Tracepoint event]
vmscan:mm_vmscan_kswapd_sleep [Tracepoint event]
vmscan:mm_vmscan_kswapd_wake [Tracepoint event]
vmscan:mm_vmscan_lru_isolate [Tracepoint event]
vmscan:mm_vmscan_lru_shrink_inactive [Tracepoint event]
vmscan:mm_vmscan_memcg_isolate [Tracepoint event]
vmscan:mm_vmscan_memcg_reclaim_begin [Tracepoint event]
vmscan:mm_vmscan_memcg_reclaim_end [Tracepoint event]
vmscan:mm_vmscan_memcg_softlimit_reclaim_begin [Tracepoint event]
vmscan:mm_vmscan_memcg_softlimit_reclaim_end [Tracepoint event]
vmscan:mm_vmscan_wakeup_kswapd [Tracepoint event]
vmscan:mm_vmscan_writepage [Tracepoint event]
workqueue:workqueue_activate_work [Tracepoint event]
workqueue:workqueue_execute_end [Tracepoint event]
workqueue:workqueue_execute_start [Tracepoint event]
workqueue:workqueue_queue_work [Tracepoint event]
writeback:balance_dirty_pages [Tracepoint event]
writeback:bdi_dirty_ratelimit [Tracepoint event]
writeback:global_dirty_state [Tracepoint event]
writeback:wbc_writepage [Tracepoint event]
writeback:writeback_bdi_register [Tracepoint event]
writeback:writeback_congestion_wait [Tracepoint event]
writeback:writeback_dirty_inode [Tracepoint event]
writeback:writeback_dirty_inode_enqueue [Tracepoint event]
writeback:writeback_dirty_inode_start [Tracepoint event]
writeback:writeback_dirty_page [Tracepoint event]
writeback:writeback_exec [Tracepoint event]
writeback:writeback_lazytime [Tracepoint event]
writeback:writeback_lazytime_iput [Tracepoint event]
writeback:writeback_mark_inode_dirty [Tracepoint event]
writeback:writeback_nowork [Tracepoint event]
writeback:writeback_pages_written [Tracepoint event]
writeback:writeback_queue [Tracepoint event]
writeback:writeback_queue_io [Tracepoint event]
writeback:writeback_sb_inodes_requeue [Tracepoint event]
writeback:writeback_single_inode [Tracepoint event]
writeback:writeback_single_inode_start [Tracepoint event]
writeback:writeback_start [Tracepoint event]
writeback:writeback_wait [Tracepoint event]
writeback:writeback_wait_iff_congested [Tracepoint event]
writeback:writeback_wake_background [Tracepoint event]
writeback:writeback_write_inode [Tracepoint event]
writeback:writeback_write_inode_start [Tracepoint event]
writeback:writeback_written [Tracepoint event]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Make-wifi-fast] a bit of profiling on the archer
2016-11-19 18:30 ` Dave Taht
@ 2016-11-21 16:24 ` Jesper Dangaard Brouer
2016-11-21 16:36 ` Jonathan Morton
0 siblings, 1 reply; 10+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-21 16:24 UTC (permalink / raw)
To: Dave Taht; +Cc: make-wifi-fast, brouer
On Sat, 19 Nov 2016 10:30:10 -0800
Dave Taht <dave.taht@gmail.com> wrote:
> Jesper/eric:
>
> If you have any satisfying network oneliners and stats to monitor via
> perf along the lines of
>
> http://www.brendangregg.com/perf.html#OneLiners
>
> it would be helpful longer term.
Good point... I'll think where we can easily publish such oneliners.
> The mips platforms have sprouted more events than it ever had before
> (well, until last year, perf didn't work at all):
>
> Of these besides the cpu-cycles thing, the only stuff I've looked at
> are various wake_tx_queue related events, and I have a scar involved
> in unaligned_accesses (but we're not doing any, soo)
>
> branch-instructions OR branches [Hardware event]
> branch-misses [Hardware event]
> cpu-cycles OR cycles [Hardware event]
> instructions [Hardware event]
I'm very happy to see this is HW events in this platform! :-)
Guess, I should buy one of these ;-)
This is the Archer? which is what kind of CPU?
> alignment-faults [Software event]
> bpf-output [Software event]
> context-switches OR cs [Software event]
> cpu-clock [Software event]
> cpu-migrations OR migrations [Software event]
> dummy [Software event]
> emulation-faults [Software event]
> major-faults [Software event]
> minor-faults [Software event]
> page-faults OR faults [Software event]
> task-clock [Software event]
> L1-dcache-load-misses [Hardware cache event]
> L1-dcache-loads [Hardware cache event]
> L1-dcache-store-misses [Hardware cache event]
> L1-dcache-stores [Hardware cache event]
> L1-icache-load-misses [Hardware cache event]
> L1-icache-loads [Hardware cache event]
> L1-icache-prefetches [Hardware cache event]
The icache HW events are actually quite interesting. And three of
them, on my Intel Skylake CPU I only have "L1-icache-load-misses".
> LLC-load-misses [Hardware cache event]
> LLC-loads [Hardware cache event]
> LLC-store-misses [Hardware cache event]
> LLC-stores [Hardware cache event]
LLC = Last Level Cache.
Do you have any info on what the cache layout and sizes of this CPU is?
LLC indicate it might have a L2 cache?
> branch-load-misses [Hardware cache event]
> branch-loads [Hardware cache event]
> iTLB-load-misses [Hardware cache event]
> iTLB-loads [Hardware cache event]
> rNNN [Raw hardware event descriptor]
> cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
> mem:<addr>[/len][:access] [Hardware breakpoint]
> ath10k:ath10k_htt_pktlog [Tracepoint event]
> ath10k:ath10k_htt_rx_desc [Tracepoint event]
You can use tracepoints if you want to find/analyse very specific
issue, but not so useful for general perf monitoring.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 10+ messages in thread