[Make-wifi-fast] a bit of profiling on the archer
Dave Taht
dave.taht at gmail.com
Sat Nov 19 12:33:51 EST 2016
On Thu, Nov 17, 2016 at 11:55 PM, Jesper Dangaard Brouer
<brouer at redhat.com> wrote:
>
> On Thu, 17 Nov 2016 20:14:49 -0800 Dave Taht <dave.taht at gmail.com> wrote:
>
>> I have not been profiling much on lower end platforms (it's hard, you
>> can crash a box pretty easily with the wrong options or sample rates).
>
> I'm happy to hear that perf does work on this lower end HW, although
> the disclaimer of sample rates.
>
> Does anyone know if hardware based PMU (Performance Monitor Units)
> exists for these kind of devices?
>
>> While watching the ath10k peak at 150-200mbits, at 99% of cpu in
>> softirq, I spent a bit of time profiling various counters and
>> statistics.
>>
>> for this one (while downloading 12 flows at the same time via flent)
>>
>> perf record -F 99 -e cpu-clock -ag -- sleep 10
>> perf report
>
> Below perf report is not well suite for email, could you instead
> provide output from command below:
>
> perf report --no-children --stdio --call-graph none
Thanks! That is way more readable output. Appended at the end of this post.
>
>>
>> [[31m 67.81%[[m 0.00% ksoftirqd/0 [kernel.kallsyms] [k]
>> run_ksoftirqd
>> |
>> ---run_ksoftirqd
>> |
>> |[[31m--67.61%-- [[m __do_softirq
>> | |
>> | |[[31m--66.80%-- [[m net_rx_action
>> | | |
>> | | |[[31m--41.07%-- [[m ag71xx_poll
>>
>> ...
>
> (Looks like you managed to copy-paste the terminal escape codes for
> colors)
Not sure what terminal emulation perf is expecting. TERM is set to XTERM.
>>
>> It appears we're spending 66% of the time in the *ethernet* portion of
>> the path.
>
> Be careful you don't fool yourself. In your output you have the
> "children" mode on, so everything being called "under" net_rx_action is
> summed up. It could be it goes all the way through to the wifi TX
> parts and that is part of the sum. Even the memory allocations gets
> summed into this 66% number.
>
>
>> I'm going to stop worrying so much about the performance of the new
>> wifi algorithms.
for at least 24 hours
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 990 of event 'cpu-clock'
# Event count (approx.): 9999999900
#
# Overhead Command Shared Object Symbol
# ........ ........... ......................
.........................................
#
4.34% ksoftirqd/0 [ebtables] [k] ebt_do_table
2.93% ksoftirqd/0 [kernel.kallsyms] [k]
__netif_receive_skb_core
2.83% ksoftirqd/0 [ip_tables] [k] ipt_do_table
2.53% ksoftirqd/0 [kernel.kallsyms] [k] nf_iterate
2.42% ksoftirqd/0 [mac80211] [k]
ieee80211_tx_dequeue
2.02% ksoftirqd/0 [kernel.kallsyms] [k]
__local_bh_enable_ip
2.02% ksoftirqd/0 [mac80211] [k]
__ieee80211_subif_start_xmit
1.82% ksoftirqd/0 [kernel.kallsyms] [k] __dev_queue_xmit
1.72% ksoftirqd/0 [nf_conntrack] [k] tcp_packet
1.52% ksoftirqd/0 [mac80211] [k]
ieee80211_prepare_and_rx_handle
1.52% ksoftirqd/0 [mac80211] [k]
ieee80211_rx_handlers
1.41% ksoftirqd/0 [kernel.kallsyms] [k] br_handle_frame
1.41% ksoftirqd/0 [kernel.kallsyms] [k] ip_finish_output2
1.41% ksoftirqd/0 [kernel.kallsyms] [k] ip_forward
1.41% ksoftirqd/0 [nf_conntrack] [k]
__nf_conntrack_find_get
1.31% ksoftirqd/0 [kernel.kallsyms] [k] format_decode
1.31% ksoftirqd/0 [mac80211] [k]
ieee80211_xmit_fast_finish
1.21% ksoftirqd/0 [mac80211] [k]
ieee80211_tx_h_select_key
1.11% ksoftirqd/0 [kernel.kallsyms] [k]
__copy_user_common
1.11% ksoftirqd/0 [mac80211] [k]
ieee80211_queue_skb
1.01% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_txrx_compl_task
1.01% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx.isra.35
1.01% ksoftirqd/0 [kernel.kallsyms] [k]
dev_hard_start_xmit
1.01% ksoftirqd/0 [kernel.kallsyms] [k] eth_type_trans
1.01% ksoftirqd/0 [kernel.kallsyms] [k]
netif_skb_features
1.01% ksoftirqd/0 [nf_conntrack_rtcache] [k]
nf_rtcache_in.part.2
0.91% ksoftirqd/0 [ath10k_pci] [k]
ath10k_pci_hif_tx_sg
0.91% ksoftirqd/0 [kernel.kallsyms] [k] ip_rcv
0.91% ksoftirqd/0 [kernel.kallsyms] [k] memcmp
0.91% ksoftirqd/0 [mac80211] [k] ieee80211_rx_napi
0.81% ksoftirqd/0 [ath10k_core] [k] ath10k_dbg
0.81% ksoftirqd/0 [ath10k_core] [k] ath10k_htt_tx
0.81% ksoftirqd/0 [ath10k_pci] [k]
ath10k_bus_pci_write32
0.81% ksoftirqd/0 [br_netfilter] [k] br_nf_pre_routing
0.81% ksoftirqd/0 [ebtable_broute] [k] ebt_broute
0.81% ksoftirqd/0 [kernel.kallsyms] [k] ag71xx_poll
0.81% ksoftirqd/0 [kernel.kallsyms] [k] ip_output
0.81% ksoftirqd/0 [kernel.kallsyms] [k]
skb_get_hash_perturb
0.81% ksoftirqd/0 [mac80211] [k]
sta_info_hash_lookup
0.81% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_helper
0.81% kworker/0:1 [kernel.kallsyms] [k] __delay
0.71% ksoftirqd/0 [kernel.kallsyms] [k] __bzero
0.71% ksoftirqd/0 [kernel.kallsyms] [k] __rmemcpy
0.71% ksoftirqd/0 [kernel.kallsyms] [k]
_find_next_bit.part.0
0.71% ksoftirqd/0 [kernel.kallsyms] [k] ip_rcv_finish
0.71% ksoftirqd/0 [mac80211] [k]
ieee80211_lookup_ra_sta
0.71% ksoftirqd/0 [nf_conntrack] [k]
hash_conntrack_raw
0.71% ksoftirqd/0 [nf_conntrack] [k] nf_conntrack_in
0.71% ksoftirqd/0 [nf_conntrack] [k] tcp_error
0.61% ksoftirqd/0 [br_netfilter] [k]
br_nf_post_routing
0.61% ksoftirqd/0 [cfg80211] [k]
cfg80211_classify8021d
0.61% ksoftirqd/0 [iptable_raw] [k] iptable_raw_hook
0.61% ksoftirqd/0 [kernel.kallsyms] [k]
br_handle_frame_finish
0.61% ksoftirqd/0 [kernel.kallsyms] [k] netdev_pick_tx
0.61% ksoftirqd/0 [kernel.kallsyms] [k]
r4k_dma_cache_wback_inv
0.61% ksoftirqd/0 [kernel.kallsyms] [k]
skb_network_protocol
0.61% ksoftirqd/0 [nf_conntrack_rtcache] [k]
nf_ct_rtcache_find_usable
0.61% ksoftirqd/0 [nf_nat_ipv4] [k] nf_nat_ipv4_fn
0.51% ksoftirqd/0 [ath10k_pci] [k]
ath10k_ce_send_nolock
0.51% ksoftirqd/0 [kernel.kallsyms] [k]
__skb_flow_get_ports
0.51% ksoftirqd/0 [kernel.kallsyms] [k] atomic64_add
0.51% ksoftirqd/0 [kernel.kallsyms] [k]
br_dev_queue_push_xmit
0.51% ksoftirqd/0 [kernel.kallsyms] [k] br_fdb_update
0.51% ksoftirqd/0 [kernel.kallsyms] [k] dst_release
0.51% ksoftirqd/0 [kernel.kallsyms] [k]
idr_get_empty_slot
0.51% ksoftirqd/0 [kernel.kallsyms] [k] r4k_dma_cache_inv
0.51% ksoftirqd/0 [nf_conntrack] [k]
nf_ct_deliver_cached_events
0.51% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_conntrack_in
0.51% ksoftirqd/0 [nf_nat] [k] nf_nat_packet
0.40% ksoftirqd/0 [ath10k_core] [k] ath10k_dbg_dump
0.40% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_op_wake_tx_queue
0.40% ksoftirqd/0 [ath10k_pci] [k]
ath10k_bus_pci_read32
0.40% ksoftirqd/0 [ath10k_pci] [k]
ath10k_ce_completed_send_next_nolock
0.40% ksoftirqd/0 [br_netfilter] [k] ip_sabotage_in
0.40% ksoftirqd/0 [kernel.kallsyms] [k] __br_fdb_get
0.40% ksoftirqd/0 [kernel.kallsyms] [k] __delay
0.40% ksoftirqd/0 [kernel.kallsyms] [k]
__skb_flow_dissect
0.40% ksoftirqd/0 [kernel.kallsyms] [k]
__wake_up_sync_key
0.40% ksoftirqd/0 [kernel.kallsyms] [k] memcpy
0.40% ksoftirqd/0 [kernel.kallsyms] [k] sch_direct_xmit
0.40% ksoftirqd/0 [kernel.kallsyms] [k] skb_release_data
0.40% ksoftirqd/0 [kernel.kallsyms] [k]
validate_xmit_skb.isra.30.part.31
0.40% ksoftirqd/0 [mac80211] [k]
fq_flow_classify.constprop.17
0.40% ksoftirqd/0 [mac80211] [k]
ieee80211_tx_status
0.40% ksoftirqd/0 [nf_conntrack] [k]
__nf_ct_refresh_acct
0.40% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_get_l4proto
0.40% ksoftirqd/0 [nf_conntrack_rtcache] [k]
nf_rtcache_forward
0.40% ksoftirqd/0 [nf_conntrack_rtcache] [k]
nf_rtcache_forward4
0.40% ksoftirqd/0 [nf_nat_ipv4] [k] nf_nat_ipv4_in
0.40% ksoftirqd/0 [nf_nat_ipv4] [k] nf_nat_ipv4_out
0.30% ksoftirqd/0 [ath10k_core] [k]
__ath10k_htt_rx_ring_fill_n
0.30% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_rx_h_deliver
0.30% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_rx_h_mpdu
0.30% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx_push_txq
0.30% ksoftirqd/0 [ath10k_pci] [k]
ath10k_ce_completed_send_next
0.30% ksoftirqd/0 [ath10k_pci] [k]
ath10k_pci_write32
0.30% ksoftirqd/0 [br_netfilter] [k] br_nf_dev_xmit
0.30% ksoftirqd/0 [br_netfilter] [k]
nf_bridge_encap_header_len
0.30% ksoftirqd/0 [cfg80211] [k] ieee80211_hdrlen
0.30% ksoftirqd/0 [kernel.kallsyms] [k] __free_page_frag
0.30% ksoftirqd/0 [kernel.kallsyms] [k]
ag71xx_hard_start_xmit
0.30% ksoftirqd/0 [kernel.kallsyms] [k] br_dev_xmit
0.30% ksoftirqd/0 [kernel.kallsyms] [k] br_pass_frame_up
0.30% ksoftirqd/0 [kernel.kallsyms] [k] build_skb
0.30% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive
0.30% ksoftirqd/0 [kernel.kallsyms] [k] idr_alloc
0.30% ksoftirqd/0 [kernel.kallsyms] [k] inet_gro_receive
0.30% ksoftirqd/0 [kernel.kallsyms] [k] ip_finish_output
0.30% ksoftirqd/0 [kernel.kallsyms] [k] nf_hook_slow
0.30% ksoftirqd/0 [kernel.kallsyms] [k] tcp_gro_receive
0.30% ksoftirqd/0 [mac80211] [k]
codel_dequeue_func
0.30% ksoftirqd/0 [mac80211] [k] fq_flow_dequeue
0.30% ksoftirqd/0 [mac80211] [k]
ieee80211_drop_unencrypted.part.3
0.30% ksoftirqd/0 [mac80211] [k]
ieee80211_get_bssid
0.30% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_confirm
0.30% ksoftirqd/0 [nf_defrag_ipv4] [k]
ipv4_conntrack_defrag
0.20% ksoftirqd/0 [ath10k_core] [k]
ath10k_txrx_tx_unref
0.20% ksoftirqd/0 [ath10k_pci] [k]
ath10k_pci_htt_rx_cb
0.20% ksoftirqd/0 [br_netfilter] [k] br_nf_local_in
0.20% ksoftirqd/0 [ebtable_filter] [k] ebt_out_hook
0.20% ksoftirqd/0 [ebtable_nat] [k] ebt_nat_in
0.20% ksoftirqd/0 [ebtable_nat] [k] ebt_nat_out
0.20% ksoftirqd/0 [iptable_mangle] [k]
iptable_mangle_hook
0.20% ksoftirqd/0 [kernel.kallsyms] [k] __dma_sync
0.20% ksoftirqd/0 [kernel.kallsyms] [k] __free_pages_ok
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
__netif_receive_skb
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
__slab_alloc.isra.13.constprop.17
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
__slab_free.isra.14
0.20% ksoftirqd/0 [kernel.kallsyms] [k] br_deliver
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
finish_task_switch
0.20% ksoftirqd/0 [kernel.kallsyms] [k] ipv4_dst_check
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
is_skb_forwardable
0.20% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_free
0.20% ksoftirqd/0 [kernel.kallsyms] [k] ktime_get
0.20% ksoftirqd/0 [kernel.kallsyms] [k] net_rx_action
0.20% ksoftirqd/0 [kernel.kallsyms] [k]
skb_release_head_state
0.20% ksoftirqd/0 [kernel.kallsyms] [k] vsnprintf
0.20% ksoftirqd/0 [mac80211] [k]
ieee80211_deliver_skb
0.20% ksoftirqd/0 [mac80211] [k]
ieee80211_select_queue
0.20% ksoftirqd/0 [mac80211] [k]
ieee80211_skb_resize
0.20% ksoftirqd/0 [nf_conntrack_rtcache] [k] nf_rtcache_in4
0.10% hostapd [kernel.kallsyms] [k]
__copy_user_common
0.10% hostapd [kernel.kallsyms] [k] core_sys_select
0.10% hostapd [kernel.kallsyms] [k] datagram_poll
0.10% hostapd [kernel.kallsyms] [k] do_select
0.10% hostapd [kernel.kallsyms] [k]
finish_task_switch
0.10% hostapd [kernel.kallsyms] [k] handle_sys
0.10% hostapd [kernel.kallsyms] [k]
hrtimer_try_to_cancel
0.10% hostapd [kernel.kallsyms] [k] netlink_recvmsg
0.10% hostapd [kernel.kallsyms] [k] poll_initwait
0.10% hostapd [kernel.kallsyms] [k]
schedule_hrtimeout_range_clock
0.10% hostapd [kernel.kallsyms] [k] sock_poll
0.10% hostapd [kernel.kallsyms] [k] timespec_add_safe
0.10% hostapd libc.so [.] 0x0000ef74
0.10% hostapd libc.so [.] 0x000288e4
0.10% hostapd libc.so [.] 0x000722fc
0.10% hostapd wpad [.] 0x0001ab97
0.10% hostapd wpad [.] 0x0001b1a1
0.10% hostapd wpad [.] 0x0001cbd9
0.10% hostapd wpad [.] 0x0001cd0d
0.10% hostapd wpad [.] 0x0004c589
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_htc_rx_completion_handler
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_rx_amsdu_allowed.isra.4
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_tx_inc_pending
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_htt_tx_txq_update
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx_frm_has_freq
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx_h_get_txmode.isra.4
0.10% ksoftirqd/0 [ath10k_core] [k]
ath10k_mac_tx_push_pending
0.10% ksoftirqd/0 [ath10k_core] [k] ath10k_process_rx
0.10% ksoftirqd/0 [ath10k_pci] [k]
ath10k_ce_completed_recv_next_nolock
0.10% ksoftirqd/0 [ath9k] [k] ath_rx_tasklet
0.10% ksoftirqd/0 [ath9k_common] [k]
ath9k_cmn_process_rssi
0.10% ksoftirqd/0 [ath9k_hw] [k]
ath9k_hw_numtxpending
0.10% ksoftirqd/0 [cfg80211] [k]
__ieee80211_data_to_8023
0.10% ksoftirqd/0 [cfg80211] [k] cfg80211_rx_mgmt
0.10% ksoftirqd/0 [kernel.kallsyms] [k] __build_skb
0.10% ksoftirqd/0 [kernel.kallsyms] [k] __do_softirq
0.10% ksoftirqd/0 [kernel.kallsyms] [k] ag71xx_tx_packets
0.10% ksoftirqd/0 [kernel.kallsyms] [k] c0_hpt_read
0.10% ksoftirqd/0 [kernel.kallsyms] [k] consume_skb
0.10% ksoftirqd/0 [kernel.kallsyms] [k] idr_mark_full
0.10% ksoftirqd/0 [kernel.kallsyms] [k] idr_remove
0.10% ksoftirqd/0 [kernel.kallsyms] [k] ip_forward_finish
0.10% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_alloc
0.10% ksoftirqd/0 [kernel.kallsyms] [k] memmove
0.10% ksoftirqd/0 [kernel.kallsyms] [k] mod_timer
0.10% ksoftirqd/0 [kernel.kallsyms] [k] number.isra.12
0.10% ksoftirqd/0 [kernel.kallsyms] [k]
passthru_features_check
0.10% ksoftirqd/0 [kernel.kallsyms] [k] skb_copy_bits
0.10% ksoftirqd/0 [kernel.kallsyms] [k] skb_pull
0.10% ksoftirqd/0 [kernel.kallsyms] [k] skb_push
0.10% ksoftirqd/0 [mac80211] [k]
__ieee80211_beacon_add_tim
0.10% ksoftirqd/0 [mac80211] [k]
codel_should_drop.isra.4.constprop.20
0.10% ksoftirqd/0 [mac80211] [k]
ieee80211_report_used_skb
0.10% ksoftirqd/0 [mac80211] [k]
ieee80211_subif_start_xmit
0.10% ksoftirqd/0 [mac80211] [k]
remove_monitor_info
0.10% ksoftirqd/0 [nf_conntrack] [k] nf_ct_get_tuple
0.10% ksoftirqd/0 [nf_conntrack] [k] tcp_get_timeouts
0.10% ksoftirqd/0 [nf_conntrack_ipv4] [k] ipv4_pkt_to_tuple
0.10% ksoftirqd/0 [nf_nat] [k] nf_ct_nat_ext_add
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> Author of http://www.iptv-analyzer.org
> LinkedIn: http://www.linkedin.com/in/brouer
--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
More information about the Make-wifi-fast
mailing list