Review Board 1.7.16


Improve frame_queue handing in chan_iax2

Review Request #212 - Created March 28, 2009 and submitted

Russell Bryant
/trunk
Reviewers
asterisk-dev
Asterisk
While doing some profiling for <http://reviewboard.digium.com/r/205/>, one function that caught my eye was network_thread() in chan_iax2.c.  After the things that I was working on there, it was the next target for analysis and optimization.  I used oprofile's source annotation functionality and found that the loop traversing the frame queue in network_thread() was to blame for the excessive CPU cycle consumption.

The frame_queue in chan_iax2 previously held all frames that either were pending transmission or had been transmitted and are still pending acknowledgment.

In network_thread(), the previous code would go back through the main for loop after reading a single incoming frame or after being signaled because a frame had been queued up for initial transmission.  In each iteration of the loop, it traverses the entire frame queue looking for frames that need to be transmitted.  On a busy server, this could easily be quite a few entries.

This patch is actually quite simple.  The frame_queue has become only a list of frames pending acknowledgment.  Frames that need to be transmitted are queued up to a dedicated transmit thread via the taskprocessor API.

As a result, the code in network_thread() becomes much simpler, as its only job is to read incoming frames.

---------

In addition to the previously described changes, this patch now includes some additional changes to the frame_queue.  Instead of one big frame_queue, now there is a list per call number to further reduce wasted list traversals.  The biggest impact of this change is in socket_process().
I used a similar test that I used in R205 for this.  I sent 15000 IAX2 registrations evenly spaced out over 1 minute.  The server was configured using realtime with a MySQL backend via res_config_odbc.  IAX2 realtime caching was enabled.  I ran the test, used oprofile, and watched CPU usage.

The box has an Intel Core 2 Duo @ 2.33 GHz and 2 GB of RAM.  The noted CPU consumption is out of 200%, since it's a dual core machine.

----------------------------

Asterisk SVN-trunk-r184801 built by russell @ milo on a x86_64 running Linux on 2009-02-12 19:26:12 UTC

Average CPU usage by Asterisk: 10% to 15%

(results below 0.5% omitted)

samples  %        image name               app name                 symbol name
33690    39.8500  chan_iax2.so             chan_iax2.so             network_thread
13263    15.6881  chan_iax2.so             chan_iax2.so             socket_process
6972      8.2468  chan_iax2.so             chan_iax2.so             __attempt_transmit
6618      7.8281  chan_iax2.so             chan_iax2.so             pvt_destructor
6542      7.7382  chan_iax2.so             chan_iax2.so             __find_callno
2037      2.4095  asterisk                 asterisk                 internal_ao2_callback
620       0.7334  asterisk                 asterisk                 sched_time_cmp
602       0.7121  asterisk                 asterisk                 handle_statechange
563       0.6659  [vdso] (tgid:7462 range:0x7fffe31fe000-0x7fffe3200000) asterisk                 (no symbols)
555       0.6565  chan_iax2.so             chan_iax2.so             peer_cmp_cb
518       0.6127  asterisk                 asterisk                 internal_ao2_ref
...

----------------------------

(With only first set of changes)

Asterisk SVN-russell-iax2_transmit_q-r184836-/trunk built by root @ milo on a x86_64 running Linux on 2009-03-29 03:43:13 UTC

Average CPU usage by Asterisk: 5% to 10%

(most results below 0.5% omitted)

samples  %        image name               app name                 symbol name
9290     24.5969  chan_iax2.so             chan_iax2.so             socket_process
6553     17.3502  chan_iax2.so             chan_iax2.so             __find_callno
3630      9.6111  chan_iax2.so             chan_iax2.so             pvt_destructor
2392      6.3332  chan_iax2.so             chan_iax2.so             __attempt_transmit
1946      5.1524  asterisk                 asterisk                 internal_ao2_callback
582       1.5409  asterisk                 asterisk                 handle_statechange
557       1.4748  chan_iax2.so             chan_iax2.so             peer_cmp_cb
528       1.3980  asterisk                 asterisk                 internal_ao2_ref
523       1.3847  [vdso] (tgid:3088 range:0x7fff3c3fe000-0x7fff3c400000) asterisk                 (no symbols)
478       1.2656  asterisk                 asterisk                 sched_time_cmp
416       1.1014  res_config_odbc.so       res_config_odbc.so       realtime_odbc
348       0.9214  chan_iax2.so             chan_iax2.so             register_verify
339       0.8976  chan_iax2.so             chan_iax2.so             build_peer
320       0.8473  chan_iax2.so             chan_iax2.so             .plt
291       0.7705  asterisk                 asterisk                 __bt_defcmp
236       0.6249  asterisk                 asterisk                 _ast_heap_remove
233       0.6169  asterisk                 asterisk                 .plt
221       0.5851  asterisk                 asterisk                 __ao2_iterator_next
216       0.5719  asterisk                 asterisk                 tps_processing_function
204       0.5401  asterisk                 asterisk                 ast_get_channel_tech
193       0.5110  chan_iax2.so             chan_iax2.so             iax2_process_thread
190       0.5031  asterisk                 asterisk                 ast_hashtab_remove_this_object
...
56        0.1483  chan_iax2.so             chan_iax2.so             transmit_frame
...
36        0.0953  chan_iax2.so             chan_iax2.so             network_thread
...

----------------------------

(Now, with the frame_queue changed to an array of lists.)

Asterisk SVN-russell-iax2_transmit_q-r184837-/trunk built by root @ milo on a x86_64 running Linux on 2009-03-29 03:43:13 UTC

Average CPU usage by Asterisk: 3% to 6%

(results below 1% omitted)

samples  %        image name               app name                 symbol name
6413     27.6816  chan_iax2.so             chan_iax2.so             __find_callno
1929      8.3265  asterisk                 asterisk                 internal_ao2_callback
707       3.0518  chan_iax2.so             chan_iax2.so             socket_process
558       2.4086  asterisk                 asterisk                 internal_ao2_ref
550       2.3741  asterisk                 asterisk                 sched_time_cmp
524       2.2618  asterisk                 asterisk                 handle_statechange
522       2.2532  [vdso] (tgid:5835 range:0x7fff071fe000-0x7fff07200000) asterisk                 (no symbols)
516       2.2273  chan_iax2.so             chan_iax2.so             peer_cmp_cb
368       1.5885  res_config_odbc.so       res_config_odbc.so       realtime_odbc
319       1.3770  chan_iax2.so             chan_iax2.so             register_verify
310       1.3381  chan_iax2.so             chan_iax2.so             .plt
302       1.3036  chan_iax2.so             chan_iax2.so             build_peer
288       1.2431  asterisk                 asterisk                 _ast_heap_remove
273       1.1784  asterisk                 asterisk                 __bt_defcmp
268       1.1568  asterisk                 asterisk                 .plt
250       1.0791  asterisk                 asterisk                 __ao2_iterator_next
...
Review request changed
Updated (March 28, 2009, 7:04 p.m.)
  • I used a similar test that I used in R205 for this.  I sent 15000 IAX2 registrations evenly spaced out over 1 minute.  The server was configured using realtime with a MySQL backend via res_config_odbc.  IAX2 realtime caching was enabled.  I ran the test, used oprofile, and watched CPU usage.
    
    The box has an Intel Core 2 Duo @ 2.33 GHz and 2 GB of RAM.  The noted CPU consumption is out of 200%, since it's a dual core machine.
    
    ----------------------------
    
    Asterisk SVN-trunk-r184801 built by russell @ milo on a x86_64 running Linux on 2009-02-12 19:26:12 UTC
    
    Average CPU usage by Asterisk: 10 to 15%
    
    (results below 0.5% omitted)
    
    samples  %        image name               app name                 symbol name
    33690    39.8500  chan_iax2.so             chan_iax2.so             network_thread
    13263    15.6881  chan_iax2.so             chan_iax2.so             socket_process
    6972      8.2468  chan_iax2.so             chan_iax2.so             __attempt_transmit
    6618      7.8281  chan_iax2.so             chan_iax2.so             pvt_destructor
    6542      7.7382  chan_iax2.so             chan_iax2.so             __find_callno
    2037      2.4095  asterisk                 asterisk                 internal_ao2_callback
    620       0.7334  asterisk                 asterisk                 sched_time_cmp
    602       0.7121  asterisk                 asterisk                 handle_statechange
    563       0.6659  [vdso] (tgid:7462 range:0x7fffe31fe000-0x7fffe3200000) asterisk                 (no symbols)
    555       0.6565  chan_iax2.so             chan_iax2.so             peer_cmp_cb
    518       0.6127  asterisk                 asterisk                 internal_ao2_ref
    ...
    
    ----------------------------
    
    Asterisk SVN-russell-iax2_transmit_q-r184836-/trunk built by root @ milo on a x86_64 running Linux on 2009-03-29 03:43:13 UTC
    
    Average CPU usage by Asterisk: 5 to 10%
    
    (most results below 0.5% omitted)
    
    samples  %        image name               app name                 symbol name
    9290     24.5969  chan_iax2.so             chan_iax2.so             socket_process
    6553     17.3502  chan_iax2.so             chan_iax2.so             __find_callno
    3630      9.6111  chan_iax2.so             chan_iax2.so             pvt_destructor
    2392      6.3332  chan_iax2.so             chan_iax2.so             __attempt_transmit
    1946      5.1524  asterisk                 asterisk                 internal_ao2_callback
    582       1.5409  asterisk                 asterisk                 handle_statechange
    557       1.4748  chan_iax2.so             chan_iax2.so             peer_cmp_cb
    528       1.3980  asterisk                 asterisk                 internal_ao2_ref
    523       1.3847  [vdso] (tgid:3088 range:0x7fff3c3fe000-0x7fff3c400000) asterisk                 (no symbols)
    478       1.2656  asterisk                 asterisk                 sched_time_cmp
    416       1.1014  res_config_odbc.so       res_config_odbc.so       realtime_odbc
    348       0.9214  chan_iax2.so             chan_iax2.so             register_verify
    339       0.8976  chan_iax2.so             chan_iax2.so             build_peer
    320       0.8473  chan_iax2.so             chan_iax2.so             .plt
    291       0.7705  asterisk                 asterisk                 __bt_defcmp
    236       0.6249  asterisk                 asterisk                 _ast_heap_remove
    233       0.6169  asterisk                 asterisk                 .plt
    221       0.5851  asterisk                 asterisk                 __ao2_iterator_next
    216       0.5719  asterisk                 asterisk                 tps_processing_function
    204       0.5401  asterisk                 asterisk                 ast_get_channel_tech
    193       0.5110  chan_iax2.so             chan_iax2.so             iax2_process_thread
    190       0.5031  asterisk                 asterisk                 ast_hashtab_remove_this_object
    ...
    56        0.1483  chan_iax2.so             chan_iax2.so             transmit_frame
    ...
    36        0.0953  chan_iax2.so             chan_iax2.so             network_thread
    ...
    
    ----------------------------
    

    I used a similar test that I used in R205 for this.  I sent 15000 IAX2 registrations evenly spaced out over 1 minute.  The server was configured using realtime with a MySQL backend via res_config_odbc.  IAX2 realtime caching was enabled.  I ran the test, used oprofile, and watched CPU usage.
    
    The box has an Intel Core 2 Duo @ 2.33 GHz and 2 GB of RAM.  The noted CPU consumption is out of 200%, since it's a dual core machine.
    
    ----------------------------
    
    Asterisk SVN-trunk-r184801 built by russell @ milo on a x86_64 running Linux on 2009-02-12 19:26:12 UTC
    
    Average CPU usage by Asterisk: 10% to 15%
    
    (results below 0.5% omitted)
    
    samples  %        image name               app name                 symbol name
    33690    39.8500  chan_iax2.so             chan_iax2.so             network_thread
    13263    15.6881  chan_iax2.so             chan_iax2.so             socket_process
    6972      8.2468  chan_iax2.so             chan_iax2.so             __attempt_transmit
    6618      7.8281  chan_iax2.so             chan_iax2.so             pvt_destructor
    6542      7.7382  chan_iax2.so             chan_iax2.so             __find_callno
    2037      2.4095  asterisk                 asterisk                 internal_ao2_callback
    620       0.7334  asterisk                 asterisk                 sched_time_cmp
    602       0.7121  asterisk                 asterisk                 handle_statechange
    563       0.6659  [vdso] (tgid:7462 range:0x7fffe31fe000-0x7fffe3200000) asterisk                 (no symbols)
    555       0.6565  chan_iax2.so             chan_iax2.so             peer_cmp_cb
    518       0.6127  asterisk                 asterisk                 internal_ao2_ref
    ...
    
    ----------------------------
    
    (With only first set of changes)
    
    Asterisk SVN-russell-iax2_transmit_q-r184836-/trunk built by root @ milo on a x86_64 running Linux on 2009-03-29 03:43:13 UTC
    
    Average CPU usage by Asterisk: 5% to 10%
    
    (most results below 0.5% omitted)
    
    samples  %        image name               app name                 symbol name
    9290     24.5969  chan_iax2.so             chan_iax2.so             socket_process
    6553     17.3502  chan_iax2.so             chan_iax2.so             __find_callno
    3630      9.6111  chan_iax2.so             chan_iax2.so             pvt_destructor
    2392      6.3332  chan_iax2.so             chan_iax2.so             __attempt_transmit
    1946      5.1524  asterisk                 asterisk                 internal_ao2_callback
    582       1.5409  asterisk                 asterisk                 handle_statechange
    557       1.4748  chan_iax2.so             chan_iax2.so             peer_cmp_cb
    528       1.3980  asterisk                 asterisk                 internal_ao2_ref
    523       1.3847  [vdso] (tgid:3088 range:0x7fff3c3fe000-0x7fff3c400000) asterisk                 (no symbols)
    478       1.2656  asterisk                 asterisk                 sched_time_cmp
    416       1.1014  res_config_odbc.so       res_config_odbc.so       realtime_odbc
    348       0.9214  chan_iax2.so             chan_iax2.so             register_verify
    339       0.8976  chan_iax2.so             chan_iax2.so             build_peer
    320       0.8473  chan_iax2.so             chan_iax2.so             .plt
    291       0.7705  asterisk                 asterisk                 __bt_defcmp
    236       0.6249  asterisk                 asterisk                 _ast_heap_remove
    233       0.6169  asterisk                 asterisk                 .plt
    221       0.5851  asterisk                 asterisk                 __ao2_iterator_next
    216       0.5719  asterisk                 asterisk                 tps_processing_function
    204       0.5401  asterisk                 asterisk                 ast_get_channel_tech
    193       0.5110  chan_iax2.so             chan_iax2.so             iax2_process_thread
    190       0.5031  asterisk                 asterisk                 ast_hashtab_remove_this_object
    ...
    56        0.1483  chan_iax2.so             chan_iax2.so             transmit_frame
    ...
    36        0.0953  chan_iax2.so             chan_iax2.so             network_thread
    ...
    
    ----------------------------
    
    (Now, with the frame_queue changed to an array of lists.)
    
    Asterisk SVN-russell-iax2_transmit_q-r184837-/trunk built by root @ milo on a x86_64 running Linux on 2009-03-29 03:43:13 UTC
    
    Average CPU usage by Asterisk: 3% to 6%
    
    (results below 1% omitted)
    
    samples  %        image name               app name                 symbol name
    6413     27.6816  chan_iax2.so             chan_iax2.so             __find_callno
    1929      8.3265  asterisk                 asterisk                 internal_ao2_callback
    707       3.0518  chan_iax2.so             chan_iax2.so             socket_process
    558       2.4086  asterisk                 asterisk                 internal_ao2_ref
    550       2.3741  asterisk                 asterisk                 sched_time_cmp
    524       2.2618  asterisk                 asterisk                 handle_statechange
    522       2.2532  [vdso] (tgid:5835 range:0x7fff071fe000-0x7fff07200000) asterisk                 (no symbols)
    516       2.2273  chan_iax2.so             chan_iax2.so             peer_cmp_cb
    368       1.5885  res_config_odbc.so       res_config_odbc.so       realtime_odbc
    319       1.3770  chan_iax2.so             chan_iax2.so             register_verify
    310       1.3381  chan_iax2.so             chan_iax2.so             .plt
    302       1.3036  chan_iax2.so             chan_iax2.so             build_peer
    288       1.2431  asterisk                 asterisk                 _ast_heap_remove
    273       1.1784  asterisk                 asterisk                 __bt_defcmp
    268       1.1568  asterisk                 asterisk                 .plt
    250       1.0791  asterisk                 asterisk                 __ao2_iterator_next
    ...
    
  • While doing some profiling for <http://reviewboard.digium.com/r/205/>, one function that caught my eye was network_thread() in chan_iax2.c.  After the things that I was working on there, it was the next target for analysis and optimization.  I used oprofile's source annotation functionality and found that the loop traversing the frame queue in network_thread() was to blame for the excessive CPU cycle consumption.
    
    The frame_queue in chan_iax2 previously held all frames that either were pending transmission or had been transmitted and are still pending acknowledgment.
    
    In network_thread(), the previous code would go back through the main for loop after reading a single incoming frame or after being signaled because a frame had been queued up for initial transmission.  In each iteration of the loop, it traverses the entire frame queue looking for frames that need to be transmitted.  On a busy server, this could easily be quite a few entries.
    
    This patch is actually quite simple.  The frame_queue has become only a list of frames pending acknowledgment.  Frames that need to be transmitted are queued up to a dedicated transmit thread via the taskprocessor API.
    
    As a result, the code in network_thread() becomes much simpler, as its only job is to read incoming frames.

    While doing some profiling for <http://reviewboard.digium.com/r/205/>, one function that caught my eye was network_thread() in chan_iax2.c.  After the things that I was working on there, it was the next target for analysis and optimization.  I used oprofile's source annotation functionality and found that the loop traversing the frame queue in network_thread() was to blame for the excessive CPU cycle consumption.
    
    The frame_queue in chan_iax2 previously held all frames that either were pending transmission or had been transmitted and are still pending acknowledgment.
    
    In network_thread(), the previous code would go back through the main for loop after reading a single incoming frame or after being signaled because a frame had been queued up for initial transmission.  In each iteration of the loop, it traverses the entire frame queue looking for frames that need to be transmitted.  On a busy server, this could easily be quite a few entries.
    
    This patch is actually quite simple.  The frame_queue has become only a list of frames pending acknowledgment.  Frames that need to be transmitted are queued up to a dedicated transmit thread via the taskprocessor API.
    
    As a result, the code in network_thread() becomes much simpler, as its only job is to read incoming frames.
    
    ---------
    
    In addition to the previously described changes, this patch now includes some additional changes to the frame_queue.  Instead of one big frame_queue, now there is a list per call number to further reduce wasted list traversals.  The biggest impact of this change is in socket_process().
  • changed from Avoid wasteful frame queue traversals in chan_iax2 network_thread() to Improve frame_queue handing in chan_iax2
Added more changes to the patch to further improve frame queue handling beyond just stuff in network_thread().
Posted (March 29, 2009, 6:01 a.m.)

   

  
/trunk/channels/chan_iax2.c (Diff revision 2)
 
 
This is mabe a little bit off-topic

But looking at the perf stats  this loop seems to be next otpimization target.

I think if the pool of free call numbers could be organized as fifo this loop could be much shorter...


  1. You are absolutely right.
    
    I have some ideas, but we should work on it separate from this, since it's unrelated to the frame_queue stuff.
  2. I will note, though, that a FIFO isn't a sufficient replacement.  This code randomizes the choice of a call number from those that are currently available.  This is done for security reasons.  An efficient container of available call numbers is needed, for sure, but we should do it in a way that the next number chosen is sufficiently random.
Ship it!
Posted (March 31, 2009, 5:42 a.m.)
Rockin!

https://reviewboard.asterisk.org/ runs on a server provided by Digium, Inc. and uses bandwidth donated to the open source Asterisk community by API Digital Communications in Huntsville, AL USA.
Please report problems with this site to asteriskteam@digium.com.