Review Board 1.7.16

Improve event API cache performance

Review Request #205 - Created March 24, 2009 and submitted

Russell Bryant
I have learned a lot since I originally wrote event.c a couple of years ago.  Lately, I have been reflecting on some of the performance implications of some of the design decisions that I made.  Then, this past Friday, I did some load testing for unrelated reasons and found that the conversion of device state handling to use the event API in 1.6.1 introduced a significant performance hit for the test that I was doing.

Specifically, the test I was doing at the time was sending many thousands of IAX2 registrations at an Asterisk box.  Asterisk 1.6.0 could handle it just fine.  1.6.1 choked.  The reason was the performance of the event API cache with handling that many entries in the cache.  In 1.6.0, when the devicestate code needed to find out the state of an IAX2 device, it would call into chan_iax2, which could determine the state by looking up the peer in a hash table.  The performance gained by using the hash table was lost in 1.6.1, mostly due to the way the cache was implemented.

Here is a summary of the changes that have been made, in order of both invasiveness and performance impact, from smallest to largest.

1) Asterisk 1.6.1 introduces some additional logic to be able to handle distributed device state.  This functionality comes at a cost.  One relatively minor change in this patch is that the extra processing required for distributed device state is now completely bypassed if it's not needed.

2) One of the things that I noticed when profiling this code was that a _lot_ of time was spent doing string comparisons.  I changed the way strings are represented in an event to include a hash value at the front.  So, before doing a string comparison, we do an integer comparison on the hash.

3) Finally, the code that handles the event cache has been re-written.  I tried to do this in a such a way that it had minimal impact on the API.  I did have to change one API call, though - ast_event_queue_and_cache().  However, the way it works now is nicer, IMO.  Each type of event that can be cached (MWI, device state) has its own hash table and rules for hashing and comparing objects.  This by far made the biggest impact on performance.

To be clear, I am proposing that these changes be merged into 1.6.1, 1.6.2, and trunk.  I consider the performance issues here a regression.
The first set of tests I did were functionality tests.  

For device state, I set up some hints and verified that when I changed the state of the associated device, the hint would change state.  I ran this test with and without the distributed devstate processing code turned on.

For MWI, I set up a mailbox, left a message, and verified that a NOTIFY was sent by chan_sip to my phone assigned to that mailbox.

The second set of tests was performance.  The box I was using for this has an Intel Core 2 Duo CPU @ 2.33 GHz.  It has 2 GB of RAM.  When I mention the CPU usage percentage for the Asterisk process, it is out of 200%, since this is a dual core box.

The test I ran is that I ran a tool which sent 12000 IAX2 registrations to Asterisk, evenly spaced out over 1 minute (one every 5 ms).  IAX2 dynamic realtime was enabled, with caching turned on.

The oprofile results include all functions that were taking up 0.5% of Asterisk's CPU time or higher.


Version: Asterisk SVN-branch-1.6.0-r183915

Average CPU usage of asterisk: 3% to 5%

oprofile results:

samples  %        image name               app name                 symbol name
26088    44.3357  asterisk                 asterisk                 ast_sched_add_variable
8226     13.9798                network_thread
4731      8.0402  asterisk                 asterisk                 _ast_sched_del
4262      7.2431                __find_callno
3785      6.4325                socket_process
1842      3.1304                pvt_destructor
1652      2.8075  [vdso] (tgid:24198 range:0xb7f64000-0xb7f65000) asterisk                 (no symbols)
617       1.0486  asterisk                 asterisk                 cd_cb
523       0.8888  asterisk                 asterisk                 bt_broot
339       0.5761                build_peer
337       0.5727       realtime_odbc
329       0.5591                register_verify


Version: Asterisk SVN-branch-1.6.1-r183916

The CPU usage by Asterisk increased steadily throughout this test and peaked at about 140%.

oprofile results:

samples  %        image name               app name                 symbol name
317713   49.4110  asterisk                 asterisk                 match_ie_val
75913    11.8061  asterisk                 asterisk                 ast_event_queue_and_cache
58659     9.1227  asterisk                 asterisk                 schedule
51157     7.9560  asterisk                 asterisk                 ast_event_get_cached
44859     6.9765  asterisk                 asterisk                 ast_event_dump_cache
31661     4.9240                network_thread
15696     2.4411  asterisk                 asterisk                 .plt
8601      1.3376                socket_process
7524      1.1701                __find_callno
6401      0.9955                pvt_destructor
4282      0.6659                __attempt_transmit


Version: Asterisk SVN-russell-event_performance-r183985-/branches/1.6.1

Average CPU usage of asterisk: 3% to 5%

oprofile results:

samples  %        image name               app name                 symbol name
42952    44.9665  asterisk                 asterisk                 schedule
18687    19.5634                network_thread
6553      6.8603                __find_callno
6444      6.7462                socket_process
3882      4.0641                pvt_destructor
2261      2.3670  [vdso] (tgid:22953 range:0xb7f99000-0xb7f9a000) asterisk                 (no symbols)
1658      1.7358                __attempt_transmit
977       1.0228  asterisk                 asterisk                 __ao2_callback
682       0.7140  asterisk                 asterisk                 __ao2_ref
549       0.5747  asterisk                 asterisk                 bt_broot


Analysis of results:

As you can see, the current code in Asterisk 1.6.1 has some performance issues, and the clear culprit is the event code.  After this set of changes, the event code drops off of th profiling map, and we're back to what we would expect to be the biggest consumers of CPU time during this test.

Additional Commentary:

The scheduler is the next big offender in CPU consumption here.  Luckily, this code has been heavily modified in trunk and 1.6.2 (, so once these changes make their way around, it will be interesting to do this test again in 1.6.2.  After the scheduler, the IAX2 network_thread is the next biggest consumer of CPU.  When looking closer at that code, the offending piece of code is actually the handling of the IAX2 frame cache.  There are some very easy changes we could make to significantly reduce the CPU consumption there, but I'll save that for another day.  :-)
Review request changed
Updated (March 25, 2009, 10:32 a.m.)
Post latest version of the diff. runs on a server provided by Digium, Inc. and uses bandwidth donated to the open source Asterisk community by API Digital Communications in Huntsville, AL USA.
Please report problems with this site to