Review Board 1.7.16

Add a double secret probation mode for strictrtp to handle directmedia scenarios

Review Request #2364 - Created March 4, 2013 and submitted

Matt Jordan
AST-1124, AST-1125
file, mmichelson
Consider the following scenario:

* Phone 1 connected to Asterisk 1, where the configuration forces the media to go through Asterisk 1
* Asterisk 1 and Asterisk 2 connected, where the configuration will attempt directmedia for any session established between the two
* Phone 2 connected to Asterisk 2, where the configuration will attempt directmedia for any session established between the two
* Both Asterisk 1 and Asterisk 2 have strictrtp enabled (which is the default setting)

        d    d            dm  dm            dm  dm
Phone 1 <----> Asterisk 1 <----> Asterisk 2 <----> Phone 2

If Phone 1 calls Phone 2, the expected media path (once the re-INVITE flurry has finished) should look like:

Phone 1 <--> Asterisk 1 <--> Phone 2

The way in which this happens typically originates with Asterisk 2. Once it receives the 200 OK from Phone 2 and passes it on to Asterisk 1, it will send re-INVITEs to both Asterisk 1 and Phone 2 to negotiate itself out of the media path. While this is happening, however, it continues to pass RTP through from Phone 2 to Asterisk 1, and from Asterisk 1 to Phone 2.

Asterisk is usually fairly quick to re-INVITE itself out of the media path. Testing has shown, however, that phones (for a fair number of models) can take a bit longer to accept the re-INVITE and issue a response. The sequence will usually look something like this:

                    SIP                                         RTP (from Asterisk 2's perspective)
Asterisk 1       Asterisk 2          Phone 2                Asterisk 1      Asterisk 2       Phone 2
                                                                <---------->         <---------->
     INVITE med to Phone 2                                      <---------->         <---------->
   <-----------------  INVITE med to Asterisk 1                 <---------->         <---------->
                     ------------------>                        <---------->         <---------->
     200 OK                                                     <---------->         <---------->
   ------------------>                                          <----------          <----------
                     .                                          <----------          <----------
                     .                                          <----------          <----------
                     .    200 OK                                <----------          <----------
                      <-----------------                              (finally out of the path)

The problem here occurs at Asterisk 1: it receives the re-INVITE notifying that it is going to have a change in the RTP source, and it re-initializes the strictrtp settings accordingly. Unfortunately, because Phone 2 is slow on the uptake, RTP keeps flowing from Phone 2 through Asterisk 2 to Asterisk 1. Asterisk 2 does its duty and forwards things along, which results in Asterisk 1 re-locking the RTP source back onto Asterisk 2. Eventually, Phone 2 wakes up and re-directs its RTP to Asterisk 1 - but by then, strictrtp is closed and Asterisk 1 expects the RTP to come from Asterisk 2, and the RTP packets are rejected.

The problem is, even though Asterisk 1 "knows" it's going to get a new RTP source, due to NAT, it can't know who the source is. It's valid in some scenarios for the new RTP source to look exactly like the old RTP source (if, for example, everything was going through a TURN server) - so it has to simply lock onto the entity sending it RTP once it has a source update. We also can't control how long it takes an endpoint to respond to the re-INVITE - packet loss, if nothing else, would reproduce this scenario.

The solution this patch provides is to implement a 'secret probation' mode for RTP packets received after strictrtp has closed and locked onto an RTP source. This patch does this by doing the following when we've closed on an RTP source:
 * If an RTP packet comes in from a new alternate source, we perform the usual probation mode checking on that RTP source. If we pass probation, we lock onto that as the new RTP source.
 * If an RTP packet comes in from the current source, we reset the counters on the alternate source's probation.

This has the effect of switching to a new, alternate source if the current source stops sending packets and we pass a probation mode check on the alternate source. We will never lock onto an alternate source if the current source keeps sending RTP, as we reset the probation mode counters on the alternate source. This prevents the one-way audio scenario as eventually, Asterisk 2 stops sending RTP and Asterisk 1 will switch over to Phone 2 as the RTP source. It avoids the vulnerability strictrtp was designed to prevent as, once locked on, the RTP source will never change so long as the source continues to send RTP.

Two different scenarios were tested that resulted in this issue. Both involved multiple phones hanging off of multiple Asterisk instances with different combinations of directmedia/no directmedia. This particular problem occurred about 50% of the time, due to different models of phones responding faster/slower to the re-INVITEs.
Review request changed
Updated (March 4, 2013, 3:28 a.m.)
Cleaned up a few logging statements.
Ship it!
Posted (March 4, 2013, 10:05 a.m.)


branches/1.8/res/res_rtp_asterisk.c (Diff revision 2)
I completely forgot about the existence of this in res_rtp_asterisk. The alt_rtp_address was added as a means of dealing with a similar strictrtp problem, except in reverse. Double-secret probation is a more generic solution that would solve both the current problem and the one that the alt_rtp_address was meant to solve. I'd be all for completely removing the alt_rtp_address stuff from the code completely since it's only used in a rather hack-y situation in chan_sip.

To give some background, the alt_rtp_address was introduced because we would receive a reinvite with a new RTP address in the SDP. Unfortunately, due to the timing of the arrival of the reinvite, we had to reply with a 491. The problem is that we would start receiving media from the address in the SDp in the reinvite we received. Since we were locked onto a different address due to strictrtp, we would drop media from the new address. The solution was to let the RTP layer know that it may start receiving media from a new alternate address instead of the one that it was locked on to. If the RTP layer started to receive media from that address, it would immediately make the alt address the new strict address.

With double secret probation code in place, it means that we could actually remove the alt_rtp_address code entirely since the RTP layer would eventually re-train on the proper new remote address.
  1. Since that will introduce an API change and doesn't *have* to be done, I'm going to leave it in place for 11 and remove it in trunk. runs on a server provided by Digium, Inc. and uses bandwidth donated to the open source Asterisk community by API Digital Communications in Huntsville, AL USA.
Please report problems with this site to