Review Board 1.7.16

SIP Re-invite Glare and 491 Madness...

Review Request #213 - Created March 31, 2009 and submitted

David Vossel
PART A:  491 never being deleted from scheduler because ACK ignored. 

This is a very odd situation that results in a dropped call.  I'll try and explain this the best I can. 

A call goes through two asterisk servers A and B.  Both A and B attempt to bridge the calls by issuing a re-Invite to each other at the exact same time.  When this happens both respond with a 491 pending invite... so we have "reinvite glare".  Some precautions have already been taken to recover from this situation, and this is where the issue gets really hairy...

here's what happens

            A --re-Invite--> B   
            A <--re-Invite-- B  
            A ---491-------> B
            A <----491------ B
            A -----ACK-----> B   ACK is ignored because it doesn't match B's pending invite seqno, 491 is never deleted from scheduler
            A <----ACK-----  B   ack is ignored because it doesn't match A's pending invite seqno, 491 is never deleted from scheduler

When the ACK is received by 'A', it is an ACK in response to the 491 'A' sent, but 'A' has no memory of the ACK's seqno because it doesn't match its pending invite seqno.  This is because the 491 is sent in response to a glare invite sent by 'B' while 'A' already had a pending invite sent out (in this case back to 'B').  Since 'A' doesn't know about the ACK's seqno, it is ignored, meaning the 491 is never deleted from the scheduler.  Same thing happens for 'B's side.  The problem is symmetric...

Now the big problem starts

           A ----resends 491---> B
           B <----resends 491--- B
       nothing is processed, no acks are sent in response
           A ----resends 491---> B
           B <----resends 491--- B
       again nothing happenes, no acks are sent in response.
the scheduler keeps resending the 491's for each side because they were never removed.  Within a few seconds the call is dropped because the both sides hit max num retries for the 491 packet. 

Solution:  During a pending invite, if we receive another invite, we send an 491 and hold on to that glare invite's seqno in the "glareinvite" variable for that sip_pvt struct.  When ACK's are received, we first check to see if it is in response to our pending invite, if not we check to see if it is in response to a glare invite.  In this case, it is in response to the glare invite and must be dealt with or the call is dropped. 

PART B:  Re-Invite never sent back out after timer expires. 

When the re-invite glare situation occurs, each side sends the 491 and cancels their current pending invite.  A timer is set for a short random amount of time, and then the re-Invite is sent back out, hopefully not at the same time.  We set the timer, execute a function to set the sip_pvt structs SIP_NEEDREINVITE flag, but never call check_pendings to send the reinvite back out.  

Solution:  Call check_pendings() after setting SIP_NEEDREINVITE flag, add locking to sip_pvt struct since it is called from scheduler.

... This made my brains hurt.  

Made calls, got the re-invite glare situation to occur, watched successful recovery in wireshark.  Call stays up! 
Review request changed
Updated (April 1, 2009, 5:40 a.m.)
update scheduled wait time to meet RFC 3261
Ship it!
Posted (April 1, 2009, 7:05 a.m.)
Excellently done. runs on a server provided by Digium, Inc. and uses bandwidth donated to the open source Asterisk community by API Digital Communications in Huntsville, AL USA.
Please report problems with this site to