HA strategies for Mobicents

Peer to Peer mode with DIST with L1 cache

Data-grid HA model

Generalized model for converged applications

Associating the model with real applications

[1] Cost of ANY such events occurring on N different nodes instead of single node

[2] Impact in real applications

SIP Dialog based affinity

Event type 5

HTTP requests

Server-side load balancing

Peer to Peer mode with DIST with L1 cache

The following diagram represents the generalized model for Infinispan cluster in peer to peer mode. The container and the application will be reading and writing data to sessions stored in several layers of cache structures, each with its own independent concurrency mechanisms.

Description: The application container or the apps will need to access sessions, dialogs and other objects in the distributed store. Each container will maintain a quick-lookup local cache of deserialized objects. In Sip Servlets that is implemented for sessions. If that fails (the session needed by the app is not in the local cache), then the container will look it up in infinispan. Each infinispan instance has a data store with items hashed to the specific node (DIST-assigned serialized data). This is where the majority of the memory is consumed. However the data in this store is randomly distributed based on hash values and is not cached based on LRU or other common caching rule. To help with that infinispan allows enabling L1 cache which will enable LRU policy on commonly accessed items that are missing from the DIST-assigned store.

In this diagram the flow of a the read is shown propagating through the stages in the cache conditionally depending on whether we have the data locally somewhere. The writes are much more obvious since everything must be write-through. Each node has container logic that can read, write and subscribe to events in the cache. Such async cache notifications are important if you need to subscribe for modifications of certain cache items to be flagged out of date.

Generally SLC, SD and SL1 (the sizes of the corresponding caches) will be finitely sized independently based on the overall architecture and subject to LRU rules to avoid running out of memory.

The large majority of the memory in this case is in the DIST-assigned store with size SD. If nodes start failing, the remaining nodes must be able to grow their own DIST store in order to accommodate the backup data from the dead nodes. The SLC and SL1 on the other side don’t need to grow, but their purpose is to populate their limited spots with frequently accessed items that are note likely to be outdated on a remote node.

Data-grid HA model

In this model the data is primarily stored in a remote data-grid by partitioning it based on some key value. The communication with the grid is done only via the network with protocols such as hot rod, memcached or thrift.

In this model caching deserialized data would be very difficult if concurrent changes are allowed. Protocols for network lookup and transport in grid systems rarely allow delivery of async events to the clients (in this case the Local Node is the client for the grid). So we will never be eagerly notified of some session has changed so that we know if we can use the cached deserialized local copy. L1 cache for serialized chunks however may have a function to check if the data is up to date which still requires at least one network request by itself even without locking the data.

Each grid protocol supports different operations and features that can be used to optimize the network access and support cache more efficiently.

Local cache invalidation is more difficult in this model as it is not guaranteed that the HA grid architecture allows notifications to keep the caches up to date with any remote changes.

Generalized model for converged applications

This model shows the basic operations that can occur arbitrarily in converged applications and containers must guarantee correct execution of the operations under any conditions in the cluster or if this is not possible containers must identify and log the problem that prevents it from executing correctly giving the applications a chance to recover.

In this model we recognize several sources of asynchronous events that can occur spontaneously initiated by the applications or by remote SIP or HTTP entities:

Server transactions in a dialog initiated by external client
Client transactions in a dialog initiated by external client
Server transactions in a dialog initiated by the application
Client transactions in a dialog initiated by the application
Server transaction in a dialog initiated by external client that needs R/W access to pre-existing session
HTTP request that needs R/W access to pre-existing session
Timer Callback initiated by an application
Async API invocation by an application that needs R/W access to pre-existing session

The difference between dialogs initiated by external source (1, 2, 5) and the application (3, 4) is that dialogs initiated by application server can easily be stamped with hints in the from tag, Call-ID, Record-Route and Via headers. Such hints can later be used by load balancers or the application servers themselves to route the request to the correct node and the correct application session. Such hints can also be used as a key to lock the resources likely to be touched by the request processing logic inside the application.

Each of these events will need to read and write to a shared session object. If these events occur on the same machine then all that is needed is serializing the access to the shared session. However if these events occur on different server nodes then the access must be coordinated across the cluster.

To ensure correct behaviour it is best to employ some form of converged load balancing that lands related SIP and HTTP requests to the same node and also coordinates server-side operations such as timers or async API invocations related to the same session to be on the same node.

Because this kind of coordination can not be achieved easily, the application servers must be able to handle rare situations that require cluster coordination. Such coordination doesn’t have to be efficient if it occurs rarely without massive spike in the consumed resources.

Associating the model with real applications

An IVR application consists of a SIP phone (SIP client A) dialing the application server and exchanging client and server transactions. Dialog A is initiated by the SIP phone A, so the interaction can be completely described in terms of type 1 and 2 events. In the diagram this dialog would be with Call-ID A.
If at any time during an IVR call the server initiates another dialog (B) to another SIP phone (B) and that second call is associated with the first call, then the events describing the second dialog are type 3 and 4. One such example is B2BUA or call forwarding. The second dialog would be with Call-ID B.
Proxy applications are the same as B2BUA, however dialog A is equal to dialog B and Call-ID A is equal to Call-ID B
If a third SIP client C dials the server and needs to be associated with calls A and B, that would be a type 5 event. Events 1 to 5 represent all possible cases for conferencing applications, chat or any multi-user interaction based on SIP. More specifically this model allows for users dialing in (1, 2, 5) and users being called by the server (3, 4).
Clearly any Web-enabled SIP application would model the HTTP request as type 6 events.
Applications that need timers or out-of-context asynchronous invocations associated with the SIP calls can be modelled as type 7 and type 8 events. E.g. wake up call, timeout-based call forwarding, capacity policy enablers, scheduled callback conferences.

[1] Cost of ANY such events occurring on N different nodes instead of single node

The cost is the following:

Coordinating a lock between the involved N nodes.
Taking up to spots in the deserialized local caches
Taking up to spots in the L1 caches
Up to times more network traffic in reads and writes - each time session is touched it is invalidated on all other nodes, so it will be fetched again on those nodes
Deserialization and delta-serialization will occur statistically for almost every event, because the involved nodes will invalidate each other’s caches.

[2] Impact in real applications

Type	Frequency/Popularity	Easy to avoid	Difficulty to coordinate
1	High	No	Low
2	High	No	Low
3	High	No	Low
4	High	No	Low
5	Very low	Somewhat	High
6	Low	No	Medium
7	Medium	No	Medium
8	Very Low	Yes (always can be done with timer)	Medium

SIP Dialog based affinity

If there is a stable SIP dialog-based affinity such as Call-ID-based or tag-based stickiness in the load balancer then the probability of all dialogs to land at the same node is where K is the number of dialogs and N is the number of nodes. This probability is quite low for any reasonable N and K and if we assume the worst case 0, then all costs from [1] apply.

This kind of affinity ensures that transactions of type 1 will always land on the same node. By natural SIP means transaction of type 2 will also always be initiated on the same node. Although transaction of type 3 and 4 are similar they have a different dialog and will land together but not necessarily on the same node as 1 and 2.

SIP Dialog based affinity with Record-Route, Call-ID or tag hints

In this case types 1, 2, 3 and 4 are all in sync and will land on the same node. Thus costs are vastly reduced for application that only use these types of operations.

Event type 5

This event is more difficult to handle because it comes from the outside with no hints that it needs to access the session shared with 1, 2, 3 and 4. Only the application knows that, thus some application-specific technique must be employed. The most general solution that we can support is to tell that SIP client to dial a specific URI, which is not always practical.

Probably the best solution here would be wrap any code that touches the shared object in an async API invocation sacrificing the efficiency and relying on the cluster capability to do proper server-side load balancing.

Type 5 events are closely related to timers and async API calls, because in most cases it would be sufficient to simply execute the event logic in an async API task. If this task is assigned correctly to the node where it belongs then there will be no need to further synchronization.

One problem with this idea however is that you must guarantee that no shared state is touched outside the async API tasks, which is something up to the application to ensure. The container has no way to figure this out.

HTTP requests

In HTTP it is common to have a specific URI or a cookie per user that is logged in the system, thus a hint can be inserted or the browser can be redirected to the correct node with standard HTTP and HTML techniques.

Server-side load balancing

Timers and async access APIs are pieces of logic that can be executed at some later time before or after the node they were originally scheduled at has died. In abstract terms these events are tasks that are assigned to certain set of resources such as sessions. After a node dies these tasks must be migrated to a new node with precisely the same rules used to failover the SIP and HTTP sessions. Any failure to do so will require cluster-wise coordination effort to synchronize the access to the shared data from [1].

All async API tasks can be implemented as timers with delay zero and no repetition, so the notion of async API tasks is redundant, but it still plays an important role as a building block separately.