HA strategies for Mobicents

HA strategies for Mobicents

Peer to Peer mode with DIST with L1 cache

Data-grid HA model

Generalized model for converged applications

Associating the model with real applications

[1] Cost of ANY such events occurring on N different nodes instead of single node

[2] Impact in real applications

SIP Dialog based affinity

Event type 5

HTTP requests

Server-side load balancing

Peer to Peer mode with DIST with L1 cache

The following diagram represents the generalized model for Infinispan cluster in peer to peer mode. The container and the application will be reading and writing data to sessions stored in several layers of cache structures, each with its own independent concurrency mechanisms.

Description: The application container or the apps will need to access sessions, dialogs and other objects in the distributed store. Each container will maintain a quick-lookup local cache of deserialized objects. In Sip Servlets that is implemented for sessions. If that fails (the session needed by the app is not in the local cache), then the container will look it up in infinispan. Each infinispan instance has a data store with items hashed to the specific node (DIST-assigned serialized data). This is where the majority of the memory is consumed. However the data in this store is randomly distributed based on hash values and is not cached based on LRU or other common caching rule. To help with that infinispan allows enabling L1 cache which will enable LRU policy on commonly accessed items that are missing from the DIST-assigned store.

In this diagram the flow of a the read is shown propagating through the stages in the cache conditionally depending on whether we have the data locally somewhere. The writes are much more obvious since everything must be write-through. Each node has container logic that can read, write and subscribe to events in the cache. Such async cache notifications are important if you need to subscribe for modifications of certain cache items to be flagged out of date.

Generally SLC, SD and SL1 (the sizes of the corresponding caches) will be finitely sized independently based on the overall architecture and subject to LRU rules to avoid running out of memory.

The large majority of the memory in this case is in the DIST-assigned store with size SD. If nodes start failing, the remaining nodes must be able to grow their own DIST store in order to accommodate the backup data from the dead nodes. The SLC and SL1 on the other side don’t need to grow, but their purpose is to populate their limited spots with frequently accessed items that are note likely to be outdated on a remote node.

Data-grid HA model

In this model the data is primarily stored in a remote data-grid by partitioning it based on some key value. The communication with the grid is done only via the network with protocols such as hot rod, memcached or thrift.

In this model caching deserialized data would be very difficult if concurrent changes are allowed. Protocols for network lookup and transport  in grid systems rarely allow delivery of async events to the clients (in this case the Local Node is the client for the grid). So we will never be eagerly notified of some session has changed so that we know if we can use the cached deserialized local copy. L1 cache for serialized chunks however may have a function to check if the data is up to date which still requires at least one network request by itself even without locking the data.

Each grid protocol supports different operations and features that can be used to optimize the network access and support cache more efficiently.

Local cache invalidation is more difficult in this model as it is not guaranteed that the HA grid architecture allows notifications to keep the caches up to date with any remote changes.

Generalized model for converged applications

This model shows the basic operations that can occur arbitrarily in converged applications and containers must guarantee correct execution of the operations under any conditions in the cluster or if this is not possible containers must identify and log the problem that prevents it from executing correctly giving the applications a chance to recover.

In this model we recognize several sources of asynchronous events that can occur spontaneously initiated by the applications or by remote SIP or HTTP entities:

  1. Server transactions in a dialog initiated by external client
  2. Client transactions in a dialog initiated by external client
  3. Server transactions in a dialog initiated by the application
  4. Client transactions in a dialog initiated by the application
  5. Server transaction in a dialog initiated by external client that needs R/W access to pre-existing session
  6. HTTP request that needs R/W access to pre-existing session
  7. Timer Callback initiated by an application
  8. Async API invocation by an application that needs R/W access to pre-existing session

The difference between dialogs initiated by external source (1, 2, 5) and the application (3, 4) is that dialogs initiated by application server can easily be stamped with hints in the from tag, Call-ID, Record-Route and Via headers. Such hints can later be used by load balancers or the application servers themselves to route the request to the correct node and the correct application session. Such hints can also be used as a key to lock the resources likely to be touched by the request processing logic inside the application.

Each of these events will need to read and write to a shared session object. If these events occur on the same machine then all that is needed is serializing the access to the shared session. However if these events occur on different server nodes then the access must be coordinated across the cluster.

To ensure correct behaviour it is best to employ some form of converged load balancing that lands related SIP and HTTP requests to the same node and also coordinates server-side operations such as timers or async API invocations related to the same session to be on the same node.

Because this kind of coordination can not be achieved easily, the application servers must be able to handle rare situations that require cluster coordination. Such coordination doesn’t have to be efficient if it occurs rarely without massive spike in the consumed resources.

Associating the model with real applications

[1] Cost of ANY such events occurring on N different nodes instead of single node

The cost is the following:

[2] Impact in real applications

Type

Frequency/Popularity

Easy to avoid

Difficulty to coordinate

1

High

No

Low

2

High

No

Low

3

High

No

Low

4

High

No

Low

5

Very low

Somewhat

High

6

Low

No

Medium

7

Medium

No

Medium

8

Very Low

Yes (always can be done with timer)

Medium

SIP Dialog based affinity

If there is a stable SIP dialog-based affinity such as Call-ID-based or tag-based stickiness in the load balancer then the probability of all dialogs to land at the same node is  where K is the number of dialogs and N is the number of nodes. This probability is quite low for any reasonable N and K and if we assume the worst case 0, then all costs from [1] apply.

This kind of affinity ensures that transactions of type 1 will always land on the same node. By natural SIP means transaction of type 2 will also always be initiated on the same node. Although transaction of type 3 and 4 are similar they have a different dialog and will land together but not necessarily on the same node as 1 and 2.

SIP Dialog based affinity with Record-Route, Call-ID or tag hints

In this case types 1, 2, 3 and 4 are all in sync and will land on the same node. Thus costs are vastly reduced for application that only use these types of operations.

Event type 5

This event is more difficult to handle because it comes from the outside with no hints that it needs to access the session shared with 1, 2, 3 and 4. Only the application knows that, thus some application-specific technique must be employed. The most general solution that we can support is to tell that SIP client to dial a specific URI, which is not always practical.

Probably the best solution here would be wrap any code that touches the shared object in an async API invocation sacrificing the efficiency and relying on the cluster capability to do proper server-side load balancing.

Type 5 events are closely related to timers and async API calls, because in most cases it would be sufficient to simply execute the event logic in an async API task. If this task is assigned correctly to the node where it belongs then there will be no need to further synchronization.

One problem with this idea however is that you must guarantee that no shared state is touched outside the async API tasks, which is something up to the application to ensure. The container has no way to figure this out.

HTTP requests

In HTTP it is common to have a specific URI or a cookie per user that is logged in the system, thus a hint can be inserted or the browser can be redirected to the correct node with standard HTTP and HTML techniques.

Server-side load balancing

Timers and async access APIs are pieces of logic that can be executed at some later time before or after the node they were originally scheduled at has died. In abstract terms these events are tasks that are assigned to certain set of resources such as sessions. After a node dies these tasks must be migrated to a new node with precisely the same rules used to failover the SIP and HTTP sessions. Any failure to do so will require cluster-wise coordination effort to synchronize the access to the shared data from [1].

All async API tasks can be implemented as timers with delay zero and no repetition, so the notion of async API tasks is redundant, but it still plays an important role as a building block separately.