Peer to Peer mode with DIST with L1 cache
Generalized model for converged applications
Associating the model with real applications
[1] Cost of ANY such events occurring on N different nodes instead of single node
[2] Impact in real applications
The following diagram represents the generalized model for Infinispan cluster in peer to peer mode. The container and the application will be reading and writing data to sessions stored in several layers of cache structures, each with its own independent concurrency mechanisms.
|
In this diagram the flow of a the read is shown propagating through the stages in the cache conditionally depending on whether we have the data locally somewhere. The writes are much more obvious since everything must be write-through. Each node has container logic that can read, write and subscribe to events in the cache. Such async cache notifications are important if you need to subscribe for modifications of certain cache items to be flagged out of date.
Generally SLC, SD and SL1 (the sizes of the corresponding caches) will be finitely sized independently based on the overall architecture and subject to LRU rules to avoid running out of memory.
The large majority of the memory in this case is in the DIST-assigned store with size SD. If nodes start failing, the remaining nodes must be able to grow their own DIST store in order to accommodate the backup data from the dead nodes. The SLC and SL1 on the other side don’t need to grow, but their purpose is to populate their limited spots with frequently accessed items that are note likely to be outdated on a remote node.
In this model the data is primarily stored in a remote data-grid by partitioning it based on some key value. The communication with the grid is done only via the network with protocols such as hot rod, memcached or thrift.
In this model caching deserialized data would be very difficult if concurrent changes are allowed. Protocols for network lookup and transport  in grid systems rarely allow delivery of async events to the clients (in this case the Local Node is the client for the grid). So we will never be eagerly notified of some session has changed so that we know if we can use the cached deserialized local copy. L1 cache for serialized chunks however may have a function to check if the data is up to date which still requires at least one network request by itself even without locking the data.
Each grid protocol supports different operations and features that can be used to optimize the network access and support cache more efficiently.
Local cache invalidation is more difficult in this model as it is not guaranteed that the HA grid architecture allows notifications to keep the caches up to date with any remote changes.
This model shows the basic operations that can occur arbitrarily in converged applications and containers must guarantee correct execution of the operations under any conditions in the cluster or if this is not possible containers must identify and log the problem that prevents it from executing correctly giving the applications a chance to recover.
In this model we recognize several sources of asynchronous events that can occur spontaneously initiated by the applications or by remote SIP or HTTP entities:
|
Each of these events will need to read and write to a shared session object. If these events occur on the same machine then all that is needed is serializing the access to the shared session. However if these events occur on different server nodes then the access must be coordinated across the cluster.
To ensure correct behaviour it is best to employ some form of converged load balancing that lands related SIP and HTTP requests to the same node and also coordinates server-side operations such as timers or async API invocations related to the same session to be on the same node.
Because this kind of coordination can not be achieved easily, the application servers must be able to handle rare situations that require cluster coordination. Such coordination doesn’t have to be efficient if it occurs rarely without massive spike in the consumed resources.
The cost is the following:
Type | Frequency/Popularity | Easy to avoid | Difficulty to coordinate |
1 | High | No | Low |
2 | High | No | Low |
3 | High | No | Low |
4 | High | No | Low |
5 | Very low | Somewhat | High |
6 | Low | No | Medium |
7 | Medium | No | Medium |
8 | Very Low | Yes (always can be done with timer) | Medium |
If there is a stable SIP dialog-based affinity such as Call-ID-based or tag-based stickiness in the load balancer then the probability of all dialogs to land at the same node is  where K is the number of dialogs and N is the number of nodes. This probability is quite low for any reasonable N and K and if we assume the worst case 0, then all costs from [1] apply.
This kind of affinity ensures that transactions of type 1 will always land on the same node. By natural SIP means transaction of type 2 will also always be initiated on the same node. Although transaction of type 3 and 4 are similar they have a different dialog and will land together but not necessarily on the same node as 1 and 2.
SIP Dialog based affinity with Record-Route, Call-ID or tag hints
In this case types 1, 2, 3 and 4 are all in sync and will land on the same node. Thus costs are vastly reduced for application that only use these types of operations.
This event is more difficult to handle because it comes from the outside with no hints that it needs to access the session shared with 1, 2, 3 and 4. Only the application knows that, thus some application-specific technique must be employed. The most general solution that we can support is to tell that SIP client to dial a specific URI, which is not always practical.
Probably the best solution here would be wrap any code that touches the shared object in an async API invocation sacrificing the efficiency and relying on the cluster capability to do proper server-side load balancing.
Type 5 events are closely related to timers and async API calls, because in most cases it would be sufficient to simply execute the event logic in an async API task. If this task is assigned correctly to the node where it belongs then there will be no need to further synchronization.
One problem with this idea however is that you must guarantee that no shared state is touched outside the async API tasks, which is something up to the application to ensure. The container has no way to figure this out.
In HTTP it is common to have a specific URI or a cookie per user that is logged in the system, thus a hint can be inserted or the browser can be redirected to the correct node with standard HTTP and HTML techniques.
Timers and async access APIs are pieces of logic that can be executed at some later time before or after the node they were originally scheduled at has died. In abstract terms these events are tasks that are assigned to certain set of resources such as sessions. After a node dies these tasks must be migrated to a new node with precisely the same rules used to failover the SIP and HTTP sessions. Any failure to do so will require cluster-wise coordination effort to synchronize the access to the shared data from [1].
All async API tasks can be implemented as timers with delay zero and no repetition, so the notion of async API tasks is redundant, but it still plays an important role as a building block separately.