Automatic Transactional Memory

Andriy Stepanchuk, tef@tentity.com, www.tentity.com

Abstract

In essence, parallel computing is composed of tasks executed in parallel, where a task is a unit of computation. Programmers need the ability to define, start/stop, and coordinate parallel tasks. Unfortunately, the tasks are not the building blocks of a program. A program is composed of functions/methods which are sequential, and, therefore, the program is intrinsically sequential also. This paper proposes a novel programming model with intrinsic parallelism.

Introduction

Given the increasing number of processing elements of computing devices, it becomes apparent the necessity of an intrinsically parallel computational model. Though functional languages are suitable for writing highly parallel programs, many real-life applications are naturally expressed by imperative languages. This paper introduces Automatic Transactional Memory (ATM) that proposes to enhance imperative languages with intrinsic parallelism. Its design is driven to support high performance as well as to enhance programming experience and productivity.

The intrinsic parallelism implies that all functions are always asynchronous; thus, functions become tasks for parallel computing. It would be easy to execute functions in parallel if they only operate local data. In imperative languages, though, functions have the ability to read and modify any memory location, or, in other words, functions have side-effects. Such globally accessible memory considerably complicates extracting parallelism due to data race.

ATM primarily addresses how data in memory are accessed. According to ATM, there is no global/static namespace should be available to create variables that store references to shared memory locations. Instead, ATM requires functions to query memory. Such memory queries represent side-effects of their enclosing functions and allow determining how functions must be executed with regard to each other based on intersections of their side-effects. Generally, functions that have intentions to modify the same data must be executed sequentially, otherwise functions can be executed in parallel.

ATM comprises a memory manager and a function scheduler.

Memory Management

Modern languages provide the ability to dynamically allocate memory to store a program state. Some languages are enhanced with garbage collectors which keep track of allocated objects in memory and automatically release their memory when the objects become unreachable. Modern languages also supply data structures (hash-tables, red-black-trees, etc.) to organize the allocated memory in collections. In essence, ATM is next generation of automatic memory management where memory allocation, garbage collection, and data structures are combined in a one service. The service provides a memory query API to create, read, update, and delete data in memory.

Using a language-independent notation and for the illustration purpose only, the following functions create two instances of of data structure Foo and modify attribute Y:

class Foo {

        int X, index;

        int Y;

}

void A() {

auto q = yield Foo where X = 3

Foo f = q.first();

if ( f is nil ) {

f = q.insert();

f.X = 3;

}

f.Y = 1;

}

void B() {

auto q = yield Foo where X = 3

Foo f = q.first();

if ( f is nil ) {

f = q.insert();

f.X = 3;

}

f.Y = 2;

}

void C() {

auto q = yield Foo where X = 5

Foo f = q.first();

        if ( f is nil ) {

f = q.insert();

f.X = 5;

}

f.Y = 1;

}

void main() {

A();

B();

C();

}

According to ATM, functions A, B, and C are executed asynchronously. The memory manager keeps track of active functions and their queries and can easily detect that functions A and B intend to modify the same data. The memory manager will delay execution of the query of function B until function A is complete or in reverse if function B started before A. Function C is safe for execution in parallel because its query does not intersect with the queries of functions A and B.

Transactions

Programs expressed in imperative languages are said to have a state. As functions are executed the state is altered. Since functions are asynchronous and executed in parallel, they can interfere with one another and lead to invalid results and an invalid state of a program. Therefore, in highly parallel computing, some function executions must be isolated.

According to ATM, functions query a program state instead of using global/static variables. The memory queries naturally represent a snapshot of a program state. Any changes to the snapshot are not visible to other parts of a program until the snapshot owning function is complete and the changes are committed by ATM. If the normal flow of the execution is interrupted, the changes are discarded by ATM. Transactions are aligned from the memory queries to the end of their enclosing functions.

An implementation of ATM can employ Multiversion Concurrency Control (MVCC) or any other suitable concurrency control method to isolate read-only transactions from read-write transactions. Read-write transactions that intend to modify the same data are scheduled sequentially by ATM. For that reason, ATM distinguishes the read-write memory queries from read-only ones. Using a language-independent notation and for the illustration purpose only, the following memory query illustrates a read-only memory query:  

auto q = select Foo where X = 5

, and the following memory query illustrates a read-write memory query:  

auto q = yield Foo where X = 5

Scheduling

Since all functions, according to ATM, are asynchronous, they must be scheduled for its execution. The scheduling comprises two phases. In the first phase a function is simply added to a queue of pending functions. After a function is extracted from the queue, it is executed until its memory query. In the second phase, a memory query is passed to the memory manager. The memory manager forms queues of pending functions which have memory queries with intersected side-effects. The second phase is optional for the read-only memory queries.

Conclusion

The object of Automatic Transactional Memory (ATM) is to define a method for writing automatically parallel programs. ATM makes the following contributions. First, it requires all functions to be asynchronous. Second, it requires functions to use the memory queries to access shared/distributed memory. Third, it schedules asynchronous function executions with regard to each other based on intersections of their memory queries and intentions of the memory queries for read-only or read-write access.

The main advantages of ATM are portability and ease of use. The advantages occur due to the declarative way of memory access via queries. Such high-level abstraction allows delegating efficient parallel execution of a program to compilers and runtime libraries which implement ATM.