Ada 202X Lightweight Parallelism and OpenMP
Tucker Taft
Ada Rap Group October 2019
based on earlier presentations from
Ada-Europe 2014 update,
Ada-Europe 2019 “20-20 view”
and others
Graphics by Raphaël and Tuck
1
Outline
2
Ada 202X High-Level Story
3
Adding Support for Parallel Programming
4
Concurrent programming
�Multiple workers
Multiple computations
May need to synchronize across workers
Parallel programming
Many workers
One large computation
Synchronization only for work split/join
Adding Support for Parallel Programming
5
Concurrent programming
Parallel programming
Ada 202X Parallel Programming Goals
6
A reminder why this is important…
The Right Turn in Single-Processor Performance (14 years ago)
Courtesy IEEE Computer, January 2011, page 33.
Safe Parallel Ada 7
Parallel Loops (202X)
8
parallel (2*Num_CPUs) -- specify max level of parallelism
for I in 1 .. 1_000 loop
A(I) := B(I) + C(I);
end loop;
parallel for Elem of Arr loop
Elem := Elem * 2;
end loop;
Parallel Block (202X)
parallel do
handled_sequence_of_statements
{and
handled_sequence_of_statements}
end do;
From Ada 202x draft manual:
Each handled_sequence_of_statements represents a separate logical thread of control that proceeds independently and concurrently. The parallel_block_statement is complete once every one of the handled_sequence_of_statements has completed, either by reaching the end of its execution, or due to a transfer of control out of the construct by one of the handled_sequence_of_statements (see 5.1).
Map/Reduce Iterators (202X)
10
��-- A reduction expression to calculate the sum of elements of an array�Result : Integer := [for Element of Arr => Element]’Reduce(“+”, 0);
-- A reduction expression to create an unbounded string �-- containing the alphabet�Alphabet : Unbounded_String� := [for Letter in 'A' .. 'Z' => Letter]’Reduce(“&”, Null_Unbounded_String, “&”);
-- A reduction expression to determine how many �-- people in a database are 30-something�ThirtySomethings : constant Natural � := [for P of Personnel => (if Age(P) > 30 then 1 else 0)]’Reduce(“+”, 0);
Global contracts from SPARK (202X)�used for data race detection
Global => in out all -- default for non-pure pkgs
Global => null -- default for pure packages
-- Explicitly identified globals with modes
Global => (in P1.A, P2.B,
in out P1.C,
out P1.D, P2.E)
-- Pkg data, access collection, task/protected/atomic
Global => in out private of P3 -- pkg P3 data
Global => synchronized in out all -- prot/atomic
Nonblocking contract�used for deadlock detection
-- apply to one subprogram
procedure Suspend_Until_True
(S : in out Suspension_Object)
with Nonblocking => False;
-- apply to an entire package
package Ada.Characters.Handling
with Nonblocking => True is …
Ada 202x Syntactic Building Blocks for Parallelism
13
Ada 202X Building Blocks -- Iterators
14
Ada 202X Building Blocks -- Filters
15
when Name(Name’First) /= “_” loop
Put_Line (Name & “ => “ & Value);
end loop;
Ada 202X Building Blocks -- “parallel”
16
for I in Arr’Range loop
. . . -- possibly do other stuff
Partial_Sum(Chunk) := @ + Arr(I); -- accumulator for each chunk
end loop;
return Partial_Sum’Reduce(“+”, 0.0); -- final reduction
Ada 202x uses “building block” approach
17
“parallel”
with chunks
Iterators
Filters
Quantified expression
Reduction expression
Aggregate
Loop body
Mapping Ada 202X to OpenMP & friends
18
Mapping Ada 202X to OpenMP & friends
19
Mapping Ada 202X to OpenMP & friends
20
Possible layering of Ada 202X light-weight parallelism support
21
Example to illustrate layering
with System.LWT.OpenMP; use System.LWT;
procedure Main is
Control : OpenMP.Scheduler_Control :=
OpenMP.Options (Option1 => …, Option2 => … );
Arr : Flt_Array (1 .. 1_000_000) := …
Partial_Sums : Flt_Array (1 .. Open_MP.Num_Core * 2) :=
(others => 0.0);
begin
parallel (Chunk in Partial_Sums’Range)
for I in Arr’Range loop
Partial_Sums (Chunk) := @ + Arr (I) ** 2;
end loop;
Put_Line (“Total SoS = “ & Partial_Sums’Reduce(“+”, 0.0)’Image);
end Main;
22
with System.LWT.OpenMP; use System.LWT;
with Ada.Parallelism;
procedure Main is
Control : OpenMP.Scheduler :=
OpenMP.Options (Option1 => …, Option2 => … );
Arr : Flt_Array (1 .. 1_000_000) := …
Partial_Sums : Flt_Array (1 .. Open_MP.Num_Core * 2) :=
(others => 0.0);
begin
pragma Par_Loop (Partial_Sums’Length);
for I in Arr’Range loop
Partial_Sums (Ada.Parallelism.Chunk_Index) := @ + Arr (I) ** 2;
end loop;
Put_Line (“Total SoS = “ & Partial_Sums’Reduce(“+”, 0.0)’Image);
end Main;
23
2) Translate to calls
with System.LWT.OpenMP; use System.LWT;
with Ada.Parallelism; use Ada.Parallelism;
procedure Main is
…
procedure Loop_Body
(Low, High : Longest_Integer; Chunk_Index : Positive) is
begin
for I in Integer’Val (Low) .. Integer’Val (High) loop
Partial_Sums (Chunk_Index) := @ + Arr (I) ** 2;
end loop;
end Loop_Body;
begin
Par_Range_Loop (Integer’Pos (Arr’First), Integer’Pos (Arr’Last),
Num_Chunks => Partial_Sums’Length, Loop_Body => Loop_Body’Access);
Put_Line (...);
end Main;
24
3) Ada.Parallelism calls System.LWT
procedure Par_Range_Loop
(Low, High : Longest_Integer; Num_Chunks : Positive;
Loop_Body : access procedure
(Low, High : Longest_Integer; Chunk_Index : Positive)) is
Master : LWT.Thread_Master;
...
begin -- Par_Range_Loop
-- Spawn first thread for first chunk
LWT.Spawn_Thread (Master, new Thread_Data_Extension’
(LWT.Root_Data with
Low => Low,
High => Low + (High-Low+1) / Num_Chunks - 1,
Chunk_Index => 1));
-- Wait for all chunks to complete
LWT.Wait_For_Master (Master);
end Par_Range_Loop;
25
4) System.LWT dispatches to Scheduler
package System.LWT is
type Thread_Scheduler is abstract tagged limited null record;
…
private
procedure Spawn_Thread (S : in out Scheduler;
Master : in out Thread_Master; Data : Thread_Data_Ptr) is abstract;
procedure Wait_For_Thread (S : in out Scheduler;
Master : in out Thead_Master) is abstract;
…
end System.LWT;
package body System.LWT is
Scheduler : access Thread_Scheduler’Class := …; -- Installed thread scheduler
procedure Spawn_Thread
(Master : in out Thread_Master; Data : Thread_Data_Ptr) is
begin
Scheduler.Spawn_Thread (Master, Data); -- dispatch to installed scheduler
end Spawn_Thread;
...
end System.LWT;
26
5) System.LWT.OpenMP handles calls
package System.LWT.OpenMP is
type Scheduler is new LWT.Thread_Scheduler with record
Option1 : …
Option2 : …
end record;
private
overriding procedure Spawn_Thread
(S : in out Scheduler; Master : in out Thread_Master; Data : Thread_Data_Ptr);
overriding procedure Wait_For_Threads
(S : in out Scheduler; Master : in out Thread_Master);
…
end System.LWT.OpenMP;
package body System.LWT.OpenMP is
… -- Implementation of Spawn_Thread, Wait_For_Threads, etc. in terms of OpenMP
end System.LWT.OpenMP;
27
Proposed Ada 202X Conflict Checking
28