Computational Astrophysics
https://computationalastrouam.github.io/aco/
Practice information
01.16.LD.304 (AULAS DE PCs)
Get to know each other
Me: https://weiguangcui.github.io/
Your knowledge about computation language and high-performance computation
What do you like to learn?
The flexibility of the practice course
Date | Contents | Home work | Comments |
27/03/2025 | Setup, Basic programming, intro Mandelbrot | | Have a look and think about the final project |
28/03/2025 | Pointer, dynamic allocated array Intro Kepler | Mandelbrot | IO |
04/04/2025 | structure Handout Mandelbrot | | Project discussion |
11/04/2025 | Useful things: sorting Intro SPH | Kepler | More on openmp? |
25/04/2025 | Useful things: gdb Handout Kepler | | reading/writing? |
09/05/2025 | Perf Handout SPH | SPH | |
23/05/2025 | MPI | | |
30/05/2025 | Final presentation |
Linux system + C++ programming
Confirm the Linux training from Violeta.
Why Linux? It is for efficient work, especially programming and computation.
g++ – compiler
man – helper; man 3 for functions
Editors? nano/vim in terminal; IDE: VS, jetbrain->CLion, Pulsar, …
Linux Basic commands: https://weiguangcui.github.io/Computational_Astrophysics/static_files/linux_info.pdf
Practices:
https://computationalastrouam.github.io/aco/CodingTutorialC++/
A simple c++ program
#include <iostream>
#include <cmath>
#include <cstdint>
#include <cstddef>
#include <cstdlib> // Do we need all of these??
#include <cstring> //!! How about local header file??
int main() { //!! int argc and char ** argv[]
double *a;
int i;
std::cout << "Hello World" << std::endl;
std::cout << "Enter an integer: ";
std::cin >> i; //!! Keyboard input
return i; //!! where does it go?
}
Exec 1: try to write your own program to use argc and argv to give your program a parameter
https://computationalastrouam.github.io/aco/CodingTutorial/HelloWorld.c
More basics things on C++ language
Git version control
Confirm the learning from Violeta
Recommended website: https://git-scm.com/docs/gittutorial
Show the simple commands to use git.
Makefile – a better way to compile program
Why Makefile?
Compiling the source code files can be tiring, especially when you have to include several source files and type the compiling command every time you need to compile. Makefiles are the solution to simplify this task. Makefiles are special format files that help build and manage the projects automatically. For example:
gcc main.c hello.c factorial.c -o hello
Macros
similar to variables. Macros are defined in a Makefile as = pairs
MACROS = -me; PSROFF = groff -Tps; DITROFF = groff -Tdvi
CFLAGS = -O -systype bsd43; LIBS = "-lncurses -lm -lsdl"
Use "make -p" to see the default macros. But you can redefine the default macros at the command line.
Syntax and Dependencies
main.o: main.cpp functions.h
$(CC) -c main.cpp
Show the example and exercise 2: write your Makefile to compile the hello.c program
https://www.tutorialspoint.com/makefile/index.htm
https://www.cs.swarthmore.edu/~newhall/unixhelp/howto_makefiles.html
OpenMP: https://www.openmp.org
OpenMP concepts
OpenMP syntaxes and clauses – shared memory
omp_set_num_threads(omp_get_num_procs());
#pragma omp parallel
omp_get_num_threads();
ID = omp_get_thread_num();
how to use OpenMP to make this calculation faster:
1+2+3+....+10^5
How to compile the code and use it
gcc -fopenmp foo.c
export OMP_NUM_THREADS=4
./your_program
More resources: https://www.openmp.org/resources/tutorials-articles/
Clarifications
omp_get_num_procs() – always return the total number of cores of CPU in your machine, this is not controlled by OMP_NUM_THREADS environment
omp_get_max_threads() – gives the value that sets by OMP_NUM_THREADS environment
omp_set_num_threads(int ) – will reset the number of threads to the value passed to it and this value is going to be used in your program, it will ignore the OMP_NUM_THREADS environment.
—-------------------------------------
During the class, I used the `clock()` function in time.h. This returns CPU time*1! Therefore, the correct time is time = 1000.*(end - begin)/CLOCKS_PER_SEC/n_threads; Or you can use “sys/time.h” with the gettimeofday() function: see https://stackoverflow.com/a/361492; or simply time your_exec_file in terminal.
OpenMP: Exec 3: calculate PI with OpenMP up to k=10^6
the Gregory-Leibniz Series:
Another series which converges more quickly is the Nilakantha Series:
For lazy people: https://computationalastrouam.github.io/aco/CodingTutorial/pi.c
Mandelbrot
https://computationalastrouam.github.io/aco//static_files/Practices/mandelbrot.pdf
After you finished your code, also try to use different number of OpenMP threads to make a scaling test (confirm the results first), i.e., does the walltime of running the code goes down with the number of threads linearly?
Practice 2
Memory usage in C++ – overflow problems
Stack Memory: This is the memory where local variables get stored. It's automatically managed by the system: when a function is called, its variables get memory allocated on the stack, and when the function exits, that memory is automatically deallocated. The limit of the stack memory is determined by the operating system and the specific configuration of the system. Use `ulimit -s` in Linux to check its size. Stack overflow error, if a program exceeds the stack memory limit: int array[10];
Heap Memory: This is the memory that you can manually control. You can allocate memory on the heap using the new keyword, and it's your responsibility to deallocate it when you're done using it, with the delete keyword. The heap is larger than the stack, so it's used for large amounts of data, or for data that needs to persist beyond the lifetime of a single function call.
Overflow problems – see the example in practice: https://computationalastrouam.github.io/aco/CodingTutorialC++/
Pointer
If you have a variable `var` in your program, `&var` will give you its address in the memory.
#include <iostream>
int main()
{
int var = 5;
std::cout << "var: " << var << std::endl;
// Notice the use of & before var
std::cout << "address of var: " << &var << std::endl;
return 0;
}
Try to run the code multiple times, do you get the same address??
Pointers (pointer variables) are special variables that are used to store addresses rather than values.
int* pc; // does the position of asterisk (dereference operator) matter??
pc = &var;
std::cout << *pc; // what do you get here?
Pointer
Changing values:
#include <iostream>
int main()
{
int* pc; int c;
c = 22;
std::cout << "Address of c: " << &c << std::endl;
std::cout << "Value of c: " << c << std::endl << std::endl;
pc = &c;
std::cout << "Address of pointer pc: " << pc << std::endl;
std::cout << "Content of pointer pc: " << *pc << std::endl << std::endl;
c = 11;
std::cout << "Address of pointer pc: " << pc << std::endl;
std::cout << "Content of pointer pc: " << *pc << std::endl << std::endl;
*pc = 2;
std::cout << "Address of c: " << &c << std::endl;
std::cout << "Value of c: " << c << std::endl << std::endl;
pc = &c; // Do this two lines give the same result?
*pc = c;
}
// Can you make the pointer to point to the first address of your memory??
Pointer and array
&x[i] is equivalent to x+i and x[i] is equivalent to *(x+i). Try to see if double array is the same as integer array.
#include <iostream>
#include <iomanip>
int main() {
int x[4] = {1, 2, 3, 4};
for(int i = 0; i < 4; ++i) {
std::cout << "&x[" << i << "] = " << std::hex << std::showbase << &x[i] << ", " << (x+i) << std::endl;
std::cout << "x[" << i << "] = " << std::dec << x[i] << ", " << *(x+i) << std::endl;
}
return 0;
}
Pointer and function
In C/C++ programming, it is also possible to pass addresses as arguments to functions. It has an interesting effect which should be paid attention to.
See the example here: https://computationalastrouam.github.io/aco/CodingTutorialC++/pointer-update-parameter.cpp
Pointer and dynamic memory allocation
Why DMA?
An array is a collection of a fixed number of values. Once the size of an array is declared, you cannot change it. Declare a very large size array is not feasible! Array is saved in stack memory, which is only 8MB on a typical modern Linux.
To allocate memory dynamically, library functions are malloc(), calloc(), realloc() and free() are used. These functions are defined in the <stdlib.h> header file.
The malloc() function allocates memory and leaves the memory uninitialized,
whereas the calloc() function allocates memory and initializes all bits to zero. Use realloc() to realloc the array with a different size.
In C++ is very simple, using the new
See the example here: https://computationalastrouam.github.io/aco/CodingTutorialC++/pointer-dam.cpp
Pointer and dynamic memory allocation II
3D arrays:
See the example here: https://computationalastrouam.github.io/aco/CodingTutorialC++/pointer-dam3D.cpp
Try to modify the code to do a 2D array with the first column is the real part and second column as the image part of fourier output.
Some exercises
Calculate Standard Deviation of a random array (use functions from <random>), pay attention to the overflow problems.
Sort a 2D random array. Tip: look sort function in <algorithm>
Segmentation fault!
Example at: https://computationalastrouam.github.io/aco/CodingTutorialC++/segfault_examples.cpp
IO : <fstream>
File Stream Types:
File Open Modes:
File error handling:
Reading Techniques
Writing Techniques
Performance Tips
IO
It is very easy to write the array data into your hard drive in either text (easy to read, hard for multiple arrays) or binary format (fast to read and write with multiple data, but need to *know the data structure first* if it is a complex output data).
Example is at https://computationalastrouam.github.io/aco/CodingTutorialC++/IO.cpp
How can you read the binary file without knowing its array size? Tip: Use tellg() to get the position of the file pointer, which gives you the size of the file in bytes.
Exercise: (1) adding error handling to the example code; (2) writing multi-format data, including string, int, float, longlong and double formatted data, to one file in both ASCII and binary format; (3) try to read your data from the ASCII and binary. (4) compare the data reading and writing speed.
Practice 3
Handout Mandelbrot or Practices?
The suggested structure for a program project
Function
Lambda function
Lambda functions, also known as lambda expressions, are a feature in C++ that allow you to define a function directly in the place where you're going to use it. They are most commonly used when passing a simple function as an argument to another function.
A lambda function in C++ has the following syntax:
[capture](parameters) -> return_type { function_body }
int factor = 5;
auto multiply = [factor](int n) { return n * factor; };
std::cout << multiply(5); // Outputs: 25
Structure
A struct (or structure) is a collection of variables (can be of different types) under a single name.
struct Student {
std::string name;
int age;
float gpa;
}; // can you put (multiple) variable names here?
int main() {
Student s1; // define Student Variable; can you define an array/dynamically allocated memory array? here?
s1.name = "John Doe";
s1.age = 20;
s1.gpa = 3.5;
std::cout << "Name: " << s1.name << ", Age: " << s1.age << ", GPA: " << s1.gpa << std::endl;
// can struct being removed?? If so, how?
return 0;
}
Structs and Pointers
struct person
{
int age;
float weight;
};
int main()
{
struct person *personPtr, person1;
personPtr = &person1;
Person1.age = 10;
Person1.weight = 30;
// How do you assign/get the values with pointer? Can they (. and ->) be switched?
}
// Change the struct to union and try to cout multi member values.
Structs and Pointers
Dynamic memory allocation of structs and What is the size of structure?
struct Person
{
int age; // 4 bytes
float weight; // 4 bytes
};
int main()
{
Person *personPtr, person1;
std::cout << "Size of my structure = " << sizeof(Person) << std::endl; //Person or person1? Does it have the same size??
int n = 1000;
personPtr = new Person[n]; // try to do sizeof personPtr before and after the dynamical allocation
delete[] personPtr; // Don't forget to delete the allocated memory
return 0;
}
Structs and Functions
Structure with a Function:
struct Rectangle {
int length;
int width;
int area() {
return length * width;
}
};
int main() {
Rectangle rect;
rect.length = 5;
rect.width = 10;
std::cout << "Area of rectangle: " << rect.area() << std::endl;
return 0;
}
Structs and Functions
How to pass/return the structures to functions?? The same as other variables
struct person
{
int age; // 4 bytes
float weight; // 4 bytes
};
struct person your_func(struct person mystrct);
You can also pass the result through pointer.
Nested Structures:
struct Date {
int day;
int month;
int year;
};
struct Employee {
std::string name;
Date dateOfJoining;
};
int main() {
Employee emp;
emp.name = "John Doe";
emp.dateOfJoining.day = 1;
emp.dateOfJoining.month = 1;
emp.dateOfJoining.year = 2020;
std::cout << "Name: " << emp.name << ", Date of Joining: " << emp.dateOfJoining.day << "/" << emp.dateOfJoining.month << "/" << emp.dateOfJoining.year << std::endl;
return 0;
}
The advantage of structure
Example can be found at https://computationalastrouam.github.io/aco/CodingTutorialC++/structure.cpp
typedef
The typedef keyword is used to create an alias name for data types.
typedef int myint;
myint s=5;
// create struct with person1 variable
typedef struct Person {
char name[50];
int SNo;
float height;
} person; // What the difference between person and person1?
Or later with
typedef struct Person mystruct;
Class
class Student {
public:
std::string name;
int age;
float gpa;
};
int main() {
Student s1;
s1.name = "John Doe";
s1.age = 20;
s1.gpa = 3.5;
std::cout << "Name: " << s1.name << ", Age: " << s1.age << ", GPA: " << s1.gpa << std::endl;
return 0;
}
https://computationalastrouam.github.io/aco/CodingTutorialC++/class.cpp
Class: access specifiers
Access specifiers in C++ are used to define how the members (functions and variables) of a class can be accessed. There are three types of access specifiers:
class MyClass {
public:
int publicVar; // Can be accessed anywhere
private:
int privateVar; // Can only be accessed within MyClass
protected:
int protectedVar; // Can be accessed within MyClass and subclasses of MyClass
};
class mysubc : public MyClass { // mysubc is a subclass of MyClass, try to use private access instead
// mysubc inherits all public and protected members of MyClass
};
Class with Methods:
class Rectangle {
private:
int length;
int width;
public:
void setDimensions(int l, int w) {
length = l;
width = w;
}
int area() {
return length * width;
}
};
int main() {
Rectangle rect;
rect.setDimensions(5, 10);
std::cout << "Area of rectangle: " << rect.area() << std::endl;
return 0;
}
Mandelbrot
1. Openmp problems
only “#pragma omp parallel for” or no parallel at all.
http://www.inf.ufsc.br/~bosco.sobral/ensino/ine5645/OpenMP_Dynamic_Scheduling.pdf
2. Doesn’t bother to use multiple files… <see this suggestion>
I assume you all know .c++/.cpp and .h files and the project structure
.c++ files contain the implementation of the code, while .h files exist to provide interfaces that allow a file to access functions, global variables, and macros from other files. It is not forbidden to have your program in .h file, but have a look at this:
https://stackoverflow.com/questions/49017408/programming-with-just-header-files
3. Usage of main() function parameters
4. It is better to have your plot script included.
5. Return the number of iterations of the convergence test will allow you to plot the pic with color
6. Scaling test:
tstart = time(NULL);
// Some operation...
tend = time(NULL);
fprintf(stderr,"the operation took %lf sec. (wall-clock time)\n",difftime(tend,tstart));
//use std::chrono::high_resolution_clock::now() from <chrono> library for higher accuracy
Project discussion if time
Practice 4
Practice 4
Array, std:array, DAM array and vector
Feature | Array | std::array #include <array> | Dynamically Allocated Array | Vector #include <vector> |
Size | Fixed at compile time | Fixed at compile time | Determined at runtime | Dynamic, can be changed at runtime |
Memory Allocation | Automatic, out of scope delete | Automatic, out of scope delete | Manual, using new and delete[] | Automatic, managed by the vector, clear + shrink_to_fit |
Memory Location | Contiguous block of memory | Contiguous block of memory | Contiguous block of memory | Contiguous block of memory (as of C++11) |
Built-in Functions | No | Yes, such as fill, swap, size, empty, data, begin/end (index), front/back (value) | No | Besides these in array, it also has: push_back, pop_back, insert, erase, shrink_to_fit, rsize, clear, reserve, capacity. |
Memory Management | No overhead1 | Automatic memory management | Requires manual management | Automatic memory management |
Resizable | No | No | No, requires manual resizing | Yes, automatically resizes when capacity is exceeded |
Access | Direct access with [] | Direct access with [] or .at() | Direct access with [] | Direct access with [] or .at() |
Insertion/Deletion at the End | Not possible | Not possible | Not possible | Constant time with push_back and pop_back |
Insertion/Deletion at the Middle | Not possible | Not possible | Not possible | Linear time with insert and erase |
Details of built-in functions – try yourself
Practice differences between them
See the arrays.cpp file at :
https://computationalastrouam.github.io/aco/CodingTutorialC++/different_arrays.cpp
And the vector.cpp file for the usage of built-in function of the vector:
https://computationalastrouam.github.io/aco/CodingTutorialC++/vector.cpp
Exercise: try to use pointer to print out the elements of these different arrays.
Try to compare the speed of the code using the 4 different ways to create an array of 1^5 elements.
Try to play with the different vector built-in functions.
Sorting in C++ – std::sort
Defined in header <algorithm> : https://en.cppreference.com/w/cpp/algorithm/sort
Sorts the elements in the range [first, last) in non-descending order. The order of equal elements is not guaranteed to be preserved.
first, last – the range of elements to sort
std::sort(s.begin(), s.end());
Exercise: try to write this random list [3, 7, 1, 9, 2, 8, 4, 6, 7, 5] into array, std::array, DAM array and vector, sort them in 4 different functions. Test the speed of the 4 runs, are they the same? Try to play with it, for example, only sort part of the array.
Sorting in C++ – std::sort
void sort( RandomIt first, RandomIt last, Compare comp );
Comp - comparison function object (i.e. an object that satisfies the requirements of Compare) which returns true if the first argument is less than (i.e. is ordered before) the second.
The signature of the comparison function should be equivalent to the following:
bool cmp(const Type1& a, const Type2& b);
While the signature does not need to have const&, the function must not modify the objects passed to it and must be able to accept all values of type (possibly const) Type1 and Type2 regardless of value category (thus, Type1& is not allowed, nor is Type1 unless for Type1 a move is equivalent to a copy).
The types Type1 and Type2 must be such that an object of type RandomIt can be dereferenced and then implicitly converted to both of them.
int numbers[]={....};
std::sort (numbers, numbers+5, std::greater<int>()); //#include <functional> for std::greater
Try write your own comp, e.g.
bool mygreater (const T& x, const T& y) {return x>y;}
Exercise: try to sort the previous array in reversed trend, i.e. from bigger to smaller; try to write your own comp function to compare the results between >= and >. Can you use sort to sort two arrays? I.e. instead of compare the sorting array, the comp function compares other arrays.
Sorting in C++ – std::sort with index
How to get the index of the sorted array??
[3, 7, 1, 9, 2, 8, 4, 6, 7, 5]
[0,1,2,3,4,5,6,7,8,9] after sorting -> [2, 4, 0, 6, 9, 7, 1, 8, 5, 3]
Look at the example at :
https://computationalastrouam.github.io/aco/CodingTutorialC++/Sort_index.cpp
Sorting – indexx
Think about your method to sort a simple random array for example, [4,2,6,7,1,5]
Download a random integer array (binary format) here: https://computationalastrouam.github.io/aco/CodingTutorial/
And try your method.
We will go back to `indexx` after the project discussion
Note the details on how to use the indexx function!! Why -1?
indexx
The algorithm is explained here: http://www.foo.be/docs-free/Numerical_Recipe_In_C/c8-4.pdf
Kepler
https://computationalastrouam.github.io/aco//static_files/Practices/kepler.pdf
Use these physical setups:
#define Grav 6.6726E-11 // m^3/kg/s^2
#define Msun 1.989E30 // kg
#define Mearth 5.973E24 // kg
#define Dse 1.496E11 // m
Initial condition in position and velocity
Pos: [Dse, 0]; Vel: [0, sqrt(Grav*Msun/Dse)]. You can change this IC later
Search online for the leapfrog and Runge Kutta 4th order integrations.
Think about how to use OPENMP to parallel your code and how to make your code to compile only once but run with different setups.
Kepler problem
Don’t pass unused parameters to your functions.
Don’t let your function calculate something that never been used.
Global parameters should be only defined once in a header file.
Duplicated functions:
double dvxdt(double x, double y) { // Calculate the acceleration in the x-direction for Earth
double r = sqrt(x * x + y * y); // Calculate distance from the Sun
return -Grav * Msun * x / (r * r * r); // Return acceleration in x-direction
}
double dvydt(double x, double y) { // Calculate the acceleration in the y-direction for Earth
double r = sqrt(x * x + y * y); // Calculate distance from the Sun
return -Grav * Msun * y / (r * r * r); // Return acceleration in y-direction
}
Project discussion
One -to -one
Reminder, the score of the final presentation is largely depending on your effort to the project.
Kepler extension suggestion: (1) 2D “solar” system, parallel code, efficiency checks
(2) 3D “solar” system, serial code, stability checks
Practice 5 – performance of your program
Kepler – continue
Use argc and argv to give your code flexibility of change the number of integrations. With that, try to do a integration time step length of 1 day.
Kepler – parameter file
How to use argc/v to read in values in a parameter file?�https://computationalastrouam.github.io/aco/practice/
Other third party solutions:
INI parsing ✅ INIReader, inih: https://github.com/benhoyt/inih/tree/master
JSON parsing ✅ nlohmann/json: https://github.com/nlohmann/json
TOML parsing ✅ toml++ : https://marzer.github.io/tomlplusplus/
YAML parsing ✅ yaml-cpp : https://github.com/jbeder/yaml-cpp
Put these initial values (G, Msun etc.) into a parameter file and read them in.
Perf
Installing:
https://www.swift.org/documentation/server/guides/linux-perf.html
Usage: https://perf.wiki.kernel.org/index.php/Tutorial
You will need sudo if you have this error: Error: Access to performance monitoring and observability operations is limited.
g++ -g -fopenmp openmp-nested-for-loop.cpp -o openmp-nested-for-loop
perf stat ./openmp-nested-for-loop
perf record ./openmp-nested-for-loop
perf annotate; perf report
Another openmp handout
What can be parallelled, and whatnot? Know where to fast your program.
What cannot be parallelized with OpenMP:
OpenMP exercises
Try to implement the omp parallel <sections> in your Kepler code
Practice 6 – debug with GDB
GDB
Reference material: https://www.sourceware.org/gdb/documentation/
The purpose of a debugger such as gdb is to allow you to see what is going on “inside” another program while it executes—or what another program was doing at the moment it crashed. gdb can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:
GDB installsion
It should be installed along with gcc. If not, go to download the software and compile, install with
`./configure`
`make`
`make install`
in Linux.
GDB cheat sheet
GDB hand-on
See the practices at: https://computationalastrouam.github.io/aco/practice/
Practice 7 – MPI
What is MPI? Message Passing Interface
Why do we need MPI?
One computer is not power enough! + too costly to have many CPUs+memories in one node. Some terminologies:
HPC is built by many nodes. At which level of the HPC computers, can you use MPI/OpenMP?
The difference between OPENMP and MPI
Shared Memory Programming VS Distributed Memory Programming
You will need message passing.
Message Passing
A standard API for message passing communication and process information lookup, registration, grouping and creation of new message datatypes.
● Point to point comms: ([non]blocking, [a]synchronous)
● Collective comms: one-many, many-one, many-many
Code parallelization cannot be incremental
MPI Basics
● mpi.h include files for C/C++: error = MPI_Xxxxx(argument, ...);
● Notice that all routines have to return an error value
● MPI_ namespace reserved for MPI: MPI:: namespace for C++
● In C/C++ (in)out arguments passed as pointers
● Always start with MPI_Init() and end with MPI_Finalize() (or MPI_Abort()). Both calls need to be made by all processes.
Before go to the example, install openmpi in your computer:
sudo apt install libopenmpi-dev
Compile your program in this way: mpic++ mpi_example.cpp -o mpi_example, and the run command as: mpirun -np 4 ./mpi_example, where -np 4 specifies to run the program with 4 processes.
The MPI program
The code can also be downloaded from: https://computationalastrouam.github.io/aco/CodingTutorial/test_mpi.c
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
MPI_Init(NULL, NULL); //(Initialization)
// initialize MPI environment Called before any other MPI call. Only MPI_Initialized(int *flag) is allowed to call before it.
int world_size; // number of processes
MPI_Comm_size(MPI_COMM_WORLD, &world_size); // (Communicators)
// A communicator is an ordered set of processes, that remains constant from its creation until its destruction
// All MPI processes form the MPI_COMM_WORLD communicator which defined very beginning and is a handle (predefined constants) defined in the include files.
// MPI_Comm_size gives size of the communicator set – how many threads will be used.
int world_rank; // the rank of the process or the IDs of each thread
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Returns the rank of the process in the communicator; Its value is between 0 and (size-1)
char processor_name[MPI_MAX_PROCESSOR_NAME]; // gets the name of the processor/computer name
int name_len;
MPI_Get_processor_name(processor_name, &name_len); // what is value for name_len?
printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);
MPI_Finalize(); // finish MPI environment}
//If a process catches an error that cannot be corrected, a user can call MPI_Abort(MPI_Comm comm, int errorcode)
Exercise 1
https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main1-HelloWorld.cpp
Compile the main1-HelloWorld.cpp program and run it with different cpus
Modify the code to only use odd number ID cpus to print out “Hello Upper World”, and even ID number CPUs to print out “Hello Lower World”
MPI (Primitive) Data types
Why the MPI has its own data types?
The purpose of the MPI datatypes is to tell the MPI functions what they're working with. So, for example, if you send an int from a little-endian machine to a big-endian one then MPI can do the byte order conversion for you. Another more common benefit is that MPI knows how big an MPI_DOUBLE is, so you don't have to have a bunch of sizeof statements everywhere.
!! the MPI datatypes are tags, not actual datatypes!!
You can also create your own data structure with:
MPI_Type_create_struct - Creates a structured data type.
** tip man 3 MPI_Type_create_struct to see the details
** you probably need to install openmpi-doc to use man
MPI datatype | C datatype |
MPI_CHAR | signed char |
MPI_SHORT | signed short int |
MPI_INT | signed int |
MPI_LONG | signed long int |
MPI_UNSIGNED_CHAR | unsigned char |
MPI_UNSIGNED_SHORT | unsigned short int |
MPI_UNSIGNED | unsigned int |
MPI_UNSIGNED_LONG | unsigned long int |
MPI_FLOAT | float |
MPI_DOUBLE | double |
MPI_LONG_DOUBLE | long double |
MPI_BYTE | 8 bits |
MPI_PACKED | packed sequence of bytes |
MPI_LONG_LONG_INT | long long int |
MPI Primitives: https://www.open-mpi.org/doc/current/
OpenMPI commands (man 1/ man)
OpenMPI general information (man 7)
MPI API (man 3)
Message passing
Communications can be between two processes (point to point [[non]blocking, [a]synchronous]) or between a group of processes (collective [one-many, many-one, many-many])
The purpose is the "exchange" of information, much like the mail system. Thus every message (document) has:
Point to Point Comms
MPI_Send/MPI_Recv(void *buf, int cnt, MPI_Datatype type, int dest/src, int tag, MPI_Comm comm)
● buf is the starting address of the array
● cnt is its length
● type is its MPI datatype
● comm is the communicator context
● int dest/source is the rank of the destination process in comm
● tag is an extra distinguishing number, like a note (MPI_ANY_TAG)
MPI_Ssend (Synchronous send) The sender notifies the receiver; after the matching receive is posted the receiver acks back and the sender sends the message.
MPI_Bsend (Buffered, asynchronous, send) The sender notifies the receiver and the message is either buffered on the sender side or the receiver side according to size until a matching receive forces a network transfer or a local copy respectively.
Depending on the MPI implementation, MPI_Send may behave as either MPI_Ssend or MPI_Bsend.
Exercise 1. P-2-P message passing
https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main2-SendReceive.cpp
Try to send different types of variables from world_rank 0 to world_rank 1.
Tip:
if (myrank == 0) {
MPI_Send(..,,,..);
} else {
MPI_Recv(..,,,..);
}
Collective – one-many: Broadcast
MPI_Bcast(void *buf, int cnt, MPI_Datatype type, int root, MPI_Comm comm)
root has to be the same on all procs, can be nonzero
Exercise 2
https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main3-Broadcast.cpp
Try to broadcast different variables
Try to see if MPI_Bcast inside rank 0 works or not
Collective – many-one: scatter
int MPI_Scatter(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)
Collective – many-one: Gather
MPI_Gather(void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm)
Make sure recvbuf is large enough on root where it matters, elsewhere it is ignored
Exercise 3
https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main4-ScatterGather.cpp
Collective – many-one: Reduce
int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm)
Op: reduce operation (handle): MPI_MAX - Returns the maximum element; MPI_MIN - Returns the minimum element; MPI_SUM - Sums the elements; MPI_PROD - Multiplies all elements; MPI_LAND - Performs a logical and across the elements; MPI_LOR - Performs a logical or across the elements; MPI_BAND - Performs a bitwise and across the bits of the elements; MPI_BOR - Performs a bitwise or across the bits of the elements; MPI_MAXLOC - Returns the maximum value and the rank of the process that owns it; MPI_MINLOC - Returns the minimum value and the rank of the process that owns it.
Exercise 4
https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main5-SumReduction.cpp
Try to reduce an array data
Collective – many-many:
MPI_Allgather(void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, MPI_Comm comm)
● Make sure recvbuf is large enough on all procs
Collective – many-many:
MPI_Allreduce(void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, MPI_Op op, MPI_Comm comm)
● Make sure recvbuf is large enough on all procs
Exercise 4
Write your own Allreduce and Allgather functions to play with it.
Practice 1
Description:
Write an MPI program where each process determines whether its rank is even or odd, and the root process (rank 0) collects all results and prints a summary.
Requirements:
Practice 2
Each process is given a random integer. Write an MPI program that finds the maximum number among all processes using reduction.
Requirements:
Bonus: Also find the rank of the process that had the maximum number using a custom reduction or MPI_Reduce with MPI_MAXLOC.
Practice 3
Description:
Given a 2D matrix (e.g., 8×8 integers), divide the rows among worker processes. Each worker computes the sum of its assigned rows. The master (rank 0) collects the row sums and computes the grand total.
Requirements:
Bonus: Handle cases where the number of rows isn't evenly divisible by the number of processes.
Practice 7 – SPH, project & questions
SPH handout
Try to defined the particle data as global, then you should be able to directly use it in your functions instead of passing it as a parameter. It is not advised to pass very large data array as a function parameter. Will that make your program running faster?
SPH handout
When you program is separated into different files, do you need to have “#include <stdlib.h>” in every file? Even the functions in stdlib.h are used in these files.
File_main file_1.h
#include <stdlib.h> #include <stdlib.h> // is this necessary?
#include <stddef.h>
#include <stdio.h>
#include "file_1.h"
int main(int argc, char **argv)
Projects
Work on you projects