3 of 98

The flexibility of the practice course

Date	Contents	Home work	Comments
27/03/2025	Setup, Basic programming, intro Mandelbrot		Have a look and think about the final project
28/03/2025	Pointer, dynamic allocated array Intro Kepler	Mandelbrot	IO
04/04/2025	structure Handout Mandelbrot		Project discussion
11/04/2025	Useful things: sorting Intro SPH	Kepler	More on openmp?
25/04/2025	Useful things: gdb Handout Kepler		reading/writing?
09/05/2025	Perf Handout SPH	SPH
23/05/2025	MPI
30/05/2025	Final presentation

4 of 98

Linux system + C++ programming

Confirm the Linux training from Violeta.

Why Linux? It is for efficient work, especially programming and computation.

g++ – compiler

man – helper; man 3 for functions

Editors? nano/vim in terminal; IDE: VS, jetbrain->CLion, Pulsar, …

Linux Basic commands: https://weiguangcui.github.io/Computational_Astrophysics/static_files/linux_info.pdf

Practices:

https://computationalastrouam.github.io/aco/CodingTutorialC++/

5 of 98

A simple c++ program

#include <iostream>

#include <cmath>

#include <cstdint>

#include <cstddef>

#include <cstdlib> // Do we need all of these??

#include <cstring> //!! How about local header file??

int main() { //!! int argc and char ** argv[]

double *a;

int i;

std::cout << "Hello World" << std::endl;

std::cout << "Enter an integer: ";

std::cin >> i; //!! Keyboard input

return i; //!! where does it go?

}

Exec 1: try to write your own program to use argc and argv to give your program a parameter

https://computationalastrouam.github.io/aco/CodingTutorial/HelloWorld.c

6 of 98

More basics things on C++ language

C++ – introduction

C++ operators and Syntax

C++ file writing and reading

C++ – data structure and class

C++ – function and pointer

7 of 98

Git version control

Confirm the learning from Violeta

Recommended website: https://git-scm.com/docs/gittutorial

Show the simple commands to use git.

8 of 98

Makefile – a better way to compile program

Why Makefile?

Compiling the source code files can be tiring, especially when you have to include several source files and type the compiling command every time you need to compile. Makefiles are the solution to simplify this task. Makefiles are special format files that help build and manage the projects automatically. For example:

gcc main.c hello.c factorial.c -o hello

Macros

similar to variables. Macros are defined in a Makefile as = pairs

MACROS = -me; PSROFF = groff -Tps; DITROFF = groff -Tdvi

CFLAGS = -O -systype bsd43; LIBS = "-lncurses -lm -lsdl"

Use "make -p" to see the default macros. But you can redefine the default macros at the command line.

Syntax and Dependencies

main.o: main.cpp functions.h

$(CC) -c main.cpp

Show the example and exercise 2: write your Makefile to compile the hello.c program

https://www.tutorialspoint.com/makefile/index.htm

https://www.cs.swarthmore.edu/~newhall/unixhelp/howto_makefiles.html

9 of 98

OpenMP: https://www.openmp.org

OpenMP concepts

OpenMP syntaxes and clauses – shared memory

omp_set_num_threads(omp_get_num_procs());

#pragma omp parallel

omp_get_num_threads();

ID = omp_get_thread_num();

how to use OpenMP to make this calculation faster:

1+2+3+....+10^5

How to compile the code and use it

gcc -fopenmp foo.c

export OMP_NUM_THREADS=4

./your_program

More resources: https://www.openmp.org/resources/tutorials-articles/

10 of 98

Clarifications

omp_get_num_procs() – always return the total number of cores of CPU in your machine, this is not controlled by OMP_NUM_THREADS environment

omp_get_max_threads() – gives the value that sets by OMP_NUM_THREADS environment

omp_set_num_threads(int ) – will reset the number of threads to the value passed to it and this value is going to be used in your program, it will ignore the OMP_NUM_THREADS environment.

—-------------------------------------

During the class, I used the `clock()` function in time.h. This returns CPU time^*1! Therefore, the correct time is time = 1000.*(end - begin)/CLOCKS_PER_SEC/n_threads; Or you can use “sys/time.h” with the gettimeofday() function: see https://stackoverflow.com/a/361492; or simply time your_exec_file in terminal.

On multi-processor machines, a computer program can use two or more CPUs for processing using parallel processing scheduling. In such situations, the notion of total CPU time is used, which is the sum of CPU time consumed by all of the CPUs utilized by the computer program.

11 of 98

OpenMP: Exec 3: calculate PI with OpenMP up to k=10^6

the Gregory-Leibniz Series:

Another series which converges more quickly is the Nilakantha Series:

For lazy people: https://computationalastrouam.github.io/aco/CodingTutorial/pi.c

12 of 98

Mandelbrot

https://computationalastrouam.github.io/aco//static_files/Practices/mandelbrot.pdf

After you finished your code, also try to use different number of OpenMP threads to make a scaling test (confirm the results first), i.e., does the walltime of running the code goes down with the number of threads linearly?

14 of 98

Memory usage in C++ – overflow problems

Stack Memory: This is the memory where local variables get stored. It's automatically managed by the system: when a function is called, its variables get memory allocated on the stack, and when the function exits, that memory is automatically deallocated. The limit of the stack memory is determined by the operating system and the specific configuration of the system. Use `ulimit -s` in Linux to check its size. Stack overflow error, if a program exceeds the stack memory limit: int array[10];

Heap Memory: This is the memory that you can manually control. You can allocate memory on the heap using the new keyword, and it's your responsibility to deallocate it when you're done using it, with the delete keyword. The heap is larger than the stack, so it's used for large amounts of data, or for data that needs to persist beyond the lifetime of a single function call.

Overflow problems – see the example in practice: https://computationalastrouam.github.io/aco/CodingTutorialC++/

15 of 98

Pointer

If you have a variable `var` in your program, `&var` will give you its address in the memory.

#include <iostream>

int main()

{

int var = 5;

std::cout << "var: " << var << std::endl;

// Notice the use of & before var

std::cout << "address of var: " << &var << std::endl;

return 0;

}

Try to run the code multiple times, do you get the same address??

Pointers (pointer variables) are special variables that are used to store addresses rather than values.

int* pc; // does the position of asterisk (dereference operator) matter??

pc = &var;

std::cout << *pc; // what do you get here?

16 of 98

Pointer

Changing values:

#include <iostream>

int main()

{

int* pc; int c;

c = 22;

std::cout << "Address of c: " << &c << std::endl;

std::cout << "Value of c: " << c << std::endl << std::endl;

pc = &c;

std::cout << "Address of pointer pc: " << pc << std::endl;

std::cout << "Content of pointer pc: " << *pc << std::endl << std::endl;

c = 11;

std::cout << "Address of pointer pc: " << pc << std::endl;

std::cout << "Content of pointer pc: " << *pc << std::endl << std::endl;

*pc = 2;

std::cout << "Address of c: " << &c << std::endl;

std::cout << "Value of c: " << c << std::endl << std::endl;

pc = &c; // Do this two lines give the same result?

*pc = c;

}

// Can you make the pointer to point to the first address of your memory??

17 of 98

Pointer and array

&x[i] is equivalent to x+i and x[i] is equivalent to *(x+i). Try to see if double array is the same as integer array.

#include <iostream>

#include <iomanip>

int main() {

int x[4] = {1, 2, 3, 4};

for(int i = 0; i < 4; ++i) {

std::cout << "&x[" << i << "] = " << std::hex << std::showbase << &x[i] << ", " << (x+i) << std::endl;

std::cout << "x[" << i << "] = " << std::dec << x[i] << ", " << *(x+i) << std::endl;

}

return 0;

}

18 of 98

Pointer and function

In C/C++ programming, it is also possible to pass addresses as arguments to functions. It has an interesting effect which should be paid attention to.

See the example here: https://computationalastrouam.github.io/aco/CodingTutorialC++/pointer-update-parameter.cpp

19 of 98

Pointer and dynamic memory allocation

Why DMA?

An array is a collection of a fixed number of values. Once the size of an array is declared, you cannot change it. Declare a very large size array is not feasible! Array is saved in stack memory, which is only 8MB on a typical modern Linux.

To allocate memory dynamically, library functions are malloc(), calloc(), realloc() and free() are used. These functions are defined in the <stdlib.h> header file.

The malloc() function allocates memory and leaves the memory uninitialized,

whereas the calloc() function allocates memory and initializes all bits to zero. Use realloc() to realloc the array with a different size.

In C++ is very simple, using the new

See the example here: https://computationalastrouam.github.io/aco/CodingTutorialC++/pointer-dam.cpp

20 of 98

Pointer and dynamic memory allocation II

3D arrays:

See the example here: https://computationalastrouam.github.io/aco/CodingTutorialC++/pointer-dam3D.cpp

Try to modify the code to do a 2D array with the first column is the real part and second column as the image part of fourier output.

21 of 98

Some exercises

Calculate Standard Deviation of a random array (use functions from <random>), pay attention to the overflow problems.

Sort a 2D random array. Tip: look sort function in <algorithm>

22 of 98

Segmentation fault!

Accessing Memory Out of Bounds
Dereferencing Null Pointers
Double Free or Memory Corruption
Stack Overflow
Invalid Memory Access
Buffer Overflow
Memory Management Issues
Race Conditions in Multithreading
Invalid Function Pointers
System Resource Limits

Example at: https://computationalastrouam.github.io/aco/CodingTutorialC++/segfault_examples.cpp

23 of 98

IO : <fstream>

File Stream Types:

Use std::ifstream for reading
Use std::ofstream for writing
Use std::fstream for both reading and writing
Use std::ios::binary for binary files

File Open Modes:

std::ios::in for reading
std::ios::out for writing
std::ios::app for appending
std::ios::binary for binary files
std::ios::trunc to truncate existing files

File error handling:

file.is_open() to check if the file opened successfully
Use file.eof() to check end of file
Use file.bad() for hardware errors
Use file.fail() for format errors

Reading Techniques

Line by line: std::getline()
Word by word: >> operator
Binary: read() method
Buffer reading: read() with buffer

Writing Techniques

Line by line: << operator
Binary: write() method
Buffer writing: write() with buffer
Use std::endl for newlines

Performance Tips

Use buffering for large files
Process files in chunks
Close files when done
Use binary mode for binary data

24 of 98

It is very easy to write the array data into your hard drive in either text (easy to read, hard for multiple arrays) or binary format (fast to read and write with multiple data, but need to *know the data structure first* if it is a complex output data).

Example is at https://computationalastrouam.github.io/aco/CodingTutorialC++/IO.cpp

How can you read the binary file without knowing its array size? Tip: Use tellg() to get the position of the file pointer, which gives you the size of the file in bytes.

Exercise: (1) adding error handling to the example code; (2) writing multi-format data, including string, int, float, longlong and double formatted data, to one file in both ASCII and binary format; (3) try to read your data from the ASCII and binary. (4) compare the data reading and writing speed.

26 of 98

Handout Mandelbrot or Practices?

27 of 98

The suggested structure for a program project

Header Files (.h/.hpp) : Declaration of functions, classes, and templates
Separated Source Files (.cpp) : Implementation of functions, not every function to be in an individual file, but should be clearly grouped by their aims/target/functions/…
Even in different folders if there are too many files
Use Makefile or CMakefiles
The usage of #ifdef, see the updated Makefile and overflow.cpp

28 of 98

Function

The main function has parameters: int main(int argc, char **argv): example
Function Template: Allows functions to work with multiple data types and Enables generic programming
Lambda Function: Allows you to define anonymous functions inline
Function Object: Objects that act like functions
Function Pointer and Callback: Pass functions as arguments
Function Overloading: Multiple functions with the same name but different parameters
Default Arguments: Allows you to specify default values for function parameters

Examples at here

29 of 98

Lambda function

Lambda functions, also known as lambda expressions, are a feature in C++ that allow you to define a function directly in the place where you're going to use it. They are most commonly used when passing a simple function as an argument to another function.

A lambda function in C++ has the following syntax:

[capture](parameters) -> return_type { function_body }

Capture: This is the capture clause, which determines what variables are accessible in the lambda function. It can be empty, or it can contain variables to capture from the surrounding scope.
Parameters: This is the parameter list, just like in a normal function.
Return_type: This is the return type of the function. This part is optional, and if omitted, the compiler will infer the return type based on the function body.
Function_body: This is the body of the function, where you write the code that will be executed when the function is called.

int factor = 5;

auto multiply = [factor](int n) { return n * factor; };

std::cout << multiply(5); // Outputs: 25

30 of 98

Structure

A struct (or structure) is a collection of variables (can be of different types) under a single name.

struct Student {

std::string name;

int age;

float gpa;

}; // can you put (multiple) variable names here?

int main() {

Student s1; // define Student Variable; can you define an array/dynamically allocated memory array? here?

s1.name = "John Doe";

s1.age = 20;

s1.gpa = 3.5;

std::cout << "Name: " << s1.name << ", Age: " << s1.age << ", GPA: " << s1.gpa << std::endl;

// can struct being removed?? If so, how?

return 0;

}

31 of 98

Structs and Pointers

The Dot(.) operator is used to normally access members of a structure.
The Arrow(->) operator exists to access the members of the structure using pointers.

struct person

{

int age;

float weight;

};

int main()

{

struct person *personPtr, person1;

personPtr = &person1;

Person1.age = 10;

Person1.weight = 30;

// How do you assign/get the values with pointer? Can they (. and ->) be switched?

}

// Change the struct to union and try to cout multi member values.

32 of 98

Structs and Pointers

Dynamic memory allocation of structs and What is the size of structure?

struct Person

{

int age; // 4 bytes

float weight; // 4 bytes

};

int main()

{

Person *personPtr, person1;

std::cout << "Size of my structure = " << sizeof(Person) << std::endl; //Person or person1? Does it have the same size??

int n = 1000;

personPtr = new Person[n]; // try to do sizeof personPtr before and after the dynamical allocation

delete[] personPtr; // Don't forget to delete the allocated memory

return 0;

}

33 of 98

Structs and Functions

Structure with a Function:

struct Rectangle {

int length;

int width;

int area() {

return length * width;

}

};

int main() {

Rectangle rect;

rect.length = 5;

rect.width = 10;

std::cout << "Area of rectangle: " << rect.area() << std::endl;

return 0;

}

34 of 98

Structs and Functions

How to pass/return the structures to functions?? The same as other variables

struct person

{

int age; // 4 bytes

float weight; // 4 bytes

};

struct person your_func(struct person mystrct);

You can also pass the result through pointer.

35 of 98

Nested Structures:

struct Date {

int day;

int month;

int year;

};

struct Employee {

std::string name;

Date dateOfJoining;

};

int main() {

Employee emp;

emp.name = "John Doe";

emp.dateOfJoining.day = 1;

emp.dateOfJoining.month = 1;

emp.dateOfJoining.year = 2020;

std::cout << "Name: " << emp.name << ", Date of Joining: " << emp.dateOfJoining.day << "/" << emp.dateOfJoining.month << "/" << emp.dateOfJoining.year << std::endl;

return 0;

}

36 of 98

The advantage of structure

Grouping of Data: Structures allow you to group variables of different data types together under a single name. This makes the program more organized, easier to understand and manage.
Code Reusability: Once a structure is defined, it can be used to create multiple variables (or objects) of that structure type. This promotes code reusability.
Efficient Memory Usage: Structures can lead to more efficient use of memory. When you have a group of variables of different types, it's more memory-efficient to use a structure than to allocate memory for each variable separately.
Ease of Passing to Functions: You can pass structures to functions more easily than individual variables. This can make your code cleaner and easier to read.
Implementation of Real World Entities: Structures can be used to represent real-world entities (like a Student, a Date, a Rectangle in the examples provided) more effectively. Each entity can have properties of different data types.
Flexibility: Structures in C++ can have member functions (like classes) which provide more flexibility. They can have constructors, destructors, and even overloaded operators

Example can be found at https://computationalastrouam.github.io/aco/CodingTutorialC++/structure.cpp

37 of 98

typedef

The typedef keyword is used to create an alias name for data types.

typedef int myint;

myint s=5;

// create struct with person1 variable

typedef struct Person {

char name[50];

int SNo;

float height;

} person; // What the difference between person and person1?

Or later with

typedef struct Person mystruct;

38 of 98

Class

class Student {

public:

std::string name;

int age;

float gpa;

};

int main() {

Student s1;

s1.name = "John Doe";

s1.age = 20;

s1.gpa = 3.5;

std::cout << "Name: " << s1.name << ", Age: " << s1.age << ", GPA: " << s1.gpa << std::endl;

return 0;

}

https://computationalastrouam.github.io/aco/CodingTutorialC++/class.cpp

39 of 98

Class: access specifiers

Access specifiers in C++ are used to define how the members (functions and variables) of a class can be accessed. There are three types of access specifiers:

public: Members declared as public are accessible from anywhere where the object of that class is visible.
private: Members declared as private are only accessible within the class itself, including its methods. They are not accessible outside the class or by any object of the class.
protected: Members declared as protected are accessible within the class and its subclasses, but not outside these.

class MyClass {

public:

int publicVar; // Can be accessed anywhere

private:

int privateVar; // Can only be accessed within MyClass

protected:

int protectedVar; // Can be accessed within MyClass and subclasses of MyClass

};

class mysubc : public MyClass { // mysubc is a subclass of MyClass, try to use private access instead

// mysubc inherits all public and protected members of MyClass

};

40 of 98

Class with Methods:

class Rectangle {

private:

int length;

int width;

public:

void setDimensions(int l, int w) {

length = l;

width = w;

}

int area() {

return length * width;

}

};

int main() {

Rectangle rect;

rect.setDimensions(5, 10);

std::cout << "Area of rectangle: " << rect.area() << std::endl;

return 0;

}

41 of 98

Mandelbrot

1. Openmp problems

only “#pragma omp parallel for” or no parallel at all.

http://www.inf.ufsc.br/~bosco.sobral/ensino/ine5645/OpenMP_Dynamic_Scheduling.pdf

2. Doesn’t bother to use multiple files… <see this suggestion>

I assume you all know .c++/.cpp and .h files and the project structure

.c++ files contain the implementation of the code, while .h files exist to provide interfaces that allow a file to access functions, global variables, and macros from other files. It is not forbidden to have your program in .h file, but have a look at this:

https://stackoverflow.com/questions/49017408/programming-with-just-header-files

3. Usage of main() function parameters

4. It is better to have your plot script included.

5. Return the number of iterations of the convergence test will allow you to plot the pic with color

6. Scaling test:

tstart = time(NULL);

// Some operation...

tend = time(NULL);

fprintf(stderr,"the operation took %lf sec. (wall-clock time)\n",difftime(tend,tstart));

//use std::chrono::high_resolution_clock::now() from <chrono> library for higher accuracy

42 of 98

Project discussion if time

44 of 98

Practice 4

Array, std::array, DAM array and std::vector, sorting with index in C++
Sorting with index
Intro Kepler problem
Project discussion

45 of 98

Array, std:array, DAM array and vector

Feature	Array	std::array #include <array>	Dynamically Allocated Array	Vector #include <vector>
Size	Fixed at compile time	Fixed at compile time	Determined at runtime	Dynamic, can be changed at runtime
Memory Allocation	Automatic, out of scope delete	Automatic, out of scope delete	Manual, using new and delete[]	Automatic, managed by the vector, clear + shrink_to_fit
Memory Location	Contiguous block of memory	Contiguous block of memory	Contiguous block of memory	Contiguous block of memory (as of C++11)
Built-in Functions	No	Yes, such as fill, swap, size, empty, data, begin/end (index), front/back (value)	No	Besides these in array, it also has: push_back, pop_back, insert, erase, shrink_to_fit, rsize, clear, reserve, capacity.
Memory Management	No overhead¹	Automatic memory management	Requires manual management	Automatic memory management
Resizable	No	No	No, requires manual resizing	Yes, automatically resizes when capacity is exceeded
Access	Direct access with []	Direct access with [] or .at()	Direct access with []	Direct access with [] or .at()
Insertion/Deletion at the End	Not possible	Not possible	Not possible	Constant time with push_back and pop_back
Insertion/Deletion at the Middle	Not possible	Not possible	Not possible	Linear time with insert and erase

it means that the process of allocating and deallocating memory does not impose additional computational cost.

46 of 98

Details of built-in functions – try yourself

push_back(element): Adds an element to the end of the vector.
pop_back(): Removes the last element from the vector.
size(): Returns the number of elements in the vector.
empty(): Returns true if the vector is empty, false otherwise.
clear(): Removes all elements from the vector.
insert(position, element): Inserts an element at the specified position in the vector.
erase(position): Removes the element at the specified position in the vector.
begin(): Returns an iterator pointing to the first element in the vector.

end(): Returns an iterator pointing to the position past the end of the vector.
front(): Returns a reference to the first element in the vector.
back(): Returns a reference to the last element in the vector.
at(index): Returns a reference to the element at the specified index in the vector.
operator[]: Accesses the element at the specified index in the vector.
reserve(size_type n): Requests that the vector capacity be at least enough to contain n elements.
capacity(): Returns the number of elements that the vector could at least hold without allocating a new storage area.
shrink_to_fit(): Requests the container to reduce its capacity to fit its size.

47 of 98

Practice differences between them

See the arrays.cpp file at :

https://computationalastrouam.github.io/aco/CodingTutorialC++/different_arrays.cpp

And the vector.cpp file for the usage of built-in function of the vector:

https://computationalastrouam.github.io/aco/CodingTutorialC++/vector.cpp

Exercise: try to use pointer to print out the elements of these different arrays.

Try to compare the speed of the code using the 4 different ways to create an array of 1^5 elements.

Try to play with the different vector built-in functions.

48 of 98

Sorting in C++ – std::sort

Defined in header <algorithm> : https://en.cppreference.com/w/cpp/algorithm/sort

void sort( RandomIt first, RandomIt last );

Sorts the elements in the range [first, last) in non-descending order. The order of equal elements is not guaranteed to be preserved.

first, last – the range of elements to sort

std::sort(s.begin(), s.end());

Exercise: try to write this random list [3, 7, 1, 9, 2, 8, 4, 6, 7, 5] into array, std::array, DAM array and vector, sort them in 4 different functions. Test the speed of the 4 runs, are they the same? Try to play with it, for example, only sort part of the array.

49 of 98

Sorting in C++ – std::sort

void sort( RandomIt first, RandomIt last, Compare comp );

Comp - comparison function object (i.e. an object that satisfies the requirements of Compare) which returns true if the first argument is less than (i.e. is ordered before) the second.

The signature of the comparison function should be equivalent to the following:

bool cmp(const Type1& a, const Type2& b);

While the signature does not need to have const&, the function must not modify the objects passed to it and must be able to accept all values of type (possibly const) Type1 and Type2 regardless of value category (thus, Type1& is not allowed, nor is Type1 unless for Type1 a move is equivalent to a copy).

The types Type1 and Type2 must be such that an object of type RandomIt can be dereferenced and then implicitly converted to both of them.

int numbers[]={....};

std::sort (numbers, numbers+5, std::greater<int>()); //#include <functional> for std::greater

Try write your own comp, e.g.

bool mygreater (const T& x, const T& y) {return x>y;}

Exercise: try to sort the previous array in reversed trend, i.e. from bigger to smaller; try to write your own comp function to compare the results between >= and >. Can you use sort to sort two arrays? I.e. instead of compare the sorting array, the comp function compares other arrays.

50 of 98

Sorting in C++ – std::sort with index

How to get the index of the sorted array??

[3, 7, 1, 9, 2, 8, 4, 6, 7, 5]

[0,1,2,3,4,5,6,7,8,9] after sorting -> [2, 4, 0, 6, 9, 7, 1, 8, 5, 3]

Look at the example at :

https://computationalastrouam.github.io/aco/CodingTutorialC++/Sort_index.cpp

51 of 98

Sorting – indexx

Think about your method to sort a simple random array for example, [4,2,6,7,1,5]

Download a random integer array (binary format) here: https://computationalastrouam.github.io/aco/CodingTutorial/

And try your method.

We will go back to `indexx` after the project discussion

Note the details on how to use the indexx function!! Why -1?

52 of 98

indexx

The algorithm is explained here: http://www.foo.be/docs-free/Numerical_Recipe_In_C/c8-4.pdf

53 of 98

Kepler

https://computationalastrouam.github.io/aco//static_files/Practices/kepler.pdf

Use these physical setups:

#define Grav 6.6726E-11 // m^3/kg/s^2

#define Msun 1.989E30 // kg

#define Mearth 5.973E24 // kg

#define Dse 1.496E11 // m

Initial condition in position and velocity

Pos: [Dse, 0]; Vel: [0, sqrt(Grav*Msun/Dse)]. You can change this IC later

Search online for the leapfrog and Runge Kutta 4th order integrations.

Think about how to use OPENMP to parallel your code and how to make your code to compile only once but run with different setups.

54 of 98

Kepler problem

Don’t pass unused parameters to your functions.

Don’t let your function calculate something that never been used.

Global parameters should be only defined once in a header file.

Duplicated functions:

double dvxdt(double x, double y) { // Calculate the acceleration in the x-direction for Earth

double r = sqrt(x * x + y * y); // Calculate distance from the Sun

return -Grav * Msun * x / (r * r * r); // Return acceleration in x-direction

}

double dvydt(double x, double y) { // Calculate the acceleration in the y-direction for Earth

double r = sqrt(x * x + y * y); // Calculate distance from the Sun

return -Grav * Msun * y / (r * r * r); // Return acceleration in y-direction

}

55 of 98

Project discussion

One -to -one

Reminder, the score of the final presentation is largely depending on your effort to the project.

Kepler extension suggestion: (1) 2D “solar” system, parallel code, efficiency checks

(2) 3D “solar” system, serial code, stability checks

56 of 98

Practice 5 – performance of your program

57 of 98

Kepler – continue

Use argc and argv to give your code flexibility of change the number of integrations. With that, try to do a integration time step length of 1 day.

58 of 98

Kepler – parameter file

How to use argc/v to read in values in a parameter file?�https://computationalastrouam.github.io/aco/practice/

59 of 98

Perf

Installing:

https://www.swift.org/documentation/server/guides/linux-perf.html

Usage: https://perf.wiki.kernel.org/index.php/Tutorial

You will need sudo if you have this error: Error: Access to performance monitoring and observability operations is limited.

g++ -g -fopenmp openmp-nested-for-loop.cpp -o openmp-nested-for-loop

perf stat ./openmp-nested-for-loop

perf record ./openmp-nested-for-loop

perf annotate; perf report

60 of 98

Another openmp handout

What can be parallelled, and whatnot? Know where to fast your program.

Examples on how to use perf.
Nested for loop and Parallel Independent tasks with `#pragma omp parallel sections`
Parallel output?

61 of 98

What cannot be parallelized with OpenMP:

Sequential tasks: Tasks that need to be executed in a specific order cannot be parallelized. If the result of one task depends on the result of a previous task, then these tasks cannot be executed in parallel.
Shared resource access: If multiple tasks need to access or modify a shared resource, this can lead to race conditions. OpenMP provides mechanisms to synchronize access to shared resources, but this can limit the effectiveness of parallelization.
Recursion: Recursive function calls are generally not suitable for parallelization with OpenMP. This is because each recursive call typically depends on the result of the previous call.
Complex dependencies: If tasks have complex dependencies on each other, it can be difficult to parallelize them effectively with OpenMP. This includes tasks that need to communicate with each other or share data in complex ways.

62 of 98

OpenMP exercises

Nested loop # parallel the loop which need longer calculation time
Omp sections
Parallel output

Try to implement the omp parallel <sections> in your Kepler code

63 of 98

Intro SPH

https://computationalastrouam.github.io/aco//static_files/Practices/sph.pdf

64 of 98

Practice 6 – debug with GDB

65 of 98

GDB

Reference material: https://www.sourceware.org/gdb/documentation/

The purpose of a debugger such as gdb is to allow you to see what is going on “inside” another program while it executes—or what another program was doing at the moment it crashed. gdb can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:

Start your program, specifying anything that might affect its behavior.
Make your program stop on specified conditions.
Examine what has happened, when your program has stopped.
Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.

66 of 98

GDB installsion

It should be installed along with gcc. If not, go to download the software and compile, install with

`./configure`

`make`

`make install`

in Linux.

67 of 98

GDB cheat sheet

Help

https://users.ece.utexas.edu/~adnan/gdb-refcard.pdf

https://darkdust.net/files/GDB%20Cheat%20Sheet.pdf

68 of 98

GDB hand-on

See the practices at: https://computationalastrouam.github.io/aco/practice/

69 of 98

Practice 7 – MPI

70 of 98

What is MPI? Message Passing Interface

Why do we need MPI?

One computer is not power enough! + too costly to have many CPUs+memories in one node. Some terminologies:

HPC cluster is the relatively tightly coupled collection of compute nodes
Compute node is an individual computer: CPUs coupled with their local memories¹, One computer node can have several sockets.
One Socket is a physical CPU, one physical CPU can have have multiple cores, and one CPU-core have more than one parallel thread.

HPC is built by many nodes. At which level of the HPC computers, can you use MPI/OpenMP?

Don’t misunderstand this with CPU’s integrated memory – cache

71 of 98

The difference between OPENMP and MPI

Shared Memory Programming VS Distributed Memory Programming

You will need message passing.

72 of 98

Message Passing

A standard API for message passing communication and process information lookup, registration, grouping and creation of new message datatypes.

● Point to point comms: ([non]blocking, [a]synchronous)

● Collective comms: one-many, many-one, many-many

Code parallelization cannot be incremental

73 of 98

MPI Basics

● mpi.h include files for C/C++: error = MPI_Xxxxx(argument, ...);

● Notice that all routines have to return an error value

● MPI_ namespace reserved for MPI: MPI:: namespace for C++

● In C/C++ (in)out arguments passed as pointers

● Always start with MPI_Init() and end with MPI_Finalize() (or MPI_Abort()). Both calls need to be made by all processes.

Before go to the example, install openmpi in your computer:

sudo apt install libopenmpi-dev

Compile your program in this way: mpic++ mpi_example.cpp -o mpi_example, and the run command as: mpirun -np 4 ./mpi_example, where -np 4 specifies to run the program with 4 processes.

74 of 98

The MPI program

The code can also be downloaded from: https://computationalastrouam.github.io/aco/CodingTutorial/test_mpi.c

#include <mpi.h>

#include <stdio.h>

int main(int argc, char** argv) {

MPI_Init(NULL, NULL); //(Initialization)

// initialize MPI environment Called before any other MPI call. Only MPI_Initialized(int *flag) is allowed to call before it.

int world_size; // number of processes

MPI_Comm_size(MPI_COMM_WORLD, &world_size); // (Communicators)

// A communicator is an ordered set of processes, that remains constant from its creation until its destruction

// All MPI processes form the MPI_COMM_WORLD communicator which defined very beginning and is a handle (predefined constants) defined in the include files.

// MPI_Comm_size gives size of the communicator set – how many threads will be used.

int world_rank; // the rank of the process or the IDs of each thread

MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

// Returns the rank of the process in the communicator; Its value is between 0 and (size-1)

char processor_name[MPI_MAX_PROCESSOR_NAME]; // gets the name of the processor/computer name

int name_len;

MPI_Get_processor_name(processor_name, &name_len); // what is value for name_len?

printf("Hello world from processor %s, rank %d out of %d processors\n",

processor_name, world_rank, world_size);

MPI_Finalize(); // finish MPI environment}

//If a process catches an error that cannot be corrected, a user can call MPI_Abort(MPI_Comm comm, int errorcode)

75 of 98

Exercise 1

https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main1-HelloWorld.cpp

Compile the main1-HelloWorld.cpp program and run it with different cpus

Modify the code to only use odd number ID cpus to print out “Hello Upper World”, and even ID number CPUs to print out “Hello Lower World”

76 of 98

MPI (Primitive) Data types

Why the MPI has its own data types?

The purpose of the MPI datatypes is to tell the MPI functions what they're working with. So, for example, if you send an int from a little-endian machine to a big-endian one then MPI can do the byte order conversion for you. Another more common benefit is that MPI knows how big an MPI_DOUBLE is, so you don't have to have a bunch of sizeof statements everywhere.

!! the MPI datatypes are tags, not actual datatypes!!

You can also create your own data structure with:

MPI_Type_create_struct - Creates a structured data type.

** tip man 3 MPI_Type_create_struct to see the details

** you probably need to install openmpi-doc to use man

MPI datatype	C datatype
MPI_CHAR	signed char
MPI_SHORT	signed short int
MPI_INT	signed int
MPI_LONG	signed long int
MPI_UNSIGNED_CHAR	unsigned char
MPI_UNSIGNED_SHORT	unsigned short int
MPI_UNSIGNED	unsigned int
MPI_UNSIGNED_LONG	unsigned long int
MPI_FLOAT	float
MPI_DOUBLE	double
MPI_LONG_DOUBLE	long double
MPI_BYTE	8 bits
MPI_PACKED	packed sequence of bytes
MPI_LONG_LONG_INT	long long int

77 of 98

MPI Primitives: https://www.open-mpi.org/doc/current/

OpenMPI commands (man 1/ man)

17 commands

OpenMPI general information (man 7)

6 frameworks

MPI API (man 3)

About 150 functions

78 of 98

Message passing

Communications can be between two processes (point to point [[non]blocking, [a]synchronous]) or between a group of processes (collective [one-many, many-one, many-many])

The purpose is the "exchange" of information, much like the mail system. Thus every message (document) has:

A sender address (rank, like a business address) - match with - A destination address (rank, like another business address)
A message location (starting address, like the document's location) - match with - A destination location (that cabinet in the office of so-and-so)
A message datatype (what is being sent) and A message size (how big is it in datatype units) – match with – Compatible datatype and size combo in order to fit at the destination.
A message tag and Matching tag at the destination

79 of 98

Point to Point Comms

Blocking comms: Block until completed (send stuff on your own)
Non-blocking comms: Return without waiting for completion (give them to someone else)
Forms of Sends:

Synchronous: message gets sent only when it is known that someone is already waiting at the other end (think fax)
Buffered: message gets sent and if someone is waiting for it so be it; otherwise it gets saved in a temporary buffer until someone retrieves it. (think mail)
Ready: Like synchronous, only there is no ack that there is a matching receive at the other end, just a programmer's assumption! (Use it with extreme care)

MPI_Send/MPI_Recv(void *buf, int cnt, MPI_Datatype type, int dest/src, int tag, MPI_Comm comm)

● buf is the starting address of the array

● cnt is its length

● type is its MPI datatype

● comm is the communicator context

● int dest/source is the rank of the destination process in comm

● tag is an extra distinguishing number, like a note (MPI_ANY_TAG)

MPI_Ssend (Synchronous send) The sender notifies the receiver; after the matching receive is posted the receiver acks back and the sender sends the message.

MPI_Bsend (Buffered, asynchronous, send) The sender notifies the receiver and the message is either buffered on the sender side or the receiver side according to size until a matching receive forces a network transfer or a local copy respectively.

Depending on the MPI implementation, MPI_Send may behave as either MPI_Ssend or MPI_Bsend.

80 of 98

Exercise 1. P-2-P message passing

https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main2-SendReceive.cpp

Try to send different types of variables from world_rank 0 to world_rank 1.

Tip:

if (myrank == 0) {

MPI_Send(..,,,..);

} else {

MPI_Recv(..,,,..);

}

81 of 98

Collective – one-many: Broadcast

MPI_Bcast(void *buf, int cnt, MPI_Datatype type, int root, MPI_Comm comm)

root has to be the same on all procs, can be nonzero

82 of 98

Exercise 2

https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main3-Broadcast.cpp

Try to broadcast different variables

Try to see if MPI_Bcast inside rank 0 works or not

83 of 98

Collective – many-one: scatter

int MPI_Scatter(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)

84 of 98

Collective – many-one: Gather

MPI_Gather(void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm)

Make sure recvbuf is large enough on root where it matters, elsewhere it is ignored

85 of 98

Exercise 3

https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main4-ScatterGather.cpp

Try to play with different number of CPUs to run the program
Try to see if only define data in rank 0

86 of 98

Collective – many-one: Reduce

int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm)

Op: reduce operation (handle): MPI_MAX - Returns the maximum element; MPI_MIN - Returns the minimum element; MPI_SUM - Sums the elements; MPI_PROD - Multiplies all elements; MPI_LAND - Performs a logical and across the elements; MPI_LOR - Performs a logical or across the elements; MPI_BAND - Performs a bitwise and across the bits of the elements; MPI_BOR - Performs a bitwise or across the bits of the elements; MPI_MAXLOC - Returns the maximum value and the rank of the process that owns it; MPI_MINLOC - Returns the minimum value and the rank of the process that owns it.

87 of 98

Exercise 4

https://computationalastrouam.github.io/aco/CodingTutorialC++/MPI/main5-SumReduction.cpp

Try to reduce an array data

88 of 98

Collective – many-many:

MPI_Allgather(void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, MPI_Comm comm)

● Make sure recvbuf is large enough on all procs

89 of 98

Collective – many-many:

MPI_Allreduce(void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, MPI_Op op, MPI_Comm comm)

● Make sure recvbuf is large enough on all procs

90 of 98

Exercise 4

Write your own Allreduce and Allgather functions to play with it.

91 of 98

Practice 1

Description:

Write an MPI program where each process determines whether its rank is even or odd, and the root process (rank 0) collects all results and prints a summary.

Requirements:

Each process determines if its rank is even or odd.
Use MPI_Gather to collect the results at the root.
Root prints how many even and odd processes there are

92 of 98

Practice 2

Each process is given a random integer. Write an MPI program that finds the maximum number among all processes using reduction.

Requirements:

Each process generates a random number.
Use MPI_Reduce with MPI_MAX to find the maximum.
The root process prints the result.

Bonus: Also find the rank of the process that had the maximum number using a custom reduction or MPI_Reduce with MPI_MAXLOC.

93 of 98

Practice 3

Description:

Given a 2D matrix (e.g., 8×8 integers), divide the rows among worker processes. Each worker computes the sum of its assigned rows. The master (rank 0) collects the row sums and computes the grand total.

Requirements:

Matrix is initialized in the root process (e.g., with random numbers or fixed values).
Distribute rows to other processes using MPI_Scatter.
Each process sums its own rows locally.
Use MPI_Gather to send row sums back to the root.
Root process computes the final total sum and prints it.

Bonus: Handle cases where the number of rows isn't evenly divisible by the number of processes.

1 of 98

2 of 98

3 of 98

4 of 98

5 of 98

6 of 98

7 of 98

8 of 98

9 of 98

10 of 98

11 of 98

12 of 98

13 of 98

14 of 98

15 of 98

16 of 98

17 of 98

18 of 98

19 of 98

20 of 98

21 of 98

22 of 98

23 of 98

24 of 98

25 of 98

26 of 98

27 of 98

28 of 98

29 of 98

30 of 98

31 of 98

32 of 98

33 of 98

34 of 98

35 of 98

36 of 98

37 of 98

38 of 98

39 of 98

40 of 98

41 of 98

42 of 98

43 of 98

44 of 98

45 of 98

46 of 98

47 of 98

48 of 98

49 of 98

50 of 98

51 of 98

52 of 98

53 of 98

54 of 98

55 of 98

56 of 98

57 of 98

58 of 98

59 of 98

60 of 98

61 of 98

62 of 98

63 of 98

64 of 98

65 of 98

66 of 98

67 of 98

68 of 98

69 of 98

70 of 98

71 of 98

72 of 98

73 of 98

74 of 98

75 of 98

76 of 98

77 of 98

78 of 98

79 of 98

80 of 98