1 of 74

Partners and sponsors

2 of 74

Calcul Québec

Your Digital Lab

3 of 74

Typical User Experience

Becoming aware

Creating an account

Accessing the account

Executing a task

Retrieving results

Facing issues

Configuring, moving files

4 of 74

Typical user experience

Becoming aware

Creating an account

Accessing a system

Executing a task

Retrieving results

Facing issues

Configuring, moving files

5 of 74

What is it for?

6 of 74

What is ARC?

Advanced Research Computing (ARC)

Definition:

Any computation that makes an intensive usage of computing resources or that is limited by the amount of resources available.

7 of 74

Why ARC?

No experiment pos. : cost, complexity, time

Digital simulation is sometime the only option

Interstellar © Paramount 2014

8 of 74

Examples

  1. Complex problems
    1. Genome Assembly
    2. Molecular Dynamics
    3. Genomic Pipeline

  • A lot of data
    • Machine learning
    • Big Data analytics
    • Images processing

9 of 74

Typical user experience

Becoming aware

Creating an account

Accessing a system

Executing a task

Retrieving results

Facing issues

Configuring, moving files

10 of 74

11 of 74

WestGrid

Compute Ontario

Calcul Québec

Acenet

Compute Canada leads the acceleration of research and innovation by deploying state-of-the-art advanced research computing (ARC) systems, storage and software solutions.

12 of 74

What do we offer

Infrastructures

Expertise

(40+ employees)

Services

    • supercomputer
    • gpu
    • cloud
    • storage
    • 4 universities
    • Analysts
    • System administrators
    • Managers
    • Software consulting
    • Infrastructure consulting
    • Training

13 of 74

Our Data Centres

14 of 74

Comparaison

your

computer

compute

node

arc

cluster

cores

2 - 12

24 - 48

35,000

memory

4 - 32 GB

128 GB - 3 TB

142 TB

network

1 Gb/s

56 - 100 Gb/s

-

storage

1 TB HDD

960GB SSD

10 PB Lustre

gpu

2560 cores�8GB

3584 cores�16 GB

584 GPU

accessibility

direct access

scheduler

-

15 of 74

A Supercomputer

16 of 74

ARC Cluster

Interactive node

(login)

Compute nodes

17 of 74

Creating an account - PI

  1. Create an account at�https://ccdb.computecanada.ca/
  2. Check principal investigator
  3. Wait for confirmation email
  4. Sponsor collaborators and students

17

18 of 74

Creating an account - student / others

18

19 of 74

Be zen, keep your eye open

Human validation takes time

Look at your spam folder in the following 48h!

PI account

Student account

PI confirmation of student

3 emails

20 of 74

Typical user experience

Becoming aware

Creating an account

Accessing a system

Executing a task

Retrieving results

Facing issues

Configuring, moving files

21 of 74

Choosing a server

Which server to choose?

  1. Choose Cedar or Graham or Beluga
  2. Read the wiki to help your decision
  3. Choose the server your lab is using
  4. Still unsure? Contact us :

support@computecanada.ca

22 of 74

Connecting to a server

server.computecanada.ca

Interactive node

(login)

compute nodes

23 of 74

Connecting to a server

ssh username@server.computecanada.ca

Terminal

24 of 74

Hands-on part 1

Connect to the following virtual cluster

For MobaXTerm :

https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm/

brain.calculquebec.cloud

25 of 74

Typical user experience

Becoming aware

Creating an account

Accessing a system

Executing a task

Retrieving results

Facing issues

Configuring, moving files

26 of 74

Configuration

Once connected, you have access to the following commands

  • pwd (print working directory)
  • ls (list files)
  • cd <dir>(change directory)
  • mkdir <dir> (make directory)
  • cp <file1> <file2> (copy file)
  • rm <file> (remove file)

27 of 74

Transfer files

  • scp (secure copy)

  • graphical tools (WinSCP, CyberDuck, ….)
  • Globus

27

scp fichier.txt user@server.computecanada.ca:

28 of 74

Hands-on part 2

1. Download this file

2.1 Copy the file on the virtual cluster with scp

2.2 or download the file directly on the cluster

https://goo.gl/rSfkWw

scp cq-*.zip userXX@brain.calculquebec.cloud:

curl -L -o formation.zip https://goo.gl/rSfkWw

29 of 74

Loading softwares

Most softwares you will need are already installed on the cluster. These softwares are available as modules.

Modules allow the installation of multiple versions of the same software on the same system while handling conflicts.

30 of 74

Modules

The module system is like a switchboard.

31 of 74

Module command

module is the command to use to interact with software modules on the clusters.

  • module spider <module>
  • module avail
  • module list
  • module load <module>
  • module unload <module>

32 of 74

Typical user experience

Becoming aware

Creating an account

Accessing a system

Executing a task

Retrieving results

Facing issues

Configuring, moving files

33 of 74

Executing a job

Login node

Scheduler

Compute node

34 of 74

Job schedule

Jobs execution is deferred to the job scheduler based on the availability of resources

=

They have to be autonomous (mode “batch”)

35 of 74

Scheduling policy

The resource allocation committee

defines a policy to prioritize

user access to resources

As you compute, your priority will decrease, once you stop computing, it will increase.

36 of 74

The resources

  • Cores
  • Nodes
  • GPUs
  • Time
  • Memory
  • Licenses

37 of 74

Resource limits

Every cluster has resource limits.

Core per node

32 - 48

Memory per node

128 GB - 3 TB

Maximum walltime

12 hours - 30 days

GPU per node

2 - 16

Network speed

1 Gbps - 100 Gbps

Refer to the wiki to know the limits for each cluster.

38 of 74

Job types

39 of 74

Sequential jobs

Use a single core coeur, a single node

Does NOT benefit from asking more resources

40 of 74

Parallel jobs

Use multiple cores, multiple nodes at once

41 of 74

Data parallelism

Same task on multiple datasets

  • Filtering each pixel of an image
  • Processing 100 samples of different patients
  • Counting the frequency of words in thousands of documents
  • Atom movements without interaction in a magnetic fields

42 of 74

Task parallelism

  • Single task executed by multiple cores/nodes
  • Algorithm divided in multiple task
    • Subtask can work on the same data (or not) and communicate to synchronize and exchange data
  • Implies a significant amount of communication between the processes to synchronize the work execution.

43 of 74

Submit file

44 of 74

Executing a job

You can only execute simple task and program on the interactive part of the system. For everything else you need to write a script commonly known as a submit file.

45 of 74

Submit file

A submit file contains

  1. An header for the scheduler to
  2. The code to execute (bash script)

46 of 74

Submit file header

Interpreter

#!/bin/bash

Resources

needed

#SBATCH --time=3:00:00

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=1

#SBATCH --cpus-per-task=1

#SBATCH --mem-per-cpu=1G

Project

#SBATCH --account=def-fafor10

Other options

#SBATCH --gres=gpus:1

47 of 74

Submit file body

Generally begins with module loading. For the workshop :

    • module load gcc
    • module load boost

48 of 74

Essential commands

action

command

returns

submit

sbatch <script.sh>

jobid

display job

queue

squeue [-u $USER]

job queue

cancel

scancel <jobid>

-

show job

usage

sacct [-j <jobid>]

resources used by self

Reference : https://docs.computecanada.ca/wiki/Running_jobs

49 of 74

Hands-on

50 of 74

Typical user experience

Becoming aware

Creating an account

Accessing a system

Executing a task

Retrieving results

Facing issues

Configuring, moving files

51 of 74

Tips and typical mistakes

52 of 74

Tips and mistakes

  • Ask only for the resources you need, not more
    • Walltime
    • Quantity of nodes/cores/memory
  • Your code will not run significantly faster on a supercomputer
    • Unless it has been design to take advantage of multiple cores and nodes.

53 of 74

Tips and mistakes

  • Look out for file formats

  • .txt Windows vs Mac/Linux: not always compatibles (verify with dos2unix)
  • Never launch resource hungry processes on the login nodes

54 of 74

Typical user experience

Becoming aware

Creating an account

Accessing a system

Executing a task

Retrieving results

Facing issues

Configuring, moving files

55 of 74

Retrieving results

To copy small results back to your machine:

  • scp <username>@<server>:/file/path .

For larger results, use

https://globus.computecanada.ca/

56 of 74

Training by Calcul Québec

57 of 74

Additional reading

Websites

Contact us : support@calculquebec.ca

Twitter: @CalculQ

58 of 74

Outils de transfert - suite

  • Superior performance to scp
  • Does not require a calibration
  • No command line required
  • Easy to use web interface
  • Email notification once transfer is done
  • Possibility to share data with other users

58

59 of 74

Globus at Compue Canada

  1. Go to : https://globus.computecanada.ca/
  2. Select “Sign Up with Globus” or connect with your account if you already have one.

59

60 of 74

File transfer

60

61 of 74

Choosing an endpoint

61

62 of 74

Connecting to an endpoint

The authentication is done on the endpoint

  • Globus does not know and won't know your password.
  • The authentication server only sends an acknowledgment to Globus that you are allowed to connect to the endpoint.

62

63 of 74

Creating a local endpoint

63

64 of 74

  1. Enter the name of your local endpoint : laptop, desktop, etc.

65 of 74

  • Click on Generate Setup Key

66 of 74

  • Copy the key
  • Click on the button representing your operating system to download Globus Personal Connect

  • Install the software and launch it.

67 of 74

  • Paste the key that you copied earlier �in the setup dialog box.

68 of 74

  • Globus should now be in your task bar. Click on its and icon, then on "Preferences"

  • Click on "Access"

69 of 74

  • Add the folder that will be accessible�through your endpoint for transfers.

70 of 74

File transer

  1. Go on Globus and webpage and click on "start transfer"
  2. On the left, select your endpoint
  3. On the right, select computecanada[...]

70

71 of 74

File transfer

  1. Authenticate with your Compute Canada username/password.

72 of 74

File transfer

73 of 74

File transfer confirmation email

73

74 of 74

Sharing option

74