Partners and sponsors
Calcul Québec
Your Digital Lab
Typical User Experience
Becoming aware
Creating an account
Accessing the account
Executing a task
Retrieving results
Facing issues
Configuring, moving files
Typical user experience
Becoming aware
Creating an account
Accessing a system
Executing a task
Retrieving results
Facing issues
Configuring, moving files
What is it for?
What is ARC?
Advanced Research Computing (ARC)
Definition:
Any computation that makes an intensive usage of computing resources or that is limited by the amount of resources available.
Why ARC?
No experiment pos. : cost, complexity, time
Digital simulation is sometime the only option
Interstellar © Paramount 2014
Examples
Typical user experience
Becoming aware
Creating an account
Accessing a system
Executing a task
Retrieving results
Facing issues
Configuring, moving files
WestGrid
Compute Ontario
Calcul Québec
Acenet
Compute Canada leads the acceleration of research and innovation by deploying state-of-the-art advanced research computing (ARC) systems, storage and software solutions.
What do we offer
Infrastructures
Expertise
(40+ employees)
Services
Our Data Centres
Comparaison
| your computer | compute node | arc cluster |
cores | 2 - 12 | 24 - 48 | 35,000 |
memory | 4 - 32 GB | 128 GB - 3 TB | 142 TB |
network | 1 Gb/s | 56 - 100 Gb/s | - |
storage | 1 TB HDD | 960GB SSD | 10 PB Lustre |
gpu | 2560 cores�8GB | 3584 cores�16 GB | 584 GPU |
accessibility | direct access | scheduler | - |
A Supercomputer
ARC Cluster
Interactive node
(login)
Compute nodes
Creating an account - PI
17
Creating an account - student / others
18
Be zen, keep your eye open
Human validation takes time
Look at your spam folder in the following 48h!
PI account
Student account
PI confirmation of student
3 emails
Typical user experience
Becoming aware
Creating an account
Accessing a system
Executing a task
Retrieving results
Facing issues
Configuring, moving files
Choosing a server
Which server to choose?
support@computecanada.ca
Connecting to a server
server.computecanada.ca
Interactive node
(login)
compute nodes
Hands-on part 1
Connect to the following virtual cluster
For MobaXTerm :
https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm/
brain.calculquebec.cloud
Typical user experience
Becoming aware
Creating an account
Accessing a system
Executing a task
Retrieving results
Facing issues
Configuring, moving files
Configuration
Once connected, you have access to the following commands
Transfer files
27
scp fichier.txt user@server.computecanada.ca:
Hands-on part 2
1. Download this file
2.1 Copy the file on the virtual cluster with scp
2.2 or download the file directly on the cluster
https://goo.gl/rSfkWw
scp cq-*.zip userXX@brain.calculquebec.cloud:
curl -L -o formation.zip https://goo.gl/rSfkWw
Loading softwares
Most softwares you will need are already installed on the cluster. These softwares are available as modules.
Modules allow the installation of multiple versions of the same software on the same system while handling conflicts.
Modules
The module system is like a switchboard.
Module command
module is the command to use to interact with software modules on the clusters.
Typical user experience
Becoming aware
Creating an account
Accessing a system
Executing a task
Retrieving results
Facing issues
Configuring, moving files
Executing a job
Login node
Scheduler
Compute node
Job schedule
Jobs execution is deferred to the job scheduler based on the availability of resources
=
They have to be autonomous (mode “batch”)
Scheduling policy
The resource allocation committee
defines a policy to prioritize
user access to resources
As you compute, your priority will decrease, once you stop computing, it will increase.
The resources
Resource limits
Every cluster has resource limits.
Core per node | 32 - 48 |
Memory per node | 128 GB - 3 TB |
Maximum walltime | 12 hours - 30 days |
GPU per node | 2 - 16 |
Network speed | 1 Gbps - 100 Gbps |
Refer to the wiki to know the limits for each cluster.
Job types
Sequential jobs
Use a single core coeur, a single node
Does NOT benefit from asking more resources
Parallel jobs
Use multiple cores, multiple nodes at once
Data parallelism
Same task on multiple datasets
Task parallelism
Submit file
Executing a job
You can only execute simple task and program on the interactive part of the system. For everything else you need to write a script commonly known as a submit file.
Submit file
A submit file contains
Submit file header
Interpreter | #!/bin/bash |
Resources needed | #SBATCH --time=3:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=1G |
Project | #SBATCH --account=def-fafor10 |
Other options | #SBATCH --gres=gpus:1 |
Submit file body
Generally begins with module loading. For the workshop :
Essential commands
action | command | returns |
submit | sbatch <script.sh> | jobid |
display job queue | squeue [-u $USER] | job queue |
cancel | scancel <jobid> | - |
show job usage | sacct [-j <jobid>] | resources used by self |
Reference : https://docs.computecanada.ca/wiki/Running_jobs
Hands-on
Typical user experience
Becoming aware
Creating an account
Accessing a system
Executing a task
Retrieving results
Facing issues
Configuring, moving files
Tips and typical mistakes
Tips and mistakes
Tips and mistakes
Typical user experience
Becoming aware
Creating an account
Accessing a system
Executing a task
Retrieving results
Facing issues
Configuring, moving files
Retrieving results
To copy small results back to your machine:
For larger results, use
https://globus.computecanada.ca/
Training by Calcul Québec
Additional reading
Websites
Contact us : support@calculquebec.ca
Twitter: @CalculQ
Outils de transfert - suite
58
Globus at Compue Canada
59
File transfer
60
Choosing an endpoint
61
Connecting to an endpoint
The authentication is done on the endpoint
62
Creating a local endpoint
63
File transer
70
File transfer
File transfer
File transfer confirmation email
73
Sharing option
74