1 of 30

Welcome to Cluster 6: Introduction to Jupyter & HPC

Jaz Sakr & Elnaz A.

July 8, 2024

2 of 30

3 of 30

The HPC

The goal for today is to

  • Join Slack
  • Take questionnaire
  • Connect to the HPC two ways
    • Terminal
    • Jupyter GUI

Make sure you are NOT connected Wifi as guest!

4 of 30

Please join Slack channel: UCI COSMOS Cluster 6 2023

  • Click the invite link via UCI email
    • ucinetID@uci.edu

  • Links to TA slides, protocols, professor lectures (coming soon), bookmarked on general and help channels

5 of 30

UCINetIDs

  • Login with Duo two-factor authentication on phone �
  • Access UCI email through Gmail

6 of 30

Help us get to know you a little better

Please fill out this questionnaire!

https://docs.google.com/forms/d/e/1FAIpQLSfdbfaegFY4h9dUlhkZjC6PaRP5EW13uQcY7zy3uOWzIUzdNQ/viewform?usp=sf_link

7 of 30

Connect to the HPC by GUI or CUI

GUI (Graphical User Interface): is a digital interface in which a user interacts with graphical components such as icons, buttons, and menus.

CUI (Character User Interface): use of text commands, managed by a command-line interpreter, in order to communicate with a computer program.

VS

8 of 30

The HPC (High-performance computing)

HPC systems use supercomputers and computer clusters to solve advanced computation problems.

HPC resources:

Basic laptop resources:

  • 10 cores
  • 16 GB memory
  • 250 GB storage

Where the “data crunching” happens

YOU ARE HERE

9 of 30

Both interfaces can do the same thing on a computer

Same folder path

Same folder contents

Same folder

10 of 30

What is Jupyter?

  • Open-source web app�
  • Create and share Jupyter notebook documents that contain live code, plots, equations, and text

  • Interactively develop and present data science projects�

11 of 30

  • Login using UCINetID and password�
  • Partition/Reservation: Free
  • Account to Charge: your UCINetID
  • CPU cores: 2
  • Memory per CPU core: 4GB
  • Containerized Notebook Image: [Centos7] COSMOS Cluster 6
  • Resume last session if available: Checked by default

12 of 30

  • We use UCI’s JupyterHub with a pre-configured environment
    • List of installed packages
  • Uses HPC to run code and store data, not local computer�
  • Need internet access (a UCI connection or VPN)

13 of 30

Select your UCINetID to access your home directory

Interface opens on a web browser

Navigating JupyterHub

14 of 30

Navigating JupyterHub

On the Launcher tab, create a Python notebook

Contents of my home directory (yours may be different)

15 of 30

Navigating Jupyter notebooks

New Untitled.ipynb

Empty circle & “Idle” = no code is running

Filled circle & “Busy” = code is running

Code cell

16 of 30

Navigating Jupyter notebooks

Right click to rename, download, delete, etc.

Add new cell with +

Click inside cell to edit

17 of 30

Cells are numbered by the order in which they are run

In-progress cells have a star: [*]

Navigating Jupyter notebooks

18 of 30

Log out at the end of your session to not waste compute time ($!)

Navigating Jupyter notebooks

19 of 30

Logging on to HPC using a terminal

Mac:

  1. Type “terminal” in search bar
  2. Select Terminal app

Windows 10:

  1. Download free MobaXterm (Home, installer edition): https://download.mobatek.net/2212022060563542/MobaXterm_Installer_v22.1.zip
  2. Follow installation instructions
  3. Open application

20 of 30

Secure shell (SSH)

Mac:

  1. In Terminal, type ssh UCINetID@hpc3.rcic.uci.edu
  2. Enter UCINetID password when prompted*.
  3. Select 1 to get prompted by Duo Push for dual factor authentication

Windows 10:

  1. Click “Session” on top left in MobaXTerm window
  2. Click “SSH”
  3. Enter hpc3.rcic.uci.edu in “Remote host” box, check “Specify username”, and enter your UCINetID in the box. Keep 22 as the Port. Hit OK.
  4. Enter UCINetID password when prompted*.

*You won’t be able to see your password when you type it! Ctrl+C to cancel and try again, or backspace and try again

21 of 30

General bioinformatics stuff

  • CPU vs. RAM vs. storage
    • Central processing unit
      • Carries out programs
      • Sets launching speed of programs
      • Measured in clock speed (GHz)
    • Random access memory
      • Short-term memory space that stores the data as it is used
      • Sets # of programs computer can handle at once
      • Measured in bytes (KB, MB, GB, TB)
    • Storage
      • Long-term storage for inactive data not currently being used
      • Measured in bytes (KB, MB, GB, TB)

  • File sizes: KB, MB, GB, TB
    • Bioinformatics data take up a LOT of storage space
    • Scripts and processed data usually < 1 MB
    • < 1 GB is pretty small for 1 file
    • 1-10 GB is pretty big
    • > 100 GB is very big, likely multiple files or a whole sequencing run

22 of 30

General bioinformatics stuff

  • File extensions
    • Scripts
      • Bash:.sh
      • Python: .py
      • Jupyter notebook: .ipynb
      • Snakemake: .smk
    • Data: .tsv, .csv, .txt
      • Sequencing data: .fastq, .bam, .sam, .bed
      • Zipped: .gz, .tgz
    • Markdown: .md (text file with formatting)
    • Markup: .yaml
  • How to be a considerate HPC user
    • Zip big (>10 GB) data files when possible
    • Only request the memory you need
    • End Jupyter session when not actively using it

Fun Fact!YAML stands for yet another markup language or YAML ain’t markup language (a recursive acronym), which emphasizes that YAML is for data, not documents.

23 of 30

Basic Command Syntax

  • Commands follow a basic syntax:

command_name [options] [required input]�

  • Lots of commands have “manuals” to tell you all these options
    • All available online! Just google “command_name manual”�
  • Example: The ls command (see screenshot) has over 50 options!

24 of 30

Basic Linux abbreviations

cosmos-2022

studentA

studentB

PUBLIC

README.md

ta-github

25 of 30

Basic Linux commands

cosmos2023

26 of 30

Basic Linux commands

27 of 30

Basic Linux commands

28 of 30

Clone Github repository

  • Log on to HPC�
  • Run the following line:

git clone https://github.com/mortazavilab/cosmos-cluster6.git

29 of 30

Useful links

30 of 30

Installing requirements for long-read processing

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

bash Miniconda3-latest-Linux-x86_64.sh

# when it says

Do you wish the installer to initialize Miniconda3

by running conda init? [yes|no]

source ~/.bashrc

conda install mamba -n base -c conda-forge

mamba init

source ~/.bashrc

1

2