1 of 39

INTRODUCTION�TO THE �UNIX COMMAND LINE

RP – 22.08.15

2 of 39

SET-UP

3 of 39

Download the Sample Material

* Download the shell_data.zip file from the Bootcamp Homepage

* Extract the contents – a shell_data folder

* Move the folder to your home directory (e.g., /Users/dan or C:\Users\Dan)

4 of 39

1. INTRODUCTION TO SHELL

5 of 39

What Is a Shell?

A shell is a computer program that presents a command line interface which allows you to control your computer using commands entered with a keyboard instead of controlling graphical user interfaces (GUIs) with a mouse/keyboard combination.

6 of 39

Why Should You Care?

  • Many bioinformatics tools can only be used through a command line interface
  • Automation makes your life easier
  • Your work becomes less error-prone
  • Your work becomes more reproducible
  • Access to powerful cloud computing clusters e.g. DCC (Duke Compute Clusters)

7 of 39

Recap: Intro to Shell

  • PS1=‘$ ‘
    • Sets the terminal prompt to $
  • pwd
    • Stands for ‘print working directory’
  • ls
    • Stands for ‘listing’
  • cd
    • Stands for ‘change directory’
  • man <command>
    • Stands for ‘manual’ – to access a manual for a command

Tab Completion:

Press <Tab> to fill the rest of the file/directory name

8 of 39

2. NAVIGATING FILES AND DIRECTORIES

9 of 39

File System

Parent Directory

Subdirectory

Subdirectory

Subdirectory

Subdirectory

Subdirectory

File

File

File

File

File

10 of 39

Home vs Root

Home

Application

Downloads

Desktop

Subdirectory

Subdirectory

File

File

File

File

File

Root

System

bin

11 of 39

Home vs Root

Home

Application

Downloads

Desktop

Subdirectory

Subdirectory

File

File

File

File

File

Root

System

bin

12 of 39

Recap: Navigating Files and Directories

  • / - Root directory
  • ~ - Home directory
  • .. - Parent directory
  • . - Current directory
  • .hidden_file_or_folder
    • ls –a to reveal hidden files/folders

13 of 39

14 of 39

3. WORKING WITH FILES AND DIRECTORIES�

15 of 39

Recap: Wildcards and Command History

WILDCARDS

  • * - Match any zero or more character
  • ? - Match any one character
  • [] - Match one character given in the bracket

COMMAND HISTORY

  • Ctrl + C – cancel the command
  • Ctrl + R – reverse-search command history
    • !<number> to copy and paste that command into your current prompt
  • Ctrl + L – clear your screen
    • Same as clear

16 of 39

Recap: Examining Files and FASTQ Format

  • cat - print the content of the file to screen (stdout)
  • less – opens the file in a read-only view without printing it out
    • Space – go forward
    • b – go backward
    • g – go to beginning
    • G – go to end
    • q – quit
    • / – to search for specific string

  • head - print the first n lines of the file to screen (stdout)
    • -n to specify number of lines
    • Default = 10 lines
  • tail – print the last n lines of the file
    • Analogous to head

17 of 39

Recap: FASTQ Format

FORMAT

    • Begins with `@` and information about the read
    • The actual DNA sequence
    • Begins with `+` and info (sometimes same as line 1)
    • String of characters representing quality scores

QUALITY SCORE

 

18 of 39

Recap: Creating, Moving, Copying, and Removing

  • touch – generate an empty file
  • mkdir – make a directory
  • cp – copy a file or directory
  • mv – move a file or directory
    • Can also be used to rename file or directory
  • rm – remove a file
    • rm -r – remove a directory (recursive)

  • ls -l - print the list of files and directories in long format
  • chmod [+-] rwx – add or remove reading/writing/executing permission
    • Stands for ‘change mode’

19 of 39

4.REDIRECTION

20 of 39

Recap: GREP, Redirection, and Pipe

  • grep string file – search for exact match of string within the file
    • -B – argument for printing a number of lines before each matching line
    • -A – argument for printing a number of lines after each matching line
  • > – redirect the output that would normally be printed to stdout to a file.
  • wc -l – word count (number of lines)
  • |– the pipe command treats the output of the command before as the input of the command that comes after

21 of 39

Recap: For Loops and Basename

  • $ – is used to call a variable’s value but its variable name
    • ${VAR_NAME} – to avoid ambiguity when call a variable name followed immediately by characters

for <variable> in <group to iterate over>

do

<some command> $<variable>

done

  • basename <string> <suffix> – removes the suffix from the string

22 of 39

23 of 39

24 of 39

5. WRITING SCRIPTS AND WORKING WITH DATA

25 of 39

Recap: Writing Scripts and Downloading Data

  • nano – a command line text editor
    • vim can be an alternative
  • bash <filename>.sh – run the shell script
  • chmod +x <filename>.sh – makes the file executable
    • You can run it by running ./<filename>.sh
    • If the file is not in your working directory, you must add the path to that file
  • wget – ‘world wide web get’; download web pages or data at web address
  • Curl -O – ‘see URL’; display webpages or data at the web address

26 of 39

27 of 39

6. PROJECT ORGANIZATION

28 of 39

Recap: Project Organization

  • Save a copy of your raw data
    • Remove write permission of the raw data
  • Utilize compartments
    • Doc, data, code, and result directories etc.
  • Keep records of your codes
    • history | tail -n 7 > history.txt
    • Or better yet, run your commands in shell script files
  • Comment your script

29 of 39

CONCLUSION: WHAT DID WE LEARN?

30 of 39

Recap: Intro to Shell

  • PS1=‘$ ‘
    • Sets the terminal prompt to $
  • pwd
    • Stands for ‘print working directory’
  • ls
    • Stands for ‘listing’
  • cd
    • Stands for ‘change directory’
  • man <command>
    • Stands for ‘manual’ – to access a manual for a command

Tab Completion:

Press <Tab> to fill the rest of the file/directory name

31 of 39

Recap: Navigating Files and Directories

  • / - Root directory
  • ~ - Home directory
  • .. - Parent directory
  • . - Current directory
  • .hidden_file_or_folder
    • ls –a to reveal hidden files/folders

32 of 39

Recap: Wildcards and Command History

WILDCARDS

  • * - Match any zero or more character
  • ? - Match any one character
  • [] - Match one character given in the bracket

COMMAND HISTORY

  • Ctrl + C – cancel the command
  • Ctrl + R – reverse-search command history
    • !<number> to copy and paste that command into your current prompt
  • Ctrl + L – clear your screen
    • Same as clear

33 of 39

Recap: Examining Files and FASTQ Format

  • cat - print the content of the file to screen (stdout)
  • less – opens the file in a read-only view without printing it out
    • Space – go forward
    • b – go backward
    • g – go to beginning
    • G – go to end
    • q – quit
    • / – to search for specific string

  • head - print the first n lines of the file to screen (stdout)
    • -n to specify number of lines
    • Default = 10 lines
  • tail – print the last n lines of the file
    • Analogous to head

34 of 39

Recap: FASTQ Format

FORMAT

    • Begins with `@` and information about the read
    • The actual DNA sequence
    • Begins with `+` and info (sometimes same as line 1)
    • String of characters representing quality scores

QUALITY SCORE

 

35 of 39

Recap: Creating, Moving, Copying, and Removing

  • touch – generate an empty file
  • mkdir – make a directory
  • cp – copy a file or directory
  • mv – move a file or directory
    • Can also be used to rename file or directory
  • rm – remove a file
    • rm -r – remove a directory (recursive)

  • ls -l - print the list of files and directories in long format
  • chmod [+-] rwx – add or remove reading/writing/executing permission
    • Stands for ‘change mode’

36 of 39

Recap: GREP, Redirection, and Pipe

  • grep string file – search for exact match of string within the file
    • -B – argument for printing a number of lines before each matching line
    • -A – argument for printing a number of lines after each matching line
  • > – redirect the output that would normally be printed to stdout to a file.
  • wc -l – word count (number of lines)
  • |– the pipe command treats the output of the command before as the input of the command that comes after

37 of 39

Recap: For Loops and Basename

  • $ – is used to call a variable’s value but its variable name
    • ${VAR_NAME} – to avoid ambiguity when call a variable name followed immediately by characters

for <variable> in <group to iterate over>

do

<some command> $<variable>

done

  • basename <string> <suffix> – removes the suffix from the string

38 of 39

Recap: Writing Scripts and Downloading Data

  • nano – a command line text editor
    • vim can be an alternative
  • bash <filename>.sh – run the shell script
  • chmod +x <filename>.sh – makes the file executable
    • You can run it by running ./<filename>.sh
    • If the file is not in your working directory, you must add the path to that file
  • wget – ‘world wide web get’; download web pages or data at web address
  • Curl -O – ‘see URL’; display webpages or data at the web address

39 of 39

Contact

  • Ratchanon “RP” Pornmongkolsuk
  • 2nd Year UPGG
  • Alex Ochoa Lab
  • rp280@duke.edu