1 of 51

First lecture

2 of 51

What we'll cover

  • General course structure
  • What is programming?
  • Why use programming?
  • The Unix environment.

3 of 51

General course

  • First six weeks are the fundamentals of programming
  • Second six are special topics
  • Feel free to bring in your programming challenges

4 of 51

Literature

  • "Practical Computing for Biologists" (Haddock and Dunn)
    • Great for beginners
    • Mostly geared towards text editing

  • O'Reilly Books
    • "Bioinformatics Programming in Python"
      • In Python 3, but good methodology

  • http://greenteapress.com/thinkpython/thinkpython.pdf
    • Free!

5 of 51

Literature - The internet

  • Stack Overflow
    • The answer is there, but it might be snarky

  • Software Carpentry
    • Lot's of great free lessons
    • Host lectures - keep an eye out

6 of 51

What is a program?

How to get inside your computer

7 of 51

What is a program?

  • A series of commands for your computer

8 of 51

What is a program?

  • A series of commands for your computer
    • Computers are dumb

9 of 51

What is a program?

  • A series of commands for your computer
    • Computers are dumb
    • Computers read binary (0s and 1s)

10 of 51

What is a program?

  • A series of commands for your computer
    • Computers are dumb
    • Computers read binary (0s and 1s)

  • So we write programs in a language that is more readable to humans
    • These are translated to binary by an assembler language that is in-between your script and the computer itself.

11 of 51

What is programming?

  • One or more scripts saved in text files
    • Must be accessible to the operating system

  • Creating software and scripts is the goal.
    • Your operating system itself is just a collection of scripts that interoperate

  • Why programming?

12 of 51

Repeatability

  • A script can be a record of what happened
    • Especially important when things go wrong
    • Publishing scripts is cool - you want to be cool

  • Software builds on itself
    • Take advantage and be part of evolution

  • E.g.:
    • Make a software pipeline to collect, catalogue, and align sequences in a repeatable and well-documented fashion. Now give it to somebody else so they can do it too.

13 of 51

Faster

  • The first and most central goal of computer science
    • People have been working on this for over 50yrs. Take advantage of them.

  • E.g.:
    • "I wrote down all this data, and now I need to divide every number by 4.28!"
    • "My NGS text file is too big to be opened by any text editor known to man!"

14 of 51

Automation

  • Do the same thing lots of times
    • Let's face it, some tasks are simply below you.
    • Nothing is below a computer, and it's way better at this than you are anyway.

  • E.g.:
    • "I collected two months of data on color, sex, body size, and gut content of five different species at 7 different field sites, but my advisor says only take sex and color from 2 species at 5 field sites. How do I put all this in one text file in under 2 seconds?"

15 of 51

Elements of Style

  • Which language to use?
  • Is your code readable by others?
  • Is your code readable by you?
  • How can you appropriately break up tasks?

16 of 51

Languages!

  • There are many, many computer programming languages.

17 of 51

Languages!

  • There are many, many computer programming languages.
  • Things to consider:

18 of 51

Languages!

  • There are many, many computer programming languages.
  • Things to consider:
    • Speed versus readability

19 of 51

Languages!

  • There are many, many computer programming languages.
  • Things to consider:
    • Speed versus readability
    • Documentation

20 of 51

Languages!

  • There are many, many computer programming languages.
  • Things to consider:
    • Speed versus readability
    • Documentation
  • What are people in your field are using?
    • Stats - R
    • Dense computation - C & C++
    • Next-Gen - Perl & Python & Unix
    • Unix is often used as "glue" in workflows

21 of 51

Why Python?

  • General concepts almost universal

22 of 51

Why Python?

  • General concepts almost universal
  • Readable

23 of 51

Why Python?

  • General concepts almost universal
  • Readable
  • Popular

24 of 51

Why Python?

  • General concepts almost universal
  • Readable
  • Popular
  • Well-documented

25 of 51

Why Unix?

  • General concepts almost universal

26 of 51

Why Unix?

  • General concepts almost universal
  • Operating system written in C

27 of 51

Why Unix?

  • General concepts almost universal
  • Operating system written in C
  • Very fast

28 of 51

Why Unix?

  • General concepts almost universal
  • Operating system written in C
  • Very fast
  • Almost universally used in computers, supercomputers and file systems
    • This is how most programmers manage and organize files

29 of 51

A taste of Unix

  • Commands are small programs
    • Type name of command and hit "enter"
    • Unix searches for the program's text file, and executes it.
  • Programs have preset arguments which change their behavior
    • Find these in the manual pages
  • They interact with files that are in the folder (directory) that you're in

30 of 51

A taste of Unix

  • Interact with Unix via a "shell"
    • The shell channels information between the user and the Unix programs through "standard streams"

  • Information on screen is called standard output or "stdout"

  • Input to programs is "stdin"

  • Also, "sterr" - will be useful later

31 of 51

File systems

  • Your computer contains a nested hierarchy of directories.

32 of 51

File systems

  • Your computer contains a nested hierarchy of directories.
    • Keeping track of where you are in the file structure of your computer is an important component of programming.

33 of 51

File systems

  • Your computer contains a nested hierarchy of directories.
    • Keeping track of where you are in the file structure of your computer is an important component of programming.
    • The highest level is the root (denoted: /)

34 of 51

File systems

  • Your computer contains a nested hierarchy of directories.
    • Keeping track of where you are in the file structure of your computer is an important component of programming.
    • The highest level is the root
  • There are several high-level directories that users don't usually go into where programs files are stored
    • /usr/bin
    • /usr/lib

35 of 51

A note on backups

  • Everyone should back up their computer regularly
  • We will discuss some commands today that can remove files
    • They can be strung together to remove your whole file system

36 of 51

File path

  • Every file has an address on your computer
    • This is the filepath

37 of 51

File path

  • Every file has an address on your computer
    • This is the filepath
  • If you are going to do an operation on a file, you'll need it's address

38 of 51

File path

  • Every file has an address on your computer
    • This is the filepath
  • If you are going to do an operation on a file, you'll need it's address
  • Bash has a few filepaths where it automatically looks for program files
    • This is useful for calling programs
    • You can check which filepaths these are by typing "echo $PATH"

39 of 51

A few important paths

Here

.

One level up

..

Home

~ or $HOME

Root

/

.. and . -----> "relative paths"

~ or /usr/bin -----> "absolute paths"

40 of 51

Commands for Getting Around

1.) Common commands

2.) Working on files

3.) Stringing them together

41 of 51

nano

  • nano is Unix's default text editor
  • Type 'nano' to access it
  • This will open a text editor within your terminal
  • Saving, exiting and other file functions are controlled with ctrl + letter keys
  • If you create a document and write to it, saving it will add the document to the current directory

42 of 51

Commands for Getting Around

cd

Change Directory

mkdir

make directory

ls

List

rm

Remove

pwd

Print working directory

man

Manual

43 of 51

Commands for Getting Around

cd

cd : takes you home

cd .. : takes you up one level (to the containing directory)

mkdir

mkdir filename

ls

ls -a : shows hidden files

ls -l : shows files along with sizes and timestamps

rm

rm -r : remove recursively

rmdir: remove directory

**CAUTION**

with power comes danger!

44 of 51

Getting Comfortable

tab

Auto complete

*

Wildcard

Up arrow

Last command

Ctrl + C

Escape process

Ctrl + L

Clear screen

45 of 51

Getting Comfortable

tab

Enter enough unique characters and press tab. This will complete the filepath or command.

*

Matches every character in a filename.

46 of 51

File operations

grep

print line with matching plain text string

cat

Concatenate, stream to "standard out"

head/tail

Print the first or last lines in file

|

Send output of one command or program to another as input

wc

Word count

cp and mv

Copy and move

47 of 51

File operations

grep

grep word filename

cat

cat file1

head/tail

head -n1 file1

tail -n4 file1

|

ls -l | wc -l

wc

wc -l counts number of lines

wc filename counts the words in the file

cp and mv

cp file folder makes a copy of a file into a folder

mv file folder moves that file, leaving no copy

48 of 51

File operations

**Looking at the manual for all the commands we are showing you is worth your while. Typing 'man command name' will show the manual file

Or just Google it!

49 of 51

Redirection

  • > versus >>
    • > overwrites file content with whatever is on the left side of the redirect symbol
    • >> appends whatever is on the left side to the file on the right side
  • Between the pipe and the redirect, you can write a one-line custom program for text editing
    • "Get all sequence names from a sequence file"
    • grep ">" file1.fas | cut -d ">" -f 2 >> seqs.txt

50 of 51

Tasks

  • Create a file and a directory. Put some words in the file. Copy the file into it. Now, go into the directory and delete the file. Change back into the original directory and move the file into the directory. How is this different than copying?
  • Create a second file and move it into your directory. Count how many files are in the directory using a simple script.
  • Copy the first line of each of your files to a new file

51 of 51

Bonus task

  • Copy all the tree files to home
  • Remove all the tree file in home
  • Concatenate all the tree files in a file called trees.txt in home
  • How many trees are in this file?
  • The second tree is unrooted and has node labels. Make a new file with just the second tree from each of the tree files called trees2.txt