1 of 84

Basic R

Huyha

Email: hagiahuy311@gmail.com

2025

1

Bioinformatic analysis for cancer genomics

2 of 84

Why we use R?

2

3 of 84

Contents

  1. Overview and how to install R
  2. Basic R for bioinformatic
  3. Operator
  4. Data types and Data Structures
  5. Functions

3

4 of 84

4

  1. Overview and install R

5 of 84

What is R?

● A Programming/Statistical Language

● A powerful language for statistical computing and data analysis.

● Widely used in bioinformatics and life sciences

5

6 of 84

Why we use R?

1. Comprehensive Statistical Tools

2. Rich Ecosystem of Bioinformatics Packages

3. Data Visualization

4. Handling High-Dimensional Data

5. Open Source and Active Community

6. Reproducibility

7. Free

6

https://roelverbelen.netlify.app/resources/r/packages/

7 of 84

Why we use R?

7

Adapted from mr. Duy slide

8 of 84

Why we use R?

8

9 of 84

Install R

9

https://cran.rstudio.com/

10 of 84

R and Rstudio

10

https://posit.co/download/rstudio-desktop/

11 of 84

Orthers platforms for R

11

Visual code

https://code.visualstudio.com/download

12 of 84

Orthers platforms for R

12

Google colab

https://colab.research.google.com

13 of 84

13

2. Basic R

14 of 84

Work directory

# First, have a look at the current working directory getwd()

# Change to your desired directory setwd()

# List the file in the directory dir()

14

https://www.r-bloggers.com/2020/01/rstudio-projects-and-working-directories-a-beginners-guide/

15 of 84

Install and load package

# Get the list of installed packages installed.packages()

# Install package install.packages()

# Import package library()

# get all packages currently loaded in the R environment. search()

# Check installed packages locationlibPaths()

# Update package update.packages()

15

https://www.javatpoint.com/r-packages

16 of 84

Orther ways to install package

16

Bioconductor

Github

17 of 84

Install and load package

17

Search and download these packages:

● tidyverse

● readr

● ggplot2

18 of 84

Help and manual

# Access the help file ?mean

# If unsure of the precise name �# search doc across all installed packages ??mean

18

19 of 84

Package tutorial

19

Google

AI

Document

20 of 84

Workflow in R tutorial

20

21 of 84

Workflow in R tutorial

21

22 of 84

Loading and Saving CSV Files in R

#Standard use, small files (base R)

#Load a CSV file

data <- read.csv("gene_expression.csv", header = TRUE) head(data) # View first few rows

#Save a CSV file

write.csv(data, "output.csv", row.names = FALSE)

22

header = TRUE: Treats the first row as column names.

sep = ",": (Default) Assumes comma-separated values.

23 of 84

Loading and Saving CSV Files in R

#Using read.table() (More Control)

#Load CSV with custom delimiter

data <- read.table("gene_expression.csv", sep = ",", header = TRUE)

#Save CSV with write.table()

write.table(data, "output.csv", sep = ",", row.names = FALSE, quote = FALSE)

23

header = TRUE: Treats the first row as column names.

sep = ",": (Default) Assumes comma-separated values.

24 of 84

Loading and Saving CSV Files in R

#Standard use, small files

read.csv() / write.csv()

#Custom delimiters (e.g., tab-separated)

read.table() / write.table()

#Tidyverse compatibility, easy use

read_csv() / write_csv()

24

25 of 84

Save and quit

25

An R workspace image contains all the information held in the R session at the time of exit and is saved as a .RData file

# Save current workspace save.image(file="mysession.RData")

# exit R

q()

# Load workspace

load('myession.RData')

26 of 84

R-base overview

26

https://www.geeksforgeeks.org/r-tutorial/

27 of 84

27

3. Operators

28 of 84

Operator

28

https://www.tutorialkart.com/r-tutori al/r-operators/#gsc.tab=0

29 of 84

Arithmetic Operator

29

# R Arithmetic Operators Example for integers

a <- 7.5

b <- 2

print ( a+b ) #1 Addition

print ( a-b ) #2 Subtraction

print ( a*b ) #3 Multiplication

print ( a/b ) #4 Division

print ( a%%b ) #5 Reminder

print ( a%/%b ) #6 Quotient

print ( a^b ) #7 Power of

$ Rscript r_op_arithmetic.R

[1] 9.5

[2] 5.5

[3] 15

[4] 3.75

[5] 1.5

[6] 3

[7] 56.25

30 of 84

Arithmetic Operator

30

# R Operators - R Arithmetic Operators Example for vectors

a <- c(8, 9, 6)

b <- c(2, 4, 5)

print ( a+b )#1 addition

print ( a-b ) #2 subtraction

print ( a*b ) #3 multiplication

print ( a/b ) #4 Division

print ( a%%b )#5 Reminder

print ( a%/%b )#6 Quotient

print ( a^b )#7 Power of

$ Rscript r_op_arithmetic.R

[1] 10 13 11

[2] 6 5 1

[3] 16 36 30

[4] 4.00 2.25 1.20

[5] 0 1 1

[6] 4 2 1

[7] 64 6561 7776

31 of 84

Arithmetic Operator

31

32 of 84

Arithmetic Operator

Classwork

32

33 of 84

Relational Operator

33

# R Operators - R Relational Operators Example for Numbers

a <- 7.5

b <- 2

print ( a>b ) #1 greater than

print ( a<b ) #2 less than

print ( a==b ) #3 equal to

print ( a<=b ) #4 less than or equal to

print ( a>=b ) #5 greater than or equal to

print ( a!=b ) #6 not equal to

34 of 84

Relational Operator

34

# R Operators - R Relational Operators Example for Numbers

a <- 7.5

b <- 2

print ( a>b ) #1 greater than

print ( a<b ) #2 less than

print ( a==b ) #3 equal to

print ( a<=b ) #4 less than or equal to

print ( a>=b ) #5 greater than or equal to

print ( a!=b ) #6 not equal to

$ Rscript r_op_relational.R

[1] TRUE

[2] FALSE

[3] FALSE

[4] FALSE

[5] TRUE

[6] TRUE

35 of 84

Logical Operator

35

# R Operators - R Logical Operators Example for basic logical elements

a <- 0 #(TRUE)

b <- 2 #(FALES)

print ( a & b ) #1 logical AND element wise

print ( a | b ) #2 logical OR element wise

print ( !a ) #3 logical NOT element wise

print ( a && b ) #4 logical AND consolidated for all elements

print ( a || b ) #5 logical OR consolidated for all elements

36 of 84

Logical Operator

36

# R Operators - R Logical Operators Example for basic logical elements

a <- 0 #(TRUE)

b <- 2 #(FALES)

print ( a & b ) #1 logical AND element wise

print ( a | b ) #2 logical OR element wise

print ( !a ) #3 logical NOT element wise

print ( a && b ) #4 logical AND consolidated for all elements

print ( a || b ) #5 logical OR consolidated for all elements

$ Rscript r_op_logical.R

[1] FALSE

[2] TRUE

[3] TRUE

[4] FALSE

[5] TRUE

37 of 84

Assignment Operator

#Assign variable

x = 'hello'

print(x)

[1] "hello"

x <- 'learn r'

print(x)

[1] "learn r"

'r programming language' -> x; print(x)

[1] "r programming language"

37

R Variable can be assigned a value using one of the following three operators :

  1. Equal Operator =
  2. Leftward Operator <-
  3. Rightward Operator ->

38 of 84

Miscellaneous Operator

a = 23:31

print ( a )

[1] 23 24 25 26 27 28 29 30 31

a = c(25, 27, 76)

b = 27

print ( b %in% a )

[1] TRUE

38

Operator

Description

Usage

:

Creates series of numbers from left operand to right operand

a:b

%in%

Identifies if an element(a) belongs to a vector(b)

a %in% b

%*%

Performs multiplication of a vector with its transpose

A %*% t(A)

39 of 84

Miscellaneous Operator

mat = matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)

print (mat)

print( t(mat))

pro = mat %*% t(mat)

print(pro)

Output :[,1] [,2] [,3] #original matrix of order 2x3

[1,] 1 3 5

[2,] 2 4 6

[,1] [,2] #transposed matrix of order 3x2

[1,] 1 2

[2,] 3 4

[3,] 5 6

[,1] [,2] #product matrix of order 2x2

[1,] 35 44

[2,] 44 56

39

40 of 84

Special Value(Inf, NaN, NA, NULL)

40

41 of 84

Classwork

41

Classwork : Replicate all the operation codes above

42 of 84

42

3. Data Type and Data Structure

43 of 84

DataType

> x <- TRUE

> print(class(x))

[1] "logical"

> x <- 67.54

> print(class(x))

[1] "numeric"

x <- 63L

> print(class(x))

[1] "integer"

43

> x <- 6 + 4i

> print(class(x))

[1] "complex"

> x <- "hello"

> print(class(x))

[1] "character"

> x <- charToRaw("hello")

> print(class(x))

[1] "raw"

44 of 84

Data structure

44

45 of 84

Vector and index

45

# we can use the c function to combine the values as a vector.

# By default the type will be double

X<- c(61, 4, 21, 67, 89, 2)

X

[1] 61 4 21 67 89 2

# seq() function for creating

# a sequence of continuous values.

# length.out defines the length of vector.

Y<- seq(1, 10, length.out = 5)

Y

[1] 1.00 3.25 5.50 7.75 10.00

# use':' to create a vector

# of continuous values.

Z<- 2:7

Z

[1] 2 3 4 5 6 7

vector<-seq(10,100,by=10)

vector[1]

[1] 10

vector[c(1,3)]

[1] 10 30

Vector[7:10]

[1] 70 80 90 100

46 of 84

Operator in Vector

46

# Numeric vector

numbers <-c(1,2,3,4,5)

# Character vector names<-c("Alice","Bob","Charlie")

# Addition

result <- numbers + 2

print(result)

[1] 3 4 5 6 7

# Multiplication

result <- numbers *2

print(result)

[1] 2 4 6 8 10

# Adding two vectors

vector1 <-c(1,2,3)

vector2 <-c(4,5,6)

result <- vector1 + vector2

print(result)

[1] 5 7 9

47 of 84

Vector

47

numbers<-c(1,2,3,4,5,6,7,8)

# Access the second element second_element <- numbers[2] print(second_element)

[1] 2

numbers[3]<-5

numbers

[1] 1 2 -5 4 5 6 7 8

numbers<-numbers[-3]

number [1] 1 2 4 5 6 7 8

# Get elements greater than 3 (logical subset)

gt_than_3 <- numbers[numbers >3] print(greater_than_three) [1] 4 5

# Get a subset of the first three elements

subset_vector <- numbers[1:3] print(subset_vector) [1] 1 2 3

Accessing Elements:

Subset Vector

Logical Subset Vector

48 of 84

Vector

48

# Name the elements of the vector names(numbers)<-c("First","Second","Third","Fourth","Fifth")

print(numbers)

[1] First Second Third Fourth Fifth

# 1 2 3 4 5

# Get the type of the vector

vector_type <- typeof(numbers) print(vector_type)

[1] "double"

# Combine vectors

vector1<-c(1,2,3)

vector2<-c(4,5,6)

combined_vector <-c(vector1, vector2)

print(combined_vector)

[1] 1 2 3 4 5 6

Vector Naming

49 of 84

Vector factor

49

# Creating a factor from a character vector

colors <-c("red","green","blue","red","green")

color_factor <- factor(colors)

print(color_factor)

[1] red green blue red green

Levels: blue green red

# Specifying the order of levels

ordered_factor <- factor(colors, levels =c("red","green","blue"))

print(ordered_factor)

[1] red green blue red green

Levels: red green blue

50 of 84

Recycle rule

50

# Shorter vector is recycled to match the length of the longer vector short_vector

short_vector <-c(1,2)

long_vector <-c(10,20,30,40)

result <- long_vector + short_vector

print(result)

[1] 11 22 31 42

The Recycling Rule

→ How R handles operations between vectors of unequal lengths.

→ R will "recycle" the shorter vector by repeating its elements until it matches the length of the longer vector.

https://www.gastonsanchez.com/R-coding-basics/vectors4.html#recycling

51 of 84

Vector functions

51

# Sequences with seq()

> seq(from=3, to=27, by=3)

[1] 3 6 9 12 15 18 21 24 27

# Repetition with rep()

> rep(x=1,times=4) [1] 1 1 1 1

> rep(x=c(3,62,8.3),times=3)

[1] 3.0 62.0 8.3 3.0 62.0 8.3 3.0 62.0 8.3

# Sorting with sort()

> sort(x=c(2.5,-1,-10,3.44),decreasing=FALSE)

[1] -10.00 -1.00 2.50 3.44

> sort(x=c(2.5,-1,-10,3.44),decreasing=TRUE)

[1] 3.44 2.50 -1.00 -10.00

# Finding a Vector length with length()

> length(x=c(3,2,8,1))

[1] 4

52 of 84

Data frame

52

Definition:

A data frame is a table or a 2-dimensional array-like structure in R, where each column can contain different types of data (numeric, character, factor, etc.).

Structure:

Similar to a spreadsheet or SQL table, with rows representing observations and columns representing variables.

53 of 84

Creating Data frame

53

# Create a data frame with three columns

df <- data.frame(ID =1:4,

Name=c("Alice","Bob","Charlie","Diana"),

Score =c(85,92,88,76))

print(df)

ID Name Score

1 1 Alice 85

2 2 Bob 92

3 3 Charlie 88

4 4 Diana 76

54 of 84

Accessing data in Data frame

54

# Access the 'Name' Column

names<- df$Name

print(names)

[1] "Alice" "Bob" "Charlie" "Diana"

Using $ to Access Columns:

# Access the element in the 2nd row, 3rd column

element <- df[2,3]

print(element)

[1] 92

Using Indexing

55 of 84

Data frame

55

# Add a new column 'Passed'

df$Passed <- df$Score >80

print(df)

ID Name Score Passed

1 1 Alice 85 TRUE

2 2 Bob 92 TRUE

3 3 Charlie 88 TRUE

4 4 Diana 76 FALSE

# Subsetting a dataframe with condition

high_scores <- df[df$Score >80,] print(high_scores)

ID Name Score Passed 1 1 Alice 85 TRUE

2 2 Bob 92 TRUE

3 3 Charlie 88 TRUE

Adding a New Column

Subsetting Data Frames:

56 of 84

Data frame

56

# Combine data frames by adding rows

df_new <- data.frame(ID=5,

Name="Eve", Score=90)

combined_df <- rbind(df, df_new)

print(combined_df)

ID Name Score Passed

1 1 Alice 85 TRUE

2 2 Bob 92 TRUE

3 3 Charlie 88 TRUE

4 4 Diana 76 FALSE

5 5 Eve 90 TRUE

# Combine data frames by adding columns

extra_info<-data.frame(Age=c(23,25,22,21,24))

full_df <- cbind(combined_df,extra_info)

print(full_df)

ID Name Score Passed Age

1 1 Alice 85 TRUE 23

2 2 Bob 92 TRUE 25

3 3 Charlie 88 TRUE 22

4 4 Diana 76 FALSE 21

5 5 Eve 90 TRUE 24

Row Binding

Column Binding

57 of 84

Viewing and Inspecting Data Frames

57

# Viewing data

View(df)

# Explore the structure of the data

str(df)

'data.frame': 4 obs. of 4 variables:

$ ID : int 1 2 3 4

$ Name : chr "Alice" "Bob" "Charlie" "Diana"

$ Score : num 85 92 88 76 $ Passed: logi TRUE TRUE TRUE FALSE

58 of 84

Accessing data in Data frame

58

summary(df)

ID Name Score Passed

Min. :1.00 Length:4 Min. :76.00 Mode :logical

1st Qu.:1.75 Class :character 1st Qu.:82.75 FALSE:1

Median :2.50 Mode :character Median :86.50 TRUE :3

Mean :2.50 Mean :85.25

3rd Qu.:3.25 3rd Qu.:89.00

Max. :4.00 Max. :92.00

To get a summary of each column.

Summary Statistics

59 of 84

Data frame

59

# Get rows where Score is greater than 80

high_scores <- df[df$Score >80,] print(high_scores)

ID Name Score Passed

1 1 Alice 85 TRUE

2 2 Bob 92 TRUE

3 3 Charlie 88 TRUE

Subset Rows Based on Conditions

Select Specific Columns

# Select only the 'Name' and 'Score' columns

name_score <- df[,c("Name","Score")]

print(name_score)

Name Score

1 Alice 85

2 Bob 92

3 Charlie 88

4 Diana 76

60 of 84

Adding and Modifying Columns

60

# Add a column indicating if the score is above average

df$Above_Average <- df$Score > mean(df$Score)

print(df)

ID Name Score Passed Above_Average

1 1 Alice 85 TRUE FALSE

2 2 Bob 92 TRUE TRUE

3 3 Charlie 88 TRUE TRUE

4 4 Diana 76 FALSE FALSE

Modify an Existing Column

# Adjust the score by adding 5 points to each student

df$Score <- df$Score + 5

print(df)

ID Name Score Passed Above_Average

1 1 Alice 90 TRUE FALSE

2 2 Bob 97 TRUE TRUE

3 3 Charlie 93 TRUE TRUE

4 4 Diana 81 FALSE FALSE

Add a New Column

61 of 84

Accessing data in Data frame

61

# Sort the data frame by 'Score' in descending order

df_sorted <- df[order(-df$Score),] print(df_sorted)

ID Name Score Passed Above_Average

2 2 Bob 97 TRUE TRUE

3 3 Charlie 93 TRUE TRUE

1 1 Alice 90 TRUE FALSE

4 4 Diana 81 FALSE FALSE

Sort by a Single Column

Sort by Multiple Columns Binding

# Sort by 'Passed' (descending) and then by 'Score' (ascending)

df_sorted_multi <- df[order(-df$Passed,df$Score),]

print(df_sorted_multi)

ID Name Score Passed Above_Average

1 1 Alice 90 TRUE FALSE

3 3 Charlie 93 TRUE TRUE

2 2 Bob 97 TRUE TRUE

4 4 Diana 81 FALSE FALSE

62 of 84

Accessing data in Data frame

62

# Bind new data frame rows to an existing one

new_students <- data.frame (ID = 5,

Name = "Eve",

Score = 89,

Passed = TRUE,

Above_Average = FALSE)

df_combined <- rbind(df, new_students)

print(df_combined)

ID Name Score Passed Above_Average

1 1 Alice 90 TRUE FALSE

2 2 Bob 97 TRUE TRUE

3 3 Charlie 93 TRUE TRUE

4 4 Diana 81 FALSE FALSE

5 5 Eve 89 TRUE FALSE

Row Binding

63 of 84

Data frame

63

# Add a new column for student

Age ages <- data.frame(Age = c(23,25,22,21,24))

df_with_age <- cbind(df_combined, ages)

print(df_with_age)

ID Name Score Passed Above_Average Age

1 1 Alice 90 TRUE FALSE 23

2 2 Bob 97 TRUE TRUE 25

3 3 Charlie 93 TRUE TRUE 22

4 4 Diana 81 FALSE FALSE 21

5 5 Eve 89 TRUE FALSE 24

Column Binding

64 of 84

Data frame

64

# Remove the 'Passed' column

df_no_passed <- df[,!(names(df)%in%"Passed")]

print(df_no_passed)

ID Name Score Above_Average

1 1 Alice 90 FALSE

2 2 Bob 97 TRUE

3 3 Charlie 93 TRUE

4 4 Diana 81 FALSE

Remove a Column

# Rename 'Score' to 'Final_Score' names(df)[names(df)=="Score"]<-"Final_Score"

print(df)

ID Name Final_Score Passed Above_Average

1 1 Alice 90 TRUE FALSE

2 2 Bob 97 TRUE TRUE

3 3 Charlie 93 TRUE TRUE

4 4 Diana 81 FALSE FALSE

Rename a Column

65 of 84

Data frame

65

66 of 84

Accessing data in Data frame

66

# Merge two data frames by the 'ID' column

df_info <- data.frame(ID =1:4, Gender = c("F","M","M","F"))

df_merged <- merge(df, df_info, by ="ID")

print(df_merged)

ID Name Score Passed Above_Average Gender

1 1 Alice 90 TRUE FALSE F

2 2 Bob 97 TRUE TRUE M

3 3 Charlie 93 TRUE TRUE M

4 4 Diana 81 FALSE FALSE F

Merging Data Frames

67 of 84

Key Functions in Data frame

67

  • nrow(df): Number of rows.
  • ncol(df): Number of columns.
  • dim(df): Dimensions (rows, columns).
  • names(df): Column names.

Other Key Functions

68 of 84

Matrix and array

68

Definition: A matrix is a two-dimensional (2D) data structure in R where all elements are of the same data type (numeric, character, or logical).

Structure: Consists of rows and columns.

Definition: An array is a multi-dimensional data structure in R that can have more than two dimensions. All elements must be of the same type.

Structure: Arrays can be thought of as matrices extended to more dimensions.

Matrix

Array

69 of 84

Matrix and array

69

# Create a 3x3 numeric

matrix mat <- matrix(1:9, nrow =3, ncol =3)

print(mat)

# Create a 3x3x2 array

arr <- array(1:18,dim=c(3,3,2))

print(arr)

Matrix

Array

70 of 84

70

71 of 84

71

72 of 84

72

73 of 84

73

4. Functions

74 of 84

R function

74

https://www.statmethods.net/management/functions.html

https://iqss.github.io/dss-workshops/R/Rintro/base-r-cheat-sheet.pdf

  • Useful Built-in function
  • Create an R function

75 of 84

Useful Built-in function

75

Data Manipulation

● subset(): Extract subsets of data.

● merge(): Combine data frames by common columns or row names.

● apply(): Apply a function over the margins of an array or matrix.

● tapply(): Apply a function over subsets of a vector.

● reshape(): Reshape data between wide and long formats.

● cut(): Divide continuous variables into intervals.

● aggregate(): Compute summary statistics over subsets of data.

Statistical Analysis

● summary(): Provide a summary of an object.

● cor(): Calculate correlation between variables.

● lm(): Fit linear models.

● table(): Create a contingency table of counts.

76 of 84

Useful Built-in function

76

Data Cleaning

● na.omit(): Remove missing values from an object.

● is.na(): Identify missing values.

● duplicated(): Identify duplicate elements.

Data Visualization

● plot(): Generic X-Y plotting.

● hist(): Create a histogram.

● boxplot(): Create a boxplot.

● pairs(): Create a matrix of scatterplots.

77 of 84

Useful Built-in function

77

Utility Functions

● str(): Display the structure of an R object.

● paste(): Concatenate strings.

● seq(): Generate a sequence of numbers.

● rep(): Repeat elements of a vector.

78 of 84

78

5. Decision Making

79 of 84

Decision making

79

80 of 84

80

6. Control Flow

81 of 84

Control flow

81

82 of 84

R cheat sheet

82

https://iqss.github.io/dss-workshops/R/Rintro/base-r-cheat-sheet.pdf

83 of 84

Summary in R tutorial

83

84 of 84

84

Thanks you