1 of 24

R

April 6, 2023

Technical Team

Yan Luo, Arthur Ang

2 of 24

Exercise

Lesson

Assignment

Agenda

Download R

Data Types

ggplot2

ANOVA

3 of 24

Lesson

01

4 of 24

Download R

5 of 24

Hello World

Terminal: Rscript fileName.R

6 of 24

Data Types

02

7 of 24

Data Types - Vector Objects

Vectors

Lists

Matrices

Arrays

Factors

Data Frames

Data Type

Example

Syntax

Logical

TRUE, FLASE

v <- TRUE

Numeric

1, 3.14, -3/4

v <- 1

Integer

-2L, 0L, 2L

v <- -2L

Complex

3+2i

v <- 3.2i

Character

“ActSoc”, ‘b’, ‘3.14’

v <- “ActSoc”

Raw

“ActSoc”

41 63 74 53 6f 63

v <- charToRaw(‘ActSoc”)

class(v) to check data type, typeof(v) gives "logical", "integer", "double", "complex", "character", "raw" and "list", "NULL", "closure" (function), "special" and "builtin" (basic functions and operators)

8 of 24

Data Types - Vectors

Vectors

Lists

Matrices

Arrays

Factors

Data Frames

# Create a vector.

apple <- c('red','green',"yellow")

print(apple)

# Get the class of the vector.

print(class(apple))

9 of 24

Data Types - Lists

Vectors

Lists

Matrices

Arrays

Factors

Data Frames

# Create a list.

list1 <- list(c(2,5,3),21.3,sin)

# Print the list.

print(list1)

10 of 24

Data Types - Matrices

Vectors

Lists

Matrices

Arrays

Factors

Data Frames

# Create a matrix.

M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)

print(M)

11 of 24

Data Types - Arrays

Vectors

Lists

Matrices

Arrays

Factors

Data Frames

# Create an array with one dimension with values ranging from 1 to 24.

thisarray <- c(1:24)

thisarray

# Create an array with more than one dimension.

multiarray <- array(thisarray, dim = c(4, 3, 2))

multiarray

12 of 24

Data Types - Factors

Vectors

Lists

Matrices

Arrays

Factors

Data Frames

# Create a vector.

apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.

factor_apple <- factor(apple_colors)

# Print the factor.

print(factor_apple)

print(nlevels(factor_apple))

13 of 24

Data Types - Data Frames

Vectors

Lists

Matrices

Arrays

Factors

Data Frames

# Create the data frame.

BMI <- data.frame(

gender = c("Male", "Male","Female"),

height = c(152, 171.5, 165),

weight = c(81,93, 78),

Age = c(42,38,26)

)

print(BMI)

14 of 24

ggplot2

03

15 of 24

ggplot2

16 of 24

ANOVA

04

17 of 24

ANOVA

  • Analysis of Variance
    • A statistical test to determine whether 2 or more population means are different
  • Assumptions
    • Check that your observations are independent.
    • Sample sizes:
      • In case of small samples, test the normality of residuals:
        • If normality is assumed, test the homogeneity of the variances:
          • If variances are equal, use ANOVA
          • If variances are not equal, use the Welch ANOVA
        • If normality is not assumed, use the Kruskal-Wallis test
      • In case of large samples normality is assumed, so test the homogeneity of the variances:
        • If variances are equal, use ANOVA
        • If variances are not equal, use the Welch ANOVA
  • Follow-Along: https://statsandr.com/blog/anova-in-r/

18 of 24

One-Way Anova Results

  • F-statistic: The F-statistic measures the ratio of the between-group variability to the within-group variability. A larger F-statistic indicates that there is more variability between the groups than within the groups, which suggests that there is a significant difference between at least two of the groups being compared.
  • Degrees of freedom: The degrees of freedom represent the number of independent pieces of information used to estimate the variance in the data. In a one-way ANOVA, there are two degrees of freedom: one for the between-group variability and one for the within-group variability.
  • p-value: The p-value represents the probability of obtaining a result as extreme as the one observed, assuming that there is no difference between the groups being compared. A small p-value (typically less than 0.05) indicates that there is strong evidence to reject the null hypothesis of no difference between the groups and conclude that there is a significant difference.
  • Mean square: The mean square represents the sum of squares divided by the degrees of freedom. There are two mean squares in a one-way ANOVA: one for the between-group variability and one for the within-group variability.
  • Sum of squares: The sum of squares represents the total variability in the data. In a one-way ANOVA, there are two sums of squares: one for the between-group variability and one for the within-group variability.

19 of 24

20 of 24

Post-Hoc Tests

  • A post hoc test is used only after we find a statistically significant result and need to determine where our differences truly came from.
  • Post-hoc tests are a family of statistical tests so there are several of them. The most common ones are:
    • Tukey HSD, used to compare all groups to each other (so all possible comparisons of 2 groups).
    • Dunnett, used to make comparisons with a reference group. For example, consider 2 treatment groups and one control group. If you only want to compare the 2 treatment groups with respect to the control group, and you do not want to compare the 2 treatment groups to each other, the Dunnett’s test is preferred.
    • Bonferroni correction if one has a set of planned comparisons to do.

21 of 24

Homework

05

22 of 24

Homework

23 of 24

Resources

24 of 24

Email actrlsoc@gmail.com

Website nyuactsoc.com

Discord discord.gg/Nf8TqAVBfa

Instagram @nyuactsoc

Facebook @actrlsoc