1 of 36

Descriptive statistics

Tolga Tezcan, PhD

2 of 36

Learning outcomes

  1. Learn the differences between categorical (binary, nominal, ordinal) and continuous variables
  2. Learn how to run and interpret frequency tables
  3. Learn how to run and interpret descriptive tables
  4. Learn how to create bar graph and histogram
  5. Refresh knowledge of keyboard and mouse shortcuts and using model codes

2

3 of 36

What is variable? (1)

A variable is any characteristics, number, or quantity that can be measured or counted.

Any piece of information we know about our subjects (e.g., individuals).

3

4 of 36

What is variable? - Demographic (or control) variables

All the questions in research asked to the respondents are called variables.

4

education

ethnicity

age

gender

income

Questions about respondents’ demographics are called demographic or control variables.

5 of 36

What is variable? - Contextual variables

All the questions in research asked to the respondents are called variables.

5

happiness

how safe they feel in their neighborhood

religiosity

environmental attitudes

friendship networks

Questions about respondents’ attitudes, beliefs, or behaviors, are called contextual variables.

6 of 36

What is variable? (2)

A view from RStudio

6

These are variables

A view [Variables in GSS] file

7 of 36

Types of variables

Categorical

Categorical variables take on values that are labels.

Values are NOT real numbers

When respondents are provided responses to choose from.

Do you like coffee?

(1) yes

(2) not much

(3) no

7

Continuous

Continuous variables are real numbers that have an infinite number of values between any two values, with each point placed at an equal distance from one another.

Values are real numbers

When respondents are NOT provided options to choose from.

How long have you been drinking coffee? ....years

8 of 36

Categorical variables

  • Categorical variable values are NOT real numbers.
  • When respondents are provided options to choose from.
  • There are three different types of categorical variables

8

NOMINAL

Nominal variables have more than two responses to choose from.

Do you like coffee?

(1) yes / (2) no / (3) depends

Political party

(1) republican / (2) democrat / (3) independent

ORDINAL

Ordinal variables have responses that can be put in a logical and hierarchical order. The differences between the responses are unknown or inconsistent.

Rank ordered

Do you like coffee?

(1) yes / (2) not much / (3) no

Economic Status

(1) low / (2) medium / (3) high

BINARY

Binary variables list two distinct, mutually exclusive responses. True-or-false and yes-or-no questions are examples of binary variables.

Do you like coffee?

(1) yes / (2) no

Attitude

(1) agree / (2) disagree

9 of 36

Continuous variables

  • Continuous variable values are real numbers.
  • When respondents are NOT provided options to choose from.

    • Age:…. 20, 40, 48, 80
    • Income… $10,000, $30,000, $48,500
    • Education in years… 10, 15, 17, 20

9

10 of 36

Determining variable type exercise - Instructions

Determining the type of variable is important because different analysis techniques are used depending on the variable type.

Some questions from different surveys will be shown in the following slides.

We will determine if they are;

  • Categorical
    • If so, binary, nominal, or ordinal

OR

  • Continuous

10

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

11 of 36

Determining variable type exercise (2)

11

[Youth Participatory Politics Survey Project]

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

12 of 36

Determining variable type exercise (3)

12

[American Health Values Survey]

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

13 of 36

Determining variable type exercise (4)

13

[European Social Survey]

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

14 of 36

Determining variable type exercise (5)

14

[Latino National Survey]

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

15 of 36

Determining variable type exercise (6)

15

[National Surveys on Energy and the Environment]

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

16 of 36

Determining variable type exercise (7)

16

[Latino Second Generation Study]

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

17 of 36

Determining variable type exercise (8)

17

[National Survey on Drug Use and Health]

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

18 of 36

Determining variable type exercise (9)

18

[New Family Structures Study]

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

19 of 36

Determining variable type exercise (10)

19

[Police-Public Contact Survey]

Variable type

Categorical

Continuous

Binary

Nominal

Ordinal

20 of 36

Summary statistics

Summary statistics is used to obtain quick summaries of variables

20

Frequencies (count and percentage)

Frequencies is used to create frequency tables for a single categorical variable.

The “Frequencies” (frq) code counts up how many times a response of a variable appears and calculates the percentage.

Descriptives (mean and standard deviation)

Descriptives is used is create descriptive tables for a single continuous variable.

The “Descriptives” (descr) code is used to determine mean, standard deviation.

21 of 36

Frequency table (for categorical variables) (1)

21

val

label

frq

raw.prc

valid.prc

cum.prc

1

married

1462

41.25

41.43

41.43

2

widowed

255

7.20

7.23

48.65

3

divorced

608

17.16

17.23

65.88

4

separated

103

2.91

2.92

68.80

5

never married

1101

31.07

31.20

100.00

NA

NA

15

0.42

NA

NA

frq(gss$marital, out = "v")

The respondents’ marital status variable shows that 41.43% of the respondents are married; 7.23% of the respondents are widowed; 17.23% of the respondents are divorced; 2.92% of the respondents are separated; 31.20% of the respondents are never married.

[Interpretation templates]

22 of 36

Frequency table - Interpretation

22

[Variables in GSS]

23 of 36

Frequency table (for categorical variables) (2)

23

val

label

frq

raw.prc

valid.prc

cum.prc

1

male

1627

45.91

46.17

46.17

2

female

1897

53.53

53.83

100.00

NA

NA

20

0.56

NA

NA

frq(gss$sex, out = "v")

The respondents’ sex variable shows that 46.17% of the respondents are male; 53.83% of the respondents are female.

[Interpretation templates]

24 of 36

What happens if we use frequency for continuous variables?

24

val

label

frq

raw.prc

valid.prc

cum.prc

18

18

22

0.62

0.66

0.66

19

19

29

0.82

0.87

1.53

20

20

48

1.35

1.44

2.97

21

21

46

1.30

1.38

4.35

22

22

46

1.30

1.38

5.73

23

23

53

1.50

1.59

7.31

24

24

45

1.27

1.35

8.66

25

25

45

1.27

1.35

10.01

26

26

58

1.64

1.74

11.75

27

27

46

1.30

1.38

13.13

28

28

57

1.61

1.71

14.84

29

29

61

1.72

1.83

16.67

30

30

60

1.69

1.80

18.47

31

31

68

1.92

2.04

20.50

32

32

76

2.14

2.28

22.78

33

33

69

1.95

2.07

24.85

34

34

61

1.72

1.83

26.68

25 of 36

Bar graph (for categorical variables)

25

plot_frq(gss$marital, type = "bar", geom.colors = "#336699")

A bar graph is a visual representation of frequency tables.

It provides the same information.

26 of 36

Descriptive table (for continuous variables) (1)

26

descr(gss$age, out = "v", show = "short")

Variable

N

Missings (%)

Mean

SD

dd

3336

5.87

49.18

17.97

The respondents’ age variable shows that the average age of the respondents is 49.18, with standard deviation 17.97.

[Interpretation templates]

27 of 36

Descriptive table - interpretation

27

[Variables in GSS]

28 of 36

Descriptive table (for continuous variables) (2)

28

descr(gss$age, out = "v", show = "short")

Variable

N

Missings (%)

Mean

SD

dd

3524

0.56

14.11

2.89

The respondents’ education in years variable shows that the average years of education that respondents have is 14.11, with standard deviation 2.89.

[Interpretation templates]

29 of 36

What happens if we use descriptive table for categorical variables?

29

Variable

N

Missings (%)

Mean

SD

dd

3529

0.42

2.75

1.72

The average score of marital status is 2.75?

descr(gss$marital, out = "v", show = "short")

30 of 36

Histogram (for continuous variables)

30

plot_frq(gss$educ, type = "hist",show.mean = TRUE, show.mean.val = TRUE, normal.curve = TRUE, show.sd = TRUE, normal.curve.color = "red")

A histogram is a visual representation of descriptive tables.

It provides the same information.

31 of 36

Keyboard and mouse shortcuts

During this class, you must use keyboard and mouse shortcuts exactly as outlined in the following slides.

31

32 of 36

Keyboard shortcuts

32

Windows

macOS

+

Copy

+

+

Paste

+

+

Undo

+

33 of 36

Keyboard shortcuts - hand and finger positions

Little finger is on “Ctrl” (control) and index or middle finger on letters (C, V, Z, etc.)

33

Do not use both hands. Your other hand should be on the mouse (or trackpad).

34 of 36

Mouse shortcuts

34

Do not highlight the existing variable name to replace it with a new variable. DOUBLE CLICK on it with your mouse

[Single line] Do not highlight all the line to copy or run the code.

TRIPLE CLICK with your mouse

(click three times really fast)

[Multiple lines] Highlight with your mouse

35 of 36

How to work with codes? Model codes (from the R script file)

35

We NEVER type the codes or variables inside the codes. Instead, we create a model code and a working code.

Imagine we need a frequency distribution for the sex variable.

This is a model code. It is in the R script file. We know that it works.

(1) Copy the model code. (2) Paste it into the “working space” of your R script file. (3) Add a blank line (press “Enter” on Windows or “Return” on macOS). (4) Paste the model code again.

The first line is the model code, and the second line is the working code that we will edit.

Paste “sex” and replace it with “marital.” If our working code doesn't work, we compare it to the model code to troubleshoot.

36 of 36

How to work with codes? Model codes (Code templates page)

36

We NEVER type the codes or variables inside the codes. Instead, we create a model code and a working code.

For different codes than those provided in the lab R script file, use the [Code templates] page.

Imagine we need a descriptive statistics table for the educ variable.

This is a model code. It is in the code templates page. We know that it works.

(1) Copy the model code. (2) Paste it into the “working space” of your R script file. (3) Add a blank line (press “Enter” on Windows or “Return” on macOS). (4) Paste the model code again.

The first line is the model code, and the second line is the working code that we will edit.

Paste “educ” and replace it with “variable_here.” If our working code doesn't work, we compare it to the model code to troubleshoot.