Descriptive statistics
Tolga Tezcan, PhD
Learning outcomes
2
What is variable? (1)
A variable is any characteristics, number, or quantity that can be measured or counted.
Any piece of information we know about our subjects (e.g., individuals).
3
What is variable? - Demographic (or control) variables
All the questions in research asked to the respondents are called variables.
4
education
ethnicity
age
gender
income
Questions about respondents’ demographics are called demographic or control variables.
What is variable? - Contextual variables
All the questions in research asked to the respondents are called variables.
5
happiness
how safe they feel in their neighborhood
religiosity
environmental attitudes
friendship networks
Questions about respondents’ attitudes, beliefs, or behaviors, are called contextual variables.
What is variable? (2)
A view from RStudio
6
These are variables
A view [Variables in GSS] file
Types of variables
Categorical
Categorical variables take on values that are labels.
Values are NOT real numbers
When respondents are provided responses to choose from.
Do you like coffee?
(1) yes
(2) not much
(3) no
7
Continuous
Continuous variables are real numbers that have an infinite number of values between any two values, with each point placed at an equal distance from one another.
Values are real numbers
When respondents are NOT provided options to choose from.
How long have you been drinking coffee? ....years
Categorical variables
8
NOMINAL
Nominal variables have more than two responses to choose from.
Do you like coffee?
(1) yes / (2) no / (3) depends
Political party
(1) republican / (2) democrat / (3) independent
ORDINAL
Ordinal variables have responses that can be put in a logical and hierarchical order. The differences between the responses are unknown or inconsistent.
Rank ordered
Do you like coffee?
(1) yes / (2) not much / (3) no
Economic Status
(1) low / (2) medium / (3) high
BINARY
Binary variables list two distinct, mutually exclusive responses. True-or-false and yes-or-no questions are examples of binary variables.
Do you like coffee?
(1) yes / (2) no
Attitude
(1) agree / (2) disagree
Continuous variables
9
Determining variable type exercise - Instructions
Determining the type of variable is important because different analysis techniques are used depending on the variable type.
Some questions from different surveys will be shown in the following slides.
We will determine if they are;
OR
10
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Determining variable type exercise (2)
11
[Youth Participatory Politics Survey Project]
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Determining variable type exercise (3)
12
[American Health Values Survey]
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Determining variable type exercise (4)
13
[European Social Survey]
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Determining variable type exercise (5)
14
[Latino National Survey]
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Determining variable type exercise (6)
15
[National Surveys on Energy and the Environment]
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Determining variable type exercise (7)
16
[Latino Second Generation Study]
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Determining variable type exercise (8)
17
[National Survey on Drug Use and Health]
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Determining variable type exercise (9)
18
[New Family Structures Study]
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Determining variable type exercise (10)
19
[Police-Public Contact Survey]
Variable type | |
Categorical | Continuous |
Binary | |
Nominal | |
Ordinal | |
Summary statistics
Summary statistics is used to obtain quick summaries of variables
20
Frequencies (count and percentage)
Frequencies is used to create frequency tables for a single categorical variable.
The “Frequencies” (frq) code counts up how many times a response of a variable appears and calculates the percentage.
Descriptives (mean and standard deviation)
Descriptives is used is create descriptive tables for a single continuous variable.
The “Descriptives” (descr) code is used to determine mean, standard deviation.
Frequency table (for categorical variables) (1)
21
val | label | frq | raw.prc | valid.prc | cum.prc |
1 | married | 1462 | 41.25 | 41.43 | 41.43 |
2 | widowed | 255 | 7.20 | 7.23 | 48.65 |
3 | divorced | 608 | 17.16 | 17.23 | 65.88 |
4 | separated | 103 | 2.91 | 2.92 | 68.80 |
5 | never married | 1101 | 31.07 | 31.20 | 100.00 |
NA | NA | 15 | 0.42 | NA | NA |
frq(gss$marital, out = "v")
The respondents’ marital status variable shows that 41.43% of the respondents are married; 7.23% of the respondents are widowed; 17.23% of the respondents are divorced; 2.92% of the respondents are separated; 31.20% of the respondents are never married.
[Interpretation templates]
Frequency table - Interpretation
22
[Variables in GSS]
Frequency table (for categorical variables) (2)
23
val | label | frq | raw.prc | valid.prc | cum.prc |
1 | male | 1627 | 45.91 | 46.17 | 46.17 |
2 | female | 1897 | 53.53 | 53.83 | 100.00 |
NA | NA | 20 | 0.56 | NA | NA |
frq(gss$sex, out = "v")
The respondents’ sex variable shows that 46.17% of the respondents are male; 53.83% of the respondents are female.
[Interpretation templates]
What happens if we use frequency for continuous variables?
24
val | label | frq | raw.prc | valid.prc | cum.prc |
18 | 18 | 22 | 0.62 | 0.66 | 0.66 |
19 | 19 | 29 | 0.82 | 0.87 | 1.53 |
20 | 20 | 48 | 1.35 | 1.44 | 2.97 |
21 | 21 | 46 | 1.30 | 1.38 | 4.35 |
22 | 22 | 46 | 1.30 | 1.38 | 5.73 |
23 | 23 | 53 | 1.50 | 1.59 | 7.31 |
24 | 24 | 45 | 1.27 | 1.35 | 8.66 |
25 | 25 | 45 | 1.27 | 1.35 | 10.01 |
26 | 26 | 58 | 1.64 | 1.74 | 11.75 |
27 | 27 | 46 | 1.30 | 1.38 | 13.13 |
28 | 28 | 57 | 1.61 | 1.71 | 14.84 |
29 | 29 | 61 | 1.72 | 1.83 | 16.67 |
30 | 30 | 60 | 1.69 | 1.80 | 18.47 |
31 | 31 | 68 | 1.92 | 2.04 | 20.50 |
32 | 32 | 76 | 2.14 | 2.28 | 22.78 |
33 | 33 | 69 | 1.95 | 2.07 | 24.85 |
34 | 34 | 61 | 1.72 | 1.83 | 26.68 |
Bar graph (for categorical variables)
25
plot_frq(gss$marital, type = "bar", geom.colors = "#336699")
A bar graph is a visual representation of frequency tables.
It provides the same information.
Descriptive table (for continuous variables) (1)
26
descr(gss$age, out = "v", show = "short")
Variable | N | Missings (%) | Mean | SD |
dd | 3336 | 5.87 | 49.18 | 17.97 |
The respondents’ age variable shows that the average age of the respondents is 49.18, with standard deviation 17.97.
[Interpretation templates]
Descriptive table - interpretation
27
[Variables in GSS]
Descriptive table (for continuous variables) (2)
28
descr(gss$age, out = "v", show = "short")
Variable | N | Missings (%) | Mean | SD |
dd | 3524 | 0.56 | 14.11 | 2.89 |
The respondents’ education in years variable shows that the average years of education that respondents have is 14.11, with standard deviation 2.89.
[Interpretation templates]
What happens if we use descriptive table for categorical variables?
29
Variable | N | Missings (%) | Mean | SD |
dd | 3529 | 0.42 | 2.75 | 1.72 |
The average score of marital status is 2.75?
descr(gss$marital, out = "v", show = "short")
Histogram (for continuous variables)
30
plot_frq(gss$educ, type = "hist",show.mean = TRUE, show.mean.val = TRUE, normal.curve = TRUE, show.sd = TRUE, normal.curve.color = "red")
A histogram is a visual representation of descriptive tables.
It provides the same information.
Keyboard and mouse shortcuts
During this class, you must use keyboard and mouse shortcuts exactly as outlined in the following slides.
31
Keyboard shortcuts
32
Windows
macOS
+
Copy
+
+
Paste
+
+
Undo
+
Keyboard shortcuts - hand and finger positions
Little finger is on “Ctrl” (control) and index or middle finger on letters (C, V, Z, etc.)
33
Do not use both hands. Your other hand should be on the mouse (or trackpad).
Mouse shortcuts
34
Do not highlight the existing variable name to replace it with a new variable. DOUBLE CLICK on it with your mouse
[Single line] Do not highlight all the line to copy or run the code.
TRIPLE CLICK with your mouse
(click three times really fast)
[Multiple lines] Highlight with your mouse
How to work with codes? Model codes (from the R script file)
35
We NEVER type the codes or variables inside the codes. Instead, we create a model code and a working code.
Imagine we need a frequency distribution for the sex variable.
This is a model code. It is in the R script file. We know that it works.
(1) Copy the model code. (2) Paste it into the “working space” of your R script file. (3) Add a blank line (press “Enter” on Windows or “Return” on macOS). (4) Paste the model code again.
The first line is the model code, and the second line is the working code that we will edit.
Paste “sex” and replace it with “marital.” If our working code doesn't work, we compare it to the model code to troubleshoot.
How to work with codes? Model codes (Code templates page)
36
We NEVER type the codes or variables inside the codes. Instead, we create a model code and a working code.
For different codes than those provided in the lab R script file, use the [Code templates] page.
Imagine we need a descriptive statistics table for the educ variable.
This is a model code. It is in the code templates page. We know that it works.
(1) Copy the model code. (2) Paste it into the “working space” of your R script file. (3) Add a blank line (press “Enter” on Windows or “Return” on macOS). (4) Paste the model code again.
The first line is the model code, and the second line is the working code that we will edit.
Paste “educ” and replace it with “variable_here.” If our working code doesn't work, we compare it to the model code to troubleshoot.