1 of 25

R for Every Survey Analysis

http://bit.ly/nycr-every-survey-analysis

Max Richman max@geopoll.com @richmanmax

@GeoPoll

@DataKindDC

2 of 25

Outline

  1. Surveys in the developing world
  2. Shifting to FOSSS (free and open source statistical software) for everyone
  3. Key survey data manipulation techniques
    1. Variable recoding
    2. Statistical weighting
    3. Analysis & reporting
  4. Future areas of focus

3 of 25

Surveys in the developing world

4 of 25

  • Africa: 1B people
  • Among highest GDP growth rate in world
  • Mobile penetration over 80 per 100 people¹

¹According to TA Telcom 2013

Surveys in the developing world

5 of 25

Types of Data and Sources

Census - count everyone

Civil registration and vital statistics (CRVS) - count events (registration, births, deaths)

Surveys - ask subsets about various topics

World Bank (LSMS) - SPSS

UNICEF (MICS) - SPSS

USAID (DHS) - SAS, Stata, SPSS

Afrobarometer - SPSS

Governments - ?

Surveys in the developing world

6 of 25

Surveys in the developing world

7 of 25

Image: Wikipedia

Surveys in the developing world

8 of 25

Images: TCG

Surveys in the developing world

9 of 25

Phones and internet in Africa

Images: Pew 2015

Surveys in the developing world

10 of 25

©2015 Mobile Accord, Inc. All rights reserved

Surveys in the developing world

11 of 25

©2015 Mobile Accord, Inc. All rights reserved

Surveys in the developing world

12 of 25

©2015 Mobile Accord, Inc. All rights reserved

Surveys in the developing world

13 of 25

Shifting to FOSSS

Making Free and Open Source Statistical Software standard:

  • A whiff: You see an R script file in action
  • A dive: Steeper learning curve means person takes extra time on a project to do it differently

Side-by-side books and sites help:

14 of 25

Credit: sodahead.com

Shifting to FOSSS

15 of 25

  1. Base R (or Python) in nice IDE
  2. Keep it human readable (resist refactoring!)
  3. Drop COTS

Shifting to FOSSS

16 of 25

Variable Recoding

Creating the variables you need for your analysis

17 of 25

Variable Recoding

18 of 25

  1. I/O with foreign

library(foreign)

data <- read.dta("my_stata_file.dta")

...

write.dta(data_final, "my_new_stata_file.dta"))

2. Change variable headings

colnames(df_station_list)[colnames(df_station_list) == 'Var1'] <- 'StationName'

3. Compute and change variable values

data$StationId[data$Station == "Did Not Watch"] = 999999

data$AgeGroup[data$Age >= 15 & data$Age <= 24] <- "15 to 24"

data$AgeGroup2[data$Age >= 15 & data$Age <= 24] <- data$AgeGroup[data$Age >= 15 & data$Age <= 24]

data_merge$PopWeight = data_merge$weight * data_merge$pop

Variable Recoding

19 of 25

Weighting

Making adjustments based on demographic variations in sampling

0.72

1.52

0.88

1.45

20 of 25

Credit: Evan-Amos - Wikipedia

Weighting

21 of 25

  • Pop Proportion / Sample Statistic
  • Multiply all together for each observation
  • Use weighted cross-tab function

Weighting

22 of 25

1)

2)

3)

23 of 25

Analysis & Reporting

  1. Tables command, prop.tables

table(data$Gender,exclude=NULL)

table(data$Gender,data$AgeGroup)

2. R STUDIO!

3. RMARKDOWN!

(KNITR/PANDOCS!)

24 of 25

Future areas to focus on

  • More events, guides, sites, tutorials
  • IO with Foreign needs beefing-up
  • Greater control over factor label recoding

25 of 25

Max Richman max@geopoll.com @richmanmax

@GeoPoll

@DataKindDC