1 of 69

Virtual Lab

A platform for online sampling, surveying and interventions

Dante Donati, UPF and BGSE Nandan Rao, UAB and BGSE

2 of 69

Outline

  1. Introduction
  2. How can I use it?
  3. Case Studies and Numbers
  4. The Component Parts
  5. Study archetypes
  6. Future Development

3 of 69

Introduction

  • What is Virtual Lab?
  • Virtual Lab process

4 of 69

What is Virtual Lab?

  • A tool for gathering survey data from willing participants.
  • Participants are recruited and engaged with online (chatbot).
  • Built for research, with statistical estimation in mind.
  • Helpful for evaluating causal effects of online interventions.
  • Useful for policy-making.

5 of 69

Why Virtual Lab?

Gathering survey data necessarily involves two distinct steps:

  1. Recruit respondents

  • Ask questions (+ intervene �and evaluate)

Virtual Lab is an integrated software platform for performing both steps via:

  1. Targeted digital advertising

  • Online survey (chatbot!)

6 of 69

Virtual Lab Process

  1. Recruit via digital advertising
  2. Survey and intervene
  3. Monitor in real time

7 of 69

Recruit

8 of 69

Survey (chat)

Intervene (within chat!)

9 of 69

Monitor

10 of 69

How can I �use it?

An example of how Virtual Lab could be used for the impact evaluation of a social media campaign.

11 of 69

A question

What is the medium-term impact of a social-media information campaign on vaccine confidence and adoption for those who saw it?

COVID-19 Vaccine

C

A COVID-19 vaccine is available today.

It is safe and can save millions of lives.

Get it now!

12 of 69

Traditional method #1: A lab experiment

Method. Gather a group of willing participants (undergraduates, MTurk, etc.), split them into control and treatment group, show the advertisement to the treatment group and give a questionnaire to everybody.

Problems.

  1. The ad is consumed entirely out-of-context, not realistic.
  2. Treatment and survey are bundled as one experience.
  3. Following up with participants after 3-6 months can be difficult/costly.
  4. Study sample differs from target population (external validity).

13 of 69

Traditional method #2: A geographic experiment

Method. Create treatment and control regions, run the ads in treatment regions and not in control regions, and administer a survey to a sample drawn from all regions.

Problems.

  • Expensive: large control regions implies a big loss.
  • Expensive: finding those who actually saw the ad involves oversampling people and hoping to find some that saw it.
  • Selection bias: those who actually saw the ad different in unobservables.

14 of 69

The integrated Virtual Lab method

  1. Use the ad platform to recruit a study sample among the target population (more details later).
  2. Administer a baseline survey to sample respondents and split them into balanced treatment/control groups.
  3. You now have three groups:
  4. two study samples: treatment and control
  5. the target population of your campaign
  6. Run your ad campaign as normal on only the target population.
  7. Run an intensive ad campaign on only the treatment group.
  8. Administer follow-up survey to study sample

15 of 69

The integrated Virtual Lab method

This design is already much better than traditional methods:

  • Participants see ads “in the wild”, disconnected from the questionnaire.
  • Overall scope of campaign only marginally reduced for impact evaluation.
  • Entire treatment group exposed to the ads (full take-up).

But the treatment effects are still not necessarily transportable to the target population, as the study sample can differ from it.

16 of 69

Going further: building representative samples

  1. Select a set of relevant covariates that define subpopulations of interest for your treatment/outcome (e.g., gender).
  2. Stratify and recruit the study sample to ensure representation from each stratum (targeted advertising!).
  3. Run experiment in the study sample and estimate the effect per stratum.
  4. Collect data on ad exposure from the target population for each stratum.
  5. Use covariate data from both the sample and the population to estimate effect on the target population.

17 of 69

With Virtual Lab

Virtual Lab is an integrated platform to make that process easy:

  1. Automatically create (hundreds) of separate ad sets based on strata and optimize spending across all of them.
  2. Automatically create audiences (inclusion/exclusion) for target population of the ad campaign.
  3. Provide a platform for sending baseline and follow up endline surveys (chatbot).

18 of 69

Case Studies and Numbers

  1. Gender Attitudes in India
  2. Malaria Incidence in India

19 of 69

RCT on Gender attitudes in India

Q: Do short video-clips delivered through social media change gender attitudes and promote positive behaviors towards VAW?

Quasi-behavioral outcomes: clicks and time spent on gender-related websites, profile picture updates with a frame saying “End VAW”

20 of 69

Recruiting ad campaign

18-to-24-years-old urban youth in Northern India

Stratification by gender

Ad banner with lottery prizes �(5 smartphones or a selfie with Bollywood celebrity) to those completing the full study

21 of 69

Timeline

15-min

baseline survey

Treatment 1:

Drama

25 min (3 episodes)

Treatment 2:

Documentary

20 min (7 episodes)

15-min

short-term survey

15-min medium-term

survey

1 week

1 week

4 months

Facebook recruiting

Control:

Placebo

25 min (4 episodes)

time

1 to 2 weeks

22 of 69

Cluster RCT on Malaria incidence in India

Q: What are the medium-term impacts of a 3-month Malaria-prevention social media marketing campaign on

  • Mosquito net usage
  • Timely treatment-seeking/testing behaviors
  • Overall Malaria incidence
  • Novel recruiting strategy: ad optimization across 80 districts in 3 states.
  • Incentive: mobile-phone credit.
  • Baseline survey administered through chatbot in July-August 2020.
  • Social media campaign excluded 40 control districts randomly chosen.
  • Bi-weekly longitudinal survey for 6 months.

23 of 69

Adaptive ad optimization

  • From early responses we found that people living in kutcha (mud, tin, and/or straw) dwellings were an underrepresented group that experiences much higher malaria incidence.
  • Recruitment was dynamically optimized to increase the representation of these individuals (population at risk)

24 of 69

By the numbers

India (Gender)

Italy (Covid-19)

India (Malaria)

Clicks on ad

33,000

3,500

302,000

Cost per click (US$)

0.66

0.10

0.20

Incentives

Lottery ticket

Amazon voucher

Mobile credit

Stratification of recruiting campaign

Yes

No

Yes

Type of study

Longitudinal (RCT)

Longitudinal

Longitudinal (RCT)

Individuals’ participation time

60 minutes

60 minutes

40 minutes

Length of study (base-to-endline)

4 months

3 months

6 months

Initial sample size

5,200

1,220

18,800

Final sample size

620 (12%)

600 (50%)

-

25 of 69

What drives effective sample and cost differences

  • Type of incentive scheme (i.e., ruffle vs. vouchers, monetary vs. “experience” prizes).
  • Characteristics of targeted population (e.g., stratification for groups less likely to engage in social media, propensity to click, saturation).
  • Duration of ad campaign (i.e., uninterrupted vs spread out).
  • Type of intervention (i.e., survey vs RCT).
  • Study length (i.e., one-time vs timing of follow-up survey).
  • Required individual’s participation time.

26 of 69

The Component Parts

  • Recruiting
  • Surveying

27 of 69

Recruiting

28 of 69

Ad platforms

Recruit participants via digital ad platforms (Facebook/Google).

Platforms allow ads to be targeted based on (many) demographic variables.

29 of 69

Stratified recruiting: quotas

Virtual Lab creates separate ad sets for the the defined strata.

Quota per stratum are defined explicitly by demographic variables available on the platform or implicitly by survey answers.

strata:

- name: young-men

targeting:

age_min: 18

age_max: 40

gender: male

size: 100

- name: old-men

targeting:

age_min: 41

gender: male

size: 50

30 of 69

Cluster population

Geographic regions can also be defined as strata.

Define regions by the administrative regions available in the ad platform or latitude/longitude/radius.

31 of 69

Continuous optimization

As respondents are recruited, Virtual Lab’s servers continuously place new ads, remove old ads, and update ad spending to fill the quota for each stratum.

Virtual

Lab’s Servers

Ad Platform’s Servers

real time

32 of 69

Auto-generated Facebook ad sets

33 of 69

Recruiting: why custom ad optimization?

The built-in ad optimization in digital ad platforms are designed to maximize a value for your business under the assumption that diversity of customers is not in-and-of-itself valuable.

In research, this is not the case. The value (information gain) of an individual changes with how many similar individuals we already have.

34 of 69

Surveying

35 of 69

Survey modes

Virtual Lab can be extended to work with any online survey:

  1. Web surveys (Qualtrics, Typeform, etc.)
  2. Chatbot (Facebook Messenger, Whatsapp, Telegram, etc.)

We will specifically discuss the Messenger chatbot and make a case for chatbot surveys.

36 of 69

Why chat?

User Experience - Respondents already know how to chat, which implies lower friction.

Creativity - Chat allows for new kinds of studies, including higher-frequency interactions, seamless interventions.

Intimacy - Chat provides a personal, private space where respondents can safely discuss sensitive topics.

37 of 69

A one-question survey via chat

  1. User clicks notification on their phone.
  2. User answers question.

38 of 69

Chat lends itself naturally to repeated (longitudinal) interaction

39 of 69

Integrate External Processes

Chat makes it easy to integrate a custom, complex user flow for external processes, like payments.

Shown is the flow for sending incentive payments via Virtual Lab’s integration with Reloadly, a mobile topup platform.

40 of 69

Measure information-seeking

Send users links to external websites.

Websites are viewable via “webview” within Messenger.

Track whether or not the user clicked on the link and how long they spent.

41 of 69

Study Archetypes

  1. Individual-level RCTs
  2. Community-level (clustered) RCTs
  3. Survey-response targeted ad campaign
  4. Quick population surveys with real-time visibility
  5. High-frequency panels with real-time visibility

42 of 69

Individual-level RCTs

  1. Social media intervention (e.g., vaccination example)
  2. “Lab-style” intervention within chatbot (e.g., India gender study)
  3. Video
  4. Pictures
  5. Information in chat messages
  6. Offline intervention
  7. Chat used as survey tool for people recruited offline
  8. Online recruitment used to in-person intervention

43 of 69

Community-level (clustered) RCTs

  • Social media intervention with community spillover effects (e.g., malaria example)
  • Any offline intervention clustered by geography.

44 of 69

Survey-response targeted ad campaign

  1. Use Virtual Lab to survey ask questions and generate “audience” of individuals who respond a certain way.
  2. Use that “audience” in ad campaign to target desired subpopulation.

This could be performed continuously so that the audience is updated in real time (e.g., vaccination campaign aimed at targeting those who haven’t yet been vaccinated).

45 of 69

Quick population surveys with real-time visibility

Virtual Lab can be used to quickly generate a set of responses for a repeated cross-sectional survey that is representative across any set of defined variables.

Monitoring tools can be used by governments or organizations to track responses in real time (continuous recruitment, continuous surveying, continuous data monitoring).

46 of 69

High frequency panels with real-time visibility

Virtual Lab can be used to quickly generate a panel of (representative) respondents.

Respondents can be asked questions at high frequency (i.e. experience sampling: daily or weekly questions).

Monitoring tools can be used by governments or organizations to track responses over time.

47 of 69

Monitoring in real time

48 of 69

Future Developments

  1. Ad-specific selection bias robustness checks
  2. Stratified (adaptive) treatment assignment

49 of 69

Robustness: selection bias due to acquisition cost

One latent variable that drives selection bias in online advertising is the cost to place an ad in front of an individual.

This cost is different for every person and, by using automated bidding, we place ads in front of the “cheapest” individuals.

Does this latent cost variable interact with a specific treatment or outcome? Automated procedures to determine that would be helpful.

50 of 69

Stratified + Adaptive Randomization

Stratified randomization into treatment arms optimizes population balance based on answers to initial questions.

Adaptive randomization picks treatment/control status based on results calculated so far to maximize statistical power.

51 of 69

52 of 69

Appendix

53 of 69

Software & Security

  1. Data encryption
  2. Scalable architecture
  3. Open source

54 of 69

Data Encryption

All data in-flight into and out of the cluster is encrypted via HTTPS (including connections to Facebook and Virtual Lab dashboard).

Encryption for data at rest inside the cluster is provided at disk-level by the block storage of public cloud provider (i.e. Google Cloud).

Downloading of responses encrypted via HTTPS. Users are responsible to store data securely on their systems for downstream analysis.

55 of 69

Scalable Architecture

Modern, open-source software stack: Docker, Node.js, Go, Kafka, CockroachDB, Kubernetes, Prometheus, Grafana.

Evented system with parallelism built around Kafka partitions (Kafka is an open source message bus used by companies such as LinkedIn, Netflix, Wikimedia).

System scales horizontally with servers in Kubernetes cluster.

56 of 69

Open Source Code

All code is open source: https://github.com/vlab-research

Application packaged as Helm app for one-file config deployment into any Kubernetes cluster.

Interested in deploying your own? Get in touch! We’d love to help you.

57 of 69

Covid-19 and Stereotypes in Italy

Q: How does Covid-19 affect individuals’ perceptions of infectiousness of various nationalities and political attitudes towards China?�

  • Longitudinal survey (Feb-May 2020), respondents interviewed every 2 weeks.
  • Ad campaign geo-targeted 90 provinces.
  • Ad banner with Amazon coupon for anyone completing 3-month study.

58 of 69

Survey on professional aspirations in Nigeria

Q: What do urban young people in Nigeria want for their future and how do they think about opportunities in the public vs. private sector?

  • A 20-min survey collected information on beliefs about public and private sectors and career choices.
  • Ad campaign targeted 13-to-19-years-old urban youth.
  • Ad banner with lottery prizes (1 smartphone) to those completing survey.

65% of respondents report not trusting the government and 52% not trusting bureaucrats, and when asked about the two main personality traits they associate with a government worker (out of a list of 20 traits), 24% of them picked words like “corrupt”, “selfish”, “arbitrary” or related terms.

59 of 69

India: costs and attrition

33,000 clicks on ad → cost per click $0.66

18,250 (100%) started baseline → cost p.p. $1.19

7,000 (38%) finished (15-min) → cost p.p. $3.10

3,260 (15%) watched 3-10 videos over 1 week → cost p.p. $6.66

Final cost p.p. incl. incentive → $7.3

60 of 69

Nigeria: costs and attrition

3,000 clicks on ad → cost per click $0.11

1,100 (100%) started survey → cost p.p. $0.31

550 (50%) filled 20-min survey → cost p.p. $0.62

Final cost p.p. incl. incentive → $0.98

61 of 69

Italy: costs and attrition

3,500 clicks on ad → cost per click $0.10

1,220 (100%) started study → cost p.p. $0.28

600 (50%) completed 6 waves → cost p.p. $0.57

Final cost p.p. incl. incentive → $11.6

62 of 69

Project Pipeline (incl. Covid-19)

  1. Covid-19 and Literacy apps
  2. Edutainment PSAs
  3. Seeding information and Network dependence
  4. Digital recruitment and Mode effects

63 of 69

Covid-19 Partnerships with social media companies

  • Decrease recruitment costs for interventions and their evaluations (e.g., ad credits, cross-posting across media platforms)
  • Improve campaign targeting by adapting content and targeting campaigns in real-time (e.g., targeting disinformation hotspots)
  • Can help systematically introduce social media campaigns in development investments by testing cost-effective, scalable and high-visibility campaigns. A group of researchers at UPF, UAB and the World Bank aim to test:
    • Distribution strategies using network centrality measures for community-level impacts
    • Edutainment to promote attitudinal and behavior change (e.g., reading to young children) and take up of social programs and apps (e.g., literacy apps)

64 of 69

Covid-19 and Literacy apps (1)

65 of 69

Covid-19 and Literacy apps (2)

66 of 69

Covid-19 and Literacy apps (3)

Effective home-learning tools are needed with 1.6 billion students staying home due to Covid-19.

With support from Norad and World Bank projects, the World Bank’s DIME edu-tainment team is testing the effectiveness of literacy apps like “Feed The Monster” on parents’ self-efficacy beliefs and educational outcomes of children in SSA.

Uses two groups: direct (optimized) app marketing and RCT from Virtual Lab survey.

Tests effectiveness of app but also effectiveness of marketing channel for targeting populations with the most promising potential outcomes.

67 of 69

Edutainment PSAs

Expand current partnerships with edutainment producers (e.g., MTV, Discovery, etc):

  1. In the short-term, to test the effectiveness of edutainment public service announcements (PSA) on Covid-19 health behaviors (e.g., social isolation) and behaviors related to other epidemics triggered by the current pandemic (e.g., gender-based violence, learning poverty, social cohesion).
  2. In the long-term, test edutainment PSAs on attitudes and behaviors related to climate change and other social causes, for example, ethical fashion.

68 of 69

Seeding information and Network Dependence

  • Expand the team’s research on “seeding” information based on network centrality measures using social media platforms.
    • Targeting potential influencers can increase spread of information and their attitudinal and behavioral impacts on network members.
  • Shared network structure can cause spurious dependence which can affect statistical validity of any study.
    • Using the network structure of a social network, the study would interview correspondents and tests for interference with findings from the network dependence.

69 of 69

Digital recruitment and Mode effects

As development interventions are increasingly moving online during the Covid-19 crisis, there is urgent need to study effective recruitment and long-term use strategies.

Similarly, investigating study recruitment and survey-mode effects is needed in new online research. For example, testing:

  1. Chat vs. other survey modes in non-response and mortality (phone, face-to-face, Qualtrics, MTurk, etc.)
  2. Incentives (raffle vs. coupon)
  3. Recruitment strategy (Facebook vs. other digital advertising vs. non-digital advertising)