INTRODUCTION TO DATA SCIENCE
Experimental Design
Lecture: 02
– FARDINA FATHMIUL ALAM
(fardina@umd.edu)
CMSC 320: 2026
Today’s Objectives
Chapter 3 in : https://ffalam.github.io/CMSC320TextBook/chapter3/Chapter_3_0.html
Today, we’ll cover the basics of experimental design, including how to plan and conduct experiments.
The goal is to help you design and analyze experiments more effectively.
2
Experimental Design in Data Science?
The process of planning, conducting, and analyzing experiments to test hypotheses and gather meaningful data for data-driven decisions.
3
Data science fundamentally involves making decisions based on data.
4
What will the weather be like for the next 10 days?
How many visitors did the website ‘X’ receive last week?
Does offering free shipping increase the number of purchases?
Which courses should the department offer to maximize enrollment next semester?
Optimization criteria
Objective or goal function → we want to achieve
(maximize or minimize)
Asking the right questions before solving a DS problem is a great start! And be specific!
We use experimental design to collect good data, but we need causal experimental design (causal and prescriptive problems) only when we want to test the effect of an action in the real world (e.g., A/B testing).
Different questions lead to different models, data needs, and evaluation criteria.
Why did website traffic drop last weekend?
Topics
5
Example: Online Retail
Let's say you're a data scientist working for an online retailer, and you want to test whether changing the color of the "Buy Now" button on your website affects the click-through rate (CTR)
6
What is your problem definition?
Find which version of the button “Buy Now” (Option A (default) or Option B (red)) is more likely to maximize the CTR
What is your Optimization Criteria? What we want to maximize?
CTR → we want to select the ad options with button “buy now” that leads to Higher CTR
Buy It Now
Ques: How can we set up an experiment to collect data in this case?
7
Buy It Now
Data Size / Sample ?
Ques: How can we set up an experiment to collect data in this case?
→ No. of website visitors
8
Buy It Now
Data Size / Sample ? → No. of website visitors
Views the original website with existing button color. This group experiences no changes.
Sees the same website but with a different color for the "Buy Now". This is the group that experiences the change you want to test.
Ques: What are the variables here?
CONTROL GROUP
TREATMENT GROUP
Dependent Variable
Draw more reliable conclusions about the impact of the independent (manipulated) variable.
Independent Variable
Ques: How can we set up an experiment to collect data in this case?
Summary: Variables, Population, and Groups in a Study
Once the problem is defined, identify the variable(s) of interest that are relevant to your research question.
Treatment vs. Control Groups
Comparing these groups helps identify the effect of the IV on the DV
9
Also, specify the population or sample that your study will focus on.
Topics
10
Come up with a Hypothesis
11
A hypothesis is a testable statement you want to evaluate.
If X is true, then Y should happen.
What is a hypothesis?
How do we test it?
Brainstorming Time
12
Good experimental design aims to minimize correlated variables.
Topics
13
Confounders (Before Data Collection)
Confounder: An external variable that affects the DV and distorts the IV → DV relationship if not controlled.
Why it matters
14
Examples
Confounder: metabolic rate
Confounders: age, socioeconomic status etc.
More Example: Experimental Design Flow : Polling
Questions: How do we know which candidate is ahead!
Eliminate confounding variables as much as possible → Sample Bias, geographic representation, Population Proportion Bias, Demographic Mismatch and many more to make the data as accurate as feasible.
Topics
16
Some Ways to Deal with Confounder
17
Design Stage (Before Data Collection)
Analysis Stage (After Data Collection): Regression / multivariable models, Statistical adjustment
Example: Control Confounder Variable in an Experiment
If the amount of study time ( independent variable) is increased, then exam scores ( dependent variable) will also increase.
18
Stratify students by prior knowledge (high/medium/low),
Treatment
Treatment
Exp. Design Idea 1: Stratified Randomization:
Exp. Design Idea 2: Block Design (Matched Pair)
Pair participants by prior knowledge,
Try by yourself: control the effect of “age”
"As books read increases, avg. literacy also increases.”
We can measure the age of each individual; to see the effects of age on literacy.
Experimental Design: How to design
19
Topics
20
Methods for Collecting Data
if pre-existing datasets are not available.
A. Observational studies → Observe and record data (variables) without intervening or manipulating variables (Observe; don’t change anything intentionally).
E.g. Observing animal behavior in a natural habitat without any external influence.
B. Surveys → Collect information through structured questionnaires or interviews.
E.g. Conducting a survey to gather opinions on a political issue.
C. Experiments → We actively change something to see what happens.
D. Simulations → Create artificial scenarios to model real-world situations for data collection.
E.g. Using a computer simulation to study traffic patterns in a city.
21
Cross-sectional studies: data collected at one time point� Example: survey people’s exercise habits today |
Retrospective (case-control) studies: look back at past exposure� Example: compare smoking history of lung cancer patients vs. non-patients |
Prospective (longitudinal/cohort) studies: follow a group (cohort) over time� Example: track smokers and non-smokers for 10 years |
B. Surveys (A specific type of observational study)
Collect data using questions or questionnaires�
Example: Survey students about study habits and exam stress.
23
D. Simulation
Use a computer or mathematical model to mimic real-world systems�
Example: Simulate traffic flow to study congestion without changing real roads.
24
Topics
25
Placebo Effect
Improvement occurs due to belief in treatment, not the treatment itself
Example: Patients feel better after receiving a sugar pill they believe is real medicine
26
Minimizing Bias in Experimental Design
Blinding: participants (and/or researchers) do not know who receives the treatment or placebo
Using a placebo helps keep participants unaware of their group, reducing bias.
The Fundamental Rule of Data Collection
Your data must representative of the population you want to study.
Keep in mind that
It is almost impossible to be certain that your experiment has completely removed all forms of bias. It is necessary to consider possible sources of bias and highlight them in your analysis. Ideally, future experiments would improve upon your method by iteratively eliminating those sources of bias.
27
Key Takeaways
Quick Class Task
Identify which method for collecting data (observational study, an experiment, a simulation, or a survey) is best in each of the following situations and explain your answer.
28