Community for Rigor
Confirmation bias: the original error
Activity: Deduce the number rule.
Let’s try some hypothesis testing.
But first, brief instructions.
Activity: Enter numbers to guess a secret rule.
Initial screen
Test the number sequence
Does it match? Keep testing.
Initial screen
Input a number sequence
Input a number sequence
Test the number sequence
Does it match? Keep testing.
Know the rule? Submit hypothesis.
Activity: Deduce the number rule.
Now it’s your turn to determine the secret rule.
So, what happened?
Discuss: Did you falsify your hypothesis?
How did this activity go for you? Were there moments when you realized you needed a new strategy?
Think
Pair
Share
What have researchers found about bias?
Here’s one experiment that shows how context inserts bias that interferes with expertise.
5 fingerprint experts agreed to a study and examined prints they had previously identified as matches.
The experts were instructed to ignore contextual information and given unlimited time to examine the prints.
The catch—this time the experts were told the prints were “erroneously matched.”
We are swayed by misleading context even when we know better.
“... experts are vulnerable to irrelevant and misleading contextual influences.”
4 out of 5 experts contradicted their own previous identification, and incorrectly assessed the prints were not a match.
Only 1 expert maintained the original "match" judgment.
We discount information that undermines our past judgments.
In a betting scenario, participants significantly increased their bets when their partners agreed with them.
They only slightly decreased their wagers, however, when their partners disagreed (Kappes et al. 2020).
As you can see, we tend to confirm our hypotheses.
This impairs our research.
This tendency is a cognitive bias called confirmation bias.
Our biased brains
Lesson 1
People’s tendency to process information by looking for, or interpreting, information that is consistent with their existing beliefs.
Confirmation bias:
By distorting how we think,
Confirmation bias skews our ability to conceive, test, and evaluate scientific hypotheses
In this unit, confirmation bias also covers two related cognitive biases:
expectation bias & observer bias.
Student expectations affected training & created performance differences by the end of the study.
There were "maze-bright" & "maze-dull" rats
Expectations can produce changes
In the 1960s, Robert Rosenthal and Kermit Fode asked students to train 2 breeds of rats to solve mazes.
The rats were actually genetically identical.
Also known as experimenter’s bias, it is:
This is expectation bias
The tendency for researcher expectations to influence subject and the outcomes.
Experimenters expectations can produce an actual, measurable difference in performance.
Students did not interact with the pigs, so difference resulted from varying human perceptions.
Video recordings of normal vs high social breeding value [SBV+]
SBV+
Normal
Expectations can affect data collection
In a more recent study, students were asked to rate the sociability of 2 breeds of pigs (Tuyttens et al. 2014).
The videos were identical, but SBV+ pigs were rated as more social.
This can look like:
This is observer bias
Tendency for researcher expectations to influence their perceptions during a study, thus affecting outcomes.
Expectations of data collectors can cause perceivable differences that are not actually there.
These biases impede rigorous science & impair our ability to:
Design informative, objective experiments and interpret experimental results impartially.
We can manage this inclination toward confirmation bias.
Frame & ask questions.
Seek out information.
Collect observations.
Make sense of data.
Since we know confirmation bias distorts how we:
Building habits to formulate better hypotheses.
Design experiments to rigorously test and compare hypotheses.
Use appropriate experimental methods to reduce errors.
Place results correctly on the exploratory/confirmatory axis.
Frame & ask questions.
Seek out information.
Collect observations.
Make sense of data.
Since we know confirmation bias distorts how we:
We can mitigate bias by:
Unit overview
Next, let’s learn how to formulate better hypotheses to address how we frame and ask questions.
“Favored” vs. alternative hypotheses
Lesson 2
Many studies don't �rigorously test a hypothesis.
Instead, they show weak evidence in support of a favored hypothesis.
How does this happen?
Start with a vague hypothesis.
Compare a favored hypothesis with a trivial null hypothesis.
Don't suitably compare or test multiple plausible hypotheses.
Often studies hit stumbling blocks because they:
Start with a vague hypothesis.
Compare a favored hypothesis with a trivial null hypothesis.
Don't suitably compare or test multiple plausible hypotheses.
We interpret any outcome as supporting our hypothesis.
We get results that don't provide any new insights.
We design the experiment to disprove H0.
What happens as a result?
Often studies hit stumbling blocks because they:
Tip!
H₀ = Null Hypothesis.
It’s the default assumption that there is no effect.
What’s the harm?
Without a connection to a specific hypothesis or related models or theories, the result has limited value to others working in this area.
Vague hypothesis
Disproving a null hypothesis gives no information about what is occurring. A false H0 is consistent with many hypotheses, including the favored Ha.
Disproving H0
Ha = Alternative Hypothesis.
e.g. The effect exists / the magnitude is NOT 0.
When H1 and H2 are not mutually exclusive, both could be right (or both wrong)—the study results don't give clear answers one way or another.
H1 & H2 are �NOT mutually exclusive.
H1 = Explanatory Hypothesis 1.
e.g. The effect operates through mechanism 1.
H2 = Explanatory Hypothesis 2.
e.g. The effect operates through mechanism 2.
What solutions are there?
To find a hypothesis worth testing we can run exploratory studies, we can search the literature, consult an LLM, or ask an expert in the area.
Vague hypothesis
Make the hypotheses exclusive so that any result is informative; or design a study to be comparative, e.g. about the magnitude of H1 vs H2.
H1 & H2 are NOT mutually exclusive.
Find one of the other possible hypotheses. Other researchers in this area (that we disagree with) usually have one that they like.
Disproving H0
Let’s focus on solving this problem.
Make the hypotheses exclusive so that any result is informative; or design a study to be comparative, e.g. about the magnitude of H1 vs H2.
Experiments with 2 mutually exclusive hypotheses are more likely to develop
a clear differentiation between potential explanations.
When hypotheses are not mutually exclusive, however, they increase the likelihood of
a favored hypothesis because we don't consider other plausible outcomes in the experiment.
How does a “favored” hypothesis lead to confirmation bias?
When only one hypothesis is tested, we tend to design studies to look for results consistent with it being true (e.g. having a hypothesized number rule, and testing a matching sequence).
Ex: The Left-Brain/Right-Brain Myth
Misinterpretation of research conducted by neuropsychologist Roger Sperry on specialized functions of the two brain hemispheres in the ‘70s and ‘80s (Sperry 1967, Sperry 1968) gave rise to early theories that the left hemisphere is exclusively logical and the right exclusively creative (Gazzaniga 2005).
This dichotomy became a “favored” explanation for personality traits, relying on oversimplified evidence and ignoring conflicting findings.
Focusing on supporting data reinforced the myth and discouraged consideration of more complex models.
Confirmation bias in action
The persistence of this myth influenced teaching strategies, career counseling, and how individuals viewed their cognitive abilities—potentially limiting personal growth and misdirecting educational efforts.
Impact on education and self-perception
What does the science say?
Neuroimaging studies reveal extensive communication between the two hemispheres and show that complex behaviors result from coordinated activity across these brain regions (Toga & Thompson, 2003).
Cognitive functions emerge from an integrated network that span both hemispheres—not from isolated “left” or “right” processes.
They methodically evaluate both through their comprehensive review of anatomical differences between hemispheres and their developmental origins.
Toga and Thompson exemplified exploring a mutually exclusive hypothesis by examining whether structural brain asymmetries are intrinsically determined or experientially shaped.
So, we have a way out: develop competing hypotheses!
Strong(er) inference practices depend on us making �better hypotheses
(Platt 1964).
ACTIVITY: develop an competing hypothesis.
Consider other possible explanations to build out other plausible hypotheses.
Which prompts helped you to think of a competing hypothesis?
Think
Pair
Share
Discussion
There are 2 hypotheses that are mutually exclusive.
Both H1 and H2 are plausible.
Recap: What’s a good place to start?
Next up: countering bias that creeps into experimental design.
Researcher degrees of freedom
Lesson 3
Confirmation bias can creep into our experimental design.
Watch out for design choices, such as:
Which population/animal model to test.
what specific form of treatment is applied.
How the outcome is measured.
Which kind of statistical test to use.
Consider these 2 workable hypotheses:
H1 = Emotional images have a localized neural correlate.
H2 = Emotional images have a diffuse neural correlate.
What could go wrong?
Consider these 2 workable hypotheses.
H1 = Emotional images have a localized neural correlate.
H2 = Emotional images have a diffuse neural correlate.
H1 = Emotional images have a localized neural correlate.
H2 = Emotional images have a diffuse neural correlate.
H1 = Emotional images have a localized neural correlate.
H2 = Emotional images have a diffuse neural correlate.
H1 = Emotional images have a localized neural correlate.
H2 = Emotional images have a diffuse neural correlate.
H1 = Emotional images have a localized neural correlate.
H2 = Emotional images have a diffuse neural correlate.
Diffuse: Multiple or undefined regions of the brain undergo significant change in response to a stimuli
Emotional Images: Images that stimulate a dominant emotion such as happiness or fear through depictions of people and animals.
Localized: A specific region of the brain undergoes significant change in response to a stimuli.
Neural correlate: A measurable pattern of nervous-system activity that reliably accompanies a mental state or event, measured by an fMRI technique that infers brain activity from localized changes in blood oxygenation.
ACTIVITY: Pick the most biased choice
Compare choices and pick the option that you think will most bias results toward H1, then explain why.
Discussion: How was voting?
Did you agree or disagree with the results of the voting? What did you notice that helped you make your decision?
Think
Pair
Share
Let’s take a closer look at how some of the researcher choices insert bias:
The authors choose smoothing, region of interest, and voxel to analyze, so that the effects are as large as possible, favoring H1.
We need to avoid double dipping - never use what you see to decide how to analyze the same data.
Choice 1: Double dipping
Let’s take a closer look at how some of the researcher choices insert bias:
The authors screen participants to find personality types more likely to create strong results. How do we know those people are representative of a larger population? Do we know how personality type affects the brain?
Choice 2: Personality screen
We need avoid selecting ourselves into populations that seem more likely have the result we want.
Both choices closely favored H1 & took complex steps to erase real variability from data.
How does this happen?
H1
The study favors H1.
Researchers make MANY choices that cut real variability from data.
Then,averaging, biased selection, & augmenting data help make H1 SEEM true.
Hazard!
These choices all make H1 likely to be supported.
The myriad choices researchers have when designing experiments & the flexibility that arises from these choices, which allow bias to creep in via subjectivity & interpretation.
Researcher degrees of freedom:
Selecting subsets of data.
“This is how we do it in our lab.”
Researcher degrees of freedom introduce many instances of flexibility including (Gelman and Loken 2016):
Vague hypotheses that could be validated by different results.
Choosing when to stop data collection.
Removing data anomalies.
Selecting and designing models.
Measuring multiple outcomes to select from.
Choosing the results or analyses to emphasize in the paper.
The more degrees of freedom we have
the more we’re going to choose options
that favor our initial hypothesis.
Reporting null and unexpected results.
Masking study participants, researchers, and analysts.
Developing specific, falsifiable hypotheses.
Randomizing subject allocation.
Pre-registering hypotheses and analysis plans.
Transparently reporting ALL data analysis.
We minimize confirmation bias posed by this flexibility through rigorous experimental practices:
Confirmation bias can introduce error into research.
However, one key principle of rigorous experimental design that will help us combat it is: masking.
Let’s dive deeper to see how masking curbs researcher degrees of freedom & bias.
Mitigating bias through masking
Lesson 4
Experiment design
& execution involve many decisions.
These decision points can be a place where confirmation bias affects our work.
Let’s look at some choices in one segment of an experiment.
Administrator (You)
Data collector (Also You)
Mouse 1, 2, 3 are all more active
Only mouse 6 is more active
Analyst (Also Also You)
After outlier correction the treatment group responds well, relative to the control group (p < 0.05).
Hypothesis: Treated mice are more active.
Ex: Testing a new Parkinson’s drug
Preparer (You!)
Treatment
Control
Choice bias!
You’re aware which mice look more active before the experiment. You may assign them to treatment (which you want to work).
Labeling bias!
You expect the treated mice to do better so you may view them as more active.
Data analysis bias!
You feel that the outlier (which is doing well) needs to be rejected. Thankfully your results were statistically significant.
Discuss: What went wrong?
What problems occurred in this procedure?
Think
Pair
Share
By not withholding information at important steps, all these choices introduced bias.
What should we do instead?
Mask our study!
“Masking, is the process by which
information that has the potential to influence study results is withheld from one or more parties involved in a research study.”
Terminology
Masking & blinding are terms that describe the same practice. We use masking due to ableist connotations of associating blindness with being unaware of information.
But how can you know if biases are distorting results?
To answer that, we need to review p-values.
A p-value (p):
Is a measurement ranging from 0 to 1, that quantifies the probability of data as extreme as observed, if the null hypothesis, H0, is true.
If the null hypothesis H0 is true, all values from 0 to 1 are equally likely.
A small p-value suggests that data are extreme, relative to what would be expected if H0 were true.
A p-value lower than a pre-set significance threshold (usually 0.05) is used to reject H0.
Can masking really have an impact on experiments?
Let's look at some examples.
Failing to mask significantly inflates effect sizes
When treatment allocation is masked,
measured efficacy of drug NXY-059 drops from an average of 54.0% to 25.1%.
In a meta-analysis of 290 animal research studies,
When treatment allocation is masked,
measured efficacy of drug NXY-059 drops from an average of 54.0% to 25.1%.
the odds of reporting a positive result were 5.2 times greater when neither masking nor randomization were used(Bebarta et al. 2003).
In other words, failing to mask adds a bias that can increase effect size by ~0.28 to 0.91σ.
σ = standard deviation.
ACTIVITY: explore the impact of bias.
Use these power curves to explore how the probability of detecting an effect is affected by masking failures.
Discuss: What is the impact of sample size?
What are the implications for the reliability of research findings when there is a bias on effect size?
How is this different for small or large samples?
Think
Pair
Share
How can we use masking to combat confirmation bias?
Data analysts make modeling choices.
Raters make judgments when collecting data.
It also exposes research to risks, such as:
Withholding details that affect direct observation and judgment.
Providing de-identified, coded data to prevent biased analysis.
We can use masking to mitigate bias by:
Collect observations.
Make sense of data.
Frame & ask questions.
Seek out information.
Just as confirmation bias distorts how we:
Think back to our example.
How do you actually do the masking?
Analyst (New Colleague!)
Your colleague carefully compares the two groups and finds there is no statistically significant difference.
Administrator
(Colleague!)
Mouse 6 is more active.
Mouse 1 and 3 are more active.
Data collector (You)
Hypothesis: Treated mice are more active.
Ex: Testing a new Parkinson’s drug
Preparer (You!)
Treatment
Control
Choice bias averted!
Your colleague ideally doesn’t know which mice are more or less active, they’re randomly assigning mice to each group.
Another bias averted!
Since you don’t know the group assignments, your observations will be more objective.
What next?
Since you followed a thoughtful experimental design, you have results you can trust.
Discuss: What is different in this experiment?
How did taking steps to mask this study impact the results?
Think
Pair
Share
So, who & what must
be masked?
Mask anyone with a possible influence on the outcome of the experiment:
Single-masked:
participants (for human studies) &
animal care staff (for animal studies)
Double-masked:
experimentalists
& clinicians
Triple-masked:
data collectors &
data analysts
Who gets masked
Mask samples
So, people conducting the experiments (administering drug or placebo)
&
people assessing the outcome (making observations & recording data)
are not aware of:
which treatment is being administered
which group a given sample belongs to
What gets masked
Mask samples
Also called allocation concealment: for example, “using sealed envelopes … until after a participant had been irrevocably entered into a trial” (Schulz et. al., 2018).
which treatment is being administered
which group a given sample belongs to
What gets masked
Allocation concealment works to prevent selection bias by using “a mechanism to prevent foreknowledge of upcoming assignments.”
As you can see, masking is a complex process even when seemingly simple.
What should you do if you have limited resources or tools?
Randomization: Use a random number generator to set treatment-allocation order, reducing selection bias.
Strategic masking: Keep key support staff (e.g., coders, care staff) unaware of group assignments.
Robust measurements: Design studies with measurements that resist manipulation from personal insights.
Pre-registration safeguard: Preregister study protocols to validate and monitor potential biases from unmasked practices.
Next, let’s examine some risks to unmasking.
How good is your mask?
Lesson 5
You’ve masked your study!
Or have you?!
Masking may not be fully effective depending on the circumstances.
What if …
The treatment causes inflammation at the site of injection, whereas the control doesn’t.
A psychoactive treatment causes an obvious good time and patients notice changes in perception and behavior.
Treatment
Control
Animal strains have differences that are unrelated to the behavior of interest, but are noticed by experimenters (like raven funk).
Even when you’ve taken steps to mask an experiment, there are unmasking risks.
Certain clues can accidentally unmask an experiment and come from sources such as:
Caretaker behavior.
Equipment or procedure discrepancies.
Differing side effects.
Healing rates.
Let’s explore where in an experiment unmasking might occur.
ACTIVITY: Spot potential risks for unmasking!
Review a study and suggest solutions to a research team.
Discuss: How can we manage these challenges?
How do the lessons learned from this case study apply to other areas of research where similar biases or rigor issues could occur?
Think
Pair
Share
But should we also formally assess whether participants, researchers, or administrators have been unmasked?
Assessing if masking has been done (or done correctly) can be complicated or misleading.
There is also no consensus on how the efficacy of masking should be assessed (Born 2024).
In fact, sometimes explicitly assessing the effectiveness of masking inserts bias when questioning study participants (Schulz et al. 2010).
Use placebo controls to prevent treatment effects from revealing information.
Strategies to mitigate & test for accidental unmasking broadly (Muthukumaraswamy et al 2021):
Provide clear, neutral instructions to minimize expectation bias.
Report participant expectations and unmasking instances for accurate result interpretation.
Review past masking procedures to improve future designs.
Placebo controls aren’t perfect!
Differential side effects (e.g., noticeable nausea or drowsiness) or other treatment cues can lead to unmasking.
What if an experiment absolutely cannot be masked?
Sometimes masking is nearly impossible (as in psychedelic drug trials)
Even so, bias can be reduced by measuring & accounting for pre-trial treatment expectations.
Report it!
State masking was incomplete & describe why (e.g., due to the unmistakable psychoactive effects of the treatment).
Strategies for reporting studies that cannot be masked (Muthukumaraswamy et al 2021):
Summarize participant/assessor guesses (ideally using indexes like the Bang/James Blinding Index) & include confidence ratings for these guesses.
Describe key trial instructions, advertising, & consent details shaping perceptions.
Explain the potential bias in treatment effect estimates & note any analytic strategies (like conditioning on participants’ beliefs) to adjust for these biases.
Present baseline & post-intervention expectancy using standardized tools
(like the Stanford Expectations of Treatment Scale) to quantify how participant beliefs may bias outcomes.
Next, learn about important analytical practices to prevent bias when exploring data.
Analytical practices to mitigate bias
Lesson 6
We’ve seen how confirmation bias can influence how data is collected.
But what happens after that?
Let’s explore the choices that occur in a data analysis.
The StudentLife dataset (Wang et al. 2014) measures:
student sleep habits, exercise, socialization, etc.
When exploring a dataset for relationships of interest, we often start by computing a
correlation coefficient
A correlation coefficient (r):
Is a specific measure, ranging from -1 to 1, that quantifies the strength & direction of a linear relationship between two variables.
The closer r is to zero, the weaker the relationship.
Positive r values indicate a positive correlation: both variables increase or decrease together.
Negative r values indicate a negative correlation: as one variable increases, the other decreases.
Specifically, �the correlation coefficient, r, �is useful for:
Quantifying trends.
Identifying relationships.
Making predictions.
ACTIVITY: Now, let’s look at the Dartmouth data.
Test a given hypothesis and try to justify it with the data provided.
Discuss: What did you discover?
For the relationships that you found in the data, �did you generate (causal) stories or explanations?
Think
Pair
Share
Although our explanations may be plausible, did we actually test them as formal hypotheses?
(No)
We did engage in exploratory data analysis!
Discovering trends & patterns can generate hypotheses for further testing.
Much of research is exploratory in nature.
(and that’s okay!!)
Note:
However, exploratory work is sometimes reported as confirmatory.
(and that’s NOT okay!!)
Recall the warning from lesson 2: When you have a vague hypothesis, you are inclined to find evidence for it.
The problem is that many different data patterns can be interpreted as support.
Undergrad students who socialize more have better mental health.
For example, both of these specific patterns in the data:
Could be interpreted as evidence for this vague hypothesis:
Students who spend more time conversing also report less stress.
Students who interact with more people also report feeling less lonely.
This is an example of:
Then interpreting the pattern as a test of statistical inference.
First exploring a data set.
Making analysis choices that reveal an interesting pattern.
Watch out!
Making data analysis choices to obtain a statistically significant result is a questionable research practice known as p-hacking.
What do we do instead?
Validate trends & patterns by conducting confirmatory tests on other datasets.
Pre-register predictive hypotheses & test them by collecting new data.
See trends after exploratory research? Awesome! Now you can:
Use identified relationships as the basis to develop specific hypotheses for why the relationship exists.
Ultimately, we must be careful to not mistake an exploration of data with a confirmatory test of a hypothesis.
Next, see how even machine learning models can have bias.
Data masking for machine learning
Lesson 7
We have seen how researcher degrees of freedom allow bias to occur in data analyses.
But what about when we deploy machine learning to make those choices for us?
Scenario: A research team is
developing a machine learning model to detect Parkinson’s disease.
The plan:
The data:
Medical data from 40 patients, half with PD & half without.
PD-
PD+
Training:
Train the model on data from all 40 patients.
Testing:
Evaluate model performance on a subset of patients.
PD?
We've already used all the data to train the model.
Evaluating the model on the same data used for training is “double-dipping” and it contaminates our evaluation of the model.
But wait!
Leakage occurs when model training includes inappropriate information, such as the same data used to evaluate performance.
This problem is referred to as leakage!
“Data leakage is a flaw in machine learning that leads to over-optimistic results”
Per a survey that shows leakage affects 294 papers across 17 scientific fields
So, what should we do instead?
Prevent leakage through data holdout!
Standard data holdout partitions data for training & testing across non-overlapping splits.
Training:
Prevent leakage through data holdout!
Standard data holdout partitions data for training & testing across non-overlapping splits.
Hold this data out and use it for testing only!
Training:
Activity: Build a Parkinson’s Disease detector!
See these principles in action for yourself.
Discuss: How to evaluate model performance?
How did the performance of the model change?�What other kinds of information in the training data could result in leakage?
Think
Pair
Share
Data leakage can occur in several ways:
Splitting temporal data into segments can �still result in overlap due to trends.
Trying multiple models or features sets and selecting the one that performs best on the test set.
Curated datasets may include the same data points in both training and testing, compromising generalization.
Augmenting small datasets with simulated data may inadvertently include training information in the test set.
When doing model building & evaluation,
make sure to:
Include more than 50 subjects, if possible.
Training & test datasets don’t share extra information.
Train on a subset & test on the remaining (e.g. 80% and 20%).
Reminder:
To use data holdout to assess model generalizability & identify potential overfitting:
Use appropriate partitions to avoid leakage.
Recognize that inflated performance can mislead model evaluation & subsequent decision-making.
Always separate training & test data to avoid shared subject-specific information.
Next up is our final lesson! Let’s learn about the wider world of cognitive biases (& errors).
Bonus biases that disrupt research
Lesson 8
Confirmation bias is not the only bias to watch out for.
The Catalogue of Bias reveals dozens of other biases that can distort research outcomes.
Further, confirmation bias is:
A cognitive phenomenon (& bias) that
stacks with other biases &
cognitive phenomena to
create even more rigor problems.
Confirmation bias can compound with other distortions, like:
Both of which can:
And it’s not the only bias to avoid.
Compound the effects of confirmation bias.
Result in systematically inaccurate decision-making.
The bandwagon effect.
Cognitive dissonance.
Create errors that impair research quality.
Let’s look at examples for these new biases.
Example 1: the bandwagon effect
Recall the misinterpretation of neuropsychologist Roger Sperry’s work that gave rise to a “favored” explanation for personality traits aligned to the left vs right brain hemispheres.
The willingness of so many others to ignore conflicting findings is also indicative of a tendency to align with the thinking of others.
Solomon Asch ran several experiments with groups of 8 participants made up of: 1 subject and 7 actors.
They were asked to identify which of various lines matched the reference.
They played 15 rounds where actors gave obviously wrong answers for 12 out of 15 rounds.
Reference
Comparison
1951 Swarthmore experiment:
Example 2: the bandwagon effect
In control trials, with no pressure to conform to actors, the error rate on identifying the right line was less than 0.7%.
In test trials, 74% of participants gave at least one incorrect answer out of the 12 critical trials.
35.7% conformed to the larger group’s incorrect responses for a majority of their answers.
12% of participants followed the group in nearly all of the tests.
Results: Swarthmore experiment
They started to really believe that they themselves were wrong!
Participants who conformed to the majority on at least 50% of trials reported experiencing what Asch termed a “distortion of perception”:
Of the participants who followed the group at some point, most stated afterward that they knew the group was wrong, but didn’t want to go against them.
What happened?
Also known as the bandwagon effect, it is:
The tendency to change opinion or behavior to align with the majority.
Occurs even when we believe the group is clearly wrong.
This is conformity bias
Example 3: cognitive dissonance
Data from a survey on attitudes toward data sharing & pre-registration in the social sciences shows a significant belief-behavior gap.
While engagement in open science nearly doubled between 2010 and 2020, the strong favorable opinion greatly outstrips practice (Ferguson et al. 2023).
Beliefs
Behavior
Beliefs
Behavior
Posting data/code
Preregisting
Very to moderately in favor
Very to moderately not in favor
Neutral
Has done task
0%
100%
50%
25%
75%
(The control group were not asked to speak with the actors.)
Leon Festinger & James Carlsmith ran several experiments with students tasked with completing boring, repetitive tasks.
Subjects in group 2 were paid $1 to lie.
Subjects in group 1 were paid $20 to lie.
Festinger & Carlsmith’s Prediction: Group 2 ($1) would rate their experience higher than Group 1 ($20).
After the tasks, students were told to lie to the next group of participants, describing the tasks as fun and exciting, and then rate their experience.
The 1954 Social Comparison Theory Experiments at Stanford
Example 4: cognitive dissonance
Subjects who were paid $20 rated the mundane tasks slightly more positively than the subjects in the control group.
Subjects who were paid $1 rated the mundane tasks more positively than the subjects who were paid $20 rated the tasks.
Results: Stanford experiment
Why would they do this?
Festinger & Carlsmith concluded that the subjects who were told to lie contorted their experience to offset the contradiction between their experience and their lie.
Subjects paid $20 did not change as much because the larger payment justified lying.
Subjects paid $1 couldn’t justify lying due to a reward, so they “revised” their experience to decrease their discomfort.
Like confirmation bias, cognitive dissonance is a cognitive phenomenon that:
Causes a mental disturbance when our thoughts and actions conflict.
When this occurs, we try to change either our actions or our values.
The great threat to rigor is when we deceive ourselves about reality to ease this discomfort.
Let’s explore more ways different biases and rigor issues connect!
Activity: map the biases!
Discuss: what connections did you make?
In what ways might rigor issues stack to create greater problems? How could research in your field be improved through an awareness of these rigor issues?
Think
Pair
Share
Bias infiltrates all levels of the research process & burdens decision-making with errors that stack.
Reporting
Publication bias & spin amplify prior errors by highlighting positive results & omitting counter-evidence or limitations.
Conducting
Observer & measurement biases skew data collection, further distorting study findings.
Designing
Framing biases steer study design away from rigorous tests of well-specified hypotheses.
Selecting
Sampling & attrition biases lead to unrepresentative study groups, compounding initial biases.
Bias infiltrates all levels of the research process & burdens decision-making with errors that stack.
Confirmation bias can trigger compounding distortions across the entire research process & balloon into a hydra of other biases.
Bias amounts to systematically inaccurate decision-making.
These decisions insert error at every stage of how science is supposed to produce knowledge.
By seeping into every corner of our scientific work, bias undermines even the best intentions.
Rigor is our antidote.
It systematically reduces error at every stage & serves as our ultimate safeguard against bias.
Better Science Every Day
Confirmation Bias