1 of 11

An Introduction to Empirical Methods

Andreas Stefik, Ph.D.

2 of 11

Introduction to Empirical Methods

  • Trying to uncover what is true in the world can be difficult
  • Science, as it has progressed, has changed its standard over centuries. Here are some examples:
    • In the late 1700's, scientists were concerned with fraud and were skeptical of supernatural explanations of phenomenon (e.g., Franklin's sham treatments)
    • In the medical sciences in the 1800s, people were increasingly considering the use of experiments to determine whether a drug or approach was effective (e.g., homeopathy)
    • Formal experimental methodologies largely did not exist until the 20th century, with significant credit needed for Fisher's work, which was transformative in the sciences
  • One powerful argument for empiricism is that it increased the accuracy of our observations over anecdotes or unorganized observation

3 of 11

Many kinds of Empirical Methods Exist

  • We will focus largely on these methods:
    • Survey Designs
    • Interview Designs
    • Experimental Designs
    • Naturalistic and Observational Designs
    • Archival Designs and Secondary Data Analysis
  • Besides these types of designs, we will explore topics like:
    • The history of empirical methods and why scientists use them
    • The evolution of empirical methods over the centuries
    • Modern issues (e.g., replication) like those discussed in the What Works Clearinghouse and CONSORT

4 of 11

Assignment: Run a Randomized Controlled Trial

  • The largest part of the course, besides an exam at the end, will be to design and run a randomized controlled trial of your own design
  • While surveys, interviews, and the like are useful, we are focusing on RCTs because they have a variety of advantages that will be discussed throughout the course
  • This assignment will go approximately like this:
    • Derive research questions, hypotheses, and an experimental protocol
    • Conduct a set of pilot studies
    • Derive formal statistics and graphs
    • Design a replication packet
    • Write a scholarly paper

5 of 11

An Introduction to Survey Designs

  • Surveys are often a popular method for obtaining information about a phenomenon
  • Like any empirical method, they have pros and cons
  • Three considerations for surveys might be:
    • Can you get reasonable evidence directly from individuals?
    • Can you get reasonable evidence from brief, structured, data?
    • Will your respondents provide reliable data?
  • There are many issues to consider in survey design, including
    • Analysis techniques
    • Ethics considerations

6 of 11

An Introduction to Interview Designs

  • Interviews are a commonsense approach for gathering information
  • Unlike watching an interview on television or the Internet, research interviews may have specific protocols that must be followed for reliability
  • Initial considerations in interviews include issues like:
    • The questions and how they are designed
    • The amount of structure (e.g., free-form, semi-structured)
    • How to conduct the interview (e.g., skype, email, snail mail, face-to-face)
    • Social Setting (e.g., in a group, at work, alone at home, anonymous)
  • Like any empirical technique, when to use interviews depends upon what we are trying to find out

7 of 11

An Introduction to Experimental Designs

  • Often in experimental design, we talk about the randomized controlled trials, or RCTs. RCTs have a very long history.
    • Your textbook says they come from Fisher's work in the 1920's, but the earliest one I've found (without statistics) was in 1834 on homeopathy
    • They were kicked into high-gear in the 1940s, especially with the VA and in part due to World War II (think the health of veterans), but some consider Austin Bradford Hill's the first
    • They came about for many reasons: fraud, a desire for increased accuracy, and standardization
  • Randomized Controlled Trials are powerful, but not magic. Consider:
    • How reliably do I need to test my hypothesis?
    • Can I manipulate my independent variables or not?
    • Is it possible to test with an RCT (e.g., one cannot split a country in two and randomize)?
    • Are there ethical issues (e.g., drug designs, early stopping of experiments)?

8 of 11

An Introduction to Naturalistic Observation

  • This type of empirical method is essentially to watch behavior in the wild. Like before, it can take many forms
  • Naturalistic observation does not mean willy nilly looking at things. Generally we code our observations formally and analyze the data
  • We might want to consider this technique under conditions like:
    • We want to watch processes as they happen (e.g., political, social)
    • Perhaps we want to make observations for the purpose of refining research variables
    • To document actions in context (your book calls this "thick").
    • You want to explore potential causal links, especially over time (e.g., certain times of day, seasons)
    • When we can observe, but not manipulate, a process (e.g., astronomical observations, politics)

9 of 11

An Introduction to Archival Designs

  • Archival designs and secondary data analysis can provide us information about data that has already been collected or published
  • This technique is actually quite popular in computer science, in part because Git repositories contain such a wealth of publicly available information
  • Otherwise, we can use sources like:
    • Published materials (e.g., books, papers)
    • Government data (e.g., the census)
    • Data repositories (e.g., Github, https://www.data.gov/)
    • Publicly available records (e.g., from hospitals, schools, police departments)
    • Internet sources (e.g., blogs)
  • Like always, context and trust (or lack thereof) matters with such sources

10 of 11

An Introduction to WWC and CONSORT

  • Some scientific fields have established standards for empirical work, to help with replication or other issues
  • In this course, we will discuss two:
  • While these two resources are very different, they both set basic guidelines about the quality of data and publication standards (e.g., reporting guidelines, details of randomization, sections of a paper)
  • For your final research paper, you must follow a variation of CONSORT

11 of 11

Summary

  • Sometimes in computer science, given that empirical methods are not heavily used, we forget that such methods have been around for a long time
  • In other scientific disciplines, and sometimes in software engineering, scholars regularly use data and evidence to make decisions or obtain accurate descriptions of an observation or phenomenon
  • In this course, we will practice running a randomized controlled trial and will learn about a variety of other empirical techniques