Evidence Practices in Software Science
Andreas Stefik, Ph.D.
Introduction
Subjectivity: is it a Problem?
Perception
[1] Dennis R. Proffitt, Jeanine Stefanucci, Tom Banton, and William Epstein. The role of effort in perceiving distance. Psychological Science, 14(2):106–112, 2003.
Communication
Consider a study by Braun, Ellis, and Loftus [2] on the topic. I summarize it as follows:
[2] Kathryn A. Braun, Rhiannon Ellis, and Elizabeth F. Loftus. Make my memory: How advertising can change our memories of the past. Psychology and Marketing, 19(1):1–23, 2002.
Bias, Trickery, and Propaganda
Consider the Pepsi vs. Coke experiment by Woolfork, Castellan, and Brooks
[3] Mary E. Woolfolk, William Castellan, and Charles I. Brooks. Pepsi versus coke: Labels, not tastes, prevail. Psychological Reports, 52(1):185–186, 1983.
Group Assignment 1:
Consider a claim you have heard in Software Engineering. Is the claim subjective and is there potential problems with it?
Over time, Scientists began to Doubt Subjectivity
Example claims:
William Douglass claimed it was a "wicked and criminal practice"
Mather claimed back that "I have read that thousands of lives have been saved by inoculation, and not one of thousands has miscarried by it. This is related by wise & learned men who would not have imposed on the world a false narrative."
The Story of Homeopathy
Homeopathy is an atrocity to me, I consider its downfall a blessing for mankind
Friedrich Willhelm von Hoven (1759-1838)
What about early Discussions of Programming Languages
Hence, without evidence, we got competing rhetoric:
David Gries
“Our first speaker talked about having variables which can contain procedure bodies as objects. The question is [...] whether or not it is a reasonable thing to have in the language."
Ichbiah then claimed:
"It seems to me that [...] forms of abstract data types are now quite well-known. [...] I terribly disagree with Gries [...] when he say that encapsulated data types are not yet within the state of the art,"
[4] John H. Williams and D. A. Fisher, editors. Design and Implementation of Programming Languages: Proceedings of a DOD Sponsored Workshop, Ithaca, Oct., 1976. Springer-Verlag, Berlin, Heidelberg, 1977.
From Subjectivity to Falsification
Introducing the "Control Group"
Blake Described the result:
“Altogether, since April, 5,889 people, of whom 844 died, had had the smallpox. This one disease caused more than three-fourths of all the deaths in Boston during the year of the epidemic. During the same period Boylston inoculated 242 persons, with 6 deaths[5].”
[5] John B. Blake. The inoculation controversy in boston: 1721-1722. The New England Quarterly, 25(4):489–506, 1952.
The Precision Problem
Seconds |
81.54 |
76.52 |
77.04 |
84.79 |
83.23 |
81.01 |
Did we Confirm or Refute our Belief?
The "Numerical Method"
Picture From https://en.wikipedia.org/wiki/Pierre_Charles_Alexandre_Louis
Modern Science Often Uses Statistics
Example Fisher-Style Experiment
Managing Trust (or lack Thereof)
Friedrich Willhelm von Hoven (1759-1838)
Group Assignment 2:
What potential Issues of "Trust" exist in Software Engineering and how can we protect against it in experiments?
Experimentation in the 20th Century
Early "Reporting" Standards hoped to Distinguish Fact from Fraud
Lesson:
Sometimes reporting requirements have changed because of adjustments to the law.
Scholars at the Time did not Trust their Generation's Science
After the biologics control act of 1906, some therapeutic reformers were doubtful doctors would individually make, or perhaps even could make, sensible decisions��“Unfortunately, however, the physicians training is likely to be such that he cannot distinguish the rank fraud from the efficacious remedy, honestly made and sold [7].”�
Discoveries were welcome, and medical drugs may have had chemical theories or treatments, but this did not mean they had a positive impact on people or communities��“We cannot blame manufacturing chemists for finding new things or advertising them as cleverly as possible. That they and the nostrum vendor are surprisingly successful in selling their wares is largely our fault [6].”�
[6] N.S. Davis, “Effect of Proprietary Literature on Medical Men” JAMA 46 (May 5, 1906)
[7] W.A. Puckner, “The Nostrum from the point of view of the Pharmacist,” JAMA 46 (May 5, 1906)
By 1918, it was clear we needed Better Standardization and Reporting
Even well-funded Institutes had tremendous difficulty in standardizing almost anything.
“Even at the well-endowed Rockefeller Institute, it proved difficult to allow each investigator control over the resources desired for coordinated laboratory and clinical studies of new treatments. Clinical investigators at other institutions found it even more difficult to accomplish their scientific aims, because they lacked the means to compel cooperation of others. Researchers at the Russell Sage Institute of Pathology, for example, which aspired to do in studying metabolic disorders what the Rockefeller Institute had accomplished in researching infectious disease, were severely handicapped by shortages of funds and a lack of control over clinical material [8].”
[8] Harry Marks, The Progress of Experiment: Science and Therapeutic reform in the United States, 1900 - 1990 (United Kingdom: Cambridge University Press, 1997).
Reformers like John Stokes started the "Cooperative Research Study"
John H. Stokes in 1937 discussing Syphilis with his students. Picture from:�https://www.youtube.com/watch?v=bXMJifagmbA
Note: This is not directly related to the “Tuskegee Syphilis Experiment,” the “Guatemala syphilis experiment,” or the “Terre Haute prison experiments.”�
During World War 2, Standardization was an Active Goal
John F. Mahoney (1889 - 1957), studied penicillin and syphilis
By Bradford-Hill and Richard Doll, the Randomized Controlled Trial was Born
[9] Richard Doll and Austin Bradford Hill, Smoking and Carcinoma of the Lung, British Medical Journal, 1950.�
Bradford-Hill style studies became Common (and the Law followed)
[10] Tsay, M., & Yang, Y. (2005). Bibliometric analysis of the literature of randomized controlled trials. Journal of the Medical Library Association, 93(4), 450–458.�
Replication Issues Remained a Large Challenge
[11] Winegrad AI, Davidson JK, Ricketts HT, Sprague RG, Hurd JB, Fajans SS, Ellenberg M, Scoville AB, Grinshaw WH, Hardin RC. The University Group Diabetes Program Study Pertaining to Phenformin. JAMA.1971;217(6):817. doi:10.1001/jama.1971.03190060055014��
Eventually, the FDA Categorized Studies
[12] https://www.fda.gov/drugs/resourcesforyou/consumers/ucm143534.htm�
By the early 1990s, Scientists developed the "CONSORT" standard
Studies on Evidence Standards Show they Improve Reporting
Key Finding:
Evidence Standards Improve Reporting of Empirical Studies in Academic Journals.
The WWC and CONSORT are very Different (and there are others)
WWC
CONSORT
Evidence Standards almost always Standardize Reporting
Sections are directly Mapped to the Standards
Group Project 3:
In Software Engineering, what changes to the law would you recommend changing and what impact could that have on ethics, evidence, or other issues?
Summary