ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAPAQARASATAUAVAWAXAY
1
This spreadsheet is now outdated. Please visit this page for up to date data: https://osf.io/fgjvw/
2
Replication and coding statusOriginal article informationArticle eligible for replicationReplication FeasibilityPrimary ResultSubjective predictionsCoder notesSummary of eligible studyAdditional statistics (original study)Additional article location infoAdministrative
3
StatusReserved bySummary of eligible studySpecial Requirements (see later columns for details)Lead for ReplicationOptional secondary/web replication being done?AuthorsTitlesubject headingsabstract linkJournal, volume, issue, pageTimes CitedDate article codedPerson coding original articlesecond coder (vetting)Article title (repeated here to facilitate admin tasks)# of studieseligible study #Does study contain at least 1 inference test?SampleProcedureInstr/ MaterialsOtherOptional notes on replication feasibilityBrief summary of result for possible replicationOptional note on why article coder selected this as the primary resultSpecial sample, instrumentation, procedure considerations for primary resultSample size used for primary resultOptional notes on sample sizePrimary result with statisticOptional notes on primary result's statisticPredicted likelihood of replicationCounter-intuitivenessAny additional notes from the study coder1-3 sentence summary of eligible studyPerson coding statisticsResult effect size95% CISample needed for 80% powerSample needed for 90% powerSample needed for 95% powerLink to articleJournalVolumeIssuePagefile locationold file locationdoiBatch# and journalSort order
4
Open for replication; advertised to list on 10/26/12 and 2/22/13.Participants play a variety of prisoner's dilemma games individually or in groups of three. Participants were 162 subject pool undergraduates, though fewer may be required based on power analysis. No special requirements. ST Wolf, CA Insko, JL Kirchner, T WildschutInterindividual-intergroup discontinuity in the domain of correspondent outcomes: The roles of relativistic concern, perceived categorization, and the doctrine of mutual assured destructionChoice Behavior [physiology], Competitive Behavior [physiology], Conflict (Psychology), Cooperative Behavior, Female, Group Processes, Humans, Individuality, Interpersonal Relations, Male, Motivation, Social Perception, Students [psychology]http://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0022-3514.94.3.479JPSP-94-3-47971/1/2013 20:05:50michael.cohn@gmail.comNot yet vettedInterindividual-intergroup discontinuity in the domain of correspondent outcomes: The roles of relativistic concern, perceived categorization, and the doctrine of mutual …22Yes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    5. Not at all difficult/common circumstances: Most labs would be able to carry it out.5. Not at all difficult/common circumstances; most labs would not have any trouble.0. There are no other special considerations for replication feasibility. The purpose was to look at the "discontinuity effect," which refers to a tendency for groups to make more polarizing / competitive choices in a prisoner's dilemma situation. Participants were run individually or in groups, and with different payoff matrices. The discontinuity effect was hypothesized to disappear when the payoffs for mutual competition were extremely low.


Means for the proportion of corrected competitive choices are given in the first row of Table 4. An Individuals Versus Groups X Matrix X Gender ANOVA resulted in significant main effects for individuals versus groups, F(1, 33) = 6.12, p = .05, d = 0.86, and matrix type, F(1, 33) = 11.57, p = .01, d = 1.18. The main effect for individuals versus groups provided further evidence for a discontinuity effect in the context of Chicken. The main effect for matrix type indicated that there was more competition in the context of the low than the high PC-JC matrix. These main effects were, however, qualified by a significant Individuals Versus Groups X Matrix interaction, F(1, 33) = 6.12, p = .05, d = 0.86. The interaction indicated that the discontinuity effect was larger (and descriptively only present) with the low PC-JC matrix. (p. 490)
The primary purpose of the study was to examine how different payoff matrices affected the discontinuity between individual and group choices. none162The authors included gender as a factor in their analysis, but it was not significant and they don't seem to have had clear hypotheses about it (it was relevant to their other DVs regarding how people expected to be perceived by the group, but not to the actual choices participants made). A replication targeting just the primary result could omit gender as a factor in order to increase power. These main effects were, however, qualified by a significant Individuals Versus Groups X Matrix interaction, F(1, 33) = 6.12, p = .05, d = 0.86. The interaction indicated that the discontinuity effect was larger (and descriptively only present) with the low PC-JC matrix.0.82. The result aligns somewhat with "popular" intuition about how it would have come outParticipants make choices in an iterated prisoner's dilemma type task, playing either individually or in groups of 3. Participants were 162 subject pool undergrads, though fewer may be needed based on power calculations, since gender was relevant to some analyses but not to the primary outcome. No special equipment or technology is required.JPSP943479Reproducibility Papers/JPSP/JPSP-94-3-479.pdfReproducibility Papers/2008.JPSP/psp 94-3/psp-94-3-479.pdf10.1037/0022-3514.94.3.47902 JPSP32
5
Open for replication; advertised to list on 10/26/12 and on 2/18/13Participants engage in a global or local processing induction (using standard experimental presentation software) and then answer questions about their behavior and personality as compared to those of certain celebrities. Adaptation of the materials may be necessary for language and culture. Should be easy with the original researchers' cooperation. Participants were undergrads who were not psychology majors. J Förster, N Liberman, S KuschelThe effect of global versus local processing styles on assimilation versus contrast in social judgment.Association Learning, Attention, Concept Formation, Dominance, Cerebral, Female, Field Dependence-Independence, Generalization (Psychology), Gestalt Theory, Humans, Judgment, Male, Pattern Recognition, Visual, Psychomotor Performance, Self Concept, Social Identification, Social Perception, Time Perceptionhttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0022-3514.94.4.579JPSP-94-4-57947frank.renkewitz@googlemail.comNot yet vetted52Yes554Experiment 5 is a replication of Experiment 2 that additionally includes an assessment of relative hemisphere activation (RHA). RHA is used to test a mediation hypothesis about the effect of processing style on assimilation and contrast. This mediation could also be replicated but this is not necessary to test the central hypothesis. However, what has to be replicated to confirm the central hypothesis is a specific pattern of results in a three way interaction. Original materials from the authors would probably be very helpful.(p 584) "we hypothesized … that subjective scales would enhance contrast because the target and the standard are used to define the range of the scale on which they obviously occupy opposite ends. We believed, however, that if one took enough of a global perspective, then a value range wider than that defined by the target and the standard might be considered, thus allowing for inclusion and assimilation. We also thought that it might be possible that a local perspective would narrow the range of values around the target and the standard, and would thus produce exclusion and contrast, even with objective scales. Thus, we thought that both types of scales might produce both assimilation, if processed globally, and contrast, if processed locally."German undergrads; processing style is manipulated with the "global–local processing task" (Navon, 1977); pretests might be necessary to establish suitable standards of comparison (Robbie Williams was used as a "moderately high standard of drug consumption", Steffi Graf as moderately low standard). Processing style and standard of comparison were manipulated between participants, scale (objective vs subjective) within participants.124We conducted a 3 (processing: global vs. local vs. control) x 2 (standard: moderately high vs. moderately low) x 2 (scale: objective vs. subjective) ANOVA for mixed designs, which yielded two-way interactions between scale and standard, F(1, 118) = 5.69, p = .02, and between processing and standard, F(2, 118) = 10.31, p = .0001. More important, the three-way interaction qualified these interactions, F(2, 118) = 5.51, p = .005.0.64JPSP944579Reproducibility Papers/JPSP/JPSP-94-4-579.pdfReproducibility Papers/2008.JPSP/psp 94-4/psp-94-4-579.pdf10.1037/0022-3514.94.4.57902 JPSP37
6
Open for replication; advertised to list on 10/26/12 and on 2/18/13A computer-based ultimatum game. Simple to run. Requires temporary deception. Participants were undergraduates (in the Netherlands; should check with authors to see if they think nationality is relevant). ~100 participants were paid 6 euros apiece (total cost ~800 USD).E van Dijk, GA van Kleef, W Steinel, I van BeestA social functional approach to emotions in bargaining: When communicating anger pays and when it backfires.Adult, Anger, Communication, Deception, Dominance-Subordination, Emotions, Fear, Female, Goals, Humans, Interpersonal Relations, Male, Motivation, Negotiating, Power (Psychology), Rejection (Psychology)http://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0022-3514.94.4.600JPSP-94-4-60045susann.fiedler@gmail.comgustav.nilsonne@ki.se33Yes554Communicated anger signals higher limits in bargaining. Emotion effects are contingent on bargainers’ expectation that low offers will be rejected. Our main interest was to see whether, if the consequences of rejection were low, participants would make lower offers to the angry recipient than to the happy recipient.questionnaire, between-participants design, bargaining game, manipulating expected emotion of the recipient (happy vs. angry), undergrads (dutch)103A 2 (recipient’s emotion) 2 (information) ANOVA yielded main effects for recipient’s emotion, F(1, 110) = 6.86, p = .01, eta2 = .06, and information, F(1, 110) = 18.43, p = .0001, eta2 = .14. The emotion main effect indicated that the participants offeredmore chips to the angry recipient (M = 57.65, SD = 8.13) than to the happy recipient (M = 53.67, SD = 9.76). The main effect of information indicated that the participants offered more chips to the recipient in the symmetric information conditions (M = 58.98, SD = 8.44) than in the asymmetric information conditions (M = 52.33, SD = 8.70). No nteraction was observed, F(1, 110) = 0.88, ns.. (UPDATE: These results are for Study 1. ERT) Results from study 3:
A 2 (recipient’s emotion) x 2 (consequences of rejection) ANOVA yielded main effects for consequences of rejection, F(1, 99) = 18.91, p = .0001, eta2 = .16, and emotion, F(1, 99) = 10.21, p = .01, eta2 = .09. The main effect of consequences of rejection indicated that the participants offered more chips to the recipient in the high-consequences conditions (M = 47.59, SD = 6.04) than in the low-consequences conditions (M = 38.76, SD = 14.59). The emotion main effect indicated that the participants offered fewer chips to the angry recipient (M = 40.33, SD = 14.41) than to the happy recipient (M = 46.51, SD = 7.17). These two main effects were qualified, however, by a significant Recipient’s Emotion x Consequences of Rejection interaction, F(1, 99) = 9.74, p = .01, eta2 = .09.

2gustav.nilsonne@ki.se. Power calculation performed on interaction effect betwen emotion and consequences of rejection (following hypothesis statement) using G*Power. Analysis log available here: https://docs.google.com/file/d/0B79wnU8V_RoiRWxRN29sdG15WkE/edit?usp=sharingf = 0.31 (derived using G*Power from reported eta^2 of 0.09)462552628http://psycnet.apa.org/journals/psp/94/4/600.pdfJPSP944600Reproducibility Papers/JPSP/JPSP-94-4-600.pdfReproducibility Papers/2008.JPSP/psp 94-4/psp-94-4-600.pdf10.1037/0022-3514.94.4.60002 JPSP38
7
Open for replication; advertised to list on 10/26/12 and 3/2/13.A self-report study of responses to a fictional corporate brochure. Requires recruiting ~80 working adults from the community, ~50% African American and 50% White. Participants were paid $10 each. V Purdie-Vaughns, CM Steele, PG Davies, R Ditlmann, JR CrosbySocial identity contingencies: How diversity cues signal threat or safety for African Americans in mainstream institutions.Adolescent, Adult, African Continental Ancestry Group [psychology], Cues, Cultural Diversity, Female, Hierarchy, Social, Humans, Judgment, Male, Organizational Culture, Prejudice, Rejection (Psychology), Social Identification, Social Values, Stereotyping, Trusthttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0022-3514.94.4.615JPSP-94-4-61581susann.fiedler@gmail.comNot yet vetted33Yes224Experimenters with different ethnicities, not run in a lab but in a "different location"In settings where African Americans expect threatening identity contingencies—in this paradigm, low minority representation coupled with colorblindness—an explicit cue conveying fair practices can forestall appraisals of threat. African American professionals in the high fairness condition (fair working enviroment) trusted the setting more than did African American professionals in the low fairness condition. Indeed, the trust gap found in the previous two experiments was eliminated. African American participants in the high fairness condition were also less likely to expect that their race would be relevant to how they were perceived by others. Replicating our findings from Experiment 2, these identity contingencies mediated the effect of the fairness cue on African Americans’ trust of the corporate setting. White professionals showed no effect of the fairness condition.work-life surveys, corporate brochures, diversity cues, photographs, african americans and whites (working professionals)90significant main effect of race F(1, 73) = 51.03, p < .001, and a significant main effect of fairness, F(1, 73) = 6.49, p < .01. African American participants in the high fairness condition reported that their racial identity was less important to them (M = 5.89, SD = 1.61)
compared with African American participants in the low fairness
condition (M = 6.82, SD = .50), F(1, 73) = 5.09, p < .03. White participants’ scores did not differ as a function of whether fairness was high (M = 3.94, SD = 1.39) or low (M = 4.53, SD "=1.50; F < 2, n²= .03).
2http://www.yale.edu/intergroup/PurdieVaughns.Steele.Davies.Ditlmann.Crosby.pdfJPSP944615Reproducibility Papers/JPSP/JPSP-94-4-615.pdfReproducibility Papers/2008.JPSP/psp 94-4/psp-94-4-615.pdf10.1037/0022-3514.94.4.61502 JPSP39
8
Open for replication; advertised to list on 10/11/12 and 4/8/13.The study examines whether perception of glossiness and bumpiness of a surface interact. The primary result shows that bumpiness of a surface leads to higher perceived glossiness. The experiment requires specialized software (which may possibly be obtained from the authors) and mirror stereoscope.No special requirementsYX Ho, MS Landy, LT MaloneyConjoint measurement of gloss and surface textureCognition, Cues, Humans, Judgment, Light, Models, Psychological, Visual Perceptionhttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1111/j.1467-9280.2008.02067.xPS-19-2-1962312/13/2012 15:03:19bahniks@seznam.czmichael.cohn@gmail.comConjoint measurement of gloss and surface texture22Yes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    5. Not at all difficult/common circumstances: Most labs would be able to carry it out.3. Moderately difficult/special circumstances; a subset of labs may be able to do this.3. Moderately difficult/special circumstances.The experiment requires a software which wouldn't be probably easy to write. In the authors provided the software the replication would be much easier. Furthermore, the study used a mirror stereoscope to show stimuli on two monitors stereoscopically.The authors hypothesized that bumpiness of a surface may affect the perception of its glossiness. [vetter says: specifically, they "found that a simple additive model captured
visual perception of texture and specularity and their interaction"]
none6χ2 ≥ 0.089, p < .008The primary result is that in 4 out of 6 observers a model including bumpiness of a surface described perception of glossiness better than a model including only glossiness of the surface. The test was significant for 4 out of 6 participants with a Bonferroni corrected significance level (hence the p < .008).0.853. The result is not clearly intuitive or counterintuitive[Vetter notes: I would have rated the difficulty of replicating either the procedure or the apparatus more highly. In addition to the unusal setup, this study requires specialized expertise at calibrating monitors.]The study examines whether perception of glossiness and bumpiness of a surface interact. The primary result shows that bumpiness of a surface leads to higher perceived glossiness. The experiment requires specialized software (which may possibly be obtained from the authors) and mirror stereoscope.PS192196Reproducibility Papers/PS/PS-19-2-196.pdfReproducibility Papers/2008.PsycScience/2 Feb/196.full.pdf10.1111/j.1467-9280.2008.02067.x02 PS312
9
Open for replication; advertised to list on 10/11/12 and 3/11/13.A straightforward self-report study in which participants watch a video of a woman behaving angrily and express their judgments. Participants were adults from the community. Study was run "in an off-campus location" but it's not clear whether the authors consider that important. NOTE: An earlier researcher intended to run this study with London and during discussions with the original author determined that a fair replication attempt requires a sample of working adults in North America.VL Brescoll, EL UhlmannCan an angry woman get ahead?Adult, Anger, Career Mobility, Communication, Culture, Female, Hierarchy, Social, Humans, Internal-External Control, Male, Prejudice, Professional Competence, Salaries and Fringe Benefits, Stereotyping, Women, Working [psychology]http://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1111/j.1467-9280.2008.02079.xPS-19-3-26851stephanie.mueller.01@uni-erfurt.deh.kappes@lse.ac.ukCan an angry woman get ahead?33Yes4530Participants watched videotaped job interviews of either male or female professionals. If these videos are forthcoming from the authors, replication will be relatively straightforward; if the videos need to be re-created, it will be rather difficult. [Replicating the materials will not be difficult as long as it is possible to obtain the videotapes used. If this is not possible, re-creating them will be moderately difficult.](p. 272-273) "For status conferral, a 2 (target’s gender: male vs. female) X 3(emotion: anger without external attribution vs. anger with externalattribution vs. no emotion) ANOVA revealed a significant interaction between the target’s gender and emotion expression,F(2, 34)=9.72, prep=.999. Examining each gender separately,we found that the angry male without an external attribution received significantly higher status than the unemotional male,t(44)=2.55, prep=.95, and the angry male with an externalattribution, t(45)=2.11, prep=.892. Results for the female targets supported our prediction; the angry female target who provided an external attribution for her anger received significantly higher status than the angry female target who did not provide an external attribution, t(44)=3.53, prep=.986, butdid not receive higher status than the unemotional female target,t(45)=0.22, prep=.251. Notably, the status conferred onthe angry female target with an external attribution was not significantly different from the status conferred on the angry male targets with or without a reason for their anger... The results for salary paralleled those for status conferral. A 2 (target's gender: male vs. female) X 3 (emotion: anger without external attribution vs. anger with external attribution vs. no emotion) ANOVA revealed a significant interaction between the target's gender and emotion expression, F(2, 112) = 6.90, prep = .986."[Status conferral is the dependent measure that followed from previous work, was assessed first in this study, and seems most central to the argument. However there seems to be a typo in the interaction statistics for status conferral, therefore I have also reported findings for salary, which should follow the same pattern. ]Participants were non-student adults from the community. If this is important for replication, obtaining the sample will be slightly more difficult. [none]133Df values reported indicate that not all participants completed all measuraes. [F(2, 34)=9.72] (p. 273): Status: A 2 (target’s gender: male vs. female) x 3(emotion: anger without external attribution vs. anger with external attribution vs. no emotion) ANOVA revealed a significant interaction between the target’s gender and emotion expression, F(2, 34)=9.72, p_rep=.999. Results for the female targets supported our prediction; the angry female target who provided an external attribution for her anger received significantly higher status than the angry female target who did not provide an external attribution, t(44) = 3.53, p_rep = .986, but did not receive higher status than the unemotional female target, t(45) = 0.22, p_rep = .251.The degrees of freedom appear to be wrong (should be around F(1, 127)) [I believe there is a typographical error in this statistic - the degrees of freedom should be around (2, 127). It is inconsistent in the analyses of the other DVs but as reported, there were 133 participants in 6 conditions.]0.82When I (Heather Kappes) committed to replicate the study, I changed many of the codings (the key study, the difficulty of obtaining the sample) from what the original coder had. Adult non-student participants watched a video of an ostensible job interview. Female targets who described an experience that made them angry (but did not say why) were accorded low status, compared to both female targets who described no emotion, and to male targets who described being angry. However, female targets who attributed their anger to an external source were accorded status equal to angry men. H.Kappes@lse.ac.ukd = 1.51 for status conferral (but this is based on the df reported, which may be a typo); d=.70 for salaryapprox. 266http://socialjudgments.com/docs/Brescoll%20and%20Uhlmann%202008.pdfPS193268Reproducibility Papers/PS/PS-19-3-268.pdfReproducibility Papers/2008.PsycScience/3 March/268.full.pdf10.1111/j.1467-9280.2008.02079.x03 PS324
10
Open for replication. Advertised to list on 8/1/12 and 5/14/13.Y Jang, DE HuberContext retrieval and context change in free recall: Recalling from long-term memory drives list isolation.Association Learning, Attention, Generalization (Psychology), Humans, Mental Recall, Retention (Psychology), Semantics, Verbal Learninghttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.1.112JEPLMC-34-1-112146/27/2012 9:02:50yanshan90@gmail.commichael.cohn@gmail.com, 11/11/12Context Retrieval and Context Change in Free Recall: Recalling From Long-Term Memory Drives List Isolation33Yes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    5. Not at all difficult/common circumstances: Most labs would be able to carry it out.5. Not at all difficult/common circumstances; most labs would not have any trouble.0. There are no other special considerations for replication feasibility. Recalling from long-term memory between word lists (i.e. episodic list-before-the-last recall and semantic missing letter recall) produced patterns of data consistent with list isolation (correct recall greater for short target lists; incorrect intervening recall greater for short intervening lists), whereas the difficult 2-back short-term memory produced blending across the lists, similar to the results with no testing or a short period of recognition testing (main effects and interaction effect of target list length & intervening list length - in both cases correct recall was greater for shorter lists; incorrect intervening recall greater for short intervening lists). (p. 119-120)None155[vetter note: I have trouble understanding this, but I think the primary result is the 4-way interaction: F(2,308) = 12.53, MS = .02, partial etasq = .08]

For recall between the lists: Correct recall was greater for short target lists than for long target lists, F(1, 154) = 5.99, MSE = .02, np2 = .04; No intervening list-length effect, F(1, 154) = 2.11, MSE = .02, p = .15; no interaction effect F(1, 154) = 3.74, MSE = .02, p = .06, Incorrect intervening recall was greater for short intervening lists, F(1,154) = 17.03. MSE = .01, np2 = .10; No target list-length effect, F(1,154) < 1; No interaction between two list lengths, F(1, 154) < 1 Letter completion task between the lists: Correct recall was greater for short target lists than for long target lists, F(1, 154) = 19.01, MSE = .04, np2 = .11, No intervening list-length effect, F(1, 154) <1; No interaction effect F(1, 154) = 1.67, MSE = .03, p = .20; Incorrect intervening recall was greater for short intervening lists, F(1,154) = 13.61. MSE = .01, np2 = .08; No target list-length effect, F(1,154) < 1; no interaction between two list lengths Two-back task between the lists: Main effect of target list length, F(1, 154) = 20.30, MSE = .03, np2 = .12; Main effect of intervening list-length, F(1, 154) = 72.63, MSE = .03, np2 = .32; Interaction of list-length effects, F(1, 154) = 37.96, MSE = .03, np2 = .13; Incorrect intervening recall was greater for short intervening lists, F(1, 154) = 23.51, MSE = .01, np2 = .13; No target list-length effect, F(1, 154) = 2.36, MSE = .01, p = .13; No interaction between the list lengths, F(1, 154) = 1.62, MSE = .01, p = .21
[note: coder recorded all statistical results, not just primary one. replication researcher will need to select which they think is primary effect, vetter has made suggestion - see previous cell]Don't have background3. The result is not clearly intuitive or counterintuitiveJEPLMC341112Reproducibility Papers/JEPLMC/JEPLMC-34-1-112.pdfReproducibility Papers/2008.JEP LMC/34_1_112.pdf10.1037/0278-7393.34.1.11201 JEPLMC172
11
Open for replication; advertised to list on 12/5/12 and 3/21/13Participants (39 undergrads) are asked to play the role of a businessperson making a decision about technology providers. Participants use an ordinary computer program to view a series of fictional product reviews and then report inferences about the providers. Some work may need to be done to come up with accessible instructions regarding the specific types of inferences they are asked to make. Stimulus presentation software is not needed; the presentation can be done in an ordinary web browser (an OSF volunteer is available to design the page). Analyses are simple ANOVAs. K FiedlerThe ultimate sampling dilemma in experience-based decision making.Adult, Bias (Epidemiology), Choice Behavior, Cognition, Computer Simulation, Decision Making, Decision Theory, Feedback, Psychological, Female, Humans, Judgment, Malehttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.1.186JEPLMC-34-1-18610Undatedjjoygaba@vcu.eduNot yet vetted22Yes554Hardest part will be programing - participants can choose to finish an information search whenever they choose, so will need a program that allows them to opt out when they want.When allowed to freely sample, individuals tend to sample information predominantly from those cells that were relevant to the decision task. However, when participants were forced make decisions based on natural sampling, resulting decisions had systematic errors. Programming could be a challenge56Exp 1: The corresponding task-focus main effect was significant, F(1, 52) = 13.75, p = .001.Exp 2: The most frequent provider, P1, was underestimated relative to the other domains, F(2, 76) = 53.76, p = .001, and the rate of positive (vs. negative) observations was underestimated as well, F(1, 38) = 18.35, p = .001http://psycnet.apa.org/journals/xlm/34/1/186.pdfJEPLMC341186Reproducibility Papers/JEPLMC/JEPLMC-34-1-186.pdfReproducibility Papers/2008.JEP LMC/34_1_186.pdf10.1037/0278-7393.34.1.18601 JEPLMC176
12
Open for replication; advertised to list on 1/1/13 and 5/14/13Participants (100 native english-speaking subject pool undergraduates) view lists of words on a computer screen, and attempt to recall the order of presentation. Empirical performance results were compared to those from a mathematical model. Both simulation and stimulus presentation are simple and can be done without any special software. Analytic methods are not clear and may require conferral with the authors. CP Beaman, I Neath, AM SurprenantModeling distributions of immediate memory effects: No strategies needed?Attention, Humans, Individuality, Memory, Short-Term, Models, Statistical, Phonetics, Psychomotor Performance, Semantics, Serial Learning, Verbal Learninghttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.1.219JEPLMC-34-1-219512/21/2012 21:15:43michael.cohn@gmail.comNot yet vettedModeling distributions of immediate memory effects: No strategies needed?22Yes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    5. Not at all difficult/common circumstances: Most labs would be able to carry it out.5. Not at all difficult/common circumstances; most labs would not have any trouble.0. There are no other special considerations for replication feasibility. The main purpose of the experiment was to determine whether a particular mathematical model was a good match for an empirical psycholinguistic finding (when presented with a list of words, people remember the order of presentation better if the words are short as opposed to long). The authors provide statistics showing that the finding was obtained, but I don't see any statistics to support their assertion that their model provides a) adequate fit, or b) better fit than the naive comparison model. Still, I think the proper course would be to replicate experiment 2 and find some way of testing the fit of the model to the empirical data.

"Figure 4 shows the results of separate simulation runs of sets of pseudo-subjects, using the same procedure as Experiment 1. Again, good fits were obtained for both the mean and variance models, with the latter better reproducing the amount of variability seen in the data." (p. 223)
The abstract and introduction point to the model testing as their main goal. The effect itself is clearly not intended, or presented, as a finding.A replication will require repeating the simulation used to produce data from the mathematical models. This could be done using any stats software, or even a spreadsheet.

The stimuli in the study were presented using stimulus presentation software, but precise timing is not important, and the required program could even be replicated in javascript (i.e., an ordinary web page) with little difficulty.

100unknownsee the summary of result field.0.73. The result is not clearly intuitive or counterintuitiveParticipants (100 native english-speaking subject pool undergraduates) view lists of words on a computer screen, and attempt to recall the order of presentation. Empirical performance results were compared to those from a mathematical model. Both simulation and stimulus presentation are simple and can be done without any special software. Analytic methods are not clear and may require conferral with the authors. JEPLMC341219Reproducibility Papers/JEPLMC/JEPLMC-34-1-219.pdfReproducibility Papers/2008.JEP LMC/34_1_219.pdf10.1037/0278-7393.34.1.21901 JEPLMC178
13
Open for replication; advertised to list on 12/17/12 and 4/1/13.Participants listen to phonologically similar words using headphones and recall the lists in various types of memory task. All stimuli and tasks are done on computer using standard stimulus presentation software (if you don't have software, contact michael.cohn@gmail.com for help implementing a free alternative). Participants were 20 subject pool students, native English speakers, normal hearing and normal or corrected vision. Analyses are simple ANOVAs. JE Marsh, F Vachon, DM JonesWhen does between-sequence phonological similarity promote irrelevant sound disruption?Attention, Humans, Mental Recall, Phonetics, Psychoacoustics, Semantics, Serial Learning, Speech Perception, Verbal Learninghttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.1.243JEPLMC-34-1-24339/3/2012 21:37:54sara.sgc@gmail.commichael.cohn@gmail.comWhen does between-sequence phonological similarity promote irrelevant sound disruption? 11Yes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    4. Slightly difficult/uncommon circumstances: Replicating the procedure would be difficult for many labs, but still feasible for most.4. Slightly difficult/uncommon circumstances; it would be an inconvenience for many labs but still feasible for most labs.0. There are no other special considerations for replication feasibility. "phonological similarity effect of irrelevant sound appears to be driven by the nature of the primary task: Generally, it is not found if the primary task is serial recall but is indeed found when category-cueing processes are likely to be engaged" p.247 [vetter adds: in specific, they looked at whether phonologically similar distractors interfered with correct recall more than phonologically dissimilar ones, and found that this was the case for free-recall (recalling words in any order) but not serial recall (recalling words in the specific order given).]
Normal hearing native english speakers
48F(2, 92) = 3.13, p < 0.05, d= 0.52 [vetter adds: This is the omnibus F test. The specific result was the distractor type X task type interaction.]don't have background3. The result is not clearly intuitive or counterintuitiveJEPLMC341243Reproducibility Papers/JEPLMC/JEPLMC-34-1-243.pdfReproducibility Papers/2008.JEP LMC/34_1_243.pdf10.1037/0278-7393.34.1.24301 JEPLMC181
14
Open for replication; advertised to list on 12/17/12 and 4/1/13.Participants hear words presented in a male or female voice and in one ear or the other, then make various memory judgements. All stimuli and tasks are done on computer using standard stimulus presentation software (if you don't have software, contact michael.cohn@gmail.com for help implementing a free alternative). The original stimuli were in German, so the study would need to be replicated in a native German-speaking population, or the authors would need to be consulted about constructing an equivalent word list in another language. Participants were 205 university students who received subject pool credit or a small payment. The analysis is a "multinomial source-monitoring model." If you are able to carry out the replication, the RP can help you find a collaborator to work on the analysis. T Meiser, C Sattler, K WeisserBinding of multidimensional context information as a distinctive characteristic of< em> remember</em> judgments.Association Learning, Attention, Awareness, Humans, Judgment, Mental Recall, Orientation, Reading, Retention (Psychology), Sex Factors, Speech Perception, Stochastic Processes, Verbal Learning, Voicehttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.1.32JEPLMC-34-1-3217UndatedD.Lakens@tue.nlNot yet vetted33Yes454Supporting recent dual-process models of remember–know judgments, the findings show that remember and know judgments differ with respect to binding processes that correspond to episodic recollection.205Multinomial source-monitoring model, DELTA G2(1) = 52.92, p < .001 (p. 43)0.852JEPLMC34132Reproducibility Papers/JEPLMC/JEPLMC-34-1-32.pdfReproducibility Papers/2008.JEP LMC/34_1_32.pdf10.1037/0278-7393.34.1.3201 JEPLMC167
15
Open for replication; advertised to list on 8/14/12 and 3/21/13Participants view words on a computer screen using standard stimulus presentation software, then make recognition judgements. Stimulus presentation is extremely brief (17ms), so some expertise may be required to ensure adequate precision. Participants were 24 individuals recruited from a "university participant database" and paid GBP4; subject pool participants could probably be used. Statistical analysis is straightforward.CJ Berry, DR Shanks, RN HensonA single-system account of the relationship between priming, recognition, and fluency.Adult, Attention, Cues, Decision Making, Female, Humans, Judgment, Male, Memory, Short-Term, Models, Statistical, Psychomotor Performance, Reaction Time, Reading, Retention (Psychology), Serial Learning, Signal Detection, Psychological, Speech, Verbal Learninghttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.1.97JEPLMC-34-1-9723Undatedfrank.renkewitz@googlemail.comNot yet vetted11Yes544The experiment is itself a replication of a study by Stark & McClelland (2000).(p. 99) “Of key interest …is [the] observation that the identification RTs for misses (old items judged new) were faster than those for correct rejections (new items judged new). In other words, even though certain items were not remembered, a priming effect still occurred for these items.”Native english speakers; 120 stimulus words with 4 letters selected according to several criteria (frequency od occurrence, imagabilty and concreteness); continuous identification with recognition (CID-R) task (Feustel, Shiffrin, & Salasoo, 1983); participants were tested individually in sound-dampened cubicles.24(p. 100) “Of primary interest for this experiment, RTs to misses were significantly faster than RTs to correct rejections, t(23) = 3.55, p = .002.”0.94JEPLMC34197Reproducibility Papers/JEPLMC/JEPLMC-34-1-97.pdfReproducibility Papers/2008.JEP LMC/34_1_97.pdf10.1037/0278-7393.34.1.9701 JEPLMC171
16
Open for replication; advertised to list on 1/1/13 and 5/22/13.Participants view prime words that vary in duration and in their relationship to a judgement to be made later. All stimuli and tasks are done on computer using standard stimulus presentation software (if you don't have software, contact michael.cohn@gmail.com for help implementing a free alternative). Participants were 118 subject pool undergraduates. The analysis involves a complicated Bayesian model; if you are capable of carrying out the experiment, we can help you find a collaborator to assist with the statistics. CT Weidemann, DE Huber, RM ShiffrinPrime diagnosticity in short-term repetition priming: Is primed evidence discounted, even when it reliably indicates the correct answer?Attention, Choice Behavior, Color Perception, Data Interpretation, Statistical, Humans, Memory, Short-Term, Paired-Associate Learning, Reaction Time, Semanticshttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.2.257JEPLMC-34-2-2571512/26/2012 5:50:03yanshan90@gmail.comNot yet vettedPrime diagnosticity in short-term repetition priming: Is primed evidence discounted, even when it reliably indicates the correct answer?44Yes [Continue to the next question]4. Slightly difficult/uncommon circumstances: Such as easy access but large numbers needed, needing to preselect an otherwise easy-access sample, etc. 5. Not at all difficult/common circumstances: Most labs would be able to carry it out.5. Not at all difficult/common circumstances; most labs would not have any trouble.5. Not at all difficult/common circumstances.A 4 (diagnosticity condition: all positive, all negative, short-positive long-negative, short-negative long-positive) X 3 (prime type: target primed, foil primed, neither primed) X 2 (prime duration: 50ms or short, 1000ms or long) ANOVA showed that:

"Trials with a particular prime duration associated with a positive prime diagnosticity produced an increased preference for the primed alternative, whereas trials with a prime duration associated with a negative prime diagnosticity showed a decreased preference for the primed alternative. ... the association between prime duration and diagnosticity exists even for primes that were not salient." (p. 270)
It was mentioned explicitly in the journal article that these are the most important findings.None.118Interaction of prime type and prime duration: F(2, 228) = 242.25, MSE = 3.07, Hyunh-Feldt (.47) corrected p < .01 Interaction between prime type and prime duration and prime diagnosticity: F(6, 228) =23.42, MSE = 0.30, Huynh-Feldt (.47) corrected p < .01 Interaction between prime diagnosticity and prime type: F(6, 228) = 46.09, MSE = 0.86, Huynh-Feldt (.57) corrected p < .01 Don't have background3. The result is not clearly intuitive or counterintuitiveThe study looked at how priming works through investigating a visual forced-choice perceptual identification task. Preference for the primed alternative was strong even when the diagnostic prime can accurately infer the correct answer. This suggests that even when primes reliably suggests the correct responses, evidence discounting occurs.JEPLMC342257Reproducibility Papers/JEPLMC/JEPLMC-34-2-257.pdfReproducibility Papers/2008.JEP LMC/34_2_257.pdf10.1037/0278-7393.34.2.25701 JEPLMC183
17
Open for replication; advertised to list on 1/13/13 and 5/22/13.K Dent, RA Johnston, GW HumphreysAge of acquisition and word frequency effects in picture naming: A dual-task investigation.Adolescent, Adult, Attention, Child, Child, Preschool, Female, Humans, Infant, Language Development, Male, Mental Recall, Neural Networks (Computer), Pattern Recognition, Visual, Phonetics, Pitch Discrimination, Reaction Time, Refractory Period, Psychological, Semantics, Verbal Behavior, Verbal Learninghttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.2.282JEPLMC-34-2-282248/1/2012 8:45:21yanshan90@gmail.comNot yet vettedAge of Acquisition and Word Frequency Effects in Picture Naming: A Dual-Task Investigation42A and 2BYes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    5. Not at all difficult/common circumstances: Most labs would be able to carry it out.4. Slightly difficult/uncommon circumstances; it would be an inconvenience for many labs but still feasible for most labs.0. There are no other special considerations for replication feasibility. Experiment 2A: AoA: - Stimulus onset asynchrony (SOA) has a significant effect on picture naming accuracy, but did not interact with age of acquisition (AoA). - As SOA decreased, the effects of age of acquisition AoA on picture naming reaction time diminished, and there was no significant at the shortest SOA (50ms). - Analysis on tone classification accuracy and reaction times revealed no significant main effects or interactions. (p. 292, 293) Experiment 2B: Word Frequency - No significant main effects or interactions of word frequency on picture naming accuracy. - Pictures with low-frequency names were named more slowly than their higher-frequency counterparts and responses increasingly delayed as SOA decreased. - Analysis on tone classification accuracy revealed no significant main effects or interactions. - Significant main effect of SOA; there was a small increase in tone classification reaction time as SOA decreases. (p. 293, 294)
None.Experiment 2A: 24; Experiment 2B: 42Experiment 2A: - Picture naming accuracy, effect of SOA significant, F1(2, 46) = 2.76, MSE = 0.005, p < .05; F2(2, 116) = 4.25, MSE = 0.005, p < .05 - Picture naming reaction time, effect of AoA significant, F1(1,23) = 7.10, MSE = 9115.61, p < .05; F2(1,58) = 4.34, MSE = 23925.33, p < .04 - Picture naming reaction time, effect of SOA significant, F1(2,46) = 93.70, MSE = 16335.89, p < .00001; F2(2, 116) = 3.50, MSE = 10080.3. p < .00001, n2p = 0.24 - Interaction effect of SOA and AoA on picture naming reaction time, F1(2, 46) = 4.15, MSE = 6520.95, p < .05; F2(2,116) = 3.50, MSE = 10080.30, p < .05 (p. 292, 293) Experiment 2B: - Picture naming reaction time, effect of word frequency significant, F1(1,41) = 42.38, MSE = 5750.99, p < .00001; F2(1,58) = 9.06, MSE = 16130.61, p < .005, n2p = .5 - Picture naming reaction time, effect of SOA significant, F1(2,28) = 178.71, MSE = 19229.65, p < .00001; F2(2, 116) = 295.88, MSE = 7795.85. p < .00001 - Tone classification reaction time, effect of SOA significant, f1(2, 82) = 5.53, MSE = 6464.32, p < .01; F2(2,116) = 3.25, MSE = 6743.94, p < .05 (p.293, 294) "Analyses were conducted both by participants (F1) and by items (F2). SOA was a repeated measure in all analyses, whereas word frequency (WF) and AoA were repeated measures in the participants analyses but between-items variables in the items analyses." p.289Don't have background2. The result aligns somewhat with "popular" intuition about how it would have come outJEPLMC342282Reproducibility Papers/JEPLMC/JEPLMC-34-2-282.pdfReproducibility Papers/2008.JEP LMC/34_2_282.pdf10.1037/0278-7393.34.2.28201 JEPLMC184
18
19
Open for replication. Advertised to list on 8/1/12 and 2/8/13.P Tabossi, R Fanari, K WolfProcessing idiomatic expressions: Effects of semantic compositionality.Adult, Comprehension, Female, Humans, Judgment, Language, Male, Psycholinguistics, Reading, Semanticshttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.2.313JEPLMC-34-2-313107/2/2012 20:15:04jkhartshorne@gmail.comNot yet vettedProcessing idiomatic expressions: Effects of semantic compositionality.54bYes [Continue to the next question]5554Would be nice to have their original materials, but they can be largely reconstructed if necessary.Participants were trained in standard VSL, in which there are 4 triplets. The objects are defined by shape. For 2 of the triplets, there is a reliable pairing between color and shape. For 2 of the triplets, the pairing is non-reliable. Folks still could discriminate the color triplets (that is, with shape information removed) from non-learned color triplets.none8tt(7)=2.892, p=.023, d=1.0230.53. The result is not clearly intuitive or counterintuitiveJEPLMC342313Reproducibility Papers/JEPLMC/JEPLMC-34-2-313.pdfReproducibility Papers/2008.JEP LMC/34_2_313.pdf10.1037/0278-7393.34.2.31301 JEPLMC186
20
Open for replication. Advertised to list on 8/1/12 and 2/8/13M Bassok, SF Pedigo, AT OskarssonPriming addition facts with semantic relations.Adult, Attention, Female, Humans, Male, Mathematics, Mental Recall, Paired-Associate Learning, Problem Solving, Reaction Time, Semanticshttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.2.343JEPLMC-34-2-34325Undatedk.a.ratliff@uvt.nlNot yet vetted22Yes555Experiment 1 showed that a priming context of semantically aligned and misaligned object relations modulates automatic activation of addition facts. Categorically related word primes, which are aligned with addition, led to a significant sum effect. At the same time, semantically unrelated word primes, which are misaligned with addition, did not elicit the sum effect. This study replicated the study one except with different primes that unconfound the effects of relatedness and semantic alignment.The procedure and sitmuli are very well documented.119There was a highly significant Priming Words X Target Digit interaction on reaction time that is consistent with the semantic alignment hypothesis, F(1, 116) = 6.18, MSE = 6.829, p = .014.0.753not reportedJEPLMC342343Reproducibility Papers/JEPLMC/JEPLMC-34-2-343.pdfReproducibility Papers/2008.JEP LMC/34_2_343.pdf10.1037/0278-7393.34.2.34301 JEPLMC188
21
Open for replication; advertised to list on 1/24/13 and 5/29/13.Participants view words and pictures on a computer screen. Words and pictures may be related phonologically, conceptually, or neither, and participants make both vocal and manual responses. Standard stimulus presentation software is required, including the ability to record and time vocal responses. Participants were 14 subject pool undergraduates (native Dutch-speakers, but this is probably not required). Analyses are simple ANOVAs. A RoelofsTracing attention and the activation flow in spoken word planning using eye movements.Association Learning, Attention, Color Perception, Computer Simulation, Eye Movements, Field Dependence-Independence, Humans, Pattern Recognition, Visual, Phonetics, Reaction Time, Reading, Semantics, Verbal Behaviorhttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.2.353JEPLMC-34-2-3532812/21/2012 21:47:28michael.cohn@gmail.comNot yet vettedModeling distributions of immediate memory effects: No strategies needed?22Yes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    5. Not at all difficult/common circumstances: Most labs would be able to carry it out.5. Not at all difficult/common circumstances; most labs would not have any trouble.0. There are no other special considerations for replication feasibility. The main purpose of the experiment was to determine whether a particular mathematical model was a good match for an empirical psycholinguistic finding (when presented with a list of words, people remember the order of presentation better if the words are short as opposed to long). The authors provide statistics showing that the finding was obtained, but I don't see any statistics to support their assertion that their model provides a) adequate fit, or b) better fit than the naive comparison model. Still, I think the proper course would be to replicate experiment 2 and find some way of testing the fit of the model to the empirical data.

"Figure 4 shows the results of separate simulation runs of sets of pseudo-subjects, using the same procedure as Experiment 1. Again, good fits were obtained for both the mean and variance models, with the latter better reproducing the amount of variability seen in the data." (p. 223)
The abstract and introduction point to the model testing as their main goal. The effect itself is clearly not intended, or presented, as a finding.A replication will require repeating the simulation used to produce data from the mathematical models. This could be done using any stats software, or even a spreadsheet.

The stimuli in the study were presented using stimulus presentation software, but precise timing is not important, and the required program could even be replicated in javascript (i.e., an ordinary web page) with little difficulty.

100unknownsee the summary of result field.0.73. The result is not clearly intuitive or counterintuitiveParticipants (100 native english-speaking subject pool undergraduates) view lists of words on a computer screen, and attempt to recall the order of presentation. Empirical performance results were compared to those from a mathematical model. Both simulation and stimulus presentation are simple and can be done without any special software. Analytic methods are not clear and may require conferral with the authors. JEPLMC342353Reproducibility Papers/JEPLMC/JEPLMC-34-2-353.pdfReproducibility Papers/2008.JEP LMC/34_2_353.pdf10.1037/0278-7393.34.2.35301 JEPLMC189
22
Open for replication; advertised to list on 2/1/13 and 6/4/13.Participants (12 subject pool undergraduates) listen to "noisy" stimuli and try to repeat the word that was spoken. The dependent variables are latency and accuracy. The analysis is a regression model testing for linear and quadratic effects of the interaction between the usefulness and dangerousness of the object named. All stimuli and tasks are done on computer using standard stimulus presentation software (if you don't have software, contact michael.cohn@gmail.com for help implementing a free alternative). The study may require a noise-attenuating room, or a normal experiment room may be adequate.LH Wurm, SR SeamanSemantic effects in naming perceptual identification but not in delayed naming: Implications for models and tasks.Attention, Comprehension, Concept Formation, Decision Making, Humans, Judgment, Memory, Short-Term, Perceptual Distortion, Psychoacoustics, Psycholinguistics, Reaction Time, Semantics, Speech Perception, Verbal Behaviorhttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.2.381JEPLMC-34-2-381312/22/2012 23:24:21michael.cohn@gmail.comNot yet vettedSemantic effects in naming perceptual identification but not in delayed naming: Implications for models and tasks44Yes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    5. Not at all difficult/common circumstances: Most labs would be able to carry it out.5. Not at all difficult/common circumstances; most labs would not have any trouble.5. Not at all difficult/common circumstances."Results of the statistical analysis on errors are shown in Table 9. The significant interaction between danger and usefulness is shown in Figure 6, which looks very much like Figure 4. This indicates that the reversal of the typical interaction was not due to the absence of a speed requirement in Experiment 3. Instead, it seems to have been due to the perceptual task being made consid- erably more difficult." p.391

The dependent variable is time taken to repeat a word after hearing it. The inference test is a regression model estimating both linear and quadratic effects of the interaction between the named object's dangerousness and its usefulness.
According to the introduction to experiment 4, the primary result was actually a comparison between the directions of the interaction effects in experiment 4, which used noisy, difficult stimuli, to their directions in experiment 1, which used the same stimuli without noise. I judged that it would be adequate to replicate experiment 4 and look for significant interactions in the same direction.

The authors present results for both accuracy and latency. I judged that latency is more central, but a closer reading may favor accuracy.
The study used a "sound-attenuating booth" but an isolated experiment room may be sufficient. Need to ask the authors. The study also requires stimulus presentation software that can record verbal responses (this can be done with the free software psychopy). 12B(linear) = .2086, t=-1.33, ns. B(quadratic) = -.0643, t=-2.24, p<.05From table 10, p. 3930.94. The result is somewhat counter to "popular" intuitionParticipants (12 subject pool undergraduates) listen to "noisy" stimuli and try to repeat the word that was spoken. The dependent variables are latency and accuracy. The analysis is a regression model testing for linear and quadratic effects of the interaction between the usefulness and dangerousness of the object named.

All stimuli and tasks are done on computer using standard stimulus presentation software (if you don't have software, contact michael.cohn@gmail.com for help implementing a free alternative). The study may require a noise-attenuating room, or a normal experiment room may be adequate.
JEPLMC342381Reproducibility Papers/JEPLMC/JEPLMC-34-2-381.pdfReproducibility Papers/2008.JEP LMC/34_2_381.pdf10.1037/0278-7393.34.2.38101 JEPLMC191
23
Open for replication; advertised to list on 1/24/13 and 6/4/13.Participants stand blindfolded in the center of a ring-shaped turntable, and attempt to estimate the locations of various objects as the turntable rotates around them. Participants were 24 undergraduates who received unspecified monetary compensation (but subject pool participants are probably fine). Analyses were simple ANOVAs. The turntable apparatus could probably be built relatively cheaply by a researcher with appropriate handyperson skills. W Mou, X Li, TP McNamaraBody-and environmental-stabilized processing of spatial knowledge.Adult, Female, Humans, Imagination, Kinesthesis, Male, Mental Recall, Orientation, Pattern Recognition, Visual, Psychophysics, Social Environment, Space Perception, Walkinghttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.2.415JEPLMC-34-2-415612/22/2012 23:46:10michael.cohn@gmail.comNot yet vettedBody-and environmental-stabilized processing of spatial knowledge55Yes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    5. Not at all difficult/common circumstances: Most labs would be able to carry it out.3. Moderately difficult/special circumstances; a subset of labs may be able to do this.0. There are no other special considerations for replication feasibility. The study requires an unusual apparatus: a large, ring-shaped turntable, with a non-rotating spot in the center for participants to stand on. No lab is likely to have this, but one could be built easily and relatively cheaply."In angular error (see Table 1) [...] the main effect of A-I distance was not significant, F(1, 23) = 0.05, MSE = 54.34, p = .05. [...] This finding indicates that participants could not ignore the activation of body- stabilized spatial processing after they had perceived that the layout was stabilized with respect to their body. " p. 420This is the finding that the authors discuss when referring to experiment 5. Experiment 5 also replicated some findings from the previous experiments, so it might be appropriate to examine those as well, in order to reduce the ambiguity introduced by attempting to replicate a null finding. The turntable does not actually turn as part of the experiment itself. However, the authors' concept of the effect indicates that it is critical that participants be shown that it _can_ turn. 24F(1, 23) = 0.05, MSE = 54.34, p = .05. 0.74. The result is somewhat counter to "popular" intuitionParticipants stand blindfolded in the center of a ring-shaped turntable, and attempt to estimate the locations of various objects as the turntable rotates around them. Participants were 24 undergraduates who received unspecified monetary compensation (but subject pool participants are probably fine). Analyses were simple anovas.

The turntable apparatus could probably be built relatively cheaply by a researcher with appropriate handyperson skills.
JEPLMC342415Reproducibility Papers/JEPLMC/JEPLMC-34-2-415.pdfReproducibility Papers/2008.JEP LMC/34_2_415.pdf10.1037/0278-7393.34.2.41501 JEPLMC194
24
Open for replication; advertised to list on 10/11/12 and 4/23/13. Authors phrase the final study as having two equally important outcomes, one questionnaire-based and one EEG-based. EEG may be more important to the intended "take-away" message, however the questionnaire-based outcome was considered non-trivial and theoretically meaningful on its own. No special considerations other than the possible EEG.E Harmon-Jones, C Harmon-Jones, M Fearn, JD Sigelman, P JohnsonLeft frontal cortical activation and spreading of alternatives: Tests of the action-based model of dissonance.Analysis of Variance, Attitude, Biofeedback, Psychology, Brain Mapping, Cognitive Dissonance, Decision Making, Female, Humans, Male, Models, Psychological, Prefrontal Cortex [physiology]http://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0022-3514.94.1.1JPSP-94-1-148Undatednosek@virginia.eduNot yet vetted225552EEG component is not critical for testing a primary effect. However, the primary effect is a 3-way interaction and the real prediction is a specific pattern.; Other difficulty rating is for EEG; Study 2 is conceptual replication of study 1(p. 7) We predicted that greater spreading of alternatives would occur in the action-oriented condition as compared with the other conditions. Also, we predicted that the action-oriented condition would cause greater relative left frontal activity than the other conditions.2nd prediction is EEG dependent, first prediction is not; undergrads57(p. 9) To test the effects of mindset manipulation on attitudes, a 3 (mindset) between-participants X 2 (predecision vs. postdecision) X 2 (chosen vs. rejected alternative) within-participants ANOVA was performed. It produced the critical three-way interaction, F(2, 54) = 3.19, p < .05, partial eta-sq = .11; see Figure 2.http://www.socialemotiveneuroscience.org/pubs/hj_diss_neurofeedback_2008jpsp.pdfJPSP9411Reproducibility Papers/JPSP/JPSP-94-1-1.pdfReproducibility Papers/2008.JPSP/psp 94-1/psp-94-1-1.pdf10.1037/0022-3514.94.1.101 JPSP1
25
Open for replication. Advertised to list on 8/1/12 and 4/16/13ER Hirt, EE Devers, SM McCreaI want to be creative: Exploring the role of hedonic contingency theory in the positive mood-cognitive flexibility link.Adult, Affect, Cognition, Creativeness, Female, Humans, Male, Psychological Theoryhttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0022-3514.94.2.94.2.214JPSP-94-2-21439Undatedelisamaria.galliani@unipd.itgustav.nilsonne@ki.se33Yes554.videos for manipulating moods should be asked to the first author(p. 225) We predicted that participants in the nonmood-freezing condition would replicate past research, such that those in a happy mood should perform significantly more creatively than those in the neutral and sad mood conditions. For those in the mood-freezing condition, the predictions were different. We hypothesized that those in the happy mood-freezing condition would show no difference in the creativity of their performance relative to those in neutral or sad moods.undergrads; video manipulations of sad neutral and happy moods; aromatherapy mood-freezing manipulations adapted from Tice (2001); 157Three dep var! (p. 226) 1. "Fluency: there was a significant Mood BY Instruction interaction observed on the number of responses generated, F(2, 151) = 3.59, p =.03. Consistent with past research, mood significantly affected the number of responses generated in the nonmoodfreezing conditions. Happy participants generated a greater number of responses (M=14.2) than either neutral (M =10.6) or sad participants (M=11.5; both ts > 2.0, ps < .05). However, no mood effects were observed in the mood-freezing conditions. No other effects were obtained on this measure." Means of the mood freezing condition: Mh=9.48; Mn=10.83; Ms=12.17.
2. "Flexibility: Mood by Instruction interaction, F(2, 151) = 8.67, p <.001. In the nonmood-freezing conditions, happy participants generated more different categories of responses (M = 4.29) than neutral (M = 3.41) or sad participants (M = 3.64; both ts > 2.5, ps < .001). As was the case with the fluency measure, these mood effects disappeared in the mood-freezing conditions." Means of the mood freezing condition: Mh=3.44; Mn=3.65; Ms=3.63.
3. Originality: " Mood by Instruction interaction, F(2, 151) = 3.11, p<.05. [...] The responses of participants in the happy nonmood-freezing condition (M = 1.69) showed significantly greater originality than participants in the neutral (M = 1.32) and sad nonmood-freezing conditions (M = 1.36; both ts > 3.2, ps < .001). Again, in the mood-freezing conditions, there were no differences as a function of mood, F(2, 71) = 0.11, ns." Means of the mood freezing condition: Mh=1.38; Mn=1.34 Ms=1.34.
http://www.indiana.edu/~hirtlab/docs/publications/Hirtetal2008.pdfJPSP942214Reproducibility Papers/JPSP/JPSP-94-2-214.pdfReproducibility Papers/2008.JPSP/psp 94-2/psp-94-2-214.pdf10.1037/0022-3514.94.2.94.2.21401 JPSP16
26
Open for replication; advertised to list on 10/11/12 and 4/16/13. RK Mallett, TD Wilson, DT GilbertExpect the unexpected: Failure to anticipate similarities leads to an intergroup forecasting error.Adolescent, Adult, Affect, Female, Forecasting, Humans, Interpersonal Relations, Male, Middle Aged, Social Perceptionhttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0022-3514.94.2.94.2.265JPSP-94-2-26527Undatedgsandstrom@psych.ubc.caNot yet vetted4+pilot4Yes555.Study calls for videotaped interactions so that participants' statements about how much they enjoyed their interactions could be verified by coders - this could be omitted. (p. 273) "The study thus was a Type of Report (forecast vs. experience) x Focus (similarity vs. difference) x Race of Partner (White vs. Black) between-participants design. We expected...when people interacted with a Black partner, in that they would make overly negative forecasts when asked to focus on differences. In contrast, we expected people to make relatively positive forecasts about their interactions with a White partner, regardless of whether they focused on differences or similarities."White women undergrads; need ingroup and outgroup confeds81(p. 273) 2 type of report: forecast vs. experience) x 2 (focus: similarities vs. differences) x 2 (race of partner: Black vs. White) between subjects ANOVA. There was a significant main effect of type of report, reflecting the fact that forecasted negativity was greater than the experienced negativity. As hypothesized, this main effect was qualified by a significant three-way interaction, F(1,73) = 7.13, p = .009.When Whites interacted with Blacks, those participants who focused on differences forecasted that the interaction would be more negative than it was. Those who focused on similarities forecasted that the interaction would be positive, and these forecasts did not differ significantly from experiences. A planned contrast showed a significant interaction between focus (similarities, differences) and type of report (forecast, experience) in the Black partner condition. The effect was not significant in the White partner condition. There was no evidence of forecasting errors in the White partner condition. People forecasted that the interaction would go relatively well, and it did go relatively well. There were no significant main effects of focus or type of report nor a significant interaction. There were no significant differences in people’s ratings of the actual interactions with a Black partner versus a White partner."1http://psycnet.apa.org/journals/psp/94/2/265/JPSP942265Reproducibility Papers/JPSP/JPSP-94-2-265.pdfReproducibility Papers/2008.JPSP/psp 94-2/psp-94-2-265.pdf10.1037/0022-3514.94.2.94.2.26501 JPSP19
27
Open for replication; advertised to list on 10/11/12 and 3/11/13.First, subjects listened to a sound of varying number of tones and semitones. Second, subjects listened to a tone, and were asked to choose whether the tone was higher or lower than a randomly selected component of the tone. Using an alternative to null-hypothesis tests based on Cohen's d statistic, the researchers found that the number of tones in a sound counter-intuitively mitigates the negative effect of inter-stimulus interval on subjects' ability to distinguish the type of sound change.["In the study reported here, listeners made same/different judgments on pairs of successive ‘‘chords’’ (sums of pure tones with random frequencies). (...) Performance worsened as the number of tones increased, but this effect was not larger for 2-s ISIs than for 0-ms ISIs. Similar results were obtained when a chord was followed by a single tone that had to be judged as higher or lower than the closest component of the chord" (p.85)] Requires computer-generated sounds (may be able to get from authors). Determining audiological normalcy and accessing an appropriately sound-isolated room may be a major challenge, or reaching the required standards may be fairly simple (or they may not really be necessary) - see original article or try asking the authors for more info.L Demany, W Trost, M Serman, C SemalAuditory change detection: simple sounds are not memorized better than complex soundsAttention, Humans, Judgment, Mental Recall, Music, Pitch Perception, Psychoacoustics, Reaction Time, Sound Spectrographyhttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1111/j.1467-9280.2008.02050.xPS-19-1-85129/30/2012 18:59:38hanowell@uw.edualexander.a.aarts@gmail.comAuditory change detection: simple sounds are not memorized better than complex sounds55Yes [Continue to the next question]5. Not at all difficult/common circumstances: The standard convenience samples available to most researchers.    5. Not at all difficult/common circumstances: Most labs would be able to carry it out.3. Moderately difficult/special circumstances; a subset of labs may be able to do this.0. There are no other special considerations for replication feasibility. Labs will need access to: (1) instruments and expertise in determining whether participants are "audiometrically normal"; (2) an instrument/synthesizer capable of producing sounds with the level of specificity described in the text and of being linked to a computer-assisted survey questionnaire; (3) sound-attenuating listening booths.The primary result is that the simpler the sound (i.e., the fewer individual tones it comprises), the larger the negative effect of inter-stimulus interval on the ability to detect a frequency change between two sounds (see page 89, "Results", 2nd paragraph). The authors' citation (on page 87, "Results", 2nd paragraph) of the following publication as the only high level description of their statistical methods: PR Killeen. 2005. An alternative to null-hypothesis significance tests. Psychol Sci 16(5):345-353. The method derives a probability of replicating an effect given its size and the sample size.["In Experiment 5 (see Fig. 2d), ISI again had a significant main effect, F(3, 9) = 8.5, p = .005, prep = .97, Eta sq = .74. (...) The interaction of N and ISI was significant,F(9, 27) = 2.6, p =.03, prep =.92, Eta sq =.46. However, Figure 2 shows that this small interaction effect is due to the fact that d' was unexpectedly high when N was equal to 1 and the ISI was 0 ms. Paradoxically, therefore, the deleterious effect of longer ISIs on performance was somewhat stronger when N was equal to 1 than when N was larger" (p.89)]none480 [4]There were 4 listeners (see page 89, "Method" > "Participants"). The authors ran 480 trials for each condition and listener (see page 89, "Method" > "Stimuli and procedure", 2nd paragraph). To be honest, the way in which the authors describe the sample size is not very clear. Is it 480 trials per condition and listener? Do they subject listeners to stimuli in 40 trial blocks (as they described for experiments 1 and 2)?F(9, 27) = 2.6, p = .03, p-rep = .92, eta-squared = .46PR Killeen. 2005. An alternative to null-hypothesis significance tests. Psychol Sci 16(5):345-353.0.92[don't have background]5. The result is completely counter to "popular" intuition [3. The result is not clearly intuitive or counterintuitive]First, subjects listened to a sound of varying number of tones and semitones. Second, subjects listened to a tone, and were asked to choose whether the tone was higher or lower than a randomly selected component of the tone. Using an alternative to null-hypothesis tests based on Cohen's d statistic, the researchers found that the number of tones in a sound counter-intuitively mitigates the negative effect of inter-stimulus interval on subjects' ability to distinguish the type of sound change.["In the study reported here, listeners made same/different judgments on pairs of successive ‘‘chords’’ (sums of pure tones with random frequencies). (...) Performance worsened as the number of tones increased, but this effect was not larger for 2-s ISIs than for 0-ms ISIs. Similar results were obtained when a chord was followed by a single tone that had to be judged as higher or lower than the closest component of the chord" (p.85)] http://pss.sagepub.com/content/19/1/85.shortPS19185Reproducibility Papers/PS/PS-19-1-85.pdfReproducibility Papers/2008.PsycScience/1 Jan/85.full.pdf10.1111/j.1467-9280.2008.02050.x01 PS295
28
Open for replication; advertised to list on 2/1/13Participants (240 undergraduate students) BC Storm, EL Bjork, RA BjorkAccelerated relearning after retrieval-induced forgetting: The benefit of being forgotten.Attention, Cues, Humans, Memory, Short-Term, Mental Recall, Paired-Associate Learning, Practice (Psychology)http://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0278-7393.34.1.230JEPLMC-34-1-23097/17/2012 2:06:48milltyl@gmail.com, yanshan90@gmail.comAuthors will vetAccelerated relearning after retrieval-induced forgetting: The benefit of being forgotten.11Yes5550. There are no other special considerations for replication feasibility. (p. 230) "items that were relearned benefited more from that relearning if they had previously been forgotten." Non-practiced items from practiced categories (RP- items) were better recalled after relearning than do non-practiced items from non-practiced categories (Nrp items) (p. 234)Result was directly hypothesised. None240"Effect of relearning was significantly larger for Rp- items than it was for Nrp items, F(1, 190) = 10.49, p< .001, MSE = 0.15. Whereas Nrp items were recalled at a rate of 0.36 (SE = 0.02) and 0.42 (SE = 0.02) before and after relearning, respectively; Rp- items were recalled at a rate of 0.31 (SE = 0.02) and 0.45 (SE = 0.02) before and after relearning, respectively." (p. 234)Don't have background2. The result aligns somewhat with "popular" intuition about how it would have come outmilltyl@gmail.comJEPLMC341230Reproducibility Papers/JEPLMC/JEPLMC-34-1-230.pdfReproducibility Papers/2008.JEP LMC/34_1_230.pdf10.1037/0278-7393.34.1.23001 JEPLMC179
29
30
Open for replication; advertised to list on 10/26/12 and 2/22/13.Study of participants' emotional inferences about photographs or drawings of people displaying pride. Self-report, no special requirements. Primary result is a null effect. Participants were 211 subject pool undergraduates. Primary result is a null effect.JL Tracy, RW RobinsThe nonverbal expression of pride: Evidence for cross-cultural recognition.Adult, Affect [physiology], Africa, Western, Aged, Anger [physiology], Cross-Cultural Comparison, Emotions [physiology], Ethnic Groups [psychology], Facial Expression, Fear [psychology], Female, Happiness, Humans, Italy, Male, Middle Aged, Nonverbal Communication [psychology], Recognition (Psychology) [physiology], Self Concept, Shame, Social Behavior, Students, United Stateshttp://openscienceframework.org/project/EZcUj/files/download/eligible.article.abstracts.html/version/1#10.1037/0022-3514.94.3.516JPSP-94-3-516557/14/2012 1:02:01jin.x.goh@gmail.comaevtv1@gmail.com, vetted 8/21/2012The nonverbal expression of pride: Evidence for cross-cultural recognition.44Yes [Continue to the next question]4. Slightly difficult/uncommon circumstances: Such as easy access but large numbers needed, needing to preselect an otherwise easy-access sample, etc. 5. Not at all difficult/common circumstances: Most labs would be able to carry it out.3. Moderately difficult/special circumstances; a subset of labs may be able to do this.4. Slightly difficult/uncommon circumstances.The authors recruited a professional artist to draw "human characters from the waist up showing anger, contempt, disgust, fear, happiness, pride, sadness, and surprise" (Tracy & Robins, 2008; p.524). These drawings were then modified to produce different gender and race (African, Asian, and Caucasian). If the drawings can be obtained from the authors, then the study would be very feasible for replication. [note that participants saw the drawings on a large screen (projected), so individual cubicles wouldn't work here, this complicates achieving the sample size a little] [further note: article doesn't specify that participants were run individually; they could have been run in groups]"On the basis of binomial tests, recognition rates for all six pride expressions (M 78%, range 70%–87%) were significantly greater than chance ( p < .05), with chance set at 33%" (p.526). Additionally, in a Target Gender (male vs female) x Target Ethnicity (African, Asian, Caucasian) ANOVA, there was a main effect of gender but neither the main effect of ethnicity nor their interaction was significant. This suggests that the expression of pride in different ethnicity and gender can be recognized above chance (but more easily recognized in female targets).[primary result is that there was no effect of participants' ethnicity on recognizing pride, and thus that "pride recognition generalizes across male and female targets of African, Asian, and Caucasian descent."] Using similarly-drawn characters across gender and race, the authors were able to control for confounding variables such as attractiveness and facial physiognomy. Thus, the instrumentation (in this case, the drawings of pride and other emotional expressions) is necessary to get the effect. [students participating (60% female): Participants self- identified their race as Asian (47%), Caucasian (31%), Latino (9%), African American (2%), and other or mixed (11%)."]211FTarget Gender x Target Ethnicity ANOVA: "main effect of gender, F(1, 205) = 3.47, p< .05 (one-tailed; effect was predicted on the basis of the findings of Study 3), but no effect of ethnicity, F(2, 205) = 0.13, ns, and no Gender x Ethnicity interaction, F(2, 205) = 0.17, ns." (p.526). don't have background [.8]2. The result aligns somewhat with "popular" intuition about how it would have come outJPSP943516Reproducibility Papers/JPSP/JPSP-94-3-516.pdfReproducibility Papers/2008.JPSP/psp 94-3/psp-94-3-516.pdf10.1037/0022-3514.94.3.51602 JPSP34
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48