JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 24

The Good, the Bad, the Challenging: �Interrater Reliability

Christine Thomas, PhD, RN, CHSE-A

Vivian Bowman, MSN RN PCCN CNE

Jennifer Wendel, MEd RN MSN CHSE

2 of 24

Objectives

Discuss the purpose and value in establishing interrater reliability
Examine biases that impacts the consistency and objectivity of the rater’s assessment
Appraise the process used to establish interrater reliability

3 of 24

Why IRR is Essential

NLN Jeffries Simulation Theory
Facilitator actions guide learning
Strategies impact engagement and performance
Bias can affect how we judge students
IRR = consistent, fair scoring
IRR reduces bias and improves validity
Critical for grading, progression, and readiness

4 of 24

The Importance of IRR

Promotes consistency and fairness in evaluation
Reduces bias, ensures equitable assessment
Strengthens validity of rubrics and tools
Supports competency-based education (CBE)
Essential for high-stakes decisions
Ensures objective evaluation
Improves clarity and quality of student feedback
Fosters growth in clinical judgment and skill

5 of 24

IRR and Standards of Best Practice

Evaluators must be trained in simulation assessment
Use valid, reliable tools aligned with best practices
Consistent scoring across evaluators is essential
Multiple raters recommended
Programs must validate tools and demonstrate IRR
IRR protects program credibility and decision integrity
Ensures fair, bias-free, defensible evaluations
Reflects commitment to excellence and student fairness

6 of 24

The Impact of IRR on Students

Builds trust in the learning environment
Reduces perceived evaluator bias
Clarifies expectations for clinical competence
Promotes resilience through meaningful feedback
Inconsistent evaluation can erode confidence and learning

7 of 24

Factors that impact IRR

Inter-rater factors

Consistency between raters
Veering away from objectives
Interpretation of descriptions/items
Content knowledge
Systematic bias (dove vs hawk)
Rater bias (irrelevant characteristics)
Gender, race, prior performance of student

8 of 24

Factors that impact IRR

Consistency within the rater

Internal factors

Mood, fatigue, illness

External factors

Order of evaluation, time of day, noise, temperature

Rater drift

Leniency/severity drift
Interpretation drift
Fatigue drift

Image generated by Copilot

9 of 24

Interrater Reliability vs Instrument Validity

Interrater reliability refers to consistency between raters
Instrument validity refers to whether the instrument measures what it is supposed to measure
The instrument must be used consistently between raters

Face validity
Delineate different rating options
Training is crucial

10 of 24

Establishing IRR

Training best done with all evaluators in a group to reach consensus and enable calibration

Ideal to observe performances at all levels of proficiency and identify behaviors that are characteristic at each level

Consensus approach most common (Percent agreement or Cohen’s kappa statistic)

11 of 24

Simulation Rubrics

Simulation rubrics require interrater reliability training
Creighton Competency Evaluation Instrument (CCEI):

Measures competencies (communication, assessment, critical thinking, & skills)
Defining qualifiers for each case improves IRR

Lasater Clinical Judgment Rubric (LCJR):

Assesses clinical judgment development
Requires faculty calibration for consistent scoring

12 of 24

Lasater Clinical Judgement Rubric

13 of 24

Lasater Clinical Judgement Rubric for Today

14 of 24

Let’s Practice

Watch video
Rate student performance
Write notes as to rationale for rating

15 of 24

Video #1

16 of 24

Inter rater reliability calculations

Kappa
ICC
Percentage of Agreement

17 of 24

18 of 24

	NOTICING						RESPONDING
	Focused Observation	agree 1/0	Recognizing Deviations�from Expected Patterns	agree 1/0	Information Seeking	agree 1/0	Calm, Confident Manner	agree 1/0	Clear Communication	agree 1/0	Well-Planned�Intervention/ Flexibility	agree 1/0	Being Skillful	agree 1/0
Example
Example
Chris	3	0	3	0	1	1	2	1	3	1	3	0	2	1
Vivian	1	0	1	1	1	1	1	0	3	1	1	0	1	0
Jen	2	1	2	0	1	1	2	1	3	1	1	0	2	1
Carrie	2	1	1	1	3	0	2	1	3	1	2	1	2	1
Emily	2	1	3	0	2	0	2	1	3	1	2	1	2	1

item agreement percentage		0.6		0.4		0.6		0.8		1		0.4		0.8

overall percentage		0.6

19 of 24

Calculate Percent Agreement

Discussion

What bias do you have?

Can you come to a consensus or agreement?

20 of 24

Video #2

21 of 24

Recalculate Percent Agreement

22 of 24

Maintaining IRR

Regular calibration and refresher training
Share feedback regarding use of tool and process
Clear and detailed rubrics
Monitor rater performance
Encourage open communication about uncertainties or challenges in scoring.

One-Time Training

23 of 24

Thank You

Christine Thomas PhD, RN, CHSE-A christine.thomas@gwu.edu

Vivian Bowman MSN, RN, PCCN, CNE vivian.bowman@gwu.edu

Jennifer Wendel MEd, MSN, RN, CHSE jennifer.wendel@gwu.edu

24 of 24

References

Adamson, K.A, Kardong-Edren, S. A (2012). A method and resources for assessing the reliability of simulation evaluation instruments. Nursing Education Perspectives, 33(5),334-339.

Burns, M. K. (2014). How to establish interrater reliability. Nursing, 44(10), 56–57. https://doi.org/10.1097/01.NURSE.0000453705.41413.c6

Haerling, K. A. (2021). Simulation evaluation. In P. R. Jeffries (Ed.), Simulation in nursing education: From conceptualization to evaluation (pp. 83–99). Wolters Kluwer.

Lasater, K. (2007). Clinical judgment development: Using simulation to create a rubric. Journal of Nursing Education, 46(11), 496–503.

McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282. https://doi.org/10.11613/BM.2012.031

Oermann, M. H., Kardong-Edgren, S., & Rizzolo, M. A. (2016). Towards an evidence-based methodology for high-stakes evaluation of nursing students’ clinical performance using simulation. Teaching and Learning in Nursing, 11(4), 133–137. https://doi.org/10.1016/j.teln.2016.04.004

Polit, D. F., & Beck, C. T. (2021). Nursing research: Generating and assessing evidence for nursing practice (11th ed.). Wolters Kluwer.

Yudkowsky, R., Park, Y., & Downing, S. (2020). Assessment in health professions education (2^nd ed.). Routledge.