The Good, the Bad, the Challenging: �Interrater Reliability
Christine Thomas, PhD, RN, CHSE-A
Vivian Bowman, MSN RN PCCN CNE
Jennifer Wendel, MEd RN MSN CHSE
Objectives
Why IRR is Essential
The Importance of IRR
IRR and Standards of Best Practice
The Impact of IRR on Students
Factors that impact IRR
Factors that impact IRR
Image generated by Copilot
Interrater Reliability vs Instrument Validity
Establishing IRR
Training best done with all evaluators in a group to reach consensus and enable calibration
Ideal to observe performances at all levels of proficiency and identify behaviors that are characteristic at each level
Consensus approach most common (Percent agreement or Cohen’s kappa statistic)
Simulation Rubrics
Lasater Clinical Judgement Rubric
Lasater Clinical Judgement Rubric for Today
Let’s Practice
Video #1
Inter rater reliability calculations
|
| NOTICING |
| RESPONDING | ||||||||||||
|
| Focused Observation | agree 1/0 | Recognizing Deviations�from Expected Patterns | agree 1/0 | Information Seeking | agree 1/0 |
| Calm, Confident Manner | agree 1/0 | Clear Communication | agree 1/0 | Well-Planned�Intervention/ Flexibility | agree 1/0 | Being Skillful | agree 1/0 |
| Example |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Chris | 3 | 0 | 3 | 0 | 1 | 1 |
| 2 | 1 | 3 | 1 | 3 | 0 | 2 | 1 |
| Vivian | 1 | 0 | 1 | 1 | 1 | 1 |
| 1 | 0 | 3 | 1 | 1 | 0 | 1 | 0 |
| Jen | 2 | 1 | 2 | 0 | 1 | 1 |
| 2 | 1 | 3 | 1 | 1 | 0 | 2 | 1 |
| Carrie | 2 | 1 | 1 | 1 | 3 | 0 |
| 2 | 1 | 3 | 1 | 2 | 1 | 2 | 1 |
| Emily | 2 | 1 | 3 | 0 | 2 | 0 |
| 2 | 1 | 3 | 1 | 2 | 1 | 2 | 1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| item agreement percentage |
| 0.6 |
| 0.4 |
| 0.6 |
|
| 0.8 |
| 1 |
| 0.4 |
| 0.8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| overall percentage | 0.6 |
| |||||||||||||
Calculate Percent Agreement
Discussion
What bias do you have?
Can you come to a consensus or agreement?
Video #2
Recalculate Percent Agreement
Maintaining IRR
One-Time Training
Thank You
Christine Thomas PhD, RN, CHSE-A christine.thomas@gwu.edu
Vivian Bowman MSN, RN, PCCN, CNE vivian.bowman@gwu.edu
Jennifer Wendel MEd, MSN, RN, CHSE jennifer.wendel@gwu.edu
References
Adamson, K.A, Kardong-Edren, S. A (2012). A method and resources for assessing the reliability of simulation evaluation instruments. Nursing Education Perspectives, 33(5),334-339.
Burns, M. K. (2014). How to establish interrater reliability. Nursing, 44(10), 56–57. https://doi.org/10.1097/01.NURSE.0000453705.41413.c6
Haerling, K. A. (2021). Simulation evaluation. In P. R. Jeffries (Ed.), Simulation in nursing education: From conceptualization to evaluation (pp. 83–99). Wolters Kluwer.
Lasater, K. (2007). Clinical judgment development: Using simulation to create a rubric. Journal of Nursing Education, 46(11), 496–503.
McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282. https://doi.org/10.11613/BM.2012.031
Oermann, M. H., Kardong-Edgren, S., & Rizzolo, M. A. (2016). Towards an evidence-based methodology for high-stakes evaluation of nursing students’ clinical performance using simulation. Teaching and Learning in Nursing, 11(4), 133–137. https://doi.org/10.1016/j.teln.2016.04.004
Polit, D. F., & Beck, C. T. (2021). Nursing research: Generating and assessing evidence for nursing practice (11th ed.). Wolters Kluwer.
Yudkowsky, R., Park, Y., & Downing, S. (2020). Assessment in health professions education (2nd ed.). Routledge.