1 of 1

BACKGROUND

RESULTS

  • Our analysis was restricted to a relatively limited data set comprising of only 31 articles.

  • The evaluation process was conducted by only two graders, decreasing the amount of data available for comparison.

  • Our investigation specifically concentrated on Hepatitis B, omitting any exploration of Hepatitis C, limiting the generalizability of findings.

Efficacy of ChatGPT vs. Cochrane Summaries on Hepatitis B: A Readability Study

Andre Ho1, Angelo Cadiente1, Jamie Chen1, Amber W. Chan1, Andrew S. Boxer2

1Hackensack Meridian School of Medicine, 123 Metro Blvd, Nutley, New Jersey 07110, USA

2Hackensack University Medical Center, 799 Bloomfield Avenue, Suite 111, Verona, NJ 07044

LIMITATIONS

REFERENCES

Search | Cochrane Library. (n.d.). Gastroenterology & Hepatology in Cochrane Topic. Retrieved October 1, 2023, from https://www.cochranelibrary.com/

  • Artificial Intelligence has the potential to improve the accessibility of the ever-growing medical literature.

  • The purpose of this study is to evaluate ChatGPT-generated summaries in readability and quality against Cochrane Review’s Plain Text Summaries in Hepatitis B research.

Each summary was also evaluated by two blinded, independent graders on a 5-point scale for accuracy and adherence to the abstract, with their combined grades compared between datasets.

METHODS

Metrics & Grades

Cochrane Plain Text Summaries

ChatGPT-3.5 Generated Summaries

P-Value

Flesch Kincaid Reading Ease

23.81 (11.71)

23.16 (9.93)

0.816

Flesch Kincaid Grade Level

14.74 (1.70)

14.79 (1.89)

0.910

Gunning Fog Score

17.53 (1.97)

18.00 (2.16)

0.373

Smog Index

12.69 (1.40)

13.13 (1.52)

0.249

Coleman Liau Index

17.02 (1.99)

17.03 (2.00)

0.985

Automated Readability Index

14.55 (1.72)

14.41 (2.62)

0.797

Summative Grade

3.79 (0.87)

4.34 (0.61)

0.00575

Table 1: Mean & Standard Deviation of Readability Metrics and Grades between Cochrane and ChatGPT.

  • Readability scores showed marginal differences between ChatGPT and Cochrane Summaries.

  • T-tests revealed no statistically significant differences in readability metrics.

  • Grading of the summaries showed a statistically significant difference between ChatGPT and Cochrane Summaries.

Cochrane Library��31 abstracts tagged “Hepatitis B”

ChatGPT-3.5 summaries compared with corresponding Cochrane Plain Text Summary

Readability

Summarized with ChatGPT-3.5 (September 25 Version)

Assessed with 6 metrics - each describing amount of formal education required to understand given response

Statistical Analysis

Two-tailed t-test to compare ChatGPT-3.5 generated summaries & Cochrane Plain Text Summaries

CONCLUSION

  • Thus, we conclude that ChatGPT tends to synthesize a stronger article summary at the same reading level than those generated by the Cochrane Review.

  • The implication of this study is that in a given audience, AI-generated summaries produce a more in-depth summary on the topic of Hepatitis B.

DISCUSSION

  • In terms of readability, both AI-authored and human-authored summaries yielded similar outcomes.

  • Yet, ChatGPT yielded more accurate and comprehensive summaries.