1 of 1

BACKGROUND

RESULTS

Our analysis was restricted to a relatively limited data set comprising of only 31 articles.

The evaluation process was conducted by only two graders, decreasing the amount of data available for comparison.

Our investigation specifically concentrated on Hepatitis B, omitting any exploration of Hepatitis C, limiting the generalizability of findings.

Efficacy of ChatGPT vs. Cochrane Summaries on Hepatitis B: A Readability Study

Andre Ho¹, Angelo Cadiente¹, Jamie Chen¹, Amber W. Chan¹, Andrew S. Boxer²

¹Hackensack Meridian School of Medicine, 123 Metro Blvd, Nutley, New Jersey 07110, USA

²Hackensack University Medical Center, 799 Bloomfield Avenue, Suite 111, Verona, NJ 07044

LIMITATIONS

REFERENCES

Search | Cochrane Library. (n.d.). Gastroenterology & Hepatology in Cochrane Topic. Retrieved October 1, 2023, from https://www.cochranelibrary.com/

Artificial Intelligence has the potential to improve the accessibility of the ever-growing medical literature.

The purpose of this study is to evaluate ChatGPT-generated summaries in readability and quality against Cochrane Review’s Plain Text Summaries in Hepatitis B research.

Each summary was also evaluated by two blinded, independent graders on a 5-point scale for accuracy and adherence to the abstract, with their combined grades compared between datasets.

METHODS

Metrics & Grades	Cochrane Plain Text Summaries	ChatGPT-3.5 Generated Summaries	P-Value
Flesch Kincaid Reading Ease	23.81 (11.71)	23.16 (9.93)	0.816
Flesch Kincaid Grade Level	14.74 (1.70)	14.79 (1.89)	0.910
Gunning Fog Score	17.53 (1.97)	18.00 (2.16)	0.373
Smog Index	12.69 (1.40)	13.13 (1.52)	0.249
Coleman Liau Index	17.02 (1.99)	17.03 (2.00)	0.985
Automated Readability Index	14.55 (1.72)	14.41 (2.62)	0.797
Summative Grade	3.79 (0.87)	4.34 (0.61)	0.00575

Table 1: Mean & Standard Deviation of Readability Metrics and Grades between Cochrane and ChatGPT.

Readability scores showed marginal differences between ChatGPT and Cochrane Summaries.

T-tests revealed no statistically significant differences in readability metrics.

Grading of the summaries showed a statistically significant difference between ChatGPT and Cochrane Summaries.

Cochrane Library��31 abstracts tagged “Hepatitis B”

ChatGPT-3.5 summaries compared with corresponding Cochrane Plain Text Summary

Readability

Summarized with ChatGPT-3.5 (September 25 Version)

Assessed with 6 metrics - each describing amount of formal education required to understand given response

Statistical Analysis

Two-tailed t-test to compare ChatGPT-3.5 generated summaries & Cochrane Plain Text Summaries

CONCLUSION

Thus, we conclude that ChatGPT tends to synthesize a stronger article summary at the same reading level than those generated by the Cochrane Review.

The implication of this study is that in a given audience, AI-generated summaries produce a more in-depth summary on the topic of Hepatitis B.

DISCUSSION

In terms of readability, both AI-authored and human-authored summaries yielded similar outcomes.

Yet, ChatGPT yielded more accurate and comprehensive summaries.