Šuto Pavičić Jelena, Marušić Ana, Buljan Ivan
Department of Oncology and Radiotherapy, University Hospital of Split, Spinciceva 1, Split, 21000, Croatia, 385 2155817.
Department of Research in Biomedicine in Health, Centre for Evidence-based Medicine, University of Split School of Medicine, Split, Croatia.
JMIR Cancer. 2025 Mar 19;11:e63347. doi: 10.2196/63347.
Plain language summaries (PLSs) of Cochrane systematic reviews are a simple format for presenting medical information to the lay public. This is particularly important in oncology, where patients have a more active role in decision-making. However, current PLS formats often exceed the readability requirements for the general population. There is still a lack of cost-effective and more automated solutions to this problem.
This study assessed whether a large language model (eg, ChatGPT) can improve the readability and linguistic characteristics of Cochrane PLSs about oncology interventions, without changing evidence synthesis conclusions.
The dataset included 275 scientific abstracts and corresponding PLSs of Cochrane systematic reviews about oncology interventions. ChatGPT-4 was tasked to make each scientific abstract into a PLS using 3 prompts as follows: (1) rewrite this scientific abstract into a PLS to achieve a Simple Measure of Gobbledygook (SMOG) index of 6, (2) rewrite the PLS from prompt 1 so it is more emotional, and (3) rewrite this scientific abstract so it is easier to read and more appropriate for the lay audience. ChatGPT-generated PLSs were analyzed for word count, level of readability (SMOG index), and linguistic characteristics using Linguistic Inquiry and Word Count (LIWC) software and compared with the original PLSs. Two independent assessors reviewed the conclusiveness categories of ChatGPT-generated PLSs and compared them with original abstracts to evaluate consistency. The conclusion of each abstract about the efficacy and safety of the intervention was categorized as conclusive (positive/negative/equal), inconclusive, or unclear. Group comparisons were conducted using the Friedman nonparametric test.
ChatGPT-generated PLSs using the first prompt (SMOG index 6) were the shortest and easiest to read, with a median SMOG score of 8.2 (95% CI 8-8.4), compared with the original PLSs (median SMOG score 13.1, 95% CI 12.9-13.4). These PLSs had a median word count of 240 (95% CI 232-248) compared with the original PLSs' median word count of 364 (95% CI 339-388). The second prompt (emotional tone) generated PLSs with a median SMOG score of 11.4 (95% CI 11.1-12), again lower than the original PLSs. PLSs produced with the third prompt (write simpler and easier) had a median SMOG score of 8.7 (95% CI 8.4-8.8). ChatGPT-generated PLSs across all prompts demonstrated reduced analytical tone and increased authenticity, clout, and emotional tone compared with the original PLSs. Importantly, the conclusiveness categorization of the original abstracts was unchanged in the ChatGPT-generated PLSs.
ChatGPT can be a valuable tool in simplifying PLSs as medically related formats for lay audiences. More research is needed, including oversight mechanisms to ensure that the information is accurate, reliable, and culturally relevant for different audiences.
Cochrane系统评价的简明语言摘要(PLS)是一种向普通公众呈现医学信息的简单形式。这在肿瘤学领域尤为重要,因为患者在决策中发挥着更积极的作用。然而,当前的PLS格式往往超出了普通人群的可读性要求。针对这一问题,仍然缺乏经济高效且更自动化的解决方案。
本研究评估了大语言模型(如ChatGPT)能否在不改变证据综合结论的情况下,提高Cochrane关于肿瘤学干预措施的PLS的可读性和语言特征。
数据集包括275篇关于肿瘤学干预措施的Cochrane系统评价的科学摘要及相应的PLS。ChatGPT-4被要求使用以下3个提示将每篇科学摘要转化为PLS:(1)将此科学摘要改写为PLS,使其简化语言可读性量表(SMOG)指数达到6;(2)改写提示1中的PLS,使其更具情感色彩;(3)改写此科学摘要,使其更易阅读且更适合普通受众。使用语言查询与字数统计(LIWC)软件对ChatGPT生成的PLS进行字数、可读性水平(SMOG指数)和语言特征分析,并与原始PLS进行比较。两名独立评估者审查ChatGPT生成的PLS的结论类别,并与原始摘要进行比较以评估一致性。将每篇摘要关于干预措施有效性和安全性的结论分类为确定性(阳性/阴性/相等)、不确定性或不明确。使用Friedman非参数检验进行组间比较。
使用第一个提示(SMOG指数6)生成的ChatGPT PLS最短且最易阅读,中位数SMOG评分为8.2(95%CI 8 - 8.4),而原始PLS的中位数SMOG评分为13.1(95%CI 12.9 - 13.4)。这些PLS的中位数字数为240(95%CI 232 - 248),而原始PLS的中位数字数为364(95%CI 339 - 388)。第二个提示(情感基调)生成的PLS中位数SMOG评分为11.4(95%CI 11.1 - 12),同样低于原始PLS。使用第三个提示(写得更简单易懂)生成的PLS中位数SMOG评分为