Suppr超能文献

基于 CONSORT 报告规范的评估随机对照试验出版物完整性的文本分类模型。

Text classification models for assessing the completeness of randomized controlled trial publications based on CONSORT reporting guidelines.

机构信息

School of Information Sciences, University of Illinois Urbana-Champaign, 501 E Daniel Street, Champaign, IL, 61820, USA.

School of Public Health, Indiana University, Bloomington, IN, USA.

出版信息

Sci Rep. 2024 Sep 17;14(1):21721. doi: 10.1038/s41598-024-72130-7.

Abstract

Complete and transparent reporting of randomized controlled trial publications (RCTs) is essential for assessing their credibility. We aimed to develop text classification models for determining whether RCT publications report CONSORT checklist items. Using a corpus annotated with 37 fine-grained CONSORT items, we trained sentence classification models (PubMedBERT fine-tuning, BioGPT fine-tuning, and in-context learning with GPT-4) and compared their performance. We assessed the impact of data augmentation methods (Easy Data Augmentation (EDA), UMLS-EDA, text generation and rephrasing with GPT-4) on model performance. We also fine-tuned section-specific PubMedBERT models (e.g., Methods) to evaluate whether they could improve performance compared to the single full model. We performed 5-fold cross-validation and report precision, recall, F score, and area under curve (AUC). Fine-tuned PubMedBERT model that uses the sentence along with the surrounding sentences and section headers yielded the best overall performance (sentence level: 0.71 micro-F, 0.67 macro-F; article-level: 0.90 micro-F, 0.84 macro-F). Data augmentation had limited positive effect. BioGPT fine-tuning and GPT-4 in-context learning exhibited suboptimal results. Methods-specific model improved recognition of methodology items, other section-specific models did not have significant impact. Most CONSORT checklist items can be recognized reasonably well with the fine-tuned PubMedBERT model but there is room for improvement. Improved models can underpin the journal editorial workflows and CONSORT adherence checks.

摘要

完整且透明的随机对照试验(RCT)出版物报告对于评估其可信度至关重要。本研究旨在开发文本分类模型,以确定 RCT 出版物是否报告 CONSORT 清单项目。使用标注有 37 个精细 CONSORT 项目的语料库,我们训练了句子分类模型(PubMedBERT 微调、BioGPT 微调以及 GPT-4 的上下文学习)并比较了它们的性能。我们评估了数据增强方法(Easy Data Augmentation (EDA)、UMLS-EDA、GPT-4 生成和改写文本)对模型性能的影响。我们还微调了特定于部分的 PubMedBERT 模型(例如,方法),以评估它们与单个完整模型相比是否能提高性能。我们进行了 5 折交叉验证,并报告精度、召回率、F 分数和曲线下面积(AUC)。使用句子以及周围句子和部分标题的微调 PubMedBERT 模型表现最佳(句子级:0.71 微 F,0.67 宏 F;文章级:0.90 微 F,0.84 宏 F)。数据增强的积极影响有限。BioGPT 微调和 GPT-4 的上下文学习表现不佳。方法特定的模型提高了对方法学项目的识别能力,其他特定于部分的模型没有显著影响。大多数 CONSORT 清单项目可以通过微调的 PubMedBERT 模型进行合理识别,但仍有改进空间。改进后的模型可以为期刊编辑工作流程和 CONSORT 依从性检查提供支持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2913/11408668/a910b8389c75/41598_2024_72130_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验