• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自我承认的随机对照试验出版物局限性的自动分类。

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications.

机构信息

School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA.

Department of Biological Sciences, Binghamton University, 4400 Vestal Parkway East, New York City, 13902, NY, USA.

出版信息

J Biomed Inform. 2024 Apr;152:104628. doi: 10.1016/j.jbi.2024.104628. Epub 2024 Mar 26.

DOI:10.1016/j.jbi.2024.104628
PMID:38548008
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11807350/
Abstract

OBJECTIVE

Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications.

METHODS

We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale.

RESULTS

Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F score (0.821 vs. 0.8) with statistical significance (p<.001). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance (p<.001).

CONCLUSION

The model could support automated screening tools which can be used by journals to draw the authors' attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.

摘要

目的

在科学出版物中承认研究局限性是科学透明性和进展的关键要素。然而,局限性报告往往不够充分。自然语言处理(NLP)方法可以支持自动化报告检查,从而提高研究的透明度。在这项研究中,我们的目标是开发一个数据集和 NLP 方法,以检测和分类随机对照试验(RCT)出版物中自我承认的局限性(例如,样本量、盲法)。

方法

我们创建了 RCT 研究中局限性类型的数据模型,并使用该数据模型对 200 篇全文 RCT 出版物进行注释。我们微调了基于 BERT 的句子分类模型,以识别限制句子及其类型。为了解决注释语料库规模较小的问题,我们尝试了数据增强方法,包括简单数据增强(EDA)和基于提示的数据增强(PromDA)。我们将表现最好的模型应用于大约 12K RCT 出版物集,以更大规模地描述自我承认的局限性。

结果

我们的数据模型由 15 个类别和 24 个子类别组成(例如,人群及其子类别诊断标准)。我们在 952 个句子中注释了 1090 个局限性类型实例(每个文章有 4.8 个限制句子和 5.5 个限制类型)。用于限制句子分类的微调 PubMedBERT 模型在 F 分数(0.821 对 0.8)上比我们早期的模型提高了约 1.5 个百分点,具有统计学意义(p<.001)。我们表现最好的局限性类型分类模型,使用 PromDA(Output View)的微调 PubMedBERT,实现了 0.7 的 F 分数,比原始的 PubMedBERT 模型提高了 2.7 个百分点,具有统计学意义(p<.001)。

结论

该模型可以支持自动筛选工具,期刊可以使用这些工具提请作者注意报告问题。从 RCT 出版物中自动提取局限性可以使同行评审和证据综合受益,并支持从临床试验文献中搜索和汇总证据的高级方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/c0cd62ec914e/nihms-2053914-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/e036b9b0c363/nihms-2053914-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/42b15cb9c687/nihms-2053914-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/5c25c9dfa0f4/nihms-2053914-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/90676e72475c/nihms-2053914-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/c0aa52817ddb/nihms-2053914-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/c0cd62ec914e/nihms-2053914-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/e036b9b0c363/nihms-2053914-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/42b15cb9c687/nihms-2053914-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/5c25c9dfa0f4/nihms-2053914-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/90676e72475c/nihms-2053914-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/c0aa52817ddb/nihms-2053914-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2af/11807350/c0cd62ec914e/nihms-2053914-f0003.jpg

相似文献

1
Automatic categorization of self-acknowledged limitations in randomized controlled trial publications.自我承认的随机对照试验出版物局限性的自动分类。
J Biomed Inform. 2024 Apr;152:104628. doi: 10.1016/j.jbi.2024.104628. Epub 2024 Mar 26.
2
Text classification models for assessing the completeness of randomized controlled trial publications based on CONSORT reporting guidelines.基于 CONSORT 报告规范的评估随机对照试验出版物完整性的文本分类模型。
Sci Rep. 2024 Sep 17;14(1):21721. doi: 10.1038/s41598-024-72130-7.
3
CONSORT-TM: Text classification models for assessing the completeness of randomized controlled trial publications.CONSORT-TM:用于评估随机对照试验出版物完整性的文本分类模型。
medRxiv. 2024 Apr 1:2024.03.31.24305138. doi: 10.1101/2024.03.31.24305138.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Automatic recognition of self-acknowledged limitations in clinical research literature.临床研究文献中自我承认局限性的自动识别。
J Am Med Inform Assoc. 2018 Jul 1;25(7):855-861. doi: 10.1093/jamia/ocy038.
6
Toward assessing clinical trial publications for reporting transparency.迈向评估临床试验出版物报告的透明度。
J Biomed Inform. 2021 Apr;116:103717. doi: 10.1016/j.jbi.2021.103717. Epub 2021 Feb 26.
7
Assessing citation integrity in biomedical publications: corpus annotation and NLP models.评估生物医学出版物的引文完整性:语料库标注和自然语言处理模型。
Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae420.
8
PICO entity extraction for preclinical animal literature.针对临床前动物文献的 PICO 实体抽取。
Syst Rev. 2022 Sep 30;11(1):209. doi: 10.1186/s13643-022-02074-4.
9
Methodological information extraction from randomized controlled trial publications: a pilot study.从随机对照试验出版物中提取方法学信息:一项初步研究。
AMIA Annu Symp Proc. 2023 Apr 29;2022:542-551. eCollection 2022.
10
Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation.增强电子健康记录中的自杀行为检测:一种基于变压器模型和语义检索注释的多标签自然语言处理框架。
J Biomed Inform. 2025 Jan;161:104755. doi: 10.1016/j.jbi.2024.104755. Epub 2024 Dec 2.

引用本文的文献

1
The emergence of large language models as tools in literature reviews: a large language model-assisted systematic review.大语言模型作为文献综述工具的出现:一项大语言模型辅助的系统综述
J Am Med Inform Assoc. 2025 Jun 1;32(6):1071-1086. doi: 10.1093/jamia/ocaf063.
2
SPIRIT-CONSORT-TM: a corpus for assessing transparency of clinical trial protocol and results publications.SPIRIT-CONSORT-TM:一个用于评估临床试验方案和结果出版物透明度的语料库。
Sci Data. 2025 Feb 28;12(1):355. doi: 10.1038/s41597-025-04629-1.
3
SPIRIT-CONSORT-TM: a corpus for assessing transparency of clinical trial protocol and results publications.

本文引用的文献

1
Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach.使用特定章节学习方法从随机对照试验摘要中进行精确的PICO提取。
Bioinformatics. 2023 Sep 5;39(9). doi: 10.1093/bioinformatics/btad542.
2
Methodology reporting improved over time in 176,469 randomized controlled trials.方法学报告在 176469 项随机对照试验中随着时间的推移而改善。
J Clin Epidemiol. 2023 Oct;162:19-28. doi: 10.1016/j.jclinepi.2023.08.004. Epub 2023 Aug 9.
3
Methodological information extraction from randomized controlled trial publications: a pilot study.
SPIRIT-CONSORT-TM:一个用于评估临床试验方案和结果出版物透明度的语料库。
medRxiv. 2025 Jan 15:2025.01.14.25320543. doi: 10.1101/2025.01.14.25320543.
从随机对照试验出版物中提取方法学信息:一项初步研究。
AMIA Annu Symp Proc. 2023 Apr 29;2022:542-551. eCollection 2022.
4
Investigating the impact of weakly supervised data on text mining models of publication transparency: a case study on randomized controlled trials.研究弱监督数据对出版物透明度文本挖掘模型的影响:以随机对照试验为例。
AMIA Jt Summits Transl Sci Proc. 2022 May 23;2022:254-263. eCollection 2022.
5
Is the future of peer review automated?同行评审的未来是自动化的吗?
BMC Res Notes. 2022 Jun 11;15(1):203. doi: 10.1186/s13104-022-06080-6.
6
Rise of the preprint: how rapid data sharing during COVID-19 has changed science forever.预印本的兴起:新冠疫情期间的快速数据共享如何永远改变了科学。
Nat Med. 2022 Jan;28(1):2-5. doi: 10.1038/s41591-021-01654-6.
7
Toward assessing clinical trial publications for reporting transparency.迈向评估临床试验出版物报告的透明度。
J Biomed Inform. 2021 Apr;116:103717. doi: 10.1016/j.jbi.2021.103717. Epub 2021 Feb 26.
8
Following the science? Comparison of methodological and reporting quality of covid-19 and other research from the first wave of the pandemic.跟随科学?对大流行第一波期间的新冠病毒和其他研究的方法学和报告质量进行比较。
BMC Med. 2021 Feb 23;19(1):46. doi: 10.1186/s12916-021-01920-x.
9
Methodological quality of COVID-19 clinical research.COVID-19 临床研究的方法学质量。
Nat Commun. 2021 Feb 11;12(1):943. doi: 10.1038/s41467-021-21220-5.
10
Automated screening of COVID-19 preprints: can we help authors to improve transparency and reproducibility?新冠疫情预印本的自动化筛选:我们能否帮助作者提高透明度和可重复性?
Nat Med. 2021 Jan;27(1):6-7. doi: 10.1038/s41591-020-01203-7.