基于Transformer的语言模型用于生物医学文献中的群组随机试验分类：模型开发与验证

Transformer-Based Language Models for Group Randomized Trial Classification in Biomedical Literature: Model Development and Validation.

作者信息

Aghaarabi Elaheh, Murray David

机构信息

Office of Disease Prevention, National Institutes of Health, 6705 Rockledge Dr, Bethesda, MD, 20892, United States, 1 3014964000.

出版信息

JMIR Med Inform. 2025 May 9;13:e63267. doi: 10.2196/63267.

DOI:10.2196/63267

PMID:40344669

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12148241/

Abstract

BACKGROUND

For the public health community, monitoring recently published articles is crucial for staying informed about the latest research developments. However, identifying publications about studies with specific research designs from the extensive body of public health publications is a challenge with the currently available methods.

OBJECTIVE

Our objective is to develop a fine-tuned pretrained language model that can accurately identify publications from clinical trials that use a group- or cluster-randomized trial (GRT), individually randomized group-treatment trial (IRGT), or stepped wedge group- or cluster-randomized trial (SWGRT) design within the biomedical literature.

METHODS

We fine-tuned the BioMedBERT language model using a dataset of biomedical literature from the Office of Disease Prevention at the National Institute of Health. The model was trained to classify publications into three categories of clinical trials that use nested designs. The model performance was evaluated on unseen data and demonstrated high sensitivity and specificity for each class.

RESULTS

When our proposed model was tested for generalizability with unseen data, it delivered high sensitivity and specificity for each class as follows: negatives (0.95 and 0.93), GRTs (0.94 and 0.90), IRGTs (0.81 and 0.97), and SWGRTs (0.96 and 0.99), respectively.

CONCLUSIONS

Our work demonstrates the potential of fine-tuned, domain-specific language models to accurately identify publications reporting on complex and specialized study designs, addressing a critical need in the public health research community. This model offers a valuable tool for the public health community to directly identify publications from clinical trials that use one of the three classes of nested designs.

摘要

背景

对于公共卫生领域而言，监测近期发表的文章对于及时了解最新研究进展至关重要。然而，利用现有方法从大量公共卫生出版物中识别出具有特定研究设计的研究出版物是一项挑战。

目的

我们的目标是开发一种经过微调的预训练语言模型，该模型能够准确识别生物医学文献中采用组群随机试验（GRT）、个体随机分组治疗试验（IRGT）或阶梯楔形组群随机试验（SWGRT）设计的临床试验出版物。

方法

我们使用美国国立卫生研究院疾病预防办公室的生物医学文献数据集对BioMedBERT语言模型进行了微调。该模型经过训练，可将出版物分类为使用嵌套设计的三类临床试验。在未见过的数据上对模型性能进行了评估，结果表明该模型对每个类别都具有较高的敏感性和特异性。

结果

当我们提出的模型使用未见过的数据进行泛化测试时，它对每个类别的敏感性和特异性都很高，具体如下：阴性（0.95和0.93）、GRT（0.94和0.90）、IRGT（0.81和0.97）以及SWGRT（0.96和0.99）。

结论

我们的工作证明了经过微调的特定领域语言模型在准确识别报告复杂和专业研究设计的出版物方面的潜力，满足了公共卫生研究领域的一项关键需求。该模型为公共卫生领域提供了一个有价值的工具，可直接识别采用三类嵌套设计之一的临床试验出版物。

相似文献

Transformer-Based Language Models for Group Randomized Trial Classification in Biomedical Literature: Model Development and Validation.基于Transformer的语言模型用于生物医学文献中的群组随机试验分类：模型开发与验证

JMIR Med Inform. 2025 May 9;13:e63267. doi: 10.2196/63267.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Benchmarking domain-specific pretrained language models to identify the best model for methodological rigor in clinical studies.对特定领域的预训练语言模型进行基准测试，以确定临床研究中方法严谨性方面的最佳模型。

J Biomed Inform. 2025 Jun;166:104825. doi: 10.1016/j.jbi.2025.104825. Epub 2025 Apr 15.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Influential methods reports for group-randomized trials and related designs.有影响力的群组随机试验及相关设计的方法报告。

Clin Trials. 2022 Aug;19(4):353-362. doi: 10.1177/17407745211063423. Epub 2022 Jan 6.

Deep learning to refine the identification of high-quality clinical research articles from the biomedical literature: Performance evaluation.深度学习改进生物医学文献中高质量临床研究文章的识别：性能评估。

J Biomed Inform. 2023 Jun;142:104384. doi: 10.1016/j.jbi.2023.104384. Epub 2023 May 8.

An Explainable Artificial Intelligence Text Classifier for Suicidality Prediction in Youth Crisis Text Line Users: Development and Validation Study.用于青少年危机短信热线用户自杀倾向预测的可解释人工智能文本分类器：开发与验证研究

JMIR Public Health Surveill. 2025 Jan 29;11:e63809. doi: 10.2196/63809.

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications.自我承认的随机对照试验出版物局限性的自动分类。

J Biomed Inform. 2024 Apr;152:104628. doi: 10.1016/j.jbi.2024.104628. Epub 2024 Mar 26.

Inadequacy of ethical conduct and reporting of stepped wedge cluster randomized trials: Results from a systematic review.阶梯楔形整群随机试验的伦理行为及报告存在不足：一项系统评价的结果

Clin Trials. 2017 Aug;14(4):333-341. doi: 10.1177/1740774517703057. Epub 2017 Apr 8.

Essential Ingredients and Innovations in the Design and Analysis of Group-Randomized Trials.群组随机试验设计与分析的基本要素和创新。

Annu Rev Public Health. 2020 Apr 2;41:1-19. doi: 10.1146/annurev-publhealth-040119-094027. Epub 2019 Dec 23.

本文引用的文献

Evaluating analytic models for individually randomized group treatment trials with complex clustering in nested and crossed designs.评价嵌套和交叉设计中具有复杂聚类的个体随机分组治疗试验的分析模型。

Stat Med. 2024 Nov 10;43(25):4796-4818. doi: 10.1002/sim.10206. Epub 2024 Sep 3.

Machine learning algorithms to identify cluster randomized trials from MEDLINE and EMBASE.机器学习算法从 MEDLINE 和 EMBASE 中识别群组随机对照试验。

Syst Rev. 2022 Oct 25;11(1):229. doi: 10.1186/s13643-022-02082-4.

Influential methods reports for group-randomized trials and related designs.有影响力的群组随机试验及相关设计的方法报告。

Clin Trials. 2022 Aug;19(4):353-362. doi: 10.1177/17407745211063423. Epub 2022 Jan 6.

Utilizing image and caption information for biomedical document classification.利用图像和标题信息进行生物医学文献分类。

Bioinformatics. 2021 Jul 12;37(Suppl_1):i468-i476. doi: 10.1093/bioinformatics/btab331.

Essential Ingredients and Innovations in the Design and Analysis of Group-Randomized Trials.群组随机试验设计与分析的基本要素和创新。

Annu Rev Public Health. 2020 Apr 2;41:1-19. doi: 10.1146/annurev-publhealth-040119-094027. Epub 2019 Dec 23.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

BioWordVec, improving biomedical word embeddings with subword information and MeSH.BioWordVec，利用子词信息和 MeSH 改进生物医学词向量。

Sci Data. 2019 May 10;6(1):52. doi: 10.1038/s41597-019-0055-0.

Leveraging Wikipedia knowledge to classify multilingual biomedical documents.利用维基百科知识对多语言生物医学文献进行分类。

Artif Intell Med. 2018 Jun;88:37-57. doi: 10.1016/j.artmed.2018.04.007. Epub 2018 May 3.

Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide.机器学习在识别随机对照试验中的应用：评估与实践指南。

Res Synth Methods. 2018 Dec;9(4):602-614. doi: 10.1002/jrsm.1287. Epub 2018 Feb 7.

Therapist variation within randomised trials of psychotherapy: implications for precision, internal and external validity.心理治疗随机试验中的治疗师变异性：对精度、内部和外部有效性的影响。

Stat Methods Med Res. 2010 Jun;19(3):291-315. doi: 10.1177/0962280209105017. Epub 2009 Jul 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验