Suppr超能文献

用于摘要筛选任务的机器学习模型 - 健康经济学和结果研究的系统文献综述应用。

Machine learning models for abstract screening task - A systematic literature review application for health economics and outcome research.

机构信息

Intelligent Medical Objects, Houston, TX, USA.

McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.

出版信息

BMC Med Res Methodol. 2024 May 9;24(1):108. doi: 10.1186/s12874-024-02224-3.

Abstract

OBJECTIVE

Systematic literature reviews (SLRs) are critical for life-science research. However, the manual selection and retrieval of relevant publications can be a time-consuming process. This study aims to (1) develop two disease-specific annotated corpora, one for human papillomavirus (HPV) associated diseases and the other for pneumococcal-associated pediatric diseases (PAPD), and (2) optimize machine- and deep-learning models to facilitate automation of the SLR abstract screening.

METHODS

This study constructed two disease-specific SLR screening corpora for HPV and PAPD, which contained citation metadata and corresponding abstracts. Performance was evaluated using precision, recall, accuracy, and F1-score of multiple combinations of machine- and deep-learning algorithms and features such as keywords and MeSH terms.

RESULTS AND CONCLUSIONS

The HPV corpus contained 1697 entries, with 538 relevant and 1159 irrelevant articles. The PAPD corpus included 2865 entries, with 711 relevant and 2154 irrelevant articles. Adding additional features beyond title and abstract improved the performance (measured in Accuracy) of machine learning models by 3% for HPV corpus and 2% for PAPD corpus. Transformer-based deep learning models that consistently outperformed conventional machine learning algorithms, highlighting the strength of domain-specific pre-trained language models for SLR abstract screening. This study provides a foundation for the development of more intelligent SLR systems.

摘要

目的

系统文献综述(SLR)对生命科学研究至关重要。然而,手动选择和检索相关文献可能是一个耗时的过程。本研究旨在:(1)开发两个特定疾病的带注释语料库,一个用于人乳头瘤病毒(HPV)相关疾病,另一个用于肺炎球菌相关儿科疾病(PAPD);(2)优化机器和深度学习模型,以促进 SLR 摘要筛选的自动化。

方法

本研究构建了两个特定疾病的 SLR 筛查语料库,用于 HPV 和 PAPD,其中包含引文元数据和相应的摘要。使用多种机器和深度学习算法的组合以及关键字和 MeSH 术语等特征的精度、召回率、准确性和 F1 分数来评估性能。

结果与结论

HPV 语料库包含 1697 条记录,其中 538 条为相关文章,1159 条为不相关文章。PAPD 语料库包含 2865 条记录,其中 711 条为相关文章,2154 条为不相关文章。除标题和摘要外添加其他特征可将机器学习模型的性能(以准确性衡量)提高 3%,HPV 语料库提高 2%,PAPD 语料库提高 2%。基于转换器的深度学习模型始终优于传统机器学习算法,这突显了针对 SLR 摘要筛选的特定领域预训练语言模型的优势。本研究为开发更智能的 SLR 系统提供了基础。

相似文献

本文引用的文献

3
Recent advances in biomedical literature mining.生物医学文献挖掘的最新进展。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa057.
8
What is a support vector machine?什么是支持向量机?
Nat Biotechnol. 2006 Dec;24(12):1565-7. doi: 10.1038/nbt1206-1565.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验