Suppr超能文献

推进药物基因组学研究:使用SpaCy自然语言处理框架从PubMed中自动提取见解。

Advancing pharmacogenomics research: automated extraction of insights from PubMed using SpaCy NLP framework.

作者信息

Dos Reis Esther Camilo, Caneppa Santiago, Vasconcelos Pedro, de Lima Santos Paulo Caleb Júnior

机构信息

INFAR - Instituto de Farmacologia e Biologia Molecular, Universidade Federal de São Paulo (UNIFESP), São Paulo, Brasil.

Research and Development Area, Gntech Exames, Florianópolis, Santa Catarina, Brazil.

出版信息

Pharmacogenomics. 2024;25(14-15):573-578. doi: 10.1080/14622416.2024.2429946. Epub 2024 Nov 20.

Abstract

This paper presents a methodology for automatically extracting insights from PubMed articles using a Natural Language Processing (NLP) framework. Our approach, leveraging advanced NLP techniques and Named Entity Recognition (NER), is crucial for advancing pharmacogenomics and other scientific fields that benefit from streamlined access to literature through automated services like RESTful APIs.Building a new NLP model presents several challenges. First, it is essential to have a thorough understanding of the field in order to define relevant entities. Second, the construction of a diverse and consistent set of examples is crucial. Finally, the effective utilization of pre-established models is of paramount importance, as demonstrated in this work.Our model, validated via ten-fold cross-validation, achieved over 70% recall and precision for all entities in the training set. We provide a reproducible pipeline for the scientific community and propose a structured approach for qualitative analysis and clustering of results. This methodology refines literature reviews, optimizes knowledge extraction, and supports broader application across diverse research domains. An online platform could further extend these benefits to researchers, educators, and practitioners.

摘要

本文介绍了一种使用自然语言处理(NLP)框架从PubMed文章中自动提取见解的方法。我们的方法利用先进的NLP技术和命名实体识别(NER),对于推进药物基因组学和其他科学领域至关重要,这些领域受益于通过像RESTful API这样的自动化服务简化对文献的访问。构建一个新的NLP模型存在几个挑战。首先,必须对该领域有透彻的了解,以便定义相关实体。其次,构建一组多样化且一致的示例至关重要。最后,如本工作所示,有效利用预先建立的模型至关重要。我们的模型通过十折交叉验证进行了验证,在训练集中对所有实体的召回率和精确率均超过70%。我们为科学界提供了一个可重复的流程,并提出了一种用于结果定性分析和聚类的结构化方法。这种方法改进了文献综述,优化了知识提取,并支持在不同研究领域的更广泛应用。一个在线平台可以进一步将这些好处扩展到研究人员、教育工作者和从业者。

相似文献

1
Advancing pharmacogenomics research: automated extraction of insights from PubMed using SpaCy NLP framework.
Pharmacogenomics. 2024;25(14-15):573-578. doi: 10.1080/14622416.2024.2429946. Epub 2024 Nov 20.
5
Machine Learning and Natural Language Processing in Mental Health: Systematic Review.
J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
8
Perceptions and experiences of the prevention, detection, and management of postpartum haemorrhage: a qualitative evidence synthesis.
Cochrane Database Syst Rev. 2023 Nov 27;11(11):CD013795. doi: 10.1002/14651858.CD013795.pub2.
9
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
10
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

本文引用的文献

2
Development of a point-of-care genetic test for effective treatment of ischaemic stroke: an early model-based cost-effectiveness analysis.
Wellcome Open Res. 2023 Apr 24;8:183. doi: 10.12688/wellcomeopenres.19202.1. eCollection 2023.
3
Characterization of complex structural variation in the gene loci using single-molecule long-read sequencing.
Front Pharmacol. 2023 Jun 22;14:1195778. doi: 10.3389/fphar.2023.1195778. eCollection 2023.
4
Pharmacogenomics in practice: a review and implementation guide.
Front Pharmacol. 2023 May 18;14:1189976. doi: 10.3389/fphar.2023.1189976. eCollection 2023.
6
Enabling Single-Cell Drug Response Annotations from Bulk RNA-Seq Using SCAD.
Adv Sci (Weinh). 2023 Apr;10(11):e2204113. doi: 10.1002/advs.202204113. Epub 2023 Feb 10.
7
Pharmacogenomics: current status and future perspectives.
Nat Rev Genet. 2023 Jun;24(6):350-362. doi: 10.1038/s41576-022-00572-8. Epub 2023 Jan 27.
8
A systematic review on the cost effectiveness of pharmacogenomics in developing countries: implementation challenges.
Pharmacogenomics J. 2022 May;22(3):147-159. doi: 10.1038/s41397-022-00272-w. Epub 2022 Mar 22.
10
Aging Atlas: a multi-omics database for aging biology.
Nucleic Acids Res. 2021 Jan 8;49(D1):D825-D830. doi: 10.1093/nar/gkaa894.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验