Dos Reis Esther Camilo, Caneppa Santiago, Vasconcelos Pedro, de Lima Santos Paulo Caleb Júnior
INFAR - Instituto de Farmacologia e Biologia Molecular, Universidade Federal de São Paulo (UNIFESP), São Paulo, Brasil.
Research and Development Area, Gntech Exames, Florianópolis, Santa Catarina, Brazil.
Pharmacogenomics. 2024;25(14-15):573-578. doi: 10.1080/14622416.2024.2429946. Epub 2024 Nov 20.
This paper presents a methodology for automatically extracting insights from PubMed articles using a Natural Language Processing (NLP) framework. Our approach, leveraging advanced NLP techniques and Named Entity Recognition (NER), is crucial for advancing pharmacogenomics and other scientific fields that benefit from streamlined access to literature through automated services like RESTful APIs.Building a new NLP model presents several challenges. First, it is essential to have a thorough understanding of the field in order to define relevant entities. Second, the construction of a diverse and consistent set of examples is crucial. Finally, the effective utilization of pre-established models is of paramount importance, as demonstrated in this work.Our model, validated via ten-fold cross-validation, achieved over 70% recall and precision for all entities in the training set. We provide a reproducible pipeline for the scientific community and propose a structured approach for qualitative analysis and clustering of results. This methodology refines literature reviews, optimizes knowledge extraction, and supports broader application across diverse research domains. An online platform could further extend these benefits to researchers, educators, and practitioners.
本文介绍了一种使用自然语言处理(NLP)框架从PubMed文章中自动提取见解的方法。我们的方法利用先进的NLP技术和命名实体识别(NER),对于推进药物基因组学和其他科学领域至关重要,这些领域受益于通过像RESTful API这样的自动化服务简化对文献的访问。构建一个新的NLP模型存在几个挑战。首先,必须对该领域有透彻的了解,以便定义相关实体。其次,构建一组多样化且一致的示例至关重要。最后,如本工作所示,有效利用预先建立的模型至关重要。我们的模型通过十折交叉验证进行了验证,在训练集中对所有实体的召回率和精确率均超过70%。我们为科学界提供了一个可重复的流程,并提出了一种用于结果定性分析和聚类的结构化方法。这种方法改进了文献综述,优化了知识提取,并支持在不同研究领域的更广泛应用。一个在线平台可以进一步将这些好处扩展到研究人员、教育工作者和从业者。