• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

推进药物基因组学研究:使用SpaCy自然语言处理框架从PubMed中自动提取见解。

Advancing pharmacogenomics research: automated extraction of insights from PubMed using SpaCy NLP framework.

作者信息

Dos Reis Esther Camilo, Caneppa Santiago, Vasconcelos Pedro, de Lima Santos Paulo Caleb Júnior

机构信息

INFAR - Instituto de Farmacologia e Biologia Molecular, Universidade Federal de São Paulo (UNIFESP), São Paulo, Brasil.

Research and Development Area, Gntech Exames, Florianópolis, Santa Catarina, Brazil.

出版信息

Pharmacogenomics. 2024;25(14-15):573-578. doi: 10.1080/14622416.2024.2429946. Epub 2024 Nov 20.

DOI:10.1080/14622416.2024.2429946
PMID:39563601
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11703043/
Abstract

This paper presents a methodology for automatically extracting insights from PubMed articles using a Natural Language Processing (NLP) framework. Our approach, leveraging advanced NLP techniques and Named Entity Recognition (NER), is crucial for advancing pharmacogenomics and other scientific fields that benefit from streamlined access to literature through automated services like RESTful APIs.Building a new NLP model presents several challenges. First, it is essential to have a thorough understanding of the field in order to define relevant entities. Second, the construction of a diverse and consistent set of examples is crucial. Finally, the effective utilization of pre-established models is of paramount importance, as demonstrated in this work.Our model, validated via ten-fold cross-validation, achieved over 70% recall and precision for all entities in the training set. We provide a reproducible pipeline for the scientific community and propose a structured approach for qualitative analysis and clustering of results. This methodology refines literature reviews, optimizes knowledge extraction, and supports broader application across diverse research domains. An online platform could further extend these benefits to researchers, educators, and practitioners.

摘要

本文介绍了一种使用自然语言处理(NLP)框架从PubMed文章中自动提取见解的方法。我们的方法利用先进的NLP技术和命名实体识别(NER),对于推进药物基因组学和其他科学领域至关重要,这些领域受益于通过像RESTful API这样的自动化服务简化对文献的访问。构建一个新的NLP模型存在几个挑战。首先,必须对该领域有透彻的了解,以便定义相关实体。其次,构建一组多样化且一致的示例至关重要。最后,如本工作所示,有效利用预先建立的模型至关重要。我们的模型通过十折交叉验证进行了验证,在训练集中对所有实体的召回率和精确率均超过70%。我们为科学界提供了一个可重复的流程,并提出了一种用于结果定性分析和聚类的结构化方法。这种方法改进了文献综述,优化了知识提取,并支持在不同研究领域的更广泛应用。一个在线平台可以进一步将这些好处扩展到研究人员、教育工作者和从业者。

相似文献

1
Advancing pharmacogenomics research: automated extraction of insights from PubMed using SpaCy NLP framework.推进药物基因组学研究:使用SpaCy自然语言处理框架从PubMed中自动提取见解。
Pharmacogenomics. 2024;25(14-15):573-578. doi: 10.1080/14622416.2024.2429946. Epub 2024 Nov 20.
2
Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks.基于深度学习的自然语言处理模型在系统文献综述任务的全文数据元素提取中的应用。
Sci Rep. 2025 Jun 3;15(1):19379. doi: 10.1038/s41598-025-03979-5.
3
Semi-supervised learning from small annotated data and large unlabeled data for fine-grained Participants, Intervention, Comparison, and Outcomes entity recognition.从小规模标注数据和大规模未标注数据中进行半监督学习,用于细粒度的参与者、干预措施、对照和结果实体识别。
J Am Med Inform Assoc. 2025 Mar 1;32(3):555-565. doi: 10.1093/jamia/ocae326.
4
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
5
Machine Learning and Natural Language Processing in Mental Health: Systematic Review.机器学习和自然语言处理在心理健康中的应用:系统综述。
J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
8
Perceptions and experiences of the prevention, detection, and management of postpartum haemorrhage: a qualitative evidence synthesis.预防、检测和管理产后出血的认知和经验:定性证据综合。
Cochrane Database Syst Rev. 2023 Nov 27;11(11):CD013795. doi: 10.1002/14651858.CD013795.pub2.
9
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
10
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

本文引用的文献

1
Towards pharmacogenomics-guided tuberculosis (TB) therapy: N-acetyltransferase-2 genotypes among TB-infected Kenyans of mixed ethnicity.朝着基于药物基因组学的结核病(TB)治疗迈进:肯尼亚混合族群中感染结核分枝杆菌的乙酰转移酶-2 基因型。
BMC Med Genomics. 2024 Jan 6;17(1):14. doi: 10.1186/s12920-023-01788-1.
2
Development of a point-of-care genetic test for effective treatment of ischaemic stroke: an early model-based cost-effectiveness analysis.用于缺血性中风有效治疗的即时护理基因检测的开发:基于早期模型的成本效益分析
Wellcome Open Res. 2023 Apr 24;8:183. doi: 10.12688/wellcomeopenres.19202.1. eCollection 2023.
3
Characterization of complex structural variation in the gene loci using single-molecule long-read sequencing.使用单分子长读长测序对基因位点的复杂结构变异进行表征。
Front Pharmacol. 2023 Jun 22;14:1195778. doi: 10.3389/fphar.2023.1195778. eCollection 2023.
4
Pharmacogenomics in practice: a review and implementation guide.实践中的药物基因组学:综述与实施指南。
Front Pharmacol. 2023 May 18;14:1189976. doi: 10.3389/fphar.2023.1189976. eCollection 2023.
5
Integrating multiple traits for improving polygenic risk prediction in disease and pharmacogenomics GWAS.综合多种特征以提高疾病和药物基因组学 GWAS 中的多基因风险预测。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad181.
6
Enabling Single-Cell Drug Response Annotations from Bulk RNA-Seq Using SCAD.利用 SCAD 从批量 RNA-Seq 中实现单细胞药物反应注释。
Adv Sci (Weinh). 2023 Apr;10(11):e2204113. doi: 10.1002/advs.202204113. Epub 2023 Feb 10.
7
Pharmacogenomics: current status and future perspectives.药物基因组学:现状与未来展望。
Nat Rev Genet. 2023 Jun;24(6):350-362. doi: 10.1038/s41576-022-00572-8. Epub 2023 Jan 27.
8
A systematic review on the cost effectiveness of pharmacogenomics in developing countries: implementation challenges.系统评价在发展中国家实施药物基因组学的成本效益:实施挑战。
Pharmacogenomics J. 2022 May;22(3):147-159. doi: 10.1038/s41397-022-00272-w. Epub 2022 Mar 22.
9
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews.PRISMA 2020 声明:系统评价报告的更新指南。
BMJ. 2021 Mar 29;372:n71. doi: 10.1136/bmj.n71.
10
Aging Atlas: a multi-omics database for aging biology.衰老图谱数据库:衰老生物学的多组学数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D825-D830. doi: 10.1093/nar/gkaa894.