• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种从文献中检索药物基因组学关联的新型文本挖掘方法。

A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature.

作者信息

Pandi Maria-Theodora, van der Spek Peter J, Koromina Maria, Patrinos George P

机构信息

Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece.

Erasmus University Medical Center, Faculty of Medicine and Health Sciences, Department of Pathology, Bioinformatics Unit, Rotterdam, Netherlands.

出版信息

Front Pharmacol. 2020 Nov 10;11:602030. doi: 10.3389/fphar.2020.602030. eCollection 2020.

DOI:10.3389/fphar.2020.602030
PMID:33343371
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7748107/
Abstract

Text mining in biomedical literature is an emerging field which has already been shown to have a variety of implementations in many research areas, including genetics, personalized medicine, and pharmacogenomics. In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations. The code that was used toward this end was implemented using R programming language, either through custom scripts, where needed, or through utilizing functions from existing libraries. Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central. Terms that denote a Mutation or a Gene as well as Chemical compound terms corresponding to drug compounds were normalized and the sentences containing the aforementioned terms were filtered and preprocessed to create appropriate training sets. Finally, after training and adequate hyperparameter tuning, four text classifiers were created and evaluated (FastText, Linear kernel SVMs, XGBoost, Lasso, and Elastic-Net Regularized Generalized Linear Models) with regard to their performance in identifying pharmacogenomics associations. Although further improvements are essential toward proper implementation of this text-mining approach in the clinical practice, our study stands as a comprehensive, simplified, and up-to-date approach for the identification and assessment of research articles enriched in clinically relevant pharmacogenomics relationships. Furthermore, this work highlights a series of challenges concerning the effective application of text mining in biomedical literature, whose resolution could substantially contribute to the further development of this field.

摘要

生物医学文献中的文本挖掘是一个新兴领域,已被证明在许多研究领域有多种应用,包括遗传学、个性化医学和药物基因组学。在本研究中,我们描述了一种用于提取药物基因组学关联的新型文本挖掘方法。为此使用的代码是用R编程语言实现的,必要时通过自定义脚本,或利用现有库中的函数。从PubMed中提取与指定查询对应的文章(摘要或全文),同时通过PubTator Central获得概念注释。对表示突变或基因的术语以及与药物化合物对应的化学化合物术语进行规范化处理,并对包含上述术语的句子进行过滤和预处理,以创建适当的训练集。最后,经过训练和适当的超参数调整,创建并评估了四个文本分类器(FastText、线性核支持向量机、XGBoost、套索回归和弹性网络正则化广义线性模型)在识别药物基因组学关联方面的性能。尽管要在临床实践中正确实施这种文本挖掘方法还需要进一步改进,但我们的研究是一种全面、简化且最新的方法,用于识别和评估富含临床相关药物基因组学关系的研究文章。此外,这项工作突出了在生物医学文献中有效应用文本挖掘所面临的一系列挑战,解决这些挑战可能会极大地促进该领域的进一步发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68db/7748107/204cbe3fddb5/fphar-11-602030-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68db/7748107/28eab589d2f9/fphar-11-602030-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68db/7748107/8883a0b171d8/fphar-11-602030-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68db/7748107/204cbe3fddb5/fphar-11-602030-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68db/7748107/28eab589d2f9/fphar-11-602030-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68db/7748107/8883a0b171d8/fphar-11-602030-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68db/7748107/204cbe3fddb5/fphar-11-602030-g003.jpg

相似文献

1
A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature.一种从文献中检索药物基因组学关联的新型文本挖掘方法。
Front Pharmacol. 2020 Nov 10;11:602030. doi: 10.3389/fphar.2020.602030. eCollection 2020.
2
PubTator central: automated concept annotation for biomedical full text articles.PubTator 中心:用于生物医学全文文章的自动概念标注。
Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593. doi: 10.1093/nar/gkz389.
3
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.miRiaD:一种用于检测微小RNA与疾病关联的文本挖掘工具。
J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.
4
Mining the pharmacogenomics literature--a survey of the state of the art.挖掘药物基因组学文献——技术现状调查。
Brief Bioinform. 2012 Jul;13(4):460-94. doi: 10.1093/bib/bbs018.
5
Fast and scalable neural embedding models for biomedical sentence classification.用于生物医学句子分类的快速可扩展神经嵌入模型。
BMC Bioinformatics. 2018 Dec 22;19(1):541. doi: 10.1186/s12859-018-2496-4.
6
A sentence sliding window approach to extract protein annotations from biomedical articles.一种用于从生物医学文章中提取蛋白质注释的句子滑动窗口方法。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-6-S1-S19. Epub 2005 May 24.
7
BioReader: a text mining tool for performing classification of biomedical literature.BioReader:一种文本挖掘工具,用于对生物医学文献进行分类。
BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):57. doi: 10.1186/s12859-019-2607-x.
8
Extraction of pharmacokinetic evidence of drug-drug interactions from the literature.从文献中提取药物相互作用的药代动力学证据。
PLoS One. 2015 May 11;10(5):e0122199. doi: 10.1371/journal.pone.0122199. eCollection 2015.
9
A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine.一种从生物医学文献中提取用于个性化医疗的药物-基因对的半监督方法。
J Biomed Inform. 2013 Aug;46(4):585-93. doi: 10.1016/j.jbi.2013.04.001. Epub 2013 Apr 6.
10
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

引用本文的文献

1
Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.使用文档-词矩阵和XGBoost对药物性肝损伤进行自动文本分类
Front Artif Intell. 2024 Jun 3;7:1401810. doi: 10.3389/frai.2024.1401810. eCollection 2024.
2
Comparative Efficacy of Metformin and Glimepiride in Modulating Pharmacological Network to Increase BDNF Levels and Benefit Type 2 Diabetes-Related Cognitive Impairment.二甲双胍和格列美脲在调节药理网络以提高脑源性神经营养因子水平及改善2型糖尿病相关认知障碍方面的疗效比较
Biomedicines. 2023 Oct 31;11(11):2939. doi: 10.3390/biomedicines11112939.
3
Prior Knowledge for Predictive Modeling: The Case of Acute Aquatic Toxicity.

本文引用的文献

1
PGxMine: Text mining for curation of PharmGKB.PGxMine:用于 PharmGKB 策管的文本挖掘。
Pac Symp Biocomput. 2020;25:611-622.
2
Global Text Mining and Development of Pharmacogenomic Knowledge Resource for Precision Medicine.用于精准医学的全球文本挖掘与药物基因组学知识资源开发。
Front Pharmacol. 2019 Aug 7;10:839. doi: 10.3389/fphar.2019.00839. eCollection 2019.
3
Calling Star Alleles With Stargazer in 28 Pharmacogenes With Whole Genome Sequences.利用全基因组序列在 28 个药物基因组中调用 Star 等位基因。
预测建模的先验知识:急性水生毒性案例。
J Chem Inf Model. 2022 Sep 12;62(17):4018-4031. doi: 10.1021/acs.jcim.1c01079. Epub 2022 Aug 23.
4
Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration.疾病 2.0:从文本挖掘和数据集成中获取的每周更新的疾病-基因关联数据库。
Database (Oxford). 2022 Mar 28;2022. doi: 10.1093/database/baac019.
Clin Pharmacol Ther. 2019 Dec;106(6):1328-1337. doi: 10.1002/cpt.1552. Epub 2019 Jul 26.
4
PubTator central: automated concept annotation for biomedical full text articles.PubTator 中心:用于生物医学全文文章的自动概念标注。
Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593. doi: 10.1093/nar/gkz389.
5
Pharmacogenomics and big genomic data: from lab to clinic and back again.药物基因组学和大型基因组数据:从实验室到临床,再回到实验室。
Hum Mol Genet. 2018 May 1;27(R1):R72-R78. doi: 10.1093/hmg/ddy116.
6
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.全面且定量地比较了 1500 万篇全文文章及其相应摘要中的文本挖掘。
PLoS Comput Biol. 2018 Feb 15;14(2):e1005962. doi: 10.1371/journal.pcbi.1005962. eCollection 2018 Feb.
7
Text Mining in Biomedical Domain with Emphasis on Document Clustering.生物医学领域中的文本挖掘,重点在于文档聚类
Healthc Inform Res. 2017 Jul;23(3):141-146. doi: 10.4258/hir.2017.23.3.141. Epub 2017 Jul 31.
8
Genome-wide association studies of drug response and toxicity: an opportunity for genome medicine.药物反应与毒性的全基因组关联研究:基因组医学的一个机遇。
Nat Rev Drug Discov. 2017 Jan;16(1):1. doi: 10.1038/nrd.2016.234. Epub 2016 Nov 25.
9
pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.pubmed.mineR:一个带有文本挖掘算法的R包,用于分析PubMed摘要。
J Biosci. 2015 Oct;40(4):671-82. doi: 10.1007/s12038-015-9552-2.
10
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.