Suppr超能文献

一种从文献中检索药物基因组学关联的新型文本挖掘方法。

A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature.

作者信息

Pandi Maria-Theodora, van der Spek Peter J, Koromina Maria, Patrinos George P

机构信息

Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece.

Erasmus University Medical Center, Faculty of Medicine and Health Sciences, Department of Pathology, Bioinformatics Unit, Rotterdam, Netherlands.

出版信息

Front Pharmacol. 2020 Nov 10;11:602030. doi: 10.3389/fphar.2020.602030. eCollection 2020.

Abstract

Text mining in biomedical literature is an emerging field which has already been shown to have a variety of implementations in many research areas, including genetics, personalized medicine, and pharmacogenomics. In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations. The code that was used toward this end was implemented using R programming language, either through custom scripts, where needed, or through utilizing functions from existing libraries. Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central. Terms that denote a Mutation or a Gene as well as Chemical compound terms corresponding to drug compounds were normalized and the sentences containing the aforementioned terms were filtered and preprocessed to create appropriate training sets. Finally, after training and adequate hyperparameter tuning, four text classifiers were created and evaluated (FastText, Linear kernel SVMs, XGBoost, Lasso, and Elastic-Net Regularized Generalized Linear Models) with regard to their performance in identifying pharmacogenomics associations. Although further improvements are essential toward proper implementation of this text-mining approach in the clinical practice, our study stands as a comprehensive, simplified, and up-to-date approach for the identification and assessment of research articles enriched in clinically relevant pharmacogenomics relationships. Furthermore, this work highlights a series of challenges concerning the effective application of text mining in biomedical literature, whose resolution could substantially contribute to the further development of this field.

摘要

生物医学文献中的文本挖掘是一个新兴领域,已被证明在许多研究领域有多种应用,包括遗传学、个性化医学和药物基因组学。在本研究中,我们描述了一种用于提取药物基因组学关联的新型文本挖掘方法。为此使用的代码是用R编程语言实现的,必要时通过自定义脚本,或利用现有库中的函数。从PubMed中提取与指定查询对应的文章(摘要或全文),同时通过PubTator Central获得概念注释。对表示突变或基因的术语以及与药物化合物对应的化学化合物术语进行规范化处理,并对包含上述术语的句子进行过滤和预处理,以创建适当的训练集。最后,经过训练和适当的超参数调整,创建并评估了四个文本分类器(FastText、线性核支持向量机、XGBoost、套索回归和弹性网络正则化广义线性模型)在识别药物基因组学关联方面的性能。尽管要在临床实践中正确实施这种文本挖掘方法还需要进一步改进,但我们的研究是一种全面、简化且最新的方法,用于识别和评估富含临床相关药物基因组学关系的研究文章。此外,这项工作突出了在生物医学文献中有效应用文本挖掘所面临的一系列挑战,解决这些挑战可能会极大地促进该领域的进一步发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68db/7748107/28eab589d2f9/fphar-11-602030-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验