Suppr超能文献

一种基于语义相似性的蛋白质-蛋白质相互作用预测方法:以 P53 相互作用激酶为例的评估。

A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases.

机构信息

Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

The Laboratory for Molecular Informatics and Data Sciences, Department of Pharmaceutical Sciences and the BRITE Institute, College of Health and Sciences, North Carolina Central University, Durham, NC 27707, USA.

出版信息

J Biomed Inform. 2020 Nov;111:103579. doi: 10.1016/j.jbi.2020.103579. Epub 2020 Sep 30.

Abstract

Biomedical literature contains unstructured, rich information regarding proteins, ligands, diseases as well as biological pathways in which they are involved. Systematically analyzing such textual corpus has the potential for biomedical discovery of new protein-protein interactions and hidden drug indications. For this purpose, we have investigated a methodology that is based on a well-established text mining tool, Word2Vec, for the analysis of PubMed full text articles to derive word embeddings, and the use of a simple semantic similarity comparison either by itself or in conjunction with k-Nearest Neighbor (kNN) technique for the prediction of new relationships. To test this methodology, three lines of retrospective analyses of a dataset with known P53-interacting proteins have been conducted. First, we demonstrated that Word2Vec semantic similarity can infer functional relatedness among all kinases known to interact with P53. Second, in a series of time-split experiments, we demonstrated that both a simple similarity comparison and kNN models built with papers published up to a certain year were able to discover P53 interactors described in later publications. Third, in a different scenario of time-split experiments, we examined the predictions of P53-interacting proteins based on the kNN models built on data prior to a certain split year for different time ranges past that year, and found that the cumulative number of correct predictions was indeed increasing with time. We conclude that text mining of research papers in the PubMed literature based on Word2Vec analysis followed by a simple similarity comparison or kNN modeling affords excellent predictions of protein-protein interactions between P53 and kinases, and should have wide applications in translational biomedical studies such as repurposing of existing drugs, drug-drug interaction, and elucidation of mechanisms of action for drugs.

摘要

生物医学文献中包含有关蛋白质、配体、疾病以及它们所涉及的生物途径的非结构化、丰富信息。系统地分析这样的文本语料库有可能发现新的蛋白质-蛋白质相互作用和隐藏的药物适应症。为此,我们研究了一种基于文本挖掘工具 Word2Vec 的方法,用于分析 PubMed 全文文章以得出单词嵌入,并使用简单的语义相似性比较(单独使用或与 k-最近邻 (kNN) 技术结合使用)来预测新的关系。为了测试这种方法,我们对具有已知 P53 相互作用蛋白的数据集进行了三行回顾性分析。首先,我们证明了 Word2Vec 语义相似性可以推断出所有已知与 P53 相互作用的激酶之间的功能相关性。其次,在一系列时间分割实验中,我们证明了简单的相似性比较和基于截止到某一年出版的论文构建的 kNN 模型都能够发现以后发表的 P53 相互作用蛋白。第三,在不同的时间分割实验场景中,我们根据截止到某一年的分割年之前的数据构建的 kNN 模型,检查了基于 kNN 模型对 P53 相互作用蛋白的预测,对于该年之后的不同时间范围,发现正确预测的累积数量确实随着时间的推移而增加。我们得出结论,基于 Word2Vec 分析的 PubMed 文献中的研究论文的文本挖掘,然后进行简单的相似性比较或 kNN 建模,可以很好地预测 P53 与激酶之间的蛋白质-蛋白质相互作用,并且应该在转化生物医学研究中具有广泛的应用,例如现有药物的重新利用、药物-药物相互作用以及药物作用机制的阐明。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验