• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于语义相似性的蛋白质-蛋白质相互作用预测方法:以 P53 相互作用激酶为例的评估。

A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases.

机构信息

Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

The Laboratory for Molecular Informatics and Data Sciences, Department of Pharmaceutical Sciences and the BRITE Institute, College of Health and Sciences, North Carolina Central University, Durham, NC 27707, USA.

出版信息

J Biomed Inform. 2020 Nov;111:103579. doi: 10.1016/j.jbi.2020.103579. Epub 2020 Sep 30.

DOI:10.1016/j.jbi.2020.103579
PMID:33007449
Abstract

Biomedical literature contains unstructured, rich information regarding proteins, ligands, diseases as well as biological pathways in which they are involved. Systematically analyzing such textual corpus has the potential for biomedical discovery of new protein-protein interactions and hidden drug indications. For this purpose, we have investigated a methodology that is based on a well-established text mining tool, Word2Vec, for the analysis of PubMed full text articles to derive word embeddings, and the use of a simple semantic similarity comparison either by itself or in conjunction with k-Nearest Neighbor (kNN) technique for the prediction of new relationships. To test this methodology, three lines of retrospective analyses of a dataset with known P53-interacting proteins have been conducted. First, we demonstrated that Word2Vec semantic similarity can infer functional relatedness among all kinases known to interact with P53. Second, in a series of time-split experiments, we demonstrated that both a simple similarity comparison and kNN models built with papers published up to a certain year were able to discover P53 interactors described in later publications. Third, in a different scenario of time-split experiments, we examined the predictions of P53-interacting proteins based on the kNN models built on data prior to a certain split year for different time ranges past that year, and found that the cumulative number of correct predictions was indeed increasing with time. We conclude that text mining of research papers in the PubMed literature based on Word2Vec analysis followed by a simple similarity comparison or kNN modeling affords excellent predictions of protein-protein interactions between P53 and kinases, and should have wide applications in translational biomedical studies such as repurposing of existing drugs, drug-drug interaction, and elucidation of mechanisms of action for drugs.

摘要

生物医学文献中包含有关蛋白质、配体、疾病以及它们所涉及的生物途径的非结构化、丰富信息。系统地分析这样的文本语料库有可能发现新的蛋白质-蛋白质相互作用和隐藏的药物适应症。为此,我们研究了一种基于文本挖掘工具 Word2Vec 的方法,用于分析 PubMed 全文文章以得出单词嵌入,并使用简单的语义相似性比较(单独使用或与 k-最近邻 (kNN) 技术结合使用)来预测新的关系。为了测试这种方法,我们对具有已知 P53 相互作用蛋白的数据集进行了三行回顾性分析。首先,我们证明了 Word2Vec 语义相似性可以推断出所有已知与 P53 相互作用的激酶之间的功能相关性。其次,在一系列时间分割实验中,我们证明了简单的相似性比较和基于截止到某一年出版的论文构建的 kNN 模型都能够发现以后发表的 P53 相互作用蛋白。第三,在不同的时间分割实验场景中,我们根据截止到某一年的分割年之前的数据构建的 kNN 模型,检查了基于 kNN 模型对 P53 相互作用蛋白的预测,对于该年之后的不同时间范围,发现正确预测的累积数量确实随着时间的推移而增加。我们得出结论,基于 Word2Vec 分析的 PubMed 文献中的研究论文的文本挖掘,然后进行简单的相似性比较或 kNN 建模,可以很好地预测 P53 与激酶之间的蛋白质-蛋白质相互作用,并且应该在转化生物医学研究中具有广泛的应用,例如现有药物的重新利用、药物-药物相互作用以及药物作用机制的阐明。

相似文献

1
A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases.一种基于语义相似性的蛋白质-蛋白质相互作用预测方法:以 P53 相互作用激酶为例的评估。
J Biomed Inform. 2020 Nov;111:103579. doi: 10.1016/j.jbi.2020.103579. Epub 2020 Sep 30.
2
Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec.生物医学术语的语义相关性和相似性:研究生物医学出版物的时效性、篇幅大小和章节对word2vec性能的影响。
BMC Med Inform Decis Mak. 2017 Jul 3;17(1):95. doi: 10.1186/s12911-017-0498-1.
3
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。
PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.
4
Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.用于单词、短语和文本的无监督低维向量表示,具有透明性、可扩展性,并能产生与神经嵌入不冗余的相似性度量。
J Biomed Inform. 2019 Feb;90:103096. doi: 10.1016/j.jbi.2019.103096. Epub 2019 Jan 14.
5
Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。
Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.
6
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
7
BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量:在大规模上创建和评估基于文献的生物医学概念嵌入。
PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.
8
Large scale biomedical texts classification: a kNN and an ESA-based approaches.大规模生物医学文本分类:基于k近邻算法和基于词嵌入语义分析的方法。
J Biomed Semantics. 2016 Jun 16;7:40. doi: 10.1186/s13326-016-0073-1.
9
In the pursuit of a semantic similarity metric based on UMLS annotations for articles in PubMed Central Open Access.在为美国国立医学图书馆医学主题词表(UMLS)注释的基于PubMed Central开放获取文章的语义相似性度量标准的研究中。
J Biomed Inform. 2015 Oct;57:204-18. doi: 10.1016/j.jbi.2015.07.015. Epub 2015 Aug 1.
10
Literature-Wide Association Studies (LWAS) for a Rare Disease: Drug Repurposing for Inflammatory Breast Cancer.针对罕见病的全文学术关联研究:炎性乳腺癌的药物再利用。
Molecules. 2020 Aug 28;25(17):3933. doi: 10.3390/molecules25173933.

引用本文的文献

1
Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein-protein interactions.预训练蛋白质语言模型为拟南芥蛋白质-蛋白质相互作用的预测带来新曙光。
Plant Methods. 2023 Dec 7;19(1):141. doi: 10.1186/s13007-023-01119-6.