Suppr超能文献

PubMed规模的化学概念嵌入重构物理蛋白质相互作用网络。

PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks.

作者信息

Škrlj Blaž, Kokalj Enja, Lavrač Nada

机构信息

Jožef Stefan International Postgraduate School, Ljubljana, Slovenia.

Jožef Stefan Institute, Ljubljana, Slovenia.

出版信息

Front Res Metr Anal. 2021 Apr 13;6:644614. doi: 10.3389/frma.2021.644614. eCollection 2021.

Abstract

PubMed is the largest resource of curated biomedical knowledge to date, entailing more than 25 million documents. Large quantities of novel literature prevent a single expert from keeping track of all potentially relevant papers, resulting in knowledge gaps. In this article, we present CHEMMESHNET, a newly developed PubMed-based network comprising more than 10,000,000 associations, constructed from expert-curated MeSH annotations of chemicals based on all currently available PubMed articles. By learning latent representations of concepts in the obtained network, we demonstrate in a proof of concept study that purely literature-based representations are sufficient for the reconstruction of a large part of the currently known network of physical, empirically determined protein-protein interactions. We demonstrate that simple linear embeddings of node pairs, when coupled with a neural network-based classifier, reliably reconstruct the existing collection of empirically confirmed protein-protein interactions. Furthermore, we demonstrate how pairs of learned representations can be used to prioritize potentially interesting novel interactions based on the common chemical context. Highly ranked interactions are qualitatively inspected in terms of potential complex formation at the structural level and represent potentially interesting new knowledge. We demonstrate that two protein-protein interactions, prioritized by structure-based approaches, also emerge as probable with regard to the trained machine-learning model.

摘要

PubMed是迄今为止最大的经过整理的生物医学知识资源库,包含超过2500万篇文献。大量的新文献使得单个专家难以追踪所有潜在相关论文,从而导致知识缺口。在本文中,我们介绍了CHEMMESHNET,这是一个新开发的基于PubMed的网络,包含超过1000万个关联,它是根据基于所有现有PubMed文章的化学物质专家策划的MeSH注释构建的。通过学习所获得网络中概念的潜在表示,我们在概念验证研究中证明,纯粹基于文献的表示足以重建目前已知的很大一部分物理上经实验确定的蛋白质-蛋白质相互作用网络。我们证明,当与基于神经网络的分类器结合时,节点对的简单线性嵌入能够可靠地重建现有的经实验证实的蛋白质-蛋白质相互作用集合。此外,我们展示了如何基于共同的化学背景,利用学习到的表示对来对潜在有趣的新相互作用进行优先级排序。对排名靠前的相互作用在结构层面上的潜在复合物形成方面进行定性检查,它们代表了潜在有趣的新知识。我们证明,通过基于结构的方法确定优先级的两种蛋白质-蛋白质相互作用,对于经过训练的机器学习模型来说也可能出现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/f911879061b8/frma-06-644614-g001.jpg

相似文献

7
Contrasting Multi-Source Temporal Knowledge Graphs for Biomedical Hypothesis Generation.用于生物医学假设生成的多源时态知识图谱对比
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):2102-2112. doi: 10.1109/TCBB.2024.3451051. Epub 2024 Dec 10.

本文引用的文献

1
A systematic review on literature-based discovery workflow.基于文献的发现工作流程的系统综述。
PeerJ Comput Sci. 2019 Nov 18;5:e235. doi: 10.7717/peerj-cs.235. eCollection 2019.
3
Embedding-based Silhouette community detection.基于嵌入的轮廓社区检测。
Mach Learn. 2020;109(11):2161-2193. doi: 10.1007/s10994-020-05882-8. Epub 2020 Jul 27.
5
Literature-based review of the drugs used for the treatment of COVID-19.基于文献的新型冠状病毒肺炎治疗用药综述。
Curr Med Res Pract. 2020 May-Jun;10(3):100-109. doi: 10.1016/j.cmrp.2020.05.013. Epub 2020 Jun 18.
6
Neural networks for open and closed Literature-based Discovery.基于文献的开放式和封闭式发现的神经网络。
PLoS One. 2020 May 15;15(5):e0232891. doi: 10.1371/journal.pone.0232891. eCollection 2020.
7
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验