• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PubMed规模的化学概念嵌入重构物理蛋白质相互作用网络。

PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks.

作者信息

Škrlj Blaž, Kokalj Enja, Lavrač Nada

机构信息

Jožef Stefan International Postgraduate School, Ljubljana, Slovenia.

Jožef Stefan Institute, Ljubljana, Slovenia.

出版信息

Front Res Metr Anal. 2021 Apr 13;6:644614. doi: 10.3389/frma.2021.644614. eCollection 2021.

DOI:10.3389/frma.2021.644614
PMID:33928210
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8076635/
Abstract

PubMed is the largest resource of curated biomedical knowledge to date, entailing more than 25 million documents. Large quantities of novel literature prevent a single expert from keeping track of all potentially relevant papers, resulting in knowledge gaps. In this article, we present CHEMMESHNET, a newly developed PubMed-based network comprising more than 10,000,000 associations, constructed from expert-curated MeSH annotations of chemicals based on all currently available PubMed articles. By learning latent representations of concepts in the obtained network, we demonstrate in a proof of concept study that purely literature-based representations are sufficient for the reconstruction of a large part of the currently known network of physical, empirically determined protein-protein interactions. We demonstrate that simple linear embeddings of node pairs, when coupled with a neural network-based classifier, reliably reconstruct the existing collection of empirically confirmed protein-protein interactions. Furthermore, we demonstrate how pairs of learned representations can be used to prioritize potentially interesting novel interactions based on the common chemical context. Highly ranked interactions are qualitatively inspected in terms of potential complex formation at the structural level and represent potentially interesting new knowledge. We demonstrate that two protein-protein interactions, prioritized by structure-based approaches, also emerge as probable with regard to the trained machine-learning model.

摘要

PubMed是迄今为止最大的经过整理的生物医学知识资源库,包含超过2500万篇文献。大量的新文献使得单个专家难以追踪所有潜在相关论文,从而导致知识缺口。在本文中,我们介绍了CHEMMESHNET,这是一个新开发的基于PubMed的网络,包含超过1000万个关联,它是根据基于所有现有PubMed文章的化学物质专家策划的MeSH注释构建的。通过学习所获得网络中概念的潜在表示,我们在概念验证研究中证明,纯粹基于文献的表示足以重建目前已知的很大一部分物理上经实验确定的蛋白质-蛋白质相互作用网络。我们证明,当与基于神经网络的分类器结合时,节点对的简单线性嵌入能够可靠地重建现有的经实验证实的蛋白质-蛋白质相互作用集合。此外,我们展示了如何基于共同的化学背景,利用学习到的表示对来对潜在有趣的新相互作用进行优先级排序。对排名靠前的相互作用在结构层面上的潜在复合物形成方面进行定性检查,它们代表了潜在有趣的新知识。我们证明,通过基于结构的方法确定优先级的两种蛋白质-蛋白质相互作用,对于经过训练的机器学习模型来说也可能出现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/21570f83b477/frma-06-644614-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/f911879061b8/frma-06-644614-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/61c9f2e1f6ca/frma-06-644614-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/24fcc1fb6147/frma-06-644614-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/c1efdabdce8a/frma-06-644614-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/a8a0a7275e62/frma-06-644614-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/7581645896c0/frma-06-644614-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/21570f83b477/frma-06-644614-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/f911879061b8/frma-06-644614-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/61c9f2e1f6ca/frma-06-644614-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/24fcc1fb6147/frma-06-644614-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/c1efdabdce8a/frma-06-644614-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/a8a0a7275e62/frma-06-644614-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/7581645896c0/frma-06-644614-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1545/8076635/21570f83b477/frma-06-644614-g007.jpg

相似文献

1
PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks.PubMed规模的化学概念嵌入重构物理蛋白质相互作用网络。
Front Res Metr Anal. 2021 Apr 13;6:644614. doi: 10.3389/frma.2021.644614. eCollection 2021.
2
BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量:在大规模上创建和评估基于文献的生物医学概念嵌入。
PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.
3
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。
PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.
4
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
5
Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches.神经网络在真实生物医学图中的链接预测:基于图嵌入方法的多维评估。
BMC Bioinformatics. 2018 May 21;19(1):176. doi: 10.1186/s12859-018-2163-9.
6
CGINet: graph convolutional network-based model for identifying chemical-gene interaction in an integrated multi-relational graph.CGINet:基于图卷积网络的模型,用于在集成的多关系图中识别化学-基因相互作用。
BMC Bioinformatics. 2020 Nov 26;21(1):544. doi: 10.1186/s12859-020-03899-3.
7
Contrasting Multi-Source Temporal Knowledge Graphs for Biomedical Hypothesis Generation.用于生物医学假设生成的多源时态知识图谱对比
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):2102-2112. doi: 10.1109/TCBB.2024.3451051. Epub 2024 Dec 10.
8
Survey on graph embeddings and their applications to machine learning problems on graphs.关于图嵌入及其在图上机器学习问题中的应用的综述。
PeerJ Comput Sci. 2021 Feb 4;7:e357. doi: 10.7717/peerj-cs.357. eCollection 2021.
9
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
10
SKIMMR: facilitating knowledge discovery in life sciences by machine-aided skim reading.SKIMMR:通过机器辅助浏览促进生命科学领域的知识发现。
PeerJ. 2014 Jul 22;2:e483. doi: 10.7717/peerj.483. eCollection 2014.

引用本文的文献

1
Predicting implicit concept embeddings for singular relationship discovery replication of closed literature-based discovery.预测隐式概念嵌入以进行基于封闭文献发现的奇异关系发现复制。
Front Res Metr Anal. 2025 Mar 5;10:1509502. doi: 10.3389/frma.2025.1509502. eCollection 2025.

本文引用的文献

1
A systematic review on literature-based discovery workflow.基于文献的发现工作流程的系统综述。
PeerJ Comput Sci. 2019 Nov 18;5:e235. doi: 10.7717/peerj-cs.235. eCollection 2019.
2
The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets.2021 年的 STRING 数据库:可定制的蛋白质-蛋白质网络,以及用户上传的基因/测量集的功能特征分析。
Nucleic Acids Res. 2021 Jan 8;49(D1):D605-D612. doi: 10.1093/nar/gkaa1074.
3
Embedding-based Silhouette community detection.基于嵌入的轮廓社区检测。
Mach Learn. 2020;109(11):2161-2193. doi: 10.1007/s10994-020-05882-8. Epub 2020 Jul 27.
4
The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions.The BioGRID 数据库:一个经过精心整理的生物医学资源,包含蛋白质、遗传和化学相互作用。
Protein Sci. 2021 Jan;30(1):187-200. doi: 10.1002/pro.3978. Epub 2020 Nov 23.
5
Literature-based review of the drugs used for the treatment of COVID-19.基于文献的新型冠状病毒肺炎治疗用药综述。
Curr Med Res Pract. 2020 May-Jun;10(3):100-109. doi: 10.1016/j.cmrp.2020.05.013. Epub 2020 Jun 18.
6
Neural networks for open and closed Literature-based Discovery.基于文献的开放式和封闭式发现的神经网络。
PLoS One. 2020 May 15;15(5):e0232891. doi: 10.1371/journal.pone.0232891. eCollection 2020.
7
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
8
Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。
Nucleic Acids Res. 2020 Jan 8;48(D1):D9-D16. doi: 10.1093/nar/gkz899.
9
Unsupervised word embeddings capture latent knowledge from materials science literature.无监督词嵌入方法可以从材料科学文献中提取潜在知识。
Nature. 2019 Jul;571(7763):95-98. doi: 10.1038/s41586-019-1335-8. Epub 2019 Jul 3.
10
The EMBL-EBI search and sequence analysis tools APIs in 2019.2019 年的 EMBL-EBI 搜索和序列分析工具 API。
Nucleic Acids Res. 2019 Jul 2;47(W1):W636-W641. doi: 10.1093/nar/gkz268.