• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超级复合物:用于蛋白质相互作用网络中分子复合物检测的有监督机器学习管道。

Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks.

机构信息

Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, Texas, United States of America.

Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas, United States of America.

出版信息

PLoS One. 2021 Dec 31;16(12):e0262056. doi: 10.1371/journal.pone.0262056. eCollection 2021.

DOI:10.1371/journal.pone.0262056
PMID:34972161
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8719692/
Abstract

Characterization of protein complexes, i.e. sets of proteins assembling into a single larger physical entity, is important, as such assemblies play many essential roles in cells such as gene regulation. From networks of protein-protein interactions, potential protein complexes can be identified computationally through the application of community detection methods, which flag groups of entities interacting with each other in certain patterns. Most community detection algorithms tend to be unsupervised and assume that communities are dense network subgraphs, which is not always true, as protein complexes can exhibit diverse network topologies. The few existing supervised machine learning methods are serial and can potentially be improved in terms of accuracy and scalability by using better-suited machine learning models and parallel algorithms. Here, we present Super.Complex, a distributed, supervised AutoML-based pipeline for overlapping community detection in weighted networks. We also propose three new evaluation measures for the outstanding issue of comparing sets of learned and known communities satisfactorily. Super.Complex learns a community fitness function from known communities using an AutoML method and applies this fitness function to detect new communities. A heuristic local search algorithm finds maximally scoring communities, and a parallel implementation can be run on a computer cluster for scaling to large networks. On a yeast protein-interaction network, Super.Complex outperforms 6 other supervised and 4 unsupervised methods. Application of Super.Complex to a human protein-interaction network with ~8k nodes and ~60k edges yields 1,028 protein complexes, with 234 complexes linked to SARS-CoV-2, the COVID-19 virus, with 111 uncharacterized proteins present in 103 learned complexes. Super.Complex is generalizable with the ability to improve results by incorporating domain-specific features. Learned community characteristics can also be transferred from existing applications to detect communities in a new application with no known communities. Code and interactive visualizations of learned human protein complexes are freely available at: https://sites.google.com/view/supercomplex/super-complex-v3-0.

摘要

蛋白质复合物的特性,即组装成单个更大物理实体的蛋白质集合,非常重要,因为这些复合物在细胞中发挥着许多重要作用,如基因调控。从蛋白质-蛋白质相互作用网络中,可以通过应用社区检测方法计算识别潜在的蛋白质复合物,这些方法会标记以特定模式相互作用的实体组。大多数社区检测算法倾向于无监督,并且假设社区是密集的网络子图,但这并不总是正确的,因为蛋白质复合物可以表现出不同的网络拓扑结构。少数现有的监督机器学习方法是串行的,可以通过使用更适合的机器学习模型和并行算法来提高准确性和可扩展性。在这里,我们提出了 Super.Complex,这是一个用于加权网络中重叠社区检测的分布式、有监督的基于 AutoML 的流水线。我们还提出了三个新的评估指标,以解决比较学习社区和已知社区的出色问题。Super.Complex 使用 AutoML 方法从已知社区中学习社区适应度函数,并应用此适应度函数来检测新社区。启发式局部搜索算法找到得分最高的社区,并且可以在计算机集群上运行并行实现以扩展到大型网络。在酵母蛋白质相互作用网络上,Super.Complex 优于其他 6 种监督方法和 4 种无监督方法。将 Super.Complex 应用于包含约 8k 个节点和约 60k 条边的人类蛋白质相互作用网络,产生了 1028 个蛋白质复合物,其中 234 个复合物与 SARS-CoV-2(COVID-19 病毒)有关,103 个学习到的复合物中有 111 个未被表征的蛋白质。Super.Complex 具有通用性,可以通过结合特定于领域的特征来提高结果。还可以将学习到的社区特征从现有应用程序转移到新的无已知社区的应用程序中,以检测社区。学习到的人类蛋白质复合物的代码和交互式可视化可在以下网址免费获得:https://sites.google.com/view/supercomplex/super-complex-v3-0.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/231f27396f8f/pone.0262056.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/166705f4009e/pone.0262056.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/d90a33f5ebfd/pone.0262056.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/0615c93a0b87/pone.0262056.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/939f560e90b1/pone.0262056.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/231f27396f8f/pone.0262056.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/166705f4009e/pone.0262056.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/d90a33f5ebfd/pone.0262056.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/0615c93a0b87/pone.0262056.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/939f560e90b1/pone.0262056.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edf/8719692/231f27396f8f/pone.0262056.g005.jpg

相似文献

1
Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks.超级复合物:用于蛋白质相互作用网络中分子复合物检测的有监督机器学习管道。
PLoS One. 2021 Dec 31;16(12):e0262056. doi: 10.1371/journal.pone.0262056. eCollection 2021.
2
Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks.超级复合体:一种用于蛋白质相互作用网络中分子复合体检测的监督式机器学习管道。
bioRxiv. 2021 Oct 11:2021.06.22.449395. doi: 10.1101/2021.06.22.449395.
3
Molecular complex detection in protein interaction networks through reinforcement learning.通过强化学习在蛋白质相互作用网络中检测分子复合物。
BMC Bioinformatics. 2023 Aug 2;24(1):306. doi: 10.1186/s12859-023-05425-7.
4
Protein Complexes Detection Based on Semi-Supervised Network Embedding Model.基于半监督网络嵌入模型的蛋白质复合物检测。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):797-803. doi: 10.1109/TCBB.2019.2944809. Epub 2021 Apr 8.
5
Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks.基于从蛋白质-蛋白质相互作用网络中获得的节点嵌入来识别蛋白质复合物。
BMC Bioinformatics. 2018 Sep 21;19(1):332. doi: 10.1186/s12859-018-2364-2.
6
Predicting protein complexes using a supervised learning method combined with local structural information.利用监督学习方法结合局部结构信息预测蛋白质复合物。
PLoS One. 2018 Mar 19;13(3):e0194124. doi: 10.1371/journal.pone.0194124. eCollection 2018.
7
A seed-extended algorithm for detecting protein complexes based on density and modularity with topological structure and GO annotations.基于拓扑结构和 GO 注释的密度和模块性的种子扩展算法,用于检测蛋白质复合物。
BMC Genomics. 2019 Aug 7;20(1):637. doi: 10.1186/s12864-019-5956-y.
8
A supervised protein complex prediction method with network representation learning and gene ontology knowledge.基于网络表示学习和基因本体知识的监督蛋白质复合物预测方法。
BMC Bioinformatics. 2022 Jul 25;23(1):300. doi: 10.1186/s12859-022-04850-4.
9
Supervised maximum-likelihood weighting of composite protein networks for complex prediction.用于复杂预测的复合蛋白质网络的监督最大似然加权
BMC Syst Biol. 2012;6 Suppl 2(Suppl 2):S13. doi: 10.1186/1752-0509-6-S2-S13. Epub 2012 Dec 12.
10
From Function to Interaction: A New Paradigm for Accurately Predicting Protein Complexes Based on Protein-to-Protein Interaction Networks.从功能到相互作用:基于蛋白质-蛋白质相互作用网络准确预测蛋白质复合物的新范式。
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):616-27. doi: 10.1109/TCBB.2014.2306825.

引用本文的文献

1
Identification of RMP24 and RMP64, human ribonuclease MRP-specific protein components.鉴定RMP24和RMP64,即人核糖核酸酶MRP特异性蛋白质成分。
Cell Rep. 2025 Jun 24;44(6):115752. doi: 10.1016/j.celrep.2025.115752. Epub 2025 May 24.
2
ClusterONE Web: a tool for discovering and analyzing overlapping protein complexes.ClusterONE网络:一种用于发现和分析重叠蛋白质复合物的工具。
Nucleic Acids Res. 2025 Jul 7;53(W1):W172-W177. doi: 10.1093/nar/gkaf368.
3
Identification of Two Elusive Human Ribonuclease MRP-Specific Protein Components.

本文引用的文献

1
A genome-wide atlas of co-essential modules assigns function to uncharacterized genes.一个全基因组范围的必需共模块图谱为未表征基因赋予功能。
Nat Genet. 2021 May;53(5):638-649. doi: 10.1038/s41588-021-00840-z. Epub 2021 Apr 15.
2
UniProt: the universal protein knowledgebase in 2021.UniProt:2021 年的通用蛋白质知识库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.
3
A SARS-CoV-2 protein interaction map reveals targets for drug repurposing.一种 SARS-CoV-2 蛋白相互作用图谱揭示了药物再利用的靶标。
两种难以捉摸的人核糖核酸酶MRP特异性蛋白质成分的鉴定。
bioRxiv. 2025 Jan 22:2025.01.19.633795. doi: 10.1101/2025.01.19.633795.
4
Mechanism research of elastic fixation promoting fracture healing based on proteomics and fracture microenvironment.基于蛋白质组学和骨折微环境的弹性固定促进骨折愈合的机制研究
Bone Joint Res. 2024 Oct 8;13(10):559-572. doi: 10.1302/2046-3758.1310.BJR-2023-0257.R2.
5
Building, benchmarking, and exploring perturbative maps of transcriptional and morphological data.构建、基准测试和探索转录和形态数据的微扰图。
PLoS Comput Biol. 2024 Oct 1;20(10):e1012463. doi: 10.1371/journal.pcbi.1012463. eCollection 2024 Oct.
6
Heterogeneous network approaches to protein pathway prediction.用于蛋白质通路预测的异构网络方法。
Comput Struct Biotechnol J. 2024 Jun 27;23:2727-2739. doi: 10.1016/j.csbj.2024.06.022. eCollection 2024 Dec.
7
HetFCM: functional co-module discovery by heterogeneous network co-clustering.HetFCM:基于异质网络共聚类的功能共模块发现。
Nucleic Acids Res. 2024 Feb 9;52(3):e16. doi: 10.1093/nar/gkad1174.
8
Molecular complex detection in protein interaction networks through reinforcement learning.通过强化学习在蛋白质相互作用网络中检测分子复合物。
BMC Bioinformatics. 2023 Aug 2;24(1):306. doi: 10.1186/s12859-023-05425-7.
9
Uncharacterized Proteins CxORFx: Subinteractome Analysis and Prognostic Significance in Cancers.未鉴定蛋白 CxORFx:在癌症中的亚相互作用组分析和预后意义。
Int J Mol Sci. 2023 Jun 15;24(12):10190. doi: 10.3390/ijms241210190.
10
Optimisation Models for Pathway Activity Inference in Cancer.癌症通路活性推断的优化模型
Cancers (Basel). 2023 Mar 15;15(6):1787. doi: 10.3390/cancers15061787.
Nature. 2020 Jul;583(7816):459-468. doi: 10.1038/s41586-020-2286-9. Epub 2020 Apr 30.
4
Identifying gene function and module connections by the integration of multispecies expression compendia.通过整合多物种表达编目来识别基因功能和模块连接。
Genome Res. 2019 Dec;29(12):2034-2045. doi: 10.1101/gr.251983.119. Epub 2019 Nov 21.
5
From Louvain to Leiden: guaranteeing well-connected communities.从鲁汶到莱顿:保障互联互通的社区。
Sci Rep. 2019 Mar 26;9(1):5233. doi: 10.1038/s41598-019-41695-z.
6
CORUM: the comprehensive resource of mammalian protein complexes-2019.CORUM:哺乳动物蛋白质复合物综合资源-2019 年版。
Nucleic Acids Res. 2019 Jan 8;47(D1):D559-D563. doi: 10.1093/nar/gky973.
7
Predicting protein complexes using a supervised learning method combined with local structural information.利用监督学习方法结合局部结构信息预测蛋白质复合物。
PLoS One. 2018 Mar 19;13(3):e0194124. doi: 10.1371/journal.pone.0194124. eCollection 2018.
8
Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes.整合9000多个质谱实验构建了人类蛋白质复合物的全局图谱。
Mol Syst Biol. 2017 Jun 8;13(6):932. doi: 10.15252/msb.20167490.
9
A subcellular map of the human proteome.人类蛋白质组的亚细胞图谱。
Science. 2017 May 26;356(6340). doi: 10.1126/science.aal3321. Epub 2017 May 11.
10
Predicting protein complex in protein interaction network - a supervised learning based method.蛋白质相互作用网络中蛋白质复合物的预测——一种基于监督学习的方法。
BMC Syst Biol. 2014;8 Suppl 3(Suppl 3):S4. doi: 10.1186/1752-0509-8-S3-S4. Epub 2014 Oct 22.