• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从异构生物数据中鉴定蛋白质复合物。

Identifying protein complexes from heterogeneous biological data.

机构信息

School of Computer Engineering, Nanyang Technological University, Singapore.

出版信息

Proteins. 2013 Nov;81(11):2023-33. doi: 10.1002/prot.24365. Epub 2013 Aug 23.

DOI:10.1002/prot.24365
PMID:23852772
Abstract

With the increasing availability of diverse biological information for proteins, integration of heterogeneous data becomes more useful for many problems in proteomics, such as annotating protein functions, predicting novel protein-protein interactions and so on. In this paper, we present an integrative approach called InteHC (Integrative Hierarchical Clustering) to identify protein complexes from multiple data sources. Although integrating multiple sources could effectively improve the coverage of current insufficient protein interactome (the false negative issue), it could also introduce potential false-positive interactions that could hurt the performance of protein complex prediction. Our proposed InteHC method can effectively address these issues to facilitate accurate protein complex prediction and it is summarized into the following three steps. First, for each individual source/feature, InteHC computes the matrices to store the affinity scores between a protein pair that indicate their propensity to interact or co-complex relationship. Second, InteHC computes a final score matrix, which is the weighted sum of affinity scores from individual sources. In particular, the weights indicating the reliability of individual sources are learned from a supervised model (i.e., a linear ranking SVM). Finally, a hierarchical clustering algorithm is performed on the final score matrix to generate clusters as predicted protein complexes. In our experiments, we compared the results collected by our hierarchical clustering on each individual feature with those predicted by InteHC on the combined matrix. We observed that integration of heterogeneous data significantly benefits the identification of protein complexes. Moreover, a comprehensive comparison demonstrates that InteHC performs much better than 14 state-of-the-art approaches. All the experimental data and results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/InteHC.

摘要

随着越来越多的蛋白质生物信息的出现,将异构数据集成对于蛋白质组学中的许多问题变得更加有用,例如注释蛋白质功能、预测新的蛋白质-蛋白质相互作用等。在本文中,我们提出了一种称为 InteHC(集成层次聚类)的综合方法,用于从多个数据源中识别蛋白质复合物。虽然整合多个来源可以有效地提高当前不足的蛋白质互作组的覆盖范围(假阴性问题),但它也可能引入潜在的假阳性相互作用,从而影响蛋白质复合物预测的性能。我们提出的 InteHC 方法可以有效地解决这些问题,有助于准确预测蛋白质复合物,它可以总结为以下三个步骤。首先,对于每个单独的来源/特征,InteHC 计算矩阵以存储蛋白质对之间的亲和度得分,这些得分表明它们相互作用或共同复合物关系的倾向。其次,InteHC 计算最终得分矩阵,这是来自各个来源的亲和度得分的加权和。特别是,指示各个来源可靠性的权重是从有监督模型(即线性排序 SVM)中学习到的。最后,在最终得分矩阵上执行层次聚类算法,以生成预测的蛋白质复合物簇。在我们的实验中,我们比较了层次聚类在每个单独特征上收集的结果与 InteHC 在组合矩阵上预测的结果。我们观察到异构数据的集成显著有利于蛋白质复合物的识别。此外,全面比较表明 InteHC 比 14 种最先进的方法表现要好得多。所有实验数据和结果都可以从 http://www.ntu.edu.sg/home/zhengjie/data/InteHC 下载。

相似文献

1
Identifying protein complexes from heterogeneous biological data.从异构生物数据中鉴定蛋白质复合物。
Proteins. 2013 Nov;81(11):2023-33. doi: 10.1002/prot.24365. Epub 2013 Aug 23.
2
Protein Complex Detection via Effective Integration of Base Clustering Solutions and Co-Complex Affinity Scores.通过有效整合基本聚类解决方案和共复合体亲和力得分进行蛋白质复合体检测
IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):733-739. doi: 10.1109/TCBB.2016.2552176. Epub 2016 Apr 8.
3
Evaluation of clustering algorithms for protein complex and protein interaction network assembly.用于蛋白质复合物和蛋白质相互作用网络组装的聚类算法评估。
J Proteome Res. 2009 Jun;8(6):2944-52. doi: 10.1021/pr900073d.
4
A two-layer integration framework for protein complex detection.一种用于蛋白质复合物检测的双层集成框架。
BMC Bioinformatics. 2016 Feb 24;17:100. doi: 10.1186/s12859-016-0939-3.
5
Complex discovery from weighted PPI networks.基于加权 PPI 网络的复杂发现。
Bioinformatics. 2009 Aug 1;25(15):1891-7. doi: 10.1093/bioinformatics/btp311. Epub 2009 May 12.
6
Predicting co-complexed protein pairs from heterogeneous data.从异构数据中预测共复合蛋白质对。
PLoS Comput Biol. 2008 Apr 18;4(4):e1000054. doi: 10.1371/journal.pcbi.1000054.
7
Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure.基于异构数据的支持向量机学习:使用蛋白质序列和结构的实证分析
Bioinformatics. 2006 Nov 15;22(22):2753-60. doi: 10.1093/bioinformatics/btl475. Epub 2006 Sep 11.
8
A discriminative approach for identifying domain-domain interactions from protein-protein interactions.一种从蛋白质相互作用中识别结构域-结构域相互作用的判别方法。
Proteins. 2010 Apr;78(5):1243-53. doi: 10.1002/prot.22643.
9
Protein complex prediction based on simultaneous protein interaction network.基于蛋白质相互作用网络的蛋白质复合物预测。
Bioinformatics. 2010 Feb 1;26(3):385-91. doi: 10.1093/bioinformatics/btp668. Epub 2009 Dec 4.
10
A degree-distribution based hierarchical agglomerative clustering algorithm for protein complexes identification.基于度分布的层次凝聚聚类算法用于蛋白质复合物识别。
Comput Biol Chem. 2011 Oct 12;35(5):298-307. doi: 10.1016/j.compbiolchem.2011.07.005. Epub 2011 Jul 20.

引用本文的文献

1
DPCT: A Dynamic Method for Detecting Protein Complexes From TAP-Aware Weighted PPI Network.DPCT:一种从TAP感知加权蛋白质-蛋白质相互作用网络中检测蛋白质复合物的动态方法。
Front Genet. 2020 Jun 26;11:567. doi: 10.3389/fgene.2020.00567. eCollection 2020.
2
Protein complex detection based on flower pollination mechanism in multi-relation reconstructed dynamic protein networks.基于多关系重构动态蛋白质网络中花授粉机制的蛋白质复合物检测。
BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):131. doi: 10.1186/s12859-019-2649-0.
3
A multi-network clustering method for detecting protein complexes from multiple heterogeneous networks.
一种用于从多个异构网络中检测蛋白质复合物的多网络聚类方法。
BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):463. doi: 10.1186/s12859-017-1877-4.
4
Protein complex detection based on partially shared multi-view clustering.基于部分共享多视图聚类的蛋白质复合物检测
BMC Bioinformatics. 2016 Sep 13;17(1):371. doi: 10.1186/s12859-016-1164-9.
5
A two-layer integration framework for protein complex detection.一种用于蛋白质复合物检测的双层集成框架。
BMC Bioinformatics. 2016 Feb 24;17:100. doi: 10.1186/s12859-016-0939-3.
6
A density-based clustering approach for identifying overlapping protein complexes with functional preferences.一种基于密度的聚类方法,用于识别具有功能偏好的重叠蛋白质复合物。
BMC Bioinformatics. 2015 May 27;16:174. doi: 10.1186/s12859-015-0583-3.
7
A least square method based model for identifying protein complexes in protein-protein interaction network.一种基于最小二乘法的蛋白质-蛋白质相互作用网络中蛋白质复合物识别模型。
Biomed Res Int. 2014;2014:720960. doi: 10.1155/2014/720960. Epub 2014 Oct 23.
8
A novel algorithm for detecting protein complexes with the breadth first search.一种用于通过广度优先搜索检测蛋白质复合物的新算法。
Biomed Res Int. 2014;2014:354539. doi: 10.1155/2014/354539. Epub 2014 Apr 10.