Suppr超能文献

利用纳米随机森林挖掘定量蛋白质组学数据中的蛋白质复合物及其关系。

Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data.

作者信息

Montaño-Gutierrez Luis F, Ohta Shinya, Kustatscher Georg, Earnshaw William C, Rappsilber Juri

机构信息

Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom.

Center for Innovative and Translational Medicine, Medical School, Kochi University, Kochi 783-8505, Japan.

出版信息

Mol Biol Cell. 2017 Mar 1;28(5):673-680. doi: 10.1091/mbc.E16-06-0370. Epub 2017 Jan 5.

Abstract

Ever-increasing numbers of quantitative proteomics data sets constitute an underexploited resource for investigating protein function. Multiprotein complexes often follow consistent trends in these experiments, which could provide insights about their biology. Yet, as more experiments are considered, a complex's signature may become conditional and less identifiable. Previously we successfully distinguished the general proteomic signature of genuine chromosomal proteins from hitchhikers using the Random Forests (RF) machine learning algorithm. Here we test whether small protein complexes can define distinguishable signatures of their own, despite the assumption that machine learning needs large training sets. We show, with simulated and real proteomics data, that RF can detect small protein complexes and relationships between them. We identify several complexes in quantitative proteomics results of wild-type and knockout mitotic chromosomes. Other proteins covary strongly with these complexes, suggesting novel functional links for later study. Integrating the RF analysis for several complexes reveals known interdependences among kinetochore subunits and a novel dependence between the inner kinetochore and condensin. Ribosomal proteins, although identified, remained independent of kinetochore subcomplexes. Together these results show that this complex-oriented RF (NanoRF) approach can integrate proteomics data to uncover subtle protein relationships. Our NanoRF pipeline is available online.

摘要

越来越多的定量蛋白质组学数据集构成了一个尚未得到充分利用的蛋白质功能研究资源。在这些实验中,多蛋白复合物往往呈现出一致的趋势,这可能为其生物学特性提供见解。然而,随着考虑的实验越来越多,一个复合物的特征可能会变得具有条件性且难以识别。此前,我们使用随机森林(RF)机器学习算法成功地将真正的染色体蛋白与搭便车蛋白的一般蛋白质组学特征区分开来。在这里,尽管机器学习需要大量训练集,但我们测试小蛋白复合物是否能定义其自身可区分的特征。我们通过模拟和真实的蛋白质组学数据表明,RF可以检测小蛋白复合物及其之间的关系。我们在野生型和敲除有丝分裂染色体的定量蛋白质组学结果中鉴定出了几种复合物。其他蛋白质与这些复合物强烈共变,这为后续研究提示了新的功能联系。对几种复合物进行RF分析,揭示了动粒亚基之间已知的相互依赖性以及内动粒与凝聚素之间的新依赖性。核糖体蛋白虽然被鉴定出来,但仍独立于动粒亚复合物。这些结果共同表明,这种面向复合物的RF(NanoRF)方法可以整合蛋白质组学数据以揭示微妙的蛋白质关系。我们的NanoRF流程可在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f30a/5328625/a640edb2fd43/673fig2.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验