Suppr超能文献

一种用于同时进行蛋白质亚细胞定位分配和新颖性检测的半监督贝叶斯方法。

A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.

机构信息

Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK.

MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK.

出版信息

PLoS Comput Biol. 2020 Nov 9;16(11):e1008288. doi: 10.1371/journal.pcbi.1008288. eCollection 2020 Nov.

Abstract

The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches.

摘要

细胞被分隔成复杂的微环境,允许一系列专门的生物过程同步进行。因此,确定蛋白质在这些隔室之一或多个中的亚细胞定位可以作为确定其功能的第一步。现在,基于高通量和高精度质谱的亚细胞蛋白质组学方法可以一次揭示数千种蛋白质的定位。然后通常使用机器学习算法来进行蛋白质-细胞器分配。然而,这些算法受到不足和不完整注释的限制。我们提出了一种半监督贝叶斯方法来进行新颖性检测,从而可以发现以前未注释的其他亚细胞小生境。我们的模型中的推理是在贝叶斯框架中进行的,这使我们能够量化蛋白质分配给新亚细胞小生境的不确定性,以及新发现的隔室数量的不确定性。我们将我们的方法应用于 10 个基于质谱的空间蛋白质组学数据集,这些数据集代表了各种不同的实验方案。我们的方法在 hyperLOPIT 数据集上的应用通过恢复与染色质相关蛋白的富集而验证了其有效性,而无需注释,并且揭示了在原始分析中未识别到的核内部分化。此外,使用来自酿酒酵母的亚细胞蛋白质组学数据,我们发现了一组从内质网到早期高尔基体的蛋白质运输的新蛋白质。总体而言,我们证明了新颖性检测有可能产生当前方法错过的具有生物学相关性的小生境。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d24/7707549/f343021e79b7/pcbi.1008288.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验