Suppr超能文献

从串联质谱数据和互作网络整合中进行蛋白质推断。

Protein Inference from the Integration of Tandem MS Data and Interactome Networks.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1399-1409. doi: 10.1109/TCBB.2016.2601618. Epub 2016 Aug 24.

Abstract

Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.

摘要

由于蛋白质在串联质谱(MS)的预处理步骤中被消化成肽的混合物,因此很难确定共享肽属于哪个特定的蛋白质。在最近的研究中,除了串联 MS 数据和肽鉴定信息外,还利用其他一些信息来推断蛋白质。与首先仅使用串联 MS 数据推断蛋白质,然后使用网络信息来完善它们的方法不同,本研究提出了一种名为 TMSIN 的蛋白质推断方法,该方法直接使用互作网络。由于两个相互作用的蛋白质应该共同存在,因此可以合理地假设,如果一个相互作用的蛋白质在样品中被自信地推断出来,那么它的相互作用伙伴在同一样品中也应该有很高的概率。因此,我们可以利用互作网络中蛋白质的邻域信息来调整共享肽属于该蛋白质的概率。在 TMSIN 中,通过将包含互作网络信息的二分图与肽鉴定信息相结合,构建了一个多权重图。基于多权重图,TMSIN 采用迭代工作流程来推断蛋白质。在每个迭代步骤中,基于贝叶斯定律,根据每个蛋白质的邻域蛋白支持得分(由共享肽映射而来)计算共享肽属于特定蛋白质的概率。我们在酵母数据和人类数据上进行了实验,以 ROC、q 值和准确率的形式评估了 TMSIN 的性能。实验结果表明,TMSIN 在酵母数据集和人类数据集中的 AUC 得分分别为 0.742 和 0.874,并且当 q 值小于或等于 0.05 时,TMSIN 产生的真阳性数量最多。重叠分析表明,TMSIN 是蛋白质推断的有效补充方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验