通过二分图分析实现蛋白质组简约性可提高准确性和透明度。

Proteomic parsimony through bipartite graph analysis improves accuracy and transparency.

作者信息

Zhang Bing, Chambers Matthew C, Tabb David L

机构信息

Department of Biomedical Informatics, Mass Spectrometry Research Center, Vanderbilt University Medical Center, Nashville, Tennessee 37232-8575, USA.

出版信息

J Proteome Res. 2007 Sep;6(9):3549-57. doi: 10.1021/pr070230d. Epub 2007 Aug 4.

Abstract

Assembling peptides identified from LC-MS/MS spectra into a list of proteins is a critical step in analyzing shotgun proteomics data. As one peptide sequence can be mapped to multiple proteins in a database, naïve protein assembly can substantially overstate the number of proteins found in samples. We model the peptide-protein relationships in a bipartite graph and use efficient graph algorithms to identify protein clusters with shared peptides and to derive the minimal list of proteins. We test the effects of this parsimony analysis approach using MS/MS data sets generated from a defined human protein mixture, a yeast whole cell extract, and a human serum proteome after MARS column depletion. The results demonstrate that the bipartite parsimony technique not only simplifies protein lists but also improves the accuracy of protein identification. We use bipartite graphs for the visualization of the protein assembly results to render the parsimony analysis process transparent to users. Our approach also groups functionally related proteins together and improves the comprehensibility of the results. We have implemented the tool in the IDPicker package. The source code and binaries for this protein assembly pipeline are available under Mozilla Public License at the following URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.

摘要

将从液相色谱-串联质谱(LC-MS/MS)谱图中鉴定出的肽段组装成蛋白质列表,是分析鸟枪法蛋白质组学数据的关键步骤。由于一条肽段序列可能对应数据库中的多个蛋白质,简单的蛋白质组装会大幅高估样本中发现的蛋白质数量。我们在二分图中对肽段与蛋白质的关系进行建模,并使用高效的图算法来识别具有共享肽段的蛋白质簇,从而得出最小的蛋白质列表。我们使用从定义的人类蛋白质混合物、酵母全细胞提取物以及经过基质辅助激光解吸电离飞行时间质谱(MALDI-TOF-MS)柱去除后的人类血清蛋白质组生成的串联质谱数据集,测试这种简约分析方法的效果。结果表明,二分简约技术不仅简化了蛋白质列表,还提高了蛋白质鉴定的准确性。我们使用二分图来可视化蛋白质组装结果,以使简约分析过程对用户透明。我们的方法还将功能相关的蛋白质归为一组,提高了结果的可理解性。我们已在IDPicker软件包中实现了该工具。此蛋白质组装流程的源代码和二进制文件可在以下网址根据Mozilla公共许可证获取:http://www.mc.vanderbilt.edu/msrc/bioinformatics/

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索