Feng Jian, Naiman Daniel Q, Cooper Bret
Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland 21218, USA.
Anal Chem. 2007 May 15;79(10):3901-11. doi: 10.1021/ac070202e. Epub 2007 Apr 19.
In shotgun proteomics, tandem mass spectrometry is used to identify peptides derived from proteins. After the peptides are detected, proteins are reassembled via a reference database of protein or gene information. Redundancy and homology between protein records in databases make it challenging to assign peptides to proteins that may or may not be in an experimental sample. Here, a probability model is introduced for determining the likelihood that peptides are correctly assigned to proteins. This model derives consistent probability estimates for assembled proteins. The probability scores make it easier to confidently identify proteins in complex samples and to accurately estimate false-positive rates. The algorithm based on this model is robust in creating protein complements from peptides from bovine protein standards, yeast, Ustilago maydis cell lysates, and Arabidopsis thaliana leaves. It also eliminates the side effects of redundancy and homology from the reference databases by employing a new concept of peptide grouping and by coherently distinguishing distinct peptides from unique records and shared peptides from homologous proteins. The software that runs the algorithm, called PANORAMICS, provides a tool to help analyze the data based on a researcher's knowledge about the sample. The software operates efficiently and quickly compared to other software platforms.
在鸟枪法蛋白质组学中,串联质谱用于鉴定源自蛋白质的肽段。肽段被检测到后,通过蛋白质或基因信息的参考数据库重新组装蛋白质。数据库中蛋白质记录之间的冗余和同源性使得将肽段分配给可能存在或不存在于实验样品中的蛋白质具有挑战性。在此,引入了一种概率模型来确定肽段被正确分配给蛋白质的可能性。该模型为组装后的蛋白质得出一致的概率估计值。概率得分使得在复杂样品中更有信心地鉴定蛋白质并准确估计假阳性率变得更容易。基于该模型的算法在从牛蛋白质标准品、酵母、玉米黑粉菌细胞裂解物和拟南芥叶片的肽段创建蛋白质互补物方面具有鲁棒性。它还通过采用肽段分组的新概念,并通过连贯地区分来自唯一记录的不同肽段和来自同源蛋白质的共享肽段,消除了参考数据库中冗余和同源性的副作用。运行该算法的软件称为PANORAMICS,它基于研究人员对样品的了解提供了一个有助于分析数据的工具。与其他软件平台相比,该软件运行高效且快速。