Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, Texas 77555, United States.
J Proteome Res. 2011 Sep 2;10(9):4150-7. doi: 10.1021/pr2003177. Epub 2011 Aug 9.
This work describes the mass distribution of all theoretically possibly tryptic peptides made of 20 amino acids, up to the mass of 3 kDa, with resolution of 0.001 Da. We characterize regions between the peaks of the distribution, including gaps (forbidden zones) and low-populated areas (quiet zones). We show how the gaps shrink over the mass range and when they completely disappear. We demonstrate that peptide compositions in quiet zones are less diverse than those in the peaks of the distribution and that by eliminating certain types of unrealistic compositions the gaps in the distribution may be increased. The mass distribution is generated using a parallel implementation of a recursive procedure that enumerates all amino acid compositions. It allows us to enumerate all compositions of tryptic peptides below 3 kDa in 48 min using a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores). The results of this work can be used to facilitate protein identification and mass defect labeling in mass spectrometry-based proteomics experiments.
这项工作描述了所有理论上可能的由 20 种氨基酸组成的胰蛋白酶肽的质量分布,质量可达 3 kDa,分辨率为 0.001 Da。我们描述了分布峰之间的区域,包括间隙(禁区)和低 populate 区域(安静区)。我们展示了间隙如何在质量范围内收缩,以及当它们完全消失时的情况。我们表明,安静区的肽组成不如分布峰中的肽组成多样化,并且通过消除某些类型的不现实组成,可以增加分布中的间隙。质量分布是使用递归过程的并行实现生成的,该过程枚举了所有的氨基酸组成。它允许我们在使用具有 12 个 Intel Xeon X5650 CPU(72 核)的计算机集群在 48 分钟内枚举所有小于 3 kDa 的胰蛋白酶肽的组成。这项工作的结果可用于促进基于质谱的蛋白质组学实验中的蛋白质鉴定和质量缺陷标记。