Proteomics and Informatics Services Facility, University of Illinois at Chicago, IL 60612, USA.
BMC Bioinformatics. 2010 Aug 23;11:436. doi: 10.1186/1471-2105-11-436.
High-throughput shotgun proteomics data contain a significant number of spectra from non-peptide ions or spectra of too poor quality to obtain highly confident peptide identifications. These spectra cannot be identified with any positive peptide matches in some database search programs or are identified with false positives in others. Removing these spectra can improve the database search results and lower computational expense.
A new algorithm has been developed to filter tandem mass spectra of poor quality from shotgun proteomic experiments. The algorithm determines the noise level dynamically and independently for each spectrum in a tandem mass spectrometric data set. Spectra are filtered based on a minimum number of required signal peaks with a signal-to-noise ratio of 2. The algorithm was tested with 23 sample data sets containing 62,117 total spectra.
The spectral screening removed 89.0% of the tandem mass spectra that did not yield a peptide match when searched with the MassMatrix database search software. Only 6.0% of tandem mass spectra that yielded peptide matches considered to be true positive matches were lost after spectral screening. The algorithm was found to be very effective at removal of unidentified spectra in other database search programs including Mascot, OMSSA, and X!Tandem (75.93%-91.00%) with a small loss (3.59%-9.40%) of true positive matches.
高通量 shotgun 蛋白质组学数据包含大量的非肽离子谱或质量太差而无法获得高度可信的肽鉴定的谱。在一些数据库搜索程序中,这些谱无法与任何阳性肽匹配进行鉴定,而在其他程序中则被鉴定为假阳性。去除这些谱可以改善数据库搜索结果并降低计算费用。
开发了一种新的算法,用于从 shotgun 蛋白质组学实验中过滤低质量的串联质谱。该算法为串联质谱数据集的每个谱动态且独立地确定噪声水平。根据具有信噪比为 2 的所需信号峰的最小数量对谱进行过滤。该算法在包含 62,117 个总谱的 23 个样品数据集上进行了测试。
光谱筛选去除了使用 MassMatrix 数据库搜索软件搜索时未产生肽匹配的 89.0%的串联质谱。在光谱筛选后,只有 6.0%的产生被认为是阳性匹配的肽匹配的串联质谱丢失。该算法在其他数据库搜索程序中,包括 Mascot、OMSSA 和 X!Tandem(75.93%-91.00%),非常有效地去除了未识别的光谱,而阳性匹配的损失很小(3.59%-9.40%)。