Minnesota Supercomputing Institute, Minneapolis, MN, USA.
Proteomics. 2013 Apr;13(8):1352-7. doi: 10.1002/pmic.201200352. Epub 2013 Mar 15.
Large databases (>10(6) sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.
在代谢组学和蛋白质组学研究中,使用大型数据库(>10(6) 个序列)会对使用数据库搜索程序将肽序列与 MS/MS 数据进行匹配带来挑战。最值得注意的是,为避免假阳性匹配而进行的严格过滤会导致更多的假阴性,从而限制肽匹配的数量。为了解决这个挑战,我们开发了一种两步法,其中来自对大型数据库的初步搜索的匹配项被用于创建较小的子集数据库。第二次搜索是针对该子集数据库的目标诱饵版本与宿主数据库合并进行的。然后,使用高置信度的肽序列匹配来推断蛋白质身份。与传统的一步法相比,我们的两步法应用于代谢组学和蛋白质组学分析,在每种情况下都能获得两倍数量的高置信度肽序列匹配。两步法捕获了与一步法匹配的几乎相同的肽,其中大多数额外的匹配是一步法的假阴性。此外,两步法无论使用哪种数据库搜索程序都能改善结果。我们的结果表明,我们的两步法最大限度地提高了对需要大型数据库的应用的肽匹配灵敏度,这对于蛋白质组学和代谢组学研究特别有价值。