Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996, USA.
Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA.
Bioinformatics. 2018 Mar 1;34(5):795-802. doi: 10.1093/bioinformatics/btx601.
Complex microbial communities can be characterized by metagenomics and metaproteomics. However, metagenome assemblies often generate enormous, and yet incomplete, protein databases, which undermines the identification of peptides and proteins in metaproteomics. This challenge calls for increased discrimination of true identifications from false identifications by database searching and filtering algorithms in metaproteomics.
Sipros Ensemble was developed here for metaproteomics using an ensemble approach. Three diverse scoring functions from MyriMatch, Comet and the original Sipros were incorporated within a single database searching engine. Supervised classification with logistic regression was used to filter database searching results. Benchmarking with soil and marine microbial communities demonstrated a higher number of peptide and protein identifications by Sipros Ensemble than MyriMatch/Percolator, Comet/Percolator, MS-GF+/Percolator, Comet & MyriMatch/iProphet and Comet & MyriMatch & MS-GF+/iProphet. Sipros Ensemble was computationally efficient and scalable on supercomputers.
Freely available under the GNU GPL license at http://sipros.omicsbio.org.
Supplementary data are available at Bioinformatics online.
复杂的微生物群落可以通过宏基因组学和宏蛋白质组学来描述。然而,宏基因组组装通常会生成庞大但又不完整的蛋白质数据库,这会影响到宏蛋白质组学中肽和蛋白质的鉴定。这一挑战需要通过数据库搜索和过滤算法来提高宏蛋白质组学中真实鉴定与假鉴定的区分度。
本文开发了 Sipros Ensemble,用于宏蛋白质组学研究,采用集成方法。三个不同的评分函数分别来自于 MyriMatch、Comet 和原始 Sipros,整合到单个数据库搜索引擎中。使用逻辑回归进行监督分类来过滤数据库搜索结果。在土壤和海洋微生物群落中的基准测试表明,Sipros Ensemble 比 MyriMatch/Percolator、Comet/Percolator、MS-GF+/Percolator、Comet & MyriMatch/iProphet 和 Comet & MyriMatch & MS-GF+/iProphet 鉴定出更多的肽和蛋白质。Sipros Ensemble 在超级计算机上具有高效的计算能力和可扩展性。
可在 http://sipros.omicsbio.org 上根据 GNU GPL 许可证免费获取。
补充数据可在 Bioinformatics 在线获取。