Sczyrba Alexander, Hofmann Peter, Belmann Peter, Koslicki David, Janssen Stefan, Dröge Johannes, Gregor Ivan, Majda Stephan, Fiedler Jessika, Dahms Eik, Bremges Andreas, Fritz Adrian, Garrido-Oter Ruben, Jørgensen Tue Sparholt, Shapiro Nicole, Blood Philip D, Gurevich Alexey, Bai Yang, Turaev Dmitrij, DeMaere Matthew Z, Chikhi Rayan, Nagarajan Niranjan, Quince Christopher, Meyer Fernando, Balvočiūtė Monika, Hansen Lars Hestbjerg, Sørensen Søren J, Chia Burton K H, Denis Bertrand, Froula Jeff L, Wang Zhong, Egan Robert, Don Kang Dongwan, Cook Jeffrey J, Deltel Charles, Beckstette Michael, Lemaitre Claire, Peterlongo Pierre, Rizk Guillaume, Lavenier Dominique, Wu Yu-Wei, Singer Steven W, Jain Chirag, Strous Marc, Klingenberg Heiner, Meinicke Peter, Barton Michael D, Lingner Thomas, Lin Hsin-Hung, Liao Yu-Chieh, Silva Genivaldo Gueiros Z, Cuevas Daniel A, Edwards Robert A, Saha Surya, Piro Vitor C, Renard Bernhard Y, Pop Mihai, Klenk Hans-Peter, Göker Markus, Kyrpides Nikos C, Woyke Tanja, Vorholt Julia A, Schulze-Lefert Paul, Rubin Edward M, Darling Aaron E, Rattei Thomas, McHardy Alice C
Faculty of Technology, Bielefeld University, Bielefeld, Germany.
Center for Biotechnology, Bielefeld University, Bielefeld, Germany.
Nat Methods. 2017 Nov;14(11):1063-1071. doi: 10.1038/nmeth.4458. Epub 2017 Oct 2.
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
组装、分类学分析和分箱方法是解释宏基因组数据的关键,但缺乏关于基准测试的共识使性能评估变得复杂。宏基因组解释关键评估(CAMI)挑战赛促使全球开发者社区在高度复杂且逼真的数据集上对其程序进行基准测试,这些数据集由约700种新测序的微生物以及约600种新型病毒和质粒生成,并代表常见的实验设置。组装和基因组分箱程序对于由单个基因组代表的物种表现良好,但受到相关菌株的显著影响。分类学分析和分箱程序在高分类级别上表现出色,在科级以下性能显著下降。参数设置显著影响性能,突出了它们对程序可重复性的重要性。CAMI结果突出了当前的挑战,但也为选择软件以回答特定研究问题提供了路线图。