Department of Biology, University of Massachusetts Boston, Boston, MA, USA.
Mol Ecol Resour. 2020 Mar;20(2):357-359. doi: 10.1111/1755-0998.13140. Epub 2020 Feb 20.
Decreasing sequencing costs have driven a rapid expansion of novel genotyping methods. One of these methods is the exploitation of restriction enzyme cut sites to generate genome-wide but reduced representation sequencing libraries (RRLs), alternatively termed genotyping by sequencing or restriction-site associated DNA sequencing. Without a reference genome, the resulting short sequence reads must be assembled de novo. There are many possible assembly programs, most not explicitly developed for RRL data, and we know little of their effectiveness. In this issue of Molecular Ecology Resources, LaCava et al. (2020) systematically evaluate six commonly used programs and two commonly varied parameters for complete and accurate assembly of RRLs, using simulated double digests of Homo sapiens and Arabidopsis thaliana genomes with varied mutation rates and types. The authors find substantial variation in performance across assembly programs. The most consistently high-performing assembler is infrequently used in their literature survey (CD-HIT; Li and Godzik, 2006), while several others fail to produce complete, accurate assemblies under many conditions. LaCava et al. additionally recommend best practices in parameter choice and evaluation of future assembly programs-advice that molecular ecologists working to assemble sequences of all kinds should take to heart.
测序成本的降低推动了新型基因分型方法的快速发展。这些方法之一是利用限制酶切割位点来生成全基因组但代表性降低的测序文库(RRL),也称为测序基因分型或限制位点相关 DNA 测序。没有参考基因组,生成的短序列读段必须从头组装。有许多可能的组装程序,大多数不是专门为 RRL 数据开发的,我们对它们的有效性知之甚少。在本期《分子生态学资源》中,LaCava 等人(2020)使用模拟的人类和拟南芥基因组的双酶切,系统评估了六种常用程序和两种常用的可变参数,以实现 RRL 的完全和准确组装,这些模拟具有不同的突变率和类型。作者发现,不同组装程序的性能存在很大差异。在他们的文献调查中,最一致表现良好的组装器(CD-HIT;Li 和 Godzik,2006)并不常用,而其他一些程序在许多情况下无法生成完整、准确的组装。LaCava 等人还建议在参数选择和未来组装程序评估方面的最佳实践,这是从事各种序列组装的分子生态学家应该牢记的建议。