Molecular, Cellular, and Biomedical Sciences Department, University of New Hampshire, Durham, NH, USA.
Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, USA.
Mol Ecol Resour. 2020 Jul;20(4):856-870. doi: 10.1111/1755-0998.13155. Epub 2020 May 11.
High-throughput sequencing technologies are a proposed solution for accessing the molecular data in historical specimens. However, degraded DNA combined with the computational demands of short-read assemblies has posed significant laboratory and bioinformatics challenges for de novo genome assembly. Linked-read or "synthetic long-read" sequencing technologies, such as 10× Genomics, may provide a cost-effective alternative solution to assemble higher quality de novo genomes from degraded tissue samples. Here, we compare assembly quality (e.g., genome contiguity and completeness, presence of orthogroups) between four new deer mouse (Peromyscus spp.) genomes assembled using linked-read technology and four published genomes assembled from a single shotgun library. At a similar price-point, these approaches produce vastly different assemblies, with linked-read assemblies having overall higher contiguity and completeness, measured by larger N50 values and greater number of genes assembled, respectively. As a proof-of-concept, we used annotated genes from the four Peromyscus linked-read assemblies and eight additional rodent taxa to generate a phylogeny, which reconstructed the expected relationships among species with 100% support. Although not without caveats, our results suggest that linked-read sequencing approaches are a viable option to build de novo genomes from degraded tissues, which may prove particularly valuable for taxa that are extinct, rare or difficult to collect.
高通量测序技术是获取历史标本中分子数据的一种解决方案。然而,降解 DNA 与短读序列组装的计算需求相结合,给从头组装基因组带来了重大的实验室和生物信息学挑战。链接读取或“合成长读”测序技术,如 10× Genomics,可能为从降解组织样本中组装更高质量的从头基因组提供一种具有成本效益的替代解决方案。在这里,我们比较了使用链接读取技术组装的四个新鹿鼠(Peromyscus spp.)基因组和从单个鸟枪法文库组装的四个已发表基因组的组装质量(例如基因组连续性和完整性、同源基因的存在)。在类似的价格点上,这些方法产生了截然不同的组装结果,链接读取组装的总体连续性和完整性更高,分别通过更大的 N50 值和更多组装的基因来衡量。作为概念验证,我们使用了四个 Peromyscus 链接读取组装体和另外八个啮齿动物分类群的注释基因来生成系统发育树,该树重建了物种之间的预期关系,支持率为 100%。尽管存在一些注意事项,但我们的结果表明,链接读取测序方法是从降解组织中构建从头基因组的可行选择,对于已经灭绝、稀有或难以收集的分类群可能特别有价值。