Li Heng, Durbin Richard
Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
ArXiv. 2023 Aug 15:arXiv:2308.07877v1.
De novo assembly is the process of reconstructing the genome sequence of an organism from sequencing reads. Genome sequences are essential to biology, and assembly has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best but technological advances in long-read sequencing now enable near complete chromosome-level assembly, also known as telomere-to-telomere assembly, for many organisms. Here we review recent progress on assembly algorithms and protocols. We focus on how to derive near telomere-to-telomere assemblies and discuss potential future developments.
从头组装是指从测序读数中重建生物体基因组序列的过程。基因组序列对生物学至关重要,并且在过去四十年里,组装一直是生物信息学中的核心问题。直到最近,基因组通常最多只能组装成几个兆碱基的片段,但长读长测序技术的进步现在使许多生物体能够实现近乎完整的染色体水平组装,也称为端粒到端粒组装。在这里,我们回顾了组装算法和方案的最新进展。我们重点关注如何获得近乎端粒到端粒的组装,并讨论潜在的未来发展。