College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China.
School of Computer and Information Engineering, Henan University, Kaifeng, China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab033.
In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.
在基因组组装领域,支架方法使得获得更完整和连续的参考基因组成为可能,这是基因组研究的基石。支架方法通常利用 contigs 和测序数据(reads)之间的比对来确定 contigs 之间的方向和顺序,并生成更长的支架,这有助于基因组下游分析。随着高通量测序技术的快速发展,过去十年中出现了多种类型的 reads,特别是在长距离测序中,这极大地提高了支架方法的组装质量。随着支架方法数量的增加,生物学和生物信息学研究人员需要对最先进的支架方法进行深入分析。在本文中,我们重点介绍支架方法的难点、各种类型的 reads 的特点差异、当前支架方法解决这些难点的方法以及未来的研究机会。我们希望这项工作将有助于设计新的支架方法,并为特定的生物学研究选择合适的支架方法。