Piña Johan S, Orozco-Arias Simon, Tobón-Orozco Nicolas, Camargo-Forero Leonardo, Tabares-Soto Reinel, Guyot Romain
Department of Data Science, People Contact, Manizales, Caldas, Colombia.
Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.
Evol Bioinform Online. 2023 Jan 20;19:11769343221150585. doi: 10.1177/11769343221150585. eCollection 2023.
A common task in bioinformatics is to compare DNA sequences to identify similarities between organisms at the sequence level. An approach to such comparison is the dot-plots, a 2-dimensional graphical representation to analyze DNA or protein alignments. Dot-plots alignment software existed before the sequencing revolution, and now there is an ongoing limitation when dealing with large-size sequences, resulting in very long execution times. High-Performance Computing (HPC) techniques have been successfully used in many applications to reduce computing times, but so far, very few applications for graphical sequence alignment using HPC have been reported. Here, we present G-SAIP (Graphical Sequence Alignment in Parallel), a software capable of spawning multiple distributed processes on CPUs, over a supercomputing infrastructure to speed up the execution time for dot-plot generation up to 1.68× compared with other current fastest tools, improve the efficiency for comparative structural genomic analysis, phylogenetics because the benefits of pairwise alignments for comparison between genomes, repetitive structure identification, and assembly quality checking.
生物信息学中的一项常见任务是比较DNA序列,以在序列水平上识别生物体之间的相似性。进行这种比较的一种方法是点图,它是一种用于分析DNA或蛋白质比对的二维图形表示法。在测序革命之前就存在点图比对软件,而目前在处理大尺寸序列时存在持续的局限性,导致执行时间非常长。高性能计算(HPC)技术已在许多应用中成功用于减少计算时间,但到目前为止,很少有使用HPC进行图形序列比对的应用报道。在这里,我们展示了G-SAIP(并行图形序列比对),这是一种能够在超级计算基础设施上的CPU上生成多个分布式进程的软件,与其他当前最快的工具相比,它能将生成点图的执行时间加快至1.68倍,提高比较结构基因组分析、系统发育学的效率,因为成对比对有利于基因组之间的比较、重复结构识别和组装质量检查。