使用RaMA对着丝粒进行快速序列比对。

Fast sequence alignment for centromeres with RaMA.

作者信息

Zhang Pinglu, Wei Yanming, Tian Qinzhong, Zou Quan, Wang Yansu

机构信息

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China.

Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, Zhejiang, China.

出版信息

Genome Res. 2025 May 2;35(5):1209-1218. doi: 10.1101/gr.279763.124.

DOI:10.1101/gr.279763.124

PMID:39939176

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12047532/

Abstract

The release of the first draft of the human pangenome has revolutionized genomic research by enabling access to complex regions like centromeres, composed of extra-long tandem repeats (ETRs). However, a significant gap remains as current methodologies are inadequate for producing sequence alignments that effectively capture genetic events within ETRs, highlighting a pressing need for improved alignment tools. Inspired by UniAligner, we developed a rare match aligner (RaMA), using rare matches as anchors and two-piece affine gap cost to generate complete pairwise alignment that better captures genetic evolution. RaMA also employs parallel computing and the wavefront algorithm to accelerate anchor discovery and sequence alignment, achieving up to 13.66 times faster processing using only 11% of UniAligner's memory. Downstream analysis of simulated data and the CHM13 and CHM1 higher-order repeat (HOR) arrays demonstrates that RaMA achieves more accurate alignments, effectively capturing true HOR structures. RaMA also introduces two methods for defining reliable alignment regions, further refining and enhancing the accuracy of centromeric alignment statistics.

摘要

人类泛基因组初稿的发布彻底改变了基因组研究，使人们能够获取由超长串联重复序列（ETR）组成的着丝粒等复杂区域。然而，由于当前方法不足以生成有效捕捉ETR内遗传事件的序列比对，仍存在显著差距，这凸显了对改进比对工具的迫切需求。受UniAligner启发，我们开发了一种稀有匹配比对器（RaMA），以稀有匹配为锚点并采用两段仿射间隙成本来生成完整的成对比对，从而更好地捕捉遗传进化。RaMA还采用并行计算和波前算法来加速锚点发现和序列比对，仅使用UniAligner 11%的内存就能实现高达13.66倍的更快处理速度。对模拟数据以及CHM13和CHM1高阶重复（HOR）阵列的下游分析表明，RaMA实现了更准确的比对，有效地捕捉了真实的HOR结构。RaMA还引入了两种定义可靠比对区域的方法，进一步细化并提高了着丝粒比对统计的准确性。