Volarić Marin, Meštrović Nevenka, Despot-Slade Evelin
Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae660.
Satellite DNAs (satDNAs) are tandemly repeated sequences that make up a significant portion of almost all eukaryotic genomes. Although satDNAs have been shown to play an important role in genome organization and evolution, they are relatively poorly analyzed, even in model organisms. One of the main reasons for the current lack of in-depth studies on satDNAs is their underrepresentation in genome assemblies. Due to complexity, abundance, and highly repetitive nature of satDNAs, their analysis is challenging, requiring efficient tools that ensure accurate annotation and comprehensive genome-wide analysis. We present a novel pipeline, named satellite DNA Exploration (SatXplor), designed to robustly characterize satDNA elements and analyze their arrays and flanking regions. SatXplor is benchmarked against other tools and curated satDNA datasets from diverse species, including mice and humans, showcase its versatility across genomes with varying complexities and satDNA profiles. Component algorithms excel in the identification of tandemly repeated sequences and, for the first time, enable evaluation of satDNA variation and array annotation with the addition of information about surrounding genomic landscape. SatXplor is an innovative pipeline for satDNA analysis that can be paired with any tool used for satDNA detection, offering insights into the structural characteristics, array determination, and genomic context of satDNA elements. By integrating various computational techniques, from sequence analysis and homology investigation to advanced clustering and graph-based methods, it provides a versatile and comprehensive approach to explore the complexity of satDNA organization and understand the underlying mechanisms and evolutionary aspects. It is open-source and freely accessible at https://github.com/mvolar/SatXplor.
卫星DNA(satDNA)是串联重复序列,几乎构成了所有真核生物基因组的很大一部分。尽管已表明satDNA在基因组组织和进化中发挥重要作用,但即使在模式生物中,它们的分析也相对较少。目前对satDNA缺乏深入研究的主要原因之一是它们在基因组组装中的代表性不足。由于satDNA的复杂性、丰富性和高度重复性,其分析具有挑战性,需要高效的工具来确保准确注释和全基因组范围的综合分析。我们提出了一种名为卫星DNA探索(SatXplor)的新型流程,旨在稳健地表征satDNA元件并分析其阵列和侧翼区域。SatXplor以其他工具为基准,并整理了来自包括小鼠和人类在内的不同物种的satDNA数据集,展示了其在具有不同复杂性和satDNA图谱的基因组中的通用性。组件算法在识别串联重复序列方面表现出色,并且首次能够通过添加有关周围基因组景观的信息来评估satDNA变异和阵列注释。SatXplor是一种用于satDNA分析的创新流程,可以与任何用于satDNA检测的工具配合使用,提供对satDNA元件的结构特征、阵列确定和基因组背景的见解。通过整合各种计算技术,从序列分析和同源性研究到先进的聚类和基于图的方法,它提供了一种通用且全面的方法来探索satDNA组织的复杂性,并了解其潜在机制和进化方面。它是开源的,可在https://github.com/mvolar/SatXplor上免费获取。