Brown Noah, Danis Charles, Ahmedjanova Vazira, Guler Jennifer L
Department of Biology, University of Virginia. Charlottesville, VA 22903.
bioRxiv. 2025 May 19:2025.01.24.634734. doi: 10.1101/2025.01.24.634734.
Structural variants (SVs) are abundant across all life, and have major impacts on the genome and transcriptome. However, it is difficult to appreciate the individual significance of SVs when they are heterogeneously distributed across a genomic neighborhood. Further, low-input sequencing technologies or sequencing of many individuals across a population introduce variance that complicates SV counting and association studies. Tools exist to simplify SV datasets, but these SV mergers begin to fail on large or highly variable datasets. To address this issue, we introduce a new SV merger called SVCROWS (Structural Variation Consensus with Reciprocal Overlap and Weighted Sizes). This option-rich R package merges and summarizes SV regions using a size-weighted reciprocal overlap framework, effectively accounting for skewed impacts of variable-length SVs. User input directs stringency of comparisons across a range of sizes, enabling different levels of resolution in complex genome regions that harbor both small and large SVs. When compared to other SV merging programs, SVCROWS accurately merges SVs while maintaining less frequent genotypes of the unmerged SV calls. SVCROWS proves to be especially useful with large and highly variable single-cell datasets for enabling SV discovery. Overall, the novel size-weighted comparisons of SVCROWS presents a framework for improved interpretation of SV calls, and its ease of use allows it to be applied to virtually any upstream analyses.
结构变异(SVs)在所有生物中都大量存在,并且对基因组和转录组有重大影响。然而,当SVs在基因组邻域中呈异质分布时,很难评估其个体意义。此外,低输入测序技术或对群体中多个个体进行测序会引入变异,这使得SV计数和关联研究变得复杂。虽然存在一些工具来简化SV数据集,但这些SV合并工具在处理大型或高度可变的数据集时开始失效。为了解决这个问题,我们引入了一种名为SVCROWS(具有相互重叠和加权大小的结构变异共识)的新SV合并工具。这个功能丰富的R包使用大小加权的相互重叠框架来合并和总结SV区域,有效地考虑了可变长度SVs的偏态影响。用户输入可以指导在一系列大小范围内比较的严格程度,从而在同时包含小SVs和大SVs的复杂基因组区域实现不同程度的分辨率。与其他SV合并程序相比,SVCROWS能够准确合并SVs,同时保持未合并SV调用中较少出现的基因型。事实证明,SVCROWS对于大型且高度可变的单细胞数据集在实现SV发现方面特别有用。总体而言,SVCROWS新颖的大小加权比较为改进SV调用的解释提供了一个框架,并且其易用性使其几乎可以应用于任何上游分析。