Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.
Genome Center, UC Davis, Davis, CA, USA.
Nat Methods. 2023 Apr;20(4):550-558. doi: 10.1038/s41592-022-01674-1. Epub 2022 Dec 22.
Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)-a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome.
结构变异 (SVs) 在基因组中占据了大量的序列变异,在人类基因组学和精准医学中发挥着重要作用。尽管多年来投入了大量的努力,但由于人类基因组的二倍体和高度重复的结构,以及远超测序读长的 SVs 的存在,个体 SVs 的发现仍然具有挑战性。然而,最近引入的低错误率长读测序技术,如 PacBio HiFi,可能最终将克服这些障碍。在这里,我们提出了样本特异性字符串的 SV 发现 (SVDSS)——一种从长读测序技术 (例如 PacBio HiFi) 中发现 SV 的方法,该方法结合并有效地利用了无映射、基于映射和基于组装的方法,以实现整体卓越的 SV 发现性能。我们在几个人类样本上的实验表明,SVDSS 在发现 PacBio HiFi 读段中的插入和缺失 SV 方面优于最先进的基于映射的方法,并在基因组重复区域的 SV 调用方面取得了显著的改进。