Gu Wenyan, Zhou Aizhong, Wang Lusheng, Sun Shiwei, Cui Xuefeng, Zhu Daming
School of Computer Science and Technology, Shandong University, Qindao, China.
Department of Computer Science, City University of Hong Kong, Hong Kong, China.
J Comput Biol. 2021 Aug;28(8):774-788. doi: 10.1089/cmb.2021.0048. Epub 2021 May 10.
Genome structural variants (SVs) have great impacts on human phenotype and diversity, and have been linked to numerous diseases. Long-read sequencing technologies arise to make it possible to find SVs of as long as 10,000 nucleotides. Thus, long read-based SV detection has been drawing attention of many recent research projects, and many tools have been developed for long reads to detect SVs recently. In this article, we present a new method, called SVLR, to detect SVs based on long-read sequencing data. Comparing with existing methods, SVLR can detect three new kinds of SVs: block replacements, block interchanges, and translocations. Although these new SVs are structurally more complicated, SVLR achieves accuracies that are comparable with those of the classic SVs. Moreover, for the classic SVs that can be detected by state-of-the-art methods (e.g., SVIM and Sniffles), our experiments demonstrate recall improvements of up to 38% without harming the precisions (i.e., >78%). We also point out three directions to further improve SV detection in the future. Source codes: https://github.com/GWYSDU/SVLR.
基因组结构变异(SVs)对人类表型和多样性有重大影响,并与多种疾病相关联。长读长测序技术的出现使得发现长达10000个核苷酸的SVs成为可能。因此,基于长读长的SV检测引起了近期许多研究项目的关注,最近也开发了许多用于长读长检测SVs的工具。在本文中,我们提出了一种名为SVLR的新方法,用于基于长读长测序数据检测SVs。与现有方法相比,SVLR可以检测三种新的SVs:块替换、块互换和易位。尽管这些新的SVs在结构上更为复杂,但SVLR实现了与经典SVs相当的准确率。此外,对于现有最先进方法(如SVIM和Sniffles)能够检测的经典SVs,我们的实验表明召回率提高了38%,同时精度不受影响(即>78%)。我们还指出了未来进一步改进SV检测的三个方向。源代码:https://github.com/GWYSDU/SVLR。