Suppr超能文献

SVsearcher:一种用于长读长数据中更准确的结构变异检测方法。

SVsearcher: A more accurate structural variation detection method in long read data.

作者信息

Zheng Yan, Shang Xuequn, Sung Wing-Kin

机构信息

School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, 710072 Xi'an, China.

School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, 710072 Xi'an, China.

出版信息

Comput Biol Med. 2023 May;158:106843. doi: 10.1016/j.compbiomed.2023.106843. Epub 2023 Mar 31.

Abstract

Structural variations (SVs) represent genomic rearrangements (such as deletions, insertions, and inversions) whose sizes are larger than 50bp. They play important roles in genetic diseases and evolution mechanism. Due to the advance of long-read sequencing (i.e. PacBio long-read sequencing and Oxford Nanopore (ONT) long-read sequencing), we can call SVs accurately. However, for ONT long reads, we observe that existing long read SV callers miss a lot of true SVs and call a lot of false SVs in repetitive regions and in regions with multi-allelic SVs. Those errors are caused by messy alignments of ONT reads due to their high error rate. Hence, we propose a novel method, SVsearcher, to solve these issues. We run SVsearcher and other callers in three real datasets and find that SVsearcher improves the F1 score by approximately 10% for high coverage (50×) datasets and more than 25% for low coverage (10×) datasets. More importantly, SVsearcher can identify 81.7%-91.8% multi-allelic SVs while existing methods only identify 13.2% (Sniffles)-54.0% (nanoSV) of them. SVsearcher is available at https://github.com/kensung-lab/SVsearcher.

摘要

结构变异(SVs)代表基因组重排(如缺失、插入和倒位),其大小大于50bp。它们在遗传疾病和进化机制中发挥着重要作用。由于长读长测序技术的进步(即PacBio长读长测序和牛津纳米孔(ONT)长读长测序),我们能够准确地识别SVs。然而,对于ONT长读长,我们发现现有的长读长SVs识别工具在重复区域和具有多等位基因SVs的区域中遗漏了许多真实的SVs,并误判了许多假的SVs。这些错误是由于ONT读长的高错误率导致的混乱比对造成的。因此,我们提出了一种新的方法SVsearcher来解决这些问题。我们在三个真实数据集上运行了SVsearcher和其他识别工具,发现对于高覆盖度(50×)数据集,SVsearcher将F1分数提高了约10%,对于低覆盖度(10×)数据集,提高了超过25%。更重要的是,SVsearcher能够识别81.7%-91.8%的多等位基因SVs,而现有方法只能识别其中的13.2%(Sniffles)-54.0%(nanoSV)。可通过https://github.com/kensung-lab/SVsearcher获取SVsearcher。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验