Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, 92093, USA.
Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, California, 92093, USA.
Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.
Whole-genome sequencing using sequencing technologies such as Illumina enables the accurate detection of small-scale variants but provides limited information about haplotypes and variants in repetitive regions of the human genome. Single-molecule sequencing (SMS) technologies such as Pacific Biosciences and Oxford Nanopore generate long reads that can potentially address the limitations of short-read sequencing. However, the high error rate of SMS reads makes it challenging to detect small-scale variants in diploid genomes. We introduce a variant calling method, Longshot, which leverages the haplotype information present in SMS reads to accurately detect and phase single-nucleotide variants (SNVs) in diploid genomes. We demonstrate that Longshot achieves very high accuracy for SNV detection using whole-genome Pacific Biosciences data, outperforms existing variant calling methods, and enables variant detection in duplicated regions of the genome that cannot be mapped using short reads.
利用 Illumina 等测序技术进行全基因组测序,可以准确检测小规模变体,但提供的关于人类基因组重复区域的单倍型和变体的信息有限。Pacific Biosciences 和 Oxford Nanopore 等单分子测序 (SMS) 技术可生成长读段,有可能解决短读测序的局限性。然而,SMS 读段的高错误率使得在二倍体基因组中检测小规模变体具有挑战性。我们引入了一种变体调用方法 Longshot,该方法利用 SMS 读段中存在的单倍型信息,准确检测和定相二倍体基因组中的单核苷酸变体 (SNV)。我们证明,Longshot 可以使用全基因组 Pacific Biosciences 数据实现非常高的 SNV 检测准确性,优于现有变体调用方法,并能够检测使用短读段无法映射的基因组重复区域中的变体。