Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK.
Genomics England Limited, One Canada Square, London E14 5AB, UK.
Genes (Basel). 2024 Jul 16;15(7):925. doi: 10.3390/genes15070925.
The identification of structural variants (SVs) in genomic data represents an ongoing challenge because of difficulties in reliable SV calling leading to reduced sensitivity and specificity. We prepared high-quality DNA from 9 parent-child trios, who had previously undergone short-read whole-genome sequencing (Illumina platform) as part of the Genomics England 100,000 Genomes Project. We reanalysed the genomes using both Bionano optical genome mapping (OGM; 8 probands and one trio) and Nanopore long-read sequencing (Oxford Nanopore Technologies [ONT] platform; all samples). To establish a "truth" dataset, we asked whether rare proband SV calls ( = 234) made by the Bionano Access (version 1.6.1)/Solve software (version 3.6.1_11162020) could be verified by individual visualisation using the Integrative Genomics Viewer with either or both of the Illumina and ONT raw sequence. Of these, 222 calls were verified, indicating that Bionano OGM calls have high precision (positive predictive value 95%). We then asked what proportion of the 222 true Bionano SVs had been identified by SV callers in the other two datasets. In the Illumina dataset, sensitivity varied according to variant type, being high for deletions (115/134; 86%) but poor for insertions (13/58; 22%). In the ONT dataset, sensitivity was generally poor using the original Sniffles variant caller (48% overall) but improved substantially with use of Sniffles2 (36/40; 90% and 17/23; 74% for deletions and insertions, respectively). In summary, we show that the precision of OGM is very high. In addition, when applying the Sniffles2 caller, the sensitivity of SV calling using ONT long-read sequence data outperforms Illumina sequencing for most SV types.
从基因组数据中识别结构变异 (SV) 是一个持续存在的挑战,因为可靠的 SV 调用存在困难,导致敏感性和特异性降低。我们从之前参加过英国基因组学 10 万基因组计划的 9 个亲子三胞胎中提取了高质量的 DNA,这些亲子三胞胎曾接受过短读长全基因组测序(Illumina 平台)。我们使用博睿纳米光学基因组图谱(OGM;8 个先证者和 1 个三胞胎)和纳米孔长读测序(Oxford Nanopore Technologies [ONT] 平台;所有样本)重新分析了基因组。为了建立一个“真实”数据集,我们询问了博睿纳米接入(版本 1.6.1)/解决软件(版本 3.6.1_11162020)中罕见的先证者 SV 调用(=234)是否可以通过使用 Integrative Genomics Viewer 进行单独可视化来验证,无论是使用 Illumina 还是 ONT 的原始序列。其中,222 个调用得到了验证,这表明博睿纳米光学基因组图谱的调用具有很高的精度(阳性预测值 95%)。然后,我们询问了其他两个数据集的 SV 调用者识别了这 222 个真实的博睿纳米 SV 的比例。在 Illumina 数据集,根据变异类型,删除的灵敏度很高(115/134;86%),而插入的灵敏度较差(13/58;22%)。在 ONT 数据集,原始 Sniffles 变体调用器的总体灵敏度较差(48%),但使用 Sniffles2 后大大提高(分别为 36/40;90%和 17/23;74%,用于删除和插入)。总之,我们表明 OGM 的精度非常高。此外,当应用 Sniffles2 调用器时,使用 ONT 长读序列数据进行 SV 调用的敏感性优于 Illumina 测序,对于大多数 SV 类型。