Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands.
Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands.
Genome Med. 2023 May 8;15(1):34. doi: 10.1186/s13073-023-01183-6.
Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease.
We sequenced the genomes of eight parent-child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing.
We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data.
HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs.
长读测序 (LRS) 技术在识别结构变异 (SV) 方面非常成功。然而,LRS 的高错误率使得检测小变异(替换和短插入缺失 < 20 bp)更具挑战性。PacBio HiFi 测序的引入使得 LRS 也适合检测小变异。在这里,我们评估了 HiFi 读数检测各种类型的新生突变 (DNM) 的能力,DNM 是技术上具有挑战性的变异类型,也是散发、严重、早发性疾病的主要原因。
我们使用高覆盖率 PacBio HiFi LRS(30 倍覆盖)和 Illumina 短读测序 (SRS)(50 倍覆盖)对 8 个亲子三胞胎的基因组进行测序。在两个数据集和比较彼此调用新生取代,小插入缺失,短串联重复(STR)和 SV,以评估 HiFi LRS 的准确性。此外,我们使用定相确定了小 DNM 的亲本来源。
我们分别在 LRS 和 SRS 中鉴定了总共 672 个和 859 个新生替换/插入缺失、28 个和 126 个新生 STR 和 24 个和 1 个新生 SV。对于小变体,平台之间有 92%和 85%的一致性。对于 STR 和 SV,一致性分别为 3.6%和 0.8%,4%和 100%。我们成功验证了 54 个 LRS 独有的小变体中的 27 个,其中 11 个(41%)被确认为真正的新生事件。对于 SRS 独特的小变体,我们验证了 133 个 DNM 中的 42 个,其中 8 个(19%)被确认为真正的新生事件。对 18 个 LRS 独特的新生 STR 调用的验证确认没有一个重复扩展是真正的 DNM。对于 23 个 LRS 独特的 SV,可以对 19 个候选 SV 进行确认,其中 10 个(52.6%)是真正的新生事件。此外,我们能够使用 LRS 数据将 96%的 DNM 分配给其亲本等位基因,而使用 SRS 数据只能分配 20%。
HiFi LRS 现在可以在单个实验室中使用单一技术产生最全面的变体数据集,从而能够准确地检测替换、插入缺失、STR 和 SV。准确性甚至允许在所有变体水平上进行敏感的 DNM 调用,并允许定相,这有助于将真正的阳性 DNM 与假阳性 DNM 区分开来。