HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA.
Genome Res. 2024 Nov 20;34(11):1747-1762. doi: 10.1101/gr.279227.124.
Variant detection from long-read genome sequencing (lrGS) has proven to be more accurate and comprehensive than variant detection from short-read genome sequencing (srGS). However, the rate at which lrGS can increase molecular diagnostic yield for rare disease is not yet precisely characterized. We performed lrGS using Pacific Biosciences "HiFi" technology on 96 short-read-negative probands with rare diseases that were suspected to be genetic. We generated hg38-aligned variants and de novo phased genome assemblies, and subsequently annotated, filtered, and curated variants using clinical standards. New disease-relevant or potentially relevant genetic findings were identified in 16/96 (16.7%) probands, nine of which (8/96, ∼9.4%) harbored pathogenic or likely pathogenic variants. Nine probands (∼9.4%) had variants that were accurately called in both srGS and lrGS and represent changes to clinical interpretation, mostly from recently published gene-disease associations. Seven cases included variants that were only correctly interpreted in lrGS, including copy-number variants (CNVs), an inversion, a mobile element insertion, two low-complexity repeat expansions, and a 1 bp deletion. While evidence for each of these variants is, in retrospect, visible in srGS, they were either not called within srGS data, were represented by calls with incorrect sizes or structures, or failed quality control and filtration. Thus, while reanalysis of older srGS data clearly increases diagnostic yield, we find that lrGS allows for substantial additional yield (7/96, 7.3%) beyond srGS. We anticipate that as lrGS analysis improves, and as lrGS data sets grow allowing for better variant-frequency annotation, the additional lrGS-only rare disease yield will grow over time.
长读长测序(lrGS)在变异检测方面比短读长测序(srGS)更为准确和全面。然而,长读长测序提高罕见病分子诊断率的速度尚未得到准确描述。我们使用 Pacific Biosciences 的“HiFi”技术对 96 名患有疑似遗传疾病的短读长阴性罕见病患者进行了 lrGS 分析。我们生成了 hg38 比对的变体和从头相基因组组装,随后使用临床标准对变体进行注释、过滤和验证。在 16/96(16.7%)名患者中发现了与疾病相关或具有潜在相关性的新遗传发现,其中 9 名(8/96,约 9.4%)患者携带致病性或可能致病性变体。9 名患者(约 9.4%)在 srGS 和 lrGS 中都能准确地检测到变异,这代表了临床解读的改变,主要来自最近发表的基因-疾病关联。7 个病例包括只有在 lrGS 中才能正确解释的变异,包括拷贝数变异(CNVs)、倒位、移动元件插入、两个低复杂度重复扩展和 1 bp 缺失。虽然回顾这些变异的证据在 srGS 中都能看到,但它们要么在 srGS 数据中没有被检测到,要么被错误大小或结构的检测结果所代表,要么未能通过质量控制和过滤。因此,虽然对旧的 srGS 数据进行重新分析确实可以提高诊断率,但我们发现 lrGS 可以提供远高于 srGS 的额外诊断率(7/96,7.3%)。我们预计,随着 lrGS 分析的改进,以及 lrGS 数据集的增长允许更好地注释变体频率,额外的 lrGS 特有的罕见病诊断率将随着时间的推移而增加。