Hiatt Susan M, Lawlor James M J, Handley Lori H, Latner Donald R, Bonnstetter Zachary T, Finnila Candice R, Thompson Michelle L, Boston Lori Beth, Williams Melissa, Nunez Ivan Rodriguez, Jenkins Jerry, Kelley Whitley V, Bebin E Martina, Lopez Michael A, Hurst Anna C E, Korf Bruce R, Schmutz Jeremy, Grimwood Jane, Cooper Gregory M
HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA.
Department of Neurology, University of Alabama at Birmingham, Birmingham, AL, 35924, USA.
medRxiv. 2024 Mar 26:2024.03.22.24304633. doi: 10.1101/2024.03.22.24304633.
Variant detection from long-read genome sequencing (lrGS) has proven to be considerably more accurate and comprehensive than variant detection from short-read genome sequencing (srGS). However, the rate at which lrGS can increase molecular diagnostic yield for rare disease is not yet precisely characterized. We performed lrGS using Pacific Biosciences "HiFi" technology on 96 short-read-negative probands with rare disease that were suspected to be genetic. We generated hg38-aligned variants and phased genome assemblies, and subsequently annotated, filtered, and curated variants using clinical standards. New disease-relevant or potentially relevant genetic findings were identified in 16/96 (16.7%) probands, eight of which (8/96, 8.33%) harbored pathogenic or likely pathogenic variants. Newly identified variants were visible in both srGS and lrGS in nine probands (~9.4%) and resulted from changes to interpretation mostly from recent gene-disease association discoveries. Seven cases included variants that were only interpretable in lrGS, including copy-number variants, an inversion, a mobile element insertion, two low-complexity repeat expansions, and a 1 bp deletion. While evidence for each of these variants is, in retrospect, visible in srGS, they were either: not called within srGS data, were represented by calls with incorrect sizes or structures, or failed quality-control and filtration. Thus, while reanalysis of older data clearly increases diagnostic yield, we find that lrGS allows for substantial additional yield (7/96, 7.3%) beyond srGS. We anticipate that as lrGS analysis improves, and as lrGS datasets grow allowing for better variant frequency annotation, the additional lrGS-only rare disease yield will grow over time.
与短读长基因组测序(srGS)相比,长读长基因组测序(lrGS)中的变异检测已被证明更加准确和全面。然而,lrGS提高罕见病分子诊断率的速度尚未得到精确描述。我们使用太平洋生物科学公司的“HiFi”技术对96名疑似患有遗传性罕见病的短读长检测阴性的先证者进行了lrGS检测。我们生成了与hg38比对的变异和定相基因组组装体,随后使用临床标准对变异进行注释、过滤和整理。在16/96(16.7%)的先证者中发现了新的与疾病相关或潜在相关的遗传发现,其中8名(8/96,8.33%)携带致病性或可能致病性变异。在9名先证者(约9.4%)中,新发现的变异在srGS和lrGS中均可见,主要是由于最近基因与疾病关联发现导致的解读变化。7例患者的变异仅在lrGS中可解读,包括拷贝数变异、倒位、移动元件插入、两个低复杂性重复序列扩增和一个1bp缺失。虽然回顾来看,这些变异中的每一个在srGS中都有迹象,但它们要么:在srGS数据中未被检出,要么以大小或结构错误的检出结果呈现,要么未通过质量控制和过滤。因此,虽然对旧数据的重新分析明显提高了诊断率,但我们发现lrGS比srGS能带来显著的额外诊断率提升(7/96,7.3%)。我们预计,随着lrGS分析的改进,以及lrGS数据集的增加使得变异频率注释更好,lrGS独有的额外罕见病诊断率将随着时间的推移而增加。