Mahmoud Medhat, Harting John, Corbitt Holly, Chen Xiao, Jhangiani Shalini N, Doddapaneni Harsha, Meng Qingchang, Han Tina, Lambert Christine, Zhang Siyuan, Baybayan Primo, Henno Geoff, Shen Hua, Hu Jianhong, Han Yi, Riegler Casey, Metcalf Ginger, Henno Geoff, Chinn Ivan K, Eberle Michael A, Kingan Sarah, Farinholt Tim, Carvalho Claudia M B, Gibbs Richard A, Kronenberg Zev, Muzny Donna, Sedlazeck Fritz J
Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA.
Pacific Biosciences, Menlo Park, California, USA.
medRxiv. 2024 Mar 18:2024.03.14.24304179. doi: 10.1101/2024.03.14.24304179.
Comprehending the mechanism behind human diseases with an established heritable component represents the forefront of personalized medicine. Nevertheless, numerous medically important genes are inaccurately represented in short-read sequencing data analysis due to their complexity and repetitiveness or the so-called 'dark regions' of the human genome. The advent of PacBio as a long-read platform has provided new insights, yet HiFi whole-genome sequencing (WGS) cost remains frequently prohibitive. We introduce a targeted sequencing and analysis framework, Twist Alliance Dark Genes Panel (TADGP), designed to offer phased variants across 389 medically important yet complex autosomal genes. We highlight TADGP accuracy across eleven control samples and compare it to WGS. This demonstrates that TADGP achieves variant calling accuracy comparable to HiFi-WGS data, but at a fraction of the cost. Thus, enabling scalability and broad applicability for studying rare diseases or complementing previously sequenced samples to gain insights into these complex genes. TADGP revealed several candidate variants across all cases and provided insight into diversity when tested on samples from rare disease and cardiovascular disease cohorts. In both cohorts, we identified novel variants affecting individual disease-associated genes (e.g., ). Nevertheless, the annotation of the variants across these 389 medically important genes remains challenging due to their underrepresentation in ClinVar and gnomAD. Consequently, we also offer an annotation resource to enhance the evaluation and prioritization of these variants. Overall, we can demonstrate that TADGP offers a cost-efficient and scalable approach to routinely assess the dark regions of the human genome with clinical relevance.
理解具有既定遗传成分的人类疾病背后的机制是个性化医疗的前沿领域。然而,由于许多医学上重要的基因具有复杂性、重复性或人类基因组中所谓的“暗区”,在短读长测序数据分析中它们的表征不准确。PacBio作为一种长读长平台的出现提供了新的见解,但高保真全基因组测序(WGS)的成本仍然常常令人望而却步。我们引入了一个靶向测序和分析框架,即Twist联盟暗基因面板(TADGP),旨在提供跨越389个医学上重要但复杂的常染色体基因的分阶段变异。我们强调了TADGP在11个对照样本中的准确性,并将其与WGS进行比较。这表明TADGP实现的变异检测准确性与高保真WGS数据相当,但成本仅为其一小部分。因此,它能够实现可扩展性和广泛适用性,用于研究罕见疾病或补充先前测序的样本,以深入了解这些复杂基因。TADGP在所有病例中都发现了几个候选变异,并在对罕见病和心血管疾病队列中的样本进行测试时,提供了关于多样性的见解。在这两个队列中,我们都鉴定出了影响个别疾病相关基因的新变异(例如 )。然而,由于这些变异在ClinVar和gnomAD中的代表性不足,对这389个医学上重要基因的变异进行注释仍然具有挑战性。因此,我们还提供了一种注释资源,以加强对这些变异的评估和优先级排序。总体而言,我们可以证明TADGP提供了一种经济高效且可扩展的方法,用于常规评估具有临床相关性的人类基因组暗区。
medRxiv. 2024-3-18
Bioinformatics. 2022-3-28
Nat Biotechnol. 2022-5
Nat Biotechnol. 2025-3
Nat Commun. 2024-1-29
Nat Biotechnol. 2024-10
Genome Biol. 2023-10-5
Hum Genomics. 2023-8-8
Am J Hum Genet. 2023-8-3
Genome Med. 2023-6-14