Chen Xiao, Baker Daniel, Dolzhenko Egor, Devaney Joseph M, Noya Jessica, Berlyoung April S, Brandon Rhonda, Hruska Kathleen S, Lochovsky Lucas, Kruszka Paul, Newman Scott, Farrow Emily, Thiffault Isabelle, Pastinen Tomi, Kasperaviciute Dalia, Gilissen Christian, Vissers Lisenka, Hoischen Alexander, Berger Seth, Vilain Eric, Délot Emmanuèle, Eberle Michael A
PacBio, Menlo Park, CA, USA.
GeneDx, Gaithersburg, MD, USA.
Nat Commun. 2025 Mar 8;16(1):2340. doi: 10.1038/s41467-025-57505-2.
Variant calling is hindered in segmental duplications by sequence homology. We developed Paraphase, a HiFi-based informatics method that resolves highly similar genes by phasing all haplotypes of paralogous genes together. We applied Paraphase to 160 long (>10 kb) segmental duplication regions across the human genome with high (>99%) sequence similarity, encoding 316 genes. Analysis across five ancestral populations revealed highly variable copy numbers of these regions. We identified 23 paralog groups with exceptionally low within-group diversity, where extensive gene conversion and unequal crossing over contribute to highly similar gene copies. Furthermore, our analysis of 36 trios identified 7 de novo SNVs and 4 de novo gene conversion events, 2 of which are non-allelic. Finally, we summarized extensive genetic diversity in 9 medically relevant genes previously considered challenging to genotype. Paraphase provides a framework for resolving gene paralogs, enabling accurate testing in medically relevant genes and population-wide studies of previously inaccessible genes.
由于序列同源性,在片段重复中进行变异检测受到阻碍。我们开发了Paraphase,这是一种基于高保真(HiFi)的信息学方法,通过对旁系同源基因的所有单倍型进行定相来解析高度相似的基因。我们将Paraphase应用于人类基因组中160个长度大于10 kb、序列相似性高(>99%)的片段重复区域,这些区域编码316个基因。对五个祖先群体的分析揭示了这些区域的拷贝数高度可变。我们鉴定出23个旁系同源基因群,其群体内多样性极低,其中广泛的基因转换和不等交换导致基因拷贝高度相似。此外,我们对36个三联体的分析鉴定出7个新生单核苷酸变异(SNV)和4个新生基因转换事件,其中2个是非等位的。最后,我们总结了9个先前认为基因分型具有挑战性的医学相关基因中的广泛遗传多样性。Paraphase为解析基因旁系同源物提供了一个框架,能够在医学相关基因中进行准确检测,并对以前难以获取的基因进行全人群研究。