Bozkurt-Yozgatli Tugce, Lun Ming Yin, Bengtsson Jesse D, Sezerman Ugur, Chinn Ivan K, Coban-Akdemir Zeynep, Carvalho Claudia M B
medRxiv. 2024 Oct 29:2024.10.28.24315942. doi: 10.1101/2024.10.28.24315942.
Inversions are known contributors to the pathogenesis of genetic diseases. Identifying inversions poses significant challenges, making it one of the most demanding structural variants (SVs) to detect and interpret. Recent advancements in sequencing technologies and the development of publicly available SV datasets have substantially enhanced our capability to explore inversions. However, a cross-comparison in those datasets remains unexplored. In this study, we reported a proband with familial hemophagocytic lymphohistiocytosis type-3 carrying c.1389+1G>A with NC_000017.11:75576992_75829587inv disrupting , an inversion present in 0.006345% of individuals in gnomAD(v4.0). Based on this result, we investigate the features of potentially pathogenic inversions in public datasets. 98.9% of inversions are rare in gnomAD, and they disrupt 5% of protein-coding genes associated with a phenotype in OMIM. We then conducted a comparative analysis of the datasets, including gnomAD, DGV, and 1KGP, and two recent studies from the Human Genome Structural Variation Consortium revealed common and dataset-specific inversion characteristics suggesting methodology detection biases. Next, we investigated the genetic features of inversions disrupting the protein-coding genes by classifying the intersections between them into three categories. We found that most of the protein-coding genes in OMIM disrupted by inversions are associated with autosomal recessive phenotypes regardless of categories supporting the hypothesis that inversions in trans with other variants are hidden causes of monogenic diseases. This effort aims to fill the gap in our understanding of the molecular characteristics of inversions with low frequency in the population and highlight the importance of identifying them in rare disease studies.
倒位是已知的导致遗传疾病发病机制的因素。识别倒位带来了重大挑战,使其成为最难检测和解释的结构变异(SVs)之一。测序技术的最新进展以及公开可用的SV数据集的开发极大地增强了我们探索倒位的能力。然而,这些数据集之间的交叉比较仍未得到探索。在本研究中,我们报告了一名患有3型家族性噬血细胞性淋巴组织细胞增生症的先证者,其携带c.1389+1G>A,NC_000017.11:75576992_75829587inv发生破坏,该倒位在gnomAD(v4.0)中0.006345%的个体中存在。基于这一结果,我们研究了公共数据集中潜在致病倒位的特征。98.9%的倒位在gnomAD中是罕见的,它们破坏了与OMIM中一种表型相关的5%的蛋白质编码基因。然后,我们对包括gnomAD、DGV和1KGP在内的数据集进行了比较分析,人类基因组结构变异联盟最近的两项研究揭示了常见的和特定于数据集的倒位特征,表明了方法检测偏差。接下来,我们通过将它们之间的交集分为三类来研究破坏蛋白质编码基因的倒位的遗传特征。我们发现,无论类别如何,OMIM中被倒位破坏的大多数蛋白质编码基因都与常染色体隐性表型相关,这支持了与其他变异处于反式的倒位是单基因疾病隐藏原因的假设。这项工作旨在填补我们对人群中低频倒位分子特征理解的空白,并强调在罕见病研究中识别它们的重要性。