Beaumont Robin N, Hawkes Gareth, Gunning Adam C, Wright Caroline F
Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, EX1 2LU, UK.
Exeter Genomics Laboratory, Royal Devon University Healthcare NHS Foundation Trust, Exeter, EX2 5DW, UK.
Genome Med. 2024 Apr 26;16(1):64. doi: 10.1186/s13073-024-01333-4.
Genetic variants that severely alter protein products (e.g. nonsense, frameshift) are often associated with disease. For some genes, these predicted loss-of-function variants (pLoFs) are observed throughout the gene, whilst in others, they occur only at specific locations. We hypothesised that, for genes linked with monogenic diseases that display incomplete penetrance, pLoF variants present in apparently unaffected individuals may be limited to regions where pLoFs are tolerated. To test this, we investigated whether pLoF location could explain instances of incomplete penetrance of variants expected to be pathogenic for Mendelian conditions.
We used exome sequence data in 454,773 individuals in the UK Biobank (UKB) to investigate the locations of pLoFs in a population cohort. We counted numbers of unique pLoF, missense, and synonymous variants in UKB in each quintile of the coding sequence (CDS) of all protein-coding genes and clustered the variants using Gaussian mixture models. We limited the analyses to genes with ≥ 5 variants of each type (16,473 genes). We compared the locations of pLoFs in UKB with all theoretically possible pLoFs in a transcript, and pathogenic pLoFs from ClinVar, and performed simulations to estimate the false-positive rate of non-uniformly distributed variants.
For most genes, all variant classes fell into clusters representing broadly uniform variant distributions, but genes in which haploinsufficiency causes developmental disorders were less likely to have uniform pLoF distribution than other genes (P < 2.2 × 10). We identified a number of genes, including ARID1B and GATA6, where pLoF variants in the first quarter of the CDS were rescued by the presence of an alternative translation start site and should not be reported as pathogenic. For other genes, such as ODC1, pLoFs were located approximately uniformly across the gene, but pathogenic pLoFs were clustered only at the end, consistent with a gain-of-function disease mechanism.
Our results suggest the potential benefits of localised constraint metrics and that the location of pLoF variants should be considered when interpreting variants.
严重改变蛋白质产物的基因变异(如无义、移码变异)通常与疾病相关。对于某些基因,这些预测的功能丧失变异(pLoF)在整个基因中都有观察到,而对于其他基因,它们仅出现在特定位置。我们假设,对于与表现出不完全外显率的单基因疾病相关的基因,明显未受影响个体中存在的pLoF变异可能仅限于pLoF可耐受的区域。为了验证这一点,我们研究了pLoF位置是否可以解释预期对孟德尔疾病致病的变异的不完全外显率情况。
我们使用了英国生物银行(UKB)中454,773名个体的外显子组序列数据,来研究群体队列中pLoF的位置。我们统计了所有蛋白质编码基因编码序列(CDS)每个五分位数中UKB中独特的pLoF、错义及同义变异的数量,并使用高斯混合模型对变异进行聚类。我们将分析限制在每种类型有≥5个变异的基因(16,473个基因)。我们将UKB中pLoF的位置与转录本中所有理论上可能的pLoF以及ClinVar中的致病pLoF进行比较,并进行模拟以估计非均匀分布变异的假阳性率。
对于大多数基因,所有变异类别都落入代表大致均匀变异分布的聚类中,但单倍剂量不足导致发育障碍的基因,其pLoF分布比其他基因更不可能均匀(P < 2.2×10)。我们鉴定出了一些基因,包括ARID1B和GATA6,其中CDS第一季度的pLoF变异因存在替代翻译起始位点而得到挽救,不应报告为致病。对于其他基因,如ODC1,pLoF在整个基因中大致均匀分布,但致病pLoF仅聚集在末端,这与功能获得性疾病机制一致。
我们的结果表明了局部约束指标的潜在益处,并且在解释变异时应考虑pLoF变异的位置。