Bioinformatics Institute, St. Petersburg, Russia.
Department of Genetics and Biotechnology, St. Petersburg State University, St. Petersburg, Russia.
Mol Genet Genomic Med. 2019 Nov;7(11):e964. doi: 10.1002/mgg3.964. Epub 2019 Sep 3.
Allele frequency data from large exome and genome aggregation projects such as the Genome Aggregation Database (gnomAD) are of ultimate importance to the interpretation of medical resequencing data. However, allele frequencies might significantly differ in poorly studied populations that are underrepresented in large-scale projects, such as the Russian population.
In this work, we leveraged our access to a large dataset of 694 exome samples to analyze genetic variation in the Northwest Russia. We compared the spectrum of genetic variants to the dbSNP build 151, and made estimates of ClinVar-based autosomal recessive (AR) disease allele prevalence as compared to gnomAD r. 2.1.
An estimated 9.3% of discovered variants were not present in dbSNP. We report statistically significant overrepresentation of pathogenic variants for several Mendelian disorders, including phenylketonuria (PAH, rs5030858), Wilson's disease (ATP7B, rs76151636), factor VII deficiency (F7, rs36209567), kyphoscoliosis type of Ehlers-Danlos syndrome (FKBP14, rs542489955), and several other recessive pathologies. We also make primary estimates of monogenic disease incidence in the population, with retinal dystrophy, cystic fibrosis, and phenylketonuria being the most frequent AR pathologies.
Our observations demonstrate the utility of population-specific allele frequency data to the diagnosis of monogenic disorders using high-throughput technologies.
来自大型外显子组和基因组聚合项目(如基因组聚合数据库[gnomAD])的等位基因频率数据对于解释医学重测序数据至关重要。然而,在大型项目中代表性不足的研究较少的人群中,等位基因频率可能会有很大差异,例如俄罗斯人群。
在这项工作中,我们利用我们对 694 个外显子样本的大型数据集的访问,来分析俄罗斯西北部的遗传变异。我们将遗传变异的频谱与 dbSNP build 151 进行了比较,并根据 ClinVar 估计了常染色体隐性(AR)疾病等位基因的患病率与 gnomAD r.2.1 的比较。
估计有 9.3%的发现的变体不存在于 dbSNP 中。我们报告了几个孟德尔疾病的致病性变体的统计学上显著过表达,包括苯丙酮尿症(PAH,rs5030858)、威尔逊病(ATP7B,rs76151636)、因子 VII 缺乏症(F7,rs36209567)、Ehlers-Danlos 综合征的脊柱侧凸型(FKBP14,rs542489955)和其他几种隐性病理学。我们还对人群中单基因疾病的发病率进行了初步估计,其中视网膜营养不良、囊性纤维化和苯丙酮尿症是最常见的 AR 病理学。
我们的观察结果表明,使用高通量技术诊断单基因疾病时,特定于人群的等位基因频率数据是有用的。