Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA.
Nat Genet. 2019 Jan;51(1):88-95. doi: 10.1038/s41588-018-0294-6. Epub 2018 Dec 10.
Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.
从数千名人类中获取的深度遗传变异目录可通过识别变异稀少的编码区域来检测种内约束。虽然现有技术可汇总整个基因的约束信息,但单个基因范围的指标却掩盖了每个基因内的区域约束变异性。因此,我们利用来自基因组聚合数据库的 123,136 名人类的观察变异,创建了一个受约束编码区域 (CCR) 的详细图谱。在 ClinVar 中最受约束的 CCR 富含致病性变异和发育障碍的突变。CCR 突出了高度约束的蛋白质结构域家族,并提示了未注释或不完整的蛋白质结构域。在研究常染色体显性疾病的新生突变时,最高百分位 CCR 可补充现有的变异优先级方法。最后,我们在缺乏已知疾病关联的基因内确定了高度受约束的 CCR。这一观察结果表明,CCR 可能识别出受到强烈纯化选择的区域,当这些区域发生突变时,会导致严重的发育表型或胚胎致死性。