Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
PLoS Genet. 2013 May;9(5):e1003484. doi: 10.1371/journal.pgen.1003484. Epub 2013 May 9.
Understanding the core set of genes that are necessary for basic developmental functions is one of the central goals in biology. Studies in model organisms identified a significant fraction of essential genes through the analysis of null-mutations that lead to lethality. Recent large-scale next-generation sequencing efforts have provided unprecedented data on genetic variation in human. However, evolutionary and genomic characteristics of human essential genes have never been directly studied on a genome-wide scale. Here we use detailed phenotypic resources available for the mouse and deep genomics sequencing data from human populations to characterize patterns of genetic variation and mutational burden in a set of 2,472 human orthologs of known essential genes in the mouse. Consistent with the action of strong, purifying selection, these genes exhibit comparatively reduced levels of sequence variation, skew in allele frequency towards more rare, and exhibit increased conservation across the primate and rodent lineages relative to the remainder of genes in the genome. In individual genomes we observed ~12 rare mutations within essential genes predicted to be damaging. Consistent with the hypothesis that mutations in essential genes are risk factors for neurodevelopmental disease, we show that de novo variants in patients with Autism Spectrum Disorder are more likely to occur in this collection of genes. While incomplete, our set of human orthologs shows characteristics fully consistent with essential function in human and thus provides a resource to inform and facilitate interpretation of sequence data in studies of human disease.
理解对于基本发育功能所必需的核心基因集是生物学的中心目标之一。通过对导致致死的无效突变的分析,在模式生物中的研究确定了相当一部分必需基因。最近的大规模下一代测序工作为人类的遗传变异提供了前所未有的数据。然而,人类必需基因的进化和基因组特征从未在全基因组范围内直接进行过研究。在这里,我们使用可用于小鼠的详细表型资源和来自人类群体的深度基因组测序数据,来描述在一组 2472 个人类与已知在小鼠中必需的基因的直系同源物中的遗传变异和突变负担的模式。与强烈的纯化选择作用一致,这些基因表现出相对较低的序列变异水平,等位基因频率偏向于更罕见的等位基因,并且与基因组中其余基因相比,在灵长类动物和啮齿类动物谱系中具有更高的保守性。在个体基因组中,我们观察到大约 12 个在预测为有害的必需基因中发生的罕见突变。与必需基因中的突变是神经发育疾病风险因素的假设一致,我们表明自闭症谱系障碍患者中的新生变异更可能发生在这组基因中。虽然不完整,但我们的人类直系同源物集完全符合人类必需功能的特征,因此为研究人类疾病中的序列数据提供了信息和便利解释的资源。