Department of Psychiatry, Mount Sinai School of Medicine, One Gustave L Levy Place, Box 1668, New York, NY 10029, USA.
Am J Med Genet C Semin Med Genet. 2012 May 15;160C(2):130-42. doi: 10.1002/ajmg.c.31330. Epub 2012 Apr 12.
Autism spectrum disorders (ASD) are a group of related neurodevelopmental disorders with significant combined prevalence (∼1%) and high heritability. Dozens of individually rare genes and loci associated with high-risk for ASD have been identified, which overlap extensively with genes for intellectual disability (ID). However, studies indicate that there may be hundreds of genes that remain to be identified. The advent of inexpensive massively parallel nucleotide sequencing can reveal the genetic underpinnings of heritable complex diseases, including ASD and ID. However, whole exome sequencing (WES) and whole genome sequencing (WGS) provides an embarrassment of riches, where many candidate variants emerge. It has been argued that genetic variation for ASD and ID will cluster in genes involved in distinct pathways and protein complexes. For this reason, computational methods that prioritize candidate genes based on additional functional information such as protein-protein interactions or association with specific canonical or empirical pathways, or other attributes, can be useful. In this study we applied several supervised learning approaches to prioritize ASD or ID disease gene candidates based on curated lists of known ASD and ID disease genes. We implemented two network-based classifiers and one attribute-based classifier to show that we can rank and classify known, and predict new, genes for these neurodevelopmental disorders. We also show that ID and ASD share common pathways that perturb an overlapping synaptic regulatory subnetwork. We also show that features relating to neuronal phenotypes in mouse knockouts can help in classifying neurodevelopmental genes. Our methods can be applied broadly to other diseases helping in prioritizing newly identified genetic variation that emerge from disease gene discovery based on WES and WGS.
自闭症谱系障碍 (ASD) 是一组相关的神经发育障碍,其合并患病率较高(∼1%),且具有高度遗传性。数十个与 ASD 高危相关的个体罕见基因和基因座已被确定,这些基因与智力障碍 (ID) 的基因重叠广泛。然而,研究表明,可能还有数百个基因有待发现。廉价的大规模平行核苷酸测序的出现可以揭示包括 ASD 和 ID 在内的遗传性复杂疾病的遗传基础。然而,外显子组测序 (WES) 和全基因组测序 (WGS) 提供了丰富的候选变体,让人应接不暇。有人认为,ASD 和 ID 的遗传变异将集中在涉及不同途径和蛋白质复合物的基因中。出于这个原因,基于其他功能信息(如蛋白质-蛋白质相互作用或与特定规范或经验途径的关联)或其他属性优先考虑候选基因的计算方法可能会很有用。在这项研究中,我们应用了几种监督学习方法,根据已知的 ASD 和 ID 疾病基因的精心编制的列表,优先考虑 ASD 或 ID 疾病基因候选基因。我们实现了两种基于网络的分类器和一种基于属性的分类器,以表明我们可以对这些神经发育障碍的已知和预测新基因进行排名和分类。我们还表明,ID 和 ASD 共享共同的途径,扰乱了重叠的突触调节子网络。我们还表明,与小鼠敲除神经元表型相关的特征有助于对神经发育基因进行分类。我们的方法可以广泛应用于其他疾病,有助于根据 WES 和 WGS 从疾病基因发现中优先考虑新出现的遗传变异。