BCB Group, DML, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
UNSW Biomedical Machine Learning Lab (BML), the Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, Australia.
PLoS Comput Biol. 2023 Jul 24;19(7):e1011249. doi: 10.1371/journal.pcbi.1011249. eCollection 2023 Jul.
The genetic etiology of brain disorders is highly heterogeneous, characterized by abnormalities in the development of the central nervous system that lead to diminished physical or intellectual capabilities. The process of determining which gene drives disease, known as "gene prioritization," is not entirely understood. Genome-wide searches for gene-disease associations are still underdeveloped due to reliance on previous discoveries and evidence sources with false positive or negative relations. This paper introduces DeepGenePrior, a model based on deep neural networks that prioritizes candidate genes in genetic diseases. Using the well-studied Variational AutoEncoder (VAE), we developed a score to measure the impact of genes on target diseases. Unlike other methods that use prior data to select candidate genes, based on the "guilt by association" principle and auxiliary data sources like protein networks, our study exclusively employs copy number variants (CNVs) for gene prioritization. By analyzing CNVs from 74,811 individuals with autism, schizophrenia, and developmental delay, we identified genes that best distinguish cases from controls. Our findings indicate a 12% increase in fold enrichment in brain-expressed genes compared to previous studies and a 15% increase in genes associated with mouse nervous system phenotypes. Furthermore, we identified common deletions in ZDHHC8, DGCR5, and CATG00000022283 among the top genes related to all three disorders, suggesting a common etiology among these clinically distinct conditions. DeepGenePrior is publicly available online at http://git.dml.ir/z_rahaie/DGP to address obstacles in existing gene prioritization studies identifying candidate genes.
脑疾病的遗传病因高度异质,其特征是中枢神经系统发育异常,导致身体或智力能力下降。确定哪个基因导致疾病的过程,即“基因优先级”,尚未完全理解。由于依赖先前的发现和具有假阳性或假阴性关系的证据来源,全基因组搜索基因-疾病关联仍未得到充分发展。本文介绍了 DeepGenePrior,这是一种基于深度神经网络的模型,用于对遗传疾病中的候选基因进行优先级排序。我们使用了经过充分研究的变分自动编码器(VAE),开发了一种评分来衡量基因对目标疾病的影响。与其他使用先验数据来选择候选基因的方法不同,我们的研究仅基于“关联即有罪”原则和辅助数据源,如蛋白质网络,专门使用拷贝数变异(CNVs)进行基因优先级排序。通过分析来自 74811 名自闭症、精神分裂症和发育迟缓患者的 CNVs,我们确定了最能区分病例和对照的基因。我们的研究结果表明,与以前的研究相比,大脑表达基因的富集倍数增加了 12%,与小鼠神经系统表型相关的基因增加了 15%。此外,我们还在与所有三种疾病相关的顶级基因中发现了 ZDHHC8、DGCR5 和 CATG00000022283 的常见缺失,表明这些临床表现不同的疾病存在共同的病因。DeepGenePrior 可在 http://git.dml.ir/z_rahaie/DGP 上公开获得,以解决现有基因优先级排序研究中确定候选基因的障碍。