DAWN:一个使用基因表达和遗传学来识别自闭症基因和子网络的框架。
DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics.
机构信息
Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA.
出版信息
Mol Autism. 2014 Mar 6;5(1):22. doi: 10.1186/2040-2392-5-22.
BACKGROUND
De novo loss-of-function (dnLoF) mutations are found twofold more often in autism spectrum disorder (ASD) probands than their unaffected siblings. Multiple independent dnLoF mutations in the same gene implicate the gene in risk and hence provide a systematic, albeit arduous, path forward for ASD genetics. It is likely that using additional non-genetic data will enhance the ability to identify ASD genes.
METHODS
To accelerate the search for ASD genes, we developed a novel algorithm, DAWN, to model two kinds of data: rare variations from exome sequencing and gene co-expression in the mid-fetal prefrontal and motor-somatosensory neocortex, a critical nexus for risk. The algorithm casts the ensemble data as a hidden Markov random field in which the graph structure is determined by gene co-expression and it combines these interrelationships with node-specific observations, namely gene identity, expression, genetic data and the estimated effect on risk.
RESULTS
Using currently available genetic data and a specific developmental time period for gene co-expression, DAWN identified 127 genes that plausibly affect risk, and a set of likely ASD subnetworks. Validation experiments making use of published targeted resequencing results demonstrate its efficacy in reliably predicting ASD genes. DAWN also successfully predicts known ASD genes, not included in the genetic data used to create the model.
CONCLUSIONS
Validation studies demonstrate that DAWN is effective in predicting ASD genes and subnetworks by leveraging genetic and gene expression data. The findings reported here implicate neurite extension and neuronal arborization as risks for ASD. Using DAWN on emerging ASD sequence data and gene expression data from other brain regions and tissues would likely identify novel ASD genes. DAWN can also be used for other complex disorders to identify genes and subnetworks in those disorders.
背景
新生缺失功能(dnLoF)突变在自闭症谱系障碍(ASD)患者中比其未受影响的兄弟姐妹高出两倍。同一基因中的多个独立 dnLoF 突变暗示该基因存在风险,从而为 ASD 遗传学提供了一条系统的、尽管艰难的前进道路。很可能使用额外的非遗传数据将提高识别 ASD 基因的能力。
方法
为了加速 ASD 基因的搜索,我们开发了一种新算法 DAWN,用于对两种类型的数据进行建模:外显子组测序中的罕见变异和中胎儿前额叶和运动感觉新皮质中的基因共表达,这是风险的关键枢纽。该算法将集合数据表示为隐马尔可夫随机场,其中图形结构由基因共表达决定,它将这些相互关系与节点特定的观察结果(即基因身份、表达、遗传数据和对风险的估计影响)结合起来。
结果
使用当前可用的遗传数据和特定的基因共表达发育时间段,DAWN 确定了 127 个可能影响风险的基因,以及一组可能的 ASD 子网。利用已发表的靶向重测序结果进行的验证实验证明了其在可靠预测 ASD 基因方面的功效。DAWN 还成功预测了已知的 ASD 基因,这些基因未包含在用于创建模型的遗传数据中。
结论
验证研究表明,DAWN 通过利用遗传和基因表达数据有效地预测了 ASD 基因和子网。这里的研究结果表明,神经突延伸和神经元分支作为 ASD 的风险因素。在新兴的 ASD 序列数据和来自其他大脑区域和组织的基因表达数据上使用 DAWN 可能会识别出新型 ASD 基因。DAWN 还可用于其他复杂疾病,以识别这些疾病中的基因和子网。