Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37212, USA.
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37212, USA.
BMC Bioinformatics. 2022 Apr 22;23(1):146. doi: 10.1186/s12859-022-04616-y.
Autism spectrum disorder (ASD) is a group of complex neurodevelopment disorders with a strong genetic basis. Large scale sequencing studies have identified over one hundred ASD risk genes. Nevertheless, the vast majority of ASD risk genes remain to be discovered, as it is estimated that more than 1000 genes are likely to be involved in ASD risk. Prioritization of risk genes is an effective strategy to increase the power of identifying novel risk genes in genetics studies of ASD. As ASD risk genes are likely to exhibit distinct properties from multiple angles, we reason that integrating multiple levels of genomic data is a powerful approach to pinpoint genuine ASD risk genes.
We present BNScore, a Bayesian model selection framework to probabilistically prioritize ASD risk genes through explicitly integrating evidence from sequencing-identified ASD genes, biological annotations, and gene functional network. We demonstrate the validity of our approach and its improved performance over existing methods by examining the resulting top candidate ASD risk genes against sets of high-confidence benchmark genes and large-scale ASD genome-wide association studies. We assess the tissue-, cell type- and development stage-specific expression properties of top prioritized genes, and find strong expression specificity in brain tissues, striatal medium spiny neurons, and fetal developmental stages.
In summary, we show that by integrating sequencing findings, functional annotation profiles, and gene-gene functional network, our proposed BNScore provides competitive performance compared to current state-of-the-art methods in prioritizing ASD genes. Our method offers a general and flexible strategy to risk gene prioritization that can potentially be applied to other complex traits as well.
自闭症谱系障碍(ASD)是一组具有强烈遗传基础的复杂神经发育障碍。大规模测序研究已经确定了超过一百个 ASD 风险基因。然而,绝大多数 ASD 风险基因仍有待发现,因为据估计,超过 1000 个基因可能与 ASD 风险有关。优先考虑风险基因是增加在 ASD 遗传学研究中识别新风险基因的功效的有效策略。由于 ASD 风险基因可能从多个角度表现出不同的特性,我们认为整合多个层次的基因组数据是一种精确确定真正 ASD 风险基因的有力方法。
我们提出了 BNScore,这是一种贝叶斯模型选择框架,通过明确整合测序确定的 ASD 基因、生物注释和基因功能网络中的证据,来概率优先考虑 ASD 风险基因。我们通过将候选的 ASD 风险基因与高可信度的基准基因集和大规模 ASD 全基因组关联研究进行比较,检验了我们方法的有效性和优于现有方法的性能。我们评估了优先考虑的基因的组织、细胞类型和发育阶段特异性表达特性,并发现大脑组织、纹状体中型棘突神经元和胎儿发育阶段的表达特异性很强。
总之,我们表明通过整合测序结果、功能注释谱和基因-基因功能网络,我们提出的 BNScore 在优先考虑 ASD 基因方面与当前最先进的方法相比具有竞争力。我们的方法提供了一种通用且灵活的风险基因优先排序策略,也可能应用于其他复杂性状。