Suppr超能文献

利用机器学习和全基因组数据预测自闭症风险基因。

Forecasting risk gene discovery in autism with machine learning and genome-scale data.

机构信息

University of Iowa, Department of Psychiatry, Iowa City, IA, USA.

University of Iowa, Interdisciplinary Genetics Program, Iowa City, IA, USA.

出版信息

Sci Rep. 2020 Mar 12;10(1):4569. doi: 10.1038/s41598-020-61288-5.

Abstract

Genetics has been one of the most powerful windows into the biology of autism spectrum disorder (ASD). It is estimated that a thousand or more genes may confer risk for ASD when functionally perturbed, however, only around 100 genes currently have sufficient evidence to be considered true "autism risk genes". Massive genetic studies are currently underway producing data to implicate additional genes. This approach - although necessary - is costly and slow-moving, making identification of putative ASD risk genes with existing data vital. Here, we approach autism risk gene discovery as a machine learning problem, rather than a genetic association problem, by using genome-scale data as predictors to identify new genes with similar properties to established autism risk genes. This ensemble method, forecASD, integrates brain gene expression, heterogeneous network data, and previous gene-level predictors of autism association into an ensemble classifier that yields a single score indexing evidence of each gene's involvement in the etiology of autism. We demonstrate that forecASD has substantially better performance than previous predictors of autism association in three independent trio-based sequencing studies. Studying forecASD prioritized genes, we show that forecASD is a robust indicator of a gene's involvement in ASD etiology, with diverse applications to gene discovery, differential expression analysis, eQTL prioritization, and pathway enrichment analysis.

摘要

遗传学一直是深入了解自闭症谱系障碍 (ASD) 生物学的最有力窗口之一。据估计,当功能受到干扰时,可能有一千个或更多的基因可能会导致 ASD 风险,然而,目前只有大约 100 个基因有足够的证据被认为是真正的“自闭症风险基因”。目前正在进行大规模的遗传研究,产生的数据将涉及更多的基因。这种方法——尽管是必要的——成本高且进展缓慢,因此利用现有数据识别可能的 ASD 风险基因至关重要。在这里,我们将自闭症风险基因的发现视为一个机器学习问题,而不是一个遗传关联问题,通过使用全基因组数据作为预测因子来识别具有与已建立的自闭症风险基因相似特性的新基因。这种集成方法 forecASD 将大脑基因表达、异构网络数据和先前自闭症关联的基因水平预测因子集成到一个集成分类器中,该分类器生成一个单一的分数,该分数索引每个基因参与自闭症病因的证据。我们证明,在三个独立的基于 trio 的测序研究中,forecASD 的性能明显优于以前的自闭症关联预测因子。通过研究 forecASD 优先化的基因,我们表明 forecASD 是基因参与 ASD 病因的稳健指标,具有广泛的应用,包括基因发现、差异表达分析、eQTL 优先级和途径富集分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ba7/7067874/e8c82270a383/41598_2020_61288_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验