利用机器学习和全基因组数据预测自闭症风险基因。

Forecasting risk gene discovery in autism with machine learning and genome-scale data.

机构信息

University of Iowa, Department of Psychiatry, Iowa City, IA, USA.

University of Iowa, Interdisciplinary Genetics Program, Iowa City, IA, USA.

出版信息

Sci Rep. 2020 Mar 12;10(1):4569. doi: 10.1038/s41598-020-61288-5.

DOI:10.1038/s41598-020-61288-5

PMID:32165711

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7067874/

Abstract

Genetics has been one of the most powerful windows into the biology of autism spectrum disorder (ASD). It is estimated that a thousand or more genes may confer risk for ASD when functionally perturbed, however, only around 100 genes currently have sufficient evidence to be considered true "autism risk genes". Massive genetic studies are currently underway producing data to implicate additional genes. This approach - although necessary - is costly and slow-moving, making identification of putative ASD risk genes with existing data vital. Here, we approach autism risk gene discovery as a machine learning problem, rather than a genetic association problem, by using genome-scale data as predictors to identify new genes with similar properties to established autism risk genes. This ensemble method, forecASD, integrates brain gene expression, heterogeneous network data, and previous gene-level predictors of autism association into an ensemble classifier that yields a single score indexing evidence of each gene's involvement in the etiology of autism. We demonstrate that forecASD has substantially better performance than previous predictors of autism association in three independent trio-based sequencing studies. Studying forecASD prioritized genes, we show that forecASD is a robust indicator of a gene's involvement in ASD etiology, with diverse applications to gene discovery, differential expression analysis, eQTL prioritization, and pathway enrichment analysis.

摘要

遗传学一直是深入了解自闭症谱系障碍 (ASD) 生物学的最有力窗口之一。据估计，当功能受到干扰时，可能有一千个或更多的基因可能会导致 ASD 风险，然而，目前只有大约 100 个基因有足够的证据被认为是真正的“自闭症风险基因”。目前正在进行大规模的遗传研究，产生的数据将涉及更多的基因。这种方法——尽管是必要的——成本高且进展缓慢，因此利用现有数据识别可能的 ASD 风险基因至关重要。在这里，我们将自闭症风险基因的发现视为一个机器学习问题，而不是一个遗传关联问题，通过使用全基因组数据作为预测因子来识别具有与已建立的自闭症风险基因相似特性的新基因。这种集成方法 forecASD 将大脑基因表达、异构网络数据和先前自闭症关联的基因水平预测因子集成到一个集成分类器中，该分类器生成一个单一的分数，该分数索引每个基因参与自闭症病因的证据。我们证明，在三个独立的基于 trio 的测序研究中，forecASD 的性能明显优于以前的自闭症关联预测因子。通过研究 forecASD 优先化的基因，我们表明 forecASD 是基因参与 ASD 病因的稳健指标，具有广泛的应用，包括基因发现、差异表达分析、eQTL 优先级和途径富集分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ba7/7067874/e8c82270a383/41598_2020_61288_Fig1_HTML.jpg

相似文献

Forecasting risk gene discovery in autism with machine learning and genome-scale data.

Sci Rep. 2020 Mar 12;10(1):4569. doi: 10.1038/s41598-020-61288-5.

Brain-specific functional relationship networks inform autism spectrum disorder gene prediction.

Transl Psychiatry. 2018 Mar 6;8(1):56. doi: 10.1038/s41398-018-0098-6.

"Guilt by association" is not competitive with genetic association for identifying autism risk genes.

Sci Rep. 2021 Aug 5;11(1):15950. doi: 10.1038/s41598-021-95321-y.

A Bayesian framework to integrate multi-level genome-scale data for Autism risk gene prioritization.

BMC Bioinformatics. 2022 Apr 22;23(1):146. doi: 10.1186/s12859-022-04616-y.

Whole Exome Sequencing Identifies Novel De Novo Variants Interacting with Six Gene Networks in Autism Spectrum Disorder.

Genes (Basel). 2020 Dec 22;12(1):1. doi: 10.3390/genes12010001.

Widespread signatures of positive selection in common risk alleles associated to autism spectrum disorder.

PLoS Genet. 2017 Feb 10;13(2):e1006618. doi: 10.1371/journal.pgen.1006618. eCollection 2017 Feb.

Functional DNA methylation signatures for autism spectrum disorder genomic risk loci: 16p11.2 deletions and CHD8 variants.

Clin Epigenetics. 2019 Jul 16;11(1):103. doi: 10.1186/s13148-019-0684-3.

Genomic selection signatures in autism spectrum disorder identifies cognitive genomic tradeoff and its relevance in paradoxical phenotypes of deficits versus potentialities.

Sci Rep. 2021 May 13;11(1):10245. doi: 10.1038/s41598-021-89798-w.

Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder.

Nat Neurosci. 2016 Nov;19(11):1454-1462. doi: 10.1038/nn.4353. Epub 2016 Aug 1.

Genetics of autism spectrum disorder.

Handb Clin Neurol. 2018;147:321-329. doi: 10.1016/B978-0-444-63233-3.00021-X.

引用本文的文献

Ranking and Combining Latent Structured Predictive Scores without Labeled Data.

IISE Trans. 2024 Dec 4. doi: 10.1080/24725854.2024.2417258.

Recent Developments in the Application of Artificial Intelligence and Machine Learning in Early Screening and Diagnosis of Autism.

Methods Mol Biol. 2025;2952:233-242. doi: 10.1007/978-1-0716-4690-8_13.

AI-assisted early screening, diagnosis, and intervention for autism in young children.

Front Psychiatry. 2025 Apr 14;16:1513809. doi: 10.3389/fpsyt.2025.1513809. eCollection 2025.

Proximity analysis of native proteomes reveals phenotypic modifiers in a mouse model of autism and related neurodevelopmental conditions.

Nat Commun. 2024 Aug 9;15(1):6801. doi: 10.1038/s41467-024-51037-x.

The Importance of Large-Scale Genomic Studies to Unravel Genetic Risk Factors for Autism.

Int J Mol Sci. 2024 May 27;25(11):5816. doi: 10.3390/ijms25115816.

Graph Node Classification to Predict Autism Risk in Genes.

Genes (Basel). 2024 Apr 1;15(4):447. doi: 10.3390/genes15040447.

A network-based method for associating genes with autism spectrum disorder.

Front Bioinform. 2024 Mar 8;4:1295600. doi: 10.3389/fbinf.2024.1295600. eCollection 2024.

Integration of genome-scale data identifies candidate sleep regulators.

Sleep. 2023 Feb 8;46(2). doi: 10.1093/sleep/zsac279.

Awakening new sleep biology with machine learning.

Sleep. 2023 Feb 8;46(2). doi: 10.1093/sleep/zsac284.

Connecting phenotype to genotype: PheWAS-inspired analysis of autism spectrum disorder.

Front Hum Neurosci. 2022 Oct 12;16:960991. doi: 10.3389/fnhum.2022.960991. eCollection 2022.

本文引用的文献

A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates.

Front Genet. 2020 Sep 10;11:500064. doi: 10.3389/fgene.2020.500064. eCollection 2020.

Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism.

Cell. 2020 Feb 6;180(3):568-584.e23. doi: 10.1016/j.cell.2019.12.036. Epub 2020 Jan 23.

Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes.

NPJ Genom Med. 2019 Aug 23;4:19. doi: 10.1038/s41525-019-0093-8. eCollection 2019.

Identification of common genetic risk variants for autism spectrum disorder.

Nat Genet. 2019 Mar;51(3):431-444. doi: 10.1038/s41588-019-0344-8. Epub 2019 Feb 25.

Comprehensive functional genomic resource and integrative model for the human brain.

Science. 2018 Dec 14;362(6420). doi: 10.1126/science.aat8464.

Brain-specific functional relationship networks inform autism spectrum disorder gene prediction.

Transl Psychiatry. 2018 Mar 6;8(1):56. doi: 10.1038/s41398-018-0098-6.

Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap.

Science. 2018 Feb 9;359(6376):693-697. doi: 10.1126/science.aad6469.

SPARK: A US Cohort of 50,000 Families to Accelerate Autism Research.

Neuron. 2018 Feb 7;97(3):488-493. doi: 10.1016/j.neuron.2018.01.015.

Abnormalities in interactions of Rho GTPases with scaffolding proteins contribute to neurodevelopmental disorders.

J Neurosci Res. 2018 May;96(5):781-788. doi: 10.1002/jnr.24200. Epub 2017 Nov 23.

The Reactome Pathway Knowledgebase.

Nucleic Acids Res. 2018 Jan 4;46(D1):D649-D655. doi: 10.1093/nar/gkx1132.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用机器学习和全基因组数据预测自闭症风险基因。

Forecasting risk gene discovery in autism with machine learning and genome-scale data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献