Suppr超能文献

使用机器学习识别多维 ASD 表型的生物学机制。

Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning.

机构信息

Instituto Nacional de Saúde Doutor Ricardo Jorge, Avenida Padre Cruz, 1649-016, Lisboa, Portugal.

Faculdade de Ciências, BioISI - Biosystems & Integrative Sciences Institute, Universidade de Lisboa, Lisboa, Portugal.

出版信息

Transl Psychiatry. 2020 Jan 28;10(1):43. doi: 10.1038/s41398-020-0721-1.

Abstract

The complex genetic architecture of Autism Spectrum Disorder (ASD) and its heterogeneous phenotype makes molecular diagnosis and patient prognosis challenging tasks. To establish more precise genotype-phenotype correlations in ASD, we developed a novel machine-learning integrative approach, which seeks to delineate associations between patients' clinical profiles and disrupted biological processes, inferred from their copy number variants (CNVs) that span brain genes. Clustering analysis of the relevant clinical measures from 2446 ASD cases in the Autism Genome Project identified two distinct phenotypic subgroups. Patients in these clusters differed significantly in ADOS-defined severity, adaptive behavior profiles, intellectual ability, and verbal status, the latter contributing the most for cluster stability and cohesion. Functional enrichment analysis of brain genes disrupted by CNVs in these ASD cases identified 15 statistically significant biological processes, including cell adhesion, neural development, cognition, and polyubiquitination, in line with previous ASD findings. A Naive Bayes classifier, generated to predict the ASD phenotypic clusters from disrupted biological processes, achieved predictions with a high precision (0.82) but low recall (0.39), for a subset of patients with higher biological Information Content scores. This study shows that milder and more severe clinical presentations can have distinct underlying biological mechanisms. It further highlights how machine-learning approaches can reduce clinical heterogeneity by using multidimensional clinical measures, and establishes genotype-phenotype correlations in ASD. However, predictions are strongly dependent on patient's information content. Findings are therefore a first step toward the translation of genetic information into clinically useful applications, and emphasize the need for larger datasets with very complete clinical and biological information.

摘要

自闭症谱系障碍(ASD)的复杂遗传结构及其异质表型使得分子诊断和患者预后成为具有挑战性的任务。为了在 ASD 中建立更精确的基因型-表型相关性,我们开发了一种新的机器学习综合方法,旨在描绘患者临床特征与从他们的拷贝数变异(CNV)推断出的受干扰的生物学过程之间的关联,这些 CNV 跨越大脑基因。对自闭症基因组计划中 2446 例 ASD 病例的相关临床测量进行聚类分析,确定了两个不同的表型亚组。这些聚类中的患者在 ADOS 定义的严重程度、适应性行为特征、智力能力和语言状况方面存在显著差异,后者对聚类的稳定性和内聚性贡献最大。对这些 ASD 病例中 CNV 扰乱的大脑基因进行功能富集分析,确定了 15 个具有统计学意义的生物学过程,包括细胞粘附、神经发育、认知和多泛素化,与之前的 ASD 发现一致。从受干扰的生物学过程中生成的用于预测 ASD 表型聚类的朴素贝叶斯分类器,对于具有较高生物学信息含量得分的患者子集,实现了高精度(0.82)但召回率低(0.39)的预测。这项研究表明,更温和和更严重的临床表现可能具有不同的潜在生物学机制。它进一步强调了机器学习方法如何通过使用多维临床测量来减少临床异质性,并在 ASD 中建立基因型-表型相关性。然而,预测强烈依赖于患者的信息量。研究结果是将遗传信息转化为临床有用应用的第一步,并强调需要具有非常完整的临床和生物学信息的更大数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a575/7026098/d1ca715f7c5c/41398_2020_721_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验