Suppr超能文献

使用机器学习算法通过基因表达谱识别自闭症患者亚组。

Identifying Subgroups of Patients With Autism by Gene Expression Profiles Using Machine Learning Algorithms.

作者信息

Lin Ping-I, Moni Mohammad Ali, Gau Susan Shur-Fen, Eapen Valsamma

机构信息

School of Psychiatry, The University of New South Wales, Sydney, NSW, Australia.

South Western Sydney Local Health District, Liverpool, NSW, Australia.

出版信息

Front Psychiatry. 2021 May 12;12:637022. doi: 10.3389/fpsyt.2021.637022. eCollection 2021.

Abstract

The identification of subgroups of autism spectrum disorder (ASD) may partially remedy the problems of clinical heterogeneity to facilitate the improvement of clinical management. The current study aims to use machine learning algorithms to analyze microarray data to identify clusters with relatively homogeneous clinical features. The whole-genome gene expression microarray data were used to predict communication quotient (SCQ) scores against all probes to select differential expression regions (DERs). Gene set enrichment analysis was performed for DERs with a fold-change >2 to identify hub pathways that play a role in the severity of social communication deficits inherent to ASD. We then used two machine learning methods, random forest classification (RF) and support vector machine (SVM), to identify two clusters using DERs. Finally, we evaluated how accurately the clusters predicted language impairment. A total of 191 DERs were initially identified, and 54 of them with a fold-change >2 were selected for the pathway analysis. Cholesterol biosynthesis and metabolisms pathways appear to act as hubs that connect other trait-associated pathways to influence the severity of social communication deficits inherent to ASD. Both RF and SVM algorithms can yield a classification accuracy level >90% when all 191 DERs were analyzed. The ASD subtypes defined by the presence of language impairment, a strong indicator for prognosis, can be predicted by transcriptomic profiles associated with social communication deficits and cholesterol biosynthesis and metabolism. The results suggest that both RF and SVM are acceptable options for machine learning algorithms to identify AD subgroups characterized by clinical homogeneity related to prognosis.

摘要

识别自闭症谱系障碍(ASD)的亚组可能会部分弥补临床异质性问题,以促进临床管理的改善。当前研究旨在使用机器学习算法分析微阵列数据,以识别具有相对同质临床特征的聚类。全基因组基因表达微阵列数据用于针对所有探针预测社交沟通商数(SCQ)得分,以选择差异表达区域(DERs)。对变化倍数>2的DERs进行基因集富集分析,以识别在ASD固有的社交沟通缺陷严重程度中起作用的核心通路。然后,我们使用两种机器学习方法,随机森林分类(RF)和支持向量机(SVM),利用DERs识别两个聚类。最后,我们评估了这些聚类预测语言障碍的准确性。最初共识别出191个DERs,其中54个变化倍数>2的被选用于通路分析。胆固醇生物合成和代谢通路似乎作为核心,连接其他与性状相关的通路,以影响ASD固有的社交沟通缺陷的严重程度。当分析所有191个DERs时,RF和SVM算法的分类准确率均可达到>90%。由语言障碍的存在所定义的ASD亚型是预后的一个重要指标,可以通过与社交沟通缺陷以及胆固醇生物合成和代谢相关的转录组图谱来预测。结果表明,RF和SVM都是机器学习算法识别以与预后相关的临床同质性为特征的ASD亚组的可接受选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/371f/8149626/9e29a55f7aad/fpsyt-12-637022-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验