Department of Industrial Engineering, University of Rome Tor Vergata, Via del Politecnico 1, 00133 Rome, Italy.
IMME Research Centre, Via San Francesco d'Assisi 20, 81100 Caserta, Italy.
Genes (Basel). 2023 Jan 25;14(2):313. doi: 10.3390/genes14020313.
Autism spectrum disorder (ASD) is a heterogeneous condition, characterized by complex genetic architectures and intertwined genetic/environmental interactions. Novel analysis approaches to disentangle its pathophysiology by computing large amounts of data are needed. We present an advanced machine learning technique, based on a clustering analysis on genotypical/phenotypical embedding spaces, to identify biological processes that might act as pathophysiological substrates for ASD. This technique was applied to the VariCarta database, which contained 187,794 variant events retrieved from 15,189 individuals with ASD. Nine clusters of ASD-related genes were identified. The 3 largest clusters included 68.6% of all individuals, consisting of 1455 (38.0%), 841 (21.9%), and 336 (8.7%) persons, respectively. Enrichment analysis was applied to isolate clinically relevant ASD-associated biological processes. Two of the identified clusters were characterized by individuals with an increased presence of variants linked to biological processes and cellular components, such as axon growth and guidance, synaptic membrane components, or transmission. The study also suggested other clusters with possible genotype-phenotype associations. Innovative methodologies, including machine learning, can improve our understanding of the underlying biological processes and gene variant networks that undergo the etiology and pathogenic mechanisms of ASD. Future work to ascertain the reproducibility of the presented methodology is warranted.
自闭症谱系障碍(ASD)是一种异质性疾病,其特征是复杂的遗传结构和交织的遗传/环境相互作用。需要新颖的分析方法来通过计算大量数据来阐明其病理生理学。我们提出了一种基于基因型/表型嵌入空间聚类分析的先进机器学习技术,以识别可能作为 ASD 病理生理基础的生物过程。该技术应用于 VariCarta 数据库,其中包含从 15189 名 ASD 个体中检索到的 187794 个变体事件。确定了 9 个与 ASD 相关的基因簇。最大的 3 个簇包括所有个体的 68.6%,分别包含 1455(38.0%)、841(21.9%)和 336(8.7%)个体。应用富集分析来分离临床相关的 ASD 相关生物过程。鉴定的两个簇的特征是存在与生物过程和细胞成分(如轴突生长和导向、突触膜成分或传递)相关的变体增加的个体。该研究还提出了其他可能具有基因型-表型关联的簇。包括机器学习在内的创新方法可以提高我们对 ASD 病因和发病机制所经历的潜在生物学过程和基因变异网络的理解。需要进一步的工作来确定所提出方法的重现性。