Kaczyński Jacek, Pasenkiewicz-Gierula Marta
Department of Biochemistry, Biophysics and Biotechnology, Jagiellonian University in Kraków, Kraków, Poland.
PLoS One. 2025 Jun 11;20(6):e0326022. doi: 10.1371/journal.pone.0326022. eCollection 2025.
Whole-genome sequencing data of simplex families with autism spectrum disorder (ASD) were analyzed by searching for statistical interactions between loci. The resulting variant pairs mapped to 411 genes, of which 368 had not been associated with ASD before. The variants were used to build an ASD predictor based on an open-source machine learning library. The predictor correctly classifies over 78% of samples from a test set with an average significance level of 8.9· 10-158. Gene Ontology (GO) enrichment analysis of the identified risk genes points to functions related to the development of the Central Nervous System (CNS). Clustering cases on the basis of risk variants improves predictor accuracy and reveals additional overrepresented GO terms. Some of the detected statistical interactions can be linked to known biological interactions between genes involved in the development of the CNS. Analysis of the statistical interactions also points to genes whose biological functions are not yet known.
通过搜索基因座之间的统计相互作用,对患有自闭症谱系障碍(ASD)的单纯家庭的全基因组测序数据进行了分析。得到的变异对映射到411个基因,其中368个基因以前未与ASD相关联。这些变异被用于基于一个开源机器学习库构建ASD预测模型。该预测模型能正确分类来自测试集的超过78%的样本,平均显著性水平为8.9·10-158。对已识别的风险基因进行基因本体论(GO)富集分析,结果指向与中枢神经系统(CNS)发育相关的功能。基于风险变异对病例进行聚类可提高预测模型的准确性,并揭示其他过度富集的GO术语。一些检测到的统计相互作用可与中枢神经系统发育中涉及的基因之间已知的生物学相互作用联系起来。对统计相互作用的分析还指向了生物学功能尚不清楚的基因。