Karim Ayesha, Alromema Nashwan, Malebary Sharaf J, Binzagr Faisal, Ahmed Amir, Khan Yaser Daanial
Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.
Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King AbdulAziz University, Jeddah, Saudi Arabia.
Digit Health. 2025 Jan 27;11:20552076241313407. doi: 10.1177/20552076241313407. eCollection 2025 Jan-Dec.
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition influenced by various genetic and environmental factors. Currently, there is no definitive clinical test, such as a blood analysis or brain scan, for early diagnosis. The objective of this study is to develop a computational model that predicts ASD driver genes in the early stages using genomic data, aiming to enhance early diagnosis and intervention.
This study utilized a benchmark genomic dataset, which was processed using feature extraction techniques to identify relevant genetic patterns. Several ensemble classification methods, including Extreme Gradient Boosting, Random Forest, Light Gradient Boosting Machine, ExtraTrees, and a stacked ensemble of classifiers, were applied to assess the predictive power of the genomic features. TheEnsemble Model Predictor for Autism Spectrum Disorder (eNSMBL-PASD) model was rigorously validated using multiple performance metrics such as accuracy, sensitivity, specificity, and Mathew's correlation coefficient.
The proposed model demonstrated superior performance across various validation techniques. The self-consistency test achieved 100% accuracy, while the independent set and cross-validation tests yielded 91% and 87% accuracy, respectively. These results highlight the model's robustness and reliability in predicting ASD-related genes.
The eNSMBL-PASD model provides a promising tool for the early detection of ASD by identifying genetic markers associated with the disorder. In the future, this model has the potential to assist healthcare professionals, particularly doctors and psychologists, in diagnosing and formulating treatment plans for ASD at its earliest stages.
自闭症谱系障碍(ASD)是一种复杂的神经发育疾病,受多种遗传和环境因素影响。目前,尚无用于早期诊断的确定性临床检测方法,如血液分析或脑部扫描。本研究的目的是开发一种计算模型,利用基因组数据在早期阶段预测ASD驱动基因,旨在加强早期诊断和干预。
本研究使用了一个基准基因组数据集,采用特征提取技术对其进行处理,以识别相关的遗传模式。应用了几种集成分类方法,包括极端梯度提升、随机森林、轻梯度提升机、ExtraTrees以及分类器的堆叠集成,来评估基因组特征的预测能力。使用准确性、敏感性、特异性和马修相关系数等多种性能指标对自闭症谱系障碍集成模型预测器(eNSMBL-PASD)模型进行了严格验证。
所提出的模型在各种验证技术中均表现出卓越性能。自一致性测试的准确率达到100%,而独立集测试和交叉验证测试的准确率分别为91%和87%。这些结果凸显了该模型在预测ASD相关基因方面的稳健性和可靠性。
eNSMBL-PASD模型通过识别与该疾病相关的遗传标记,为ASD的早期检测提供了一个有前景的工具。未来,该模型有可能协助医疗保健专业人员,特别是医生和心理学家,在ASD的最早阶段进行诊断并制定治疗方案。