Information Technology Dept., Faculty of Computers and Information, Mansoura University, Mansoura, P.O.35516, Egypt.
Information Technology Dept., Faculty of Computers and Information, Mansoura University, Mansoura, P.O.35516, Egypt.
Comput Biol Med. 2022 Jul;146:105622. doi: 10.1016/j.compbiomed.2022.105622. Epub 2022 May 24.
Alzheimer's disease (AD) is a degenerative disorder that attacks nerve cells in the brain. AD leads to memory loss and cognitive & intellectual impairments that can influence social activities and decision-making. The most common type of human genetic variation is single nucleotide polymorphisms (SNPs). SNPs are beneficial markers of complex gene-disease. Many common and serious diseases, such as AD, have associated SNPs. Detection of SNP biomarkers linked with AD could help in the early prediction and diagnosis of this disease. The main objective of this paper is to predict and diagnose AD based on SNPs biomarkers with high classification accuracy in the early stages. One of the most concerning problems is the high number of features. Thus, the paper proposes a comprehensive framework for early AD detection and detecting the most significant genes based on SNPs analysis. Usage of machine learning (ML) techniques to identify new biomarkers of AD is also suggested. In the proposed system, two feature selection techniques are separately checked: the information gain filter and Boruta wrapper. The two feature selection techniques were used to select the most significant genes related to AD in this system. Filter methods measure the relevance of features by their correlation with dependent variables, while wrapper methods measure the usefulness of a subset of features by training a model on it. Gradient boosting tree (GBT) has been applied on all AD genetic data of neuroimaging initiative phase 1 (ADNI-1) and Whole-Genome Sequencing (WGS) datasets by using two feature selection techniques. In the whole-genome approach ADNI-1, results revealed that the GBT learning algorithm scored an overall accuracy of 99.06% in the case of using Boruta feature selection. Using information gain feature selection, the proposed system achieved an average accuracy of 94.87%. The results show that the proposed system is preferable for the early detection of AD. Also, the results revealed that the Boruta wrapper feature selection is superior to the information gain filter technique.
阿尔茨海默病(AD)是一种侵袭大脑神经细胞的退行性疾病。AD 会导致记忆力丧失和认知及智力障碍,从而影响社交活动和决策能力。人类最常见的遗传变异类型是单核苷酸多态性(SNP)。SNP 是复杂基因疾病的有益标志物。许多常见且严重的疾病,如 AD,都与 SNP 有关。检测与 AD 相关的 SNP 生物标志物有助于早期预测和诊断这种疾病。本文的主要目的是基于 SNP 生物标志物,在早期阶段以较高的分类准确率预测和诊断 AD。其中一个最令人关注的问题是特征数量过多。因此,本文提出了一个基于 SNP 分析的 AD 早期检测和检测最显著基因的综合框架。还建议使用机器学习(ML)技术来识别 AD 的新生物标志物。在提出的系统中,分别检查了两种特征选择技术:信息增益过滤和 Boruta 包装器。这两种特征选择技术用于选择与 AD 最相关的最显著基因。过滤方法通过特征与因变量的相关性来衡量特征的相关性,而包装方法则通过在模型上训练子集的特征来衡量特征的有用性。梯度提升树(GBT)已应用于神经影像倡议阶段 1(ADNI-1)和全基因组测序(WGS)数据集的所有 AD 遗传数据,使用两种特征选择技术。在全基因组 ADNI-1 方法中,结果表明,在使用 Boruta 特征选择的情况下,GBT 学习算法的总准确率为 99.06%。使用信息增益特征选择,所提出的系统达到了 94.87%的平均准确率。结果表明,所提出的系统更适合 AD 的早期检测。此外,结果还表明 Boruta 包装器特征选择优于信息增益过滤技术。