School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China.
School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield AL10 9AB, UK.
Genes (Basel). 2022 Dec 12;13(12):2344. doi: 10.3390/genes13122344.
In the studies of Alzheimer's disease (AD), jointly analyzing imaging data and genetic data provides an effective method to explore the potential biomarkers of AD. AD can be separated into healthy controls (HC), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI) and AD. In the meantime, identifying the important biomarkers of AD progression, and analyzing these biomarkers in AD provide valuable insights into understanding the mechanism of AD. In this paper, we present a novel data fusion method and a genetic weighted random forest method to mine important features. Specifically, we amplify the difference among AD, LMCI, EMCI and HC by introducing eigenvalues calculated from the gene -value matrix for feature fusion. Furthermore, we construct the genetic weighted random forest using the resulting fused features. Genetic evolution is used to increase the diversity among decision trees and the decision trees generated are weighted by weights. After training, the genetic weighted random forest is analyzed further to detect the significant fused features. The validation experiments highlight the performance and generalization of our proposed model. We analyze the biological significance of the results and identify some significant genes (, , , and ). Furthermore, the calcium signaling pathway, arrhythmogenic right ventricular cardiomyopathy and the glutamatergic synapse pathway were identified. The investigational findings demonstrate that our proposed model presents an accurate and efficient approach to identifying significant biomarkers in AD.
在阿尔茨海默病(AD)的研究中,联合分析成像数据和遗传数据为探索 AD 的潜在生物标志物提供了一种有效的方法。AD 可以分为健康对照组(HC)、早期轻度认知障碍(EMCI)、晚期轻度认知障碍(LMCI)和 AD。同时,识别 AD 进展的重要生物标志物,并分析这些生物标志物在 AD 中的作用,为理解 AD 的发病机制提供了有价值的见解。在本文中,我们提出了一种新的数据融合方法和遗传加权随机森林方法来挖掘重要特征。具体来说,我们通过引入从基因-值矩阵计算得出的特征值来放大 AD、LMCI、EMCI 和 HC 之间的差异,用于特征融合。此外,我们使用生成的融合特征构建遗传加权随机森林。遗传进化用于增加决策树之间的多样性,生成的决策树用权重进行加权。训练后,进一步分析遗传加权随机森林以检测显著的融合特征。验证实验突出了我们提出的模型的性能和泛化能力。我们分析了结果的生物学意义,并确定了一些重要的基因(,,,和)。此外,还鉴定了钙信号通路、致心律失常性右心室心肌病和谷氨酸能突触通路。研究结果表明,我们提出的模型为识别 AD 中的显著生物标志物提供了一种准确有效的方法。