Graduate Program in Clinical Care in Nursing and Health, State University of Ceará, Fortaleza, Brazil.
Graduate Program in Computer Science, State University of Ceará, Fortaleza, Brazil.
Comput Math Methods Med. 2021 Jul 9;2021:4602465. doi: 10.1155/2021/4602465. eCollection 2021.
Dementia interferes with the individual's motor, behavioural, and intellectual functions, causing him to be unable to perform instrumental activities of daily living. This study is aimed at identifying the best performing algorithm and the most relevant characteristics to categorise individuals with HIV/AIDS at high risk of dementia from the application of data mining. Principal component analysis (PCA) algorithm was used and tested comparatively between the following machine learning algorithms: logistic regression, decision tree, neural network, KNN, and random forest. The database used for this study was built from the data collection of 270 individuals infected with HIV/AIDS and followed up at the outpatient clinic of a reference hospital for infectious and parasitic diseases in the State of Ceará, Brazil, from January to April 2019. Also, the performance of the algorithms was analysed for the 104 characteristics available in the database; then, with the reduction of dimensionality, there was an improvement in the quality of the machine learning algorithms and identified that during the tests, even losing about 30% of the variation. Besides, when considering only 23 characteristics, the precision of the algorithms was 86% in random forest, 56% logistic regression, 68% decision tree, 60% KNN, and 59% neural network. The random forest algorithm proved to be more effective than the others, obtaining 84% precision and 86% accuracy.
痴呆症会干扰个体的运动、行为和智力功能,导致其无法进行日常的工具性活动。本研究旨在从数据挖掘的应用中确定性能最佳的算法和最相关的特征,以对 HIV/AIDS 患者发生痴呆的风险进行分类。使用主成分分析(PCA)算法,并在以下机器学习算法之间进行了比较:逻辑回归、决策树、神经网络、KNN 和随机森林。本研究使用的数据库是从巴西塞阿拉州一家传染病参考医院的门诊 270 名感染 HIV/AIDS 的个体的数据收集构建的,时间为 2019 年 1 月至 4 月。此外,还对数据库中 104 个特征的算法性能进行了分析;然后,通过降维,机器学习算法的质量得到了提高,并确定了在测试过程中,即使丢失了大约 30%的变化,也有很好的效果。此外,仅考虑 23 个特征时,随机森林算法的准确率为 86%,逻辑回归为 56%,决策树为 68%,KNN 为 60%,神经网络为 59%。随机森林算法比其他算法更有效,准确率为 84%,准确度为 86%。