Leibniz Institute for Prevention Research and Epidemiology - BIPS, Achterstraße 30, Bremen, Germany; University of Gondar, Institute of Public Health, Department of Health Informatics, Gondar, Ethiopia.
Health Policy and Planning Directorate, Ethiopian Federal Ministry of Health, Ethiopia.
Comput Methods Programs Biomed. 2017 Dec;152:149-157. doi: 10.1016/j.cmpb.2017.09.017. Epub 2017 Sep 21.
To monitor the progress of therapy and disease progression, periodic CD4 counts are required throughout the course of HIV/AIDS care and support. The demand for CD4 count measurement is increasing as ART programs expand over the last decade. This study aimed to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART.
A cross-sectional study was conducted at the University of Gondar Hospital from 3,104 adult patients on ART with CD4 counts measured at least twice (baseline and most recent). Data were retrieved from the HIV care clinic electronic database and patients` charts. Descriptive data were analyzed by SPSS version 20. Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was followed to undertake the study. WEKA version 3.8 was used to conduct a predictive data mining. Before building the predictive data mining models, information gain values and correlation-based Feature Selection methods were used for attribute selection. Variables were ranked according to their relevance based on their information gain values. J48, Neural Network, and Random Forest algorithms were experimented to assess model accuracies.
The median duration of ART was 191.5 weeks. The mean CD4 count change was 243 (SD 191.14) cells per microliter. Overall, 2427 (78.2%) patients had their CD4 counts increased by at least 100 cells per microliter, while 4% had a decline from the baseline CD4 value. Baseline variables including age, educational status, CD8 count, ART regimen, and hemoglobin levels predicted CD4 count changes with predictive accuracies of J48, Neural Network, and Random Forest being 87.1%, 83.5%, and 99.8%, respectively. Random Forest algorithm had a superior performance accuracy level than both J48 and Artificial Neural Network. The precision, sensitivity and recall values of Random Forest were also more than 99%.
Nearly accurate prediction results were obtained using Random Forest algorithm. This algorithm could be used in a low-resource setting to build a web-based prediction model for CD4 count changes.
为了监测治疗进展和疾病进展,在整个艾滋病病毒/艾滋病护理和支持过程中需要定期进行 CD4 计数。随着过去十年中抗逆转录病毒治疗(ART)方案的扩大,对 CD4 计数测量的需求不断增加。本研究旨在预测 CD4 计数变化,并确定接受 ART 的患者中 CD4 计数变化的预测因素。
这项在戈达大学医院进行的横断面研究纳入了至少两次(基线和最近)进行 CD4 计数测量的 3104 名接受 ART 的成年患者。数据从艾滋病毒护理诊所的电子数据库和患者图表中检索。采用 SPSS 版本 20 对描述性数据进行分析。采用跨行业标准数据挖掘过程(CRISP-DM)方法开展研究。使用 WEKA 版本 3.8 进行预测性数据挖掘。在构建预测性数据挖掘模型之前,使用信息增益值和基于相关性的特征选择方法进行属性选择。根据信息增益值对变量进行相关性排序。实验中尝试了 J48、神经网络和随机森林算法来评估模型准确性。
ART 的中位持续时间为 191.5 周。平均 CD4 计数变化为 243(SD 191.14)个细胞/微升。总体而言,2427(78.2%)名患者的 CD4 计数增加了至少 100 个细胞/微升,而 4%的患者的 CD4 计数较基线值下降。基线变量包括年龄、教育程度、CD8 计数、ART 方案和血红蛋白水平可预测 CD4 计数变化,J48、神经网络和随机森林的预测准确性分别为 87.1%、83.5%和 99.8%。随机森林算法的性能准确性水平优于 J48 和人工神经网络。随机森林的精度、敏感性和召回值也超过 99%。
使用随机森林算法获得了近乎准确的预测结果。该算法可在资源有限的环境中用于构建基于网络的 CD4 计数变化预测模型。