College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing, China.
Shanghai Public Health Clinical Center, Fudan University, Shanghai, China.
HIV Med. 2023 Jan;24(1):82-92. doi: 10.1111/hiv.13324. Epub 2022 Jun 27.
We constructed a recency-frequency (RF) model for predicting the loss to follow-up (LTFU) in HIV/AIDS patients in China.
Data on HIV/AIDS outpatients in the research unit from 1 August 2009 to 30 September 2020 and from 1 October to 31 December 2020 were exported as the observation and prediction datasets, respectively. The classic recency-frequency-monetary (RFM) model was expanded into RFm, RF, RFL and RFmL models. In the observation dataset, the best predictive model was obtained using k-means clustering and C5.0 verification. Then, two rounds of k-means modelling were performed on the best model: data with R ≤ 6 months were retained, randomly divided into a training set (70%) and a testing set (30%) and used to perform the second round of modelling to subdivide the churn and non-churn patients. Next, an ANN algorithm was used to predict LTFU, and the confusion matrix with prediction datasets was constructed.
The observation and prediction datasets included 16 949 and 10 748 samples, respectively. The RF model with three clusters and a quality of 0.82 was the best predictive model. From the observation set, 13 799 samples were retained, and the model accuracy was 100% on the training and testing sets. These 13 799 samples were subdivided into 1563 samples of churn patients and 12 216 samples of non-churn patients. The accuracy of ANN prediction was 99.89%. The accuracy and precision of the confusion matrix were 85.41% and 99.76%, respectively.
The RF model is effective in predicting the LTFU of HIV/AIDS patients in China and preventing its occurrence.
我们构建了一个用于预测中国 HIV/AIDS 患者失访(LTFU)的近期-频率(RF)模型。
分别将 2009 年 8 月 1 日至 2020 年 9 月 30 日和 2020 年 10 月 1 日至 12 月 31 日研究单位的 HIV/AIDS 门诊患者数据导出为观察数据集和预测数据集。经典的近期-频率-货币(RFM)模型扩展为 RFm、RF、RFL 和 RFmL 模型。在观察数据集,使用 k-均值聚类和 C5.0 验证获得最佳预测模型。然后,对最佳模型进行两轮 k-均值建模:保留 R≤6 个月的数据,随机分为训练集(70%)和测试集(30%),并使用第二轮建模将流失和非流失患者细分。接下来,使用 ANN 算法预测 LTFU,并使用预测数据集构建混淆矩阵。
观察数据集和预测数据集分别包含 16949 例和 10748 例。具有三个聚类和质量为 0.82 的 RF 模型是最佳预测模型。从观察集中保留了 13799 例样本,训练集和测试集的模型准确率为 100%。这 13799 例样本被细分为 1563 例流失患者和 12216 例非流失患者。ANN 预测的准确率为 99.89%。混淆矩阵的准确率和精度分别为 85.41%和 99.76%。
RF 模型在中国 HIV/AIDS 患者 LTFU 的预测和预防方面是有效的。