Division of Infectious Diseases and Hospital Epidemiology, University Hospital Basel, University of Basel, Basel, Switzerland.
Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, University of Basel, Basel, Switzerland.
J Infect Dis. 2021 Oct 13;224(7):1198-1208. doi: 10.1093/infdis/jiaa236.
It is unclear whether data-driven machine learning models, which are trained on large epidemiological cohorts, may improve prediction of comorbidities in people living with human immunodeficiency virus (HIV).
In this proof-of-concept study, we included people living with HIV in the prospective Swiss HIV Cohort Study with a first estimated glomerular filtration rate (eGFR) >60 mL/minute/1.73 m2 after 1 January 2002. Our primary outcome was chronic kidney disease (CKD)-defined as confirmed decrease in eGFR ≤60 mL/minute/1.73 m2 over 3 months apart. We split the cohort data into a training set (80%), validation set (10%), and test set (10%), stratified for CKD status and follow-up length.
Of 12 761 eligible individuals (median baseline eGFR, 103 mL/minute/1.73 m2), 1192 (9%) developed a CKD after a median of 8 years. We used 64 static and 502 time-changing variables: Across prediction horizons and algorithms and in contrast to expert-based standard models, most machine learning models achieved state-of-the-art predictive performances with areas under the receiver operating characteristic curve and precision recall curve ranging from 0.926 to 0.996 and from 0.631 to 0.956, respectively.
In people living with HIV, we observed state-of-the-art performances in forecasting individual CKD onsets with different machine learning algorithms.
目前尚不清楚基于大型流行病学队列进行训练的数据驱动机器学习模型是否能提高对人类免疫缺陷病毒(HIV)感染者合并症的预测能力。
在这项概念验证研究中,我们纳入了 2002 年 1 月 1 日后首次估算肾小球滤过率(eGFR)>60 mL/minute/1.73 m2 的瑞士 HIV 队列研究中具有前瞻性的 HIV 感染者。我们的主要结局是慢性肾脏病(CKD)——定义为 eGFR 确认下降≥60 mL/minute/1.73 m2,且相隔 3 个月以上。我们将队列数据分为训练集(80%)、验证集(10%)和测试集(10%),并根据 CKD 状态和随访时间进行分层。
在 12761 名符合条件的个体中(中位基线 eGFR 为 103 mL/minute/1.73 m2),1192 名(9%)在中位随访 8 年后发生 CKD。我们使用了 64 个静态变量和 502 个时变变量:在不同的预测时间窗和算法中,与基于专家的标准模型相比,大多数机器学习模型的表现都达到了最新水平,其接受者操作特征曲线和精度召回曲线下面积分别为 0.9260.996 和 0.6310.956。
在 HIV 感染者中,我们观察到不同机器学习算法在预测个体 CKD 发作方面的表现达到了最新水平。