Department of Epidemiology, Biostatistics and Occupational Health, McGill University, 2001 McGill College Avenue, Montreal, QC, H3A 1G1, Canada.
Centre for Outcomes Research and Evaluation, McGill University Health Center-Research Institute, Montreal, QC, Canada.
BMC Med Res Methodol. 2022 Aug 12;22(1):223. doi: 10.1186/s12874-022-01700-y.
Depression is common in the human immunodeficiency virus (HIV)-hepatitis C virus (HCV) co-infected population. Demographic, behavioural, and clinical data collected in research settings may be of help in identifying those at risk for clinical depression. We aimed to predict the presence of depressive symptoms indicative of a risk of depression and identify important classification predictors using supervised machine learning.
We used data from the Canadian Co-infection Cohort, a multicentre prospective cohort, and its associated sub-study on Food Security (FS). The Center for Epidemiologic Studies Depression Scale-10 (CES-D-10) was administered in the FS sub-study; participants were classified as being at risk for clinical depression if scores ≥ 10. We developed two random forest algorithms using the training data (80%) and tenfold cross validation to predict the CES-D-10 classes-1. Full algorithm with all candidate predictors (137 predictors) and 2. Reduced algorithm using a subset of predictors based on expert opinion (46 predictors). We evaluated the algorithm performances in the testing data using area under the receiver operating characteristic curves (AUC) and generated predictor importance plots.
We included 1,934 FS sub-study visits from 717 participants who were predominantly male (73%), white (76%), unemployed (73%), and high school educated (52%). At the first visit, median age was 49 years (IQR:43-54) and 53% reported presence of depressive symptoms with CES-D-10 scores ≥ 10. The full algorithm had an AUC of 0.82 (95% CI:0.78-0.86) and the reduced algorithm of 0.76 (95% CI:0.71-0.81). Employment, HIV clinical stage, revenue source, body mass index, and education were the five most important predictors.
We developed a prediction algorithm that could be instrumental in identifying individuals at risk for depression in the HIV-HCV co-infected population in research settings. Development of such machine learning algorithms using research data with rich predictor information can be useful for retrospective analyses of unanswered questions regarding impact of depressive symptoms on clinical and patient-centred outcomes among vulnerable populations.
在人类免疫缺陷病毒(HIV)-丙型肝炎病毒(HCV)合并感染人群中,抑郁症很常见。在研究环境中收集的人口统计学、行为和临床数据可能有助于识别有临床抑郁风险的人群。我们旨在使用监督机器学习来预测存在提示抑郁风险的抑郁症状,并确定重要的分类预测因素。
我们使用了加拿大合并感染队列(一个多中心前瞻性队列)及其相关的食物保障(FS)子研究的数据。FS 子研究中使用了中心流行病学研究抑郁量表-10(CES-D-10);如果得分≥10,将参与者归类为有临床抑郁风险。我们使用训练数据(80%)和十折交叉验证开发了两个随机森林算法,以预测 CES-D-10 分类-1。全算法,包含所有候选预测因子(137 个预测因子)和 2. 根据专家意见选择的预测因子子集的简化算法(46 个预测因子)。我们使用受试者工作特征曲线下面积(AUC)评估算法在测试数据中的性能,并生成预测因子重要性图。
我们纳入了来自 717 名参与者的 1934 次 FS 子研究访问,这些参与者主要为男性(73%)、白人(76%)、失业(73%)和高中学历(52%)。在第一次就诊时,中位年龄为 49 岁(IQR:43-54),53%的人报告 CES-D-10 得分≥10,存在抑郁症状。全算法的 AUC 为 0.82(95%CI:0.78-0.86),简化算法的 AUC 为 0.76(95%CI:0.71-0.81)。就业、HIV 临床分期、收入来源、体重指数和教育是五个最重要的预测因子。
我们开发了一种预测算法,可用于在 HIV-HCV 合并感染人群中识别有抑郁风险的个体。使用具有丰富预测因子信息的研究数据开发此类机器学习算法,可用于对弱势群体中抑郁症状对临床和以患者为中心的结果的影响等未解决问题进行回顾性分析。