利用全国电子登记数据对艾滋病毒感染状况进行算法预测。

BACKGROUND: Late HIV diagnosis is detrimental both to the individual and to society. Strategies to improve early diagnosis of HIV must be a key health care priority. We examined whether nation-wide electronic registry data could be used to predict HIV status using machine learning algorithms. METHODS: We extracted individual level data from Danish registries and used algorithms to predict HIV status. We used various algorithms to train prediction models and validated these models. We calibrated the models to mimic different clinical scenarios and created confusion matrices based on the calibrated models. FINDINGS: A total 4,384,178 individuals, including 4,350 with incident HIV, were included in the analyses. The full model that included all variables that included demographic variables and information on past medical history had the highest area under the receiver operating characteristics curves of 88·4% (95%CI: 87·5% - 89·4%) in the validation dataset. Performance measures did not differ substantially with regards to which machine learning algorithm was used. When we calibrated the models to a specificity of 99·9% (pre-exposure prophylaxis (PrEP) scenario), we found a positive predictive value (PPV) of 8·3% in the full model. When we calibrated the models to a sensitivity of 90% (screening scenario), 384 individuals would have to be tested to find one undiagnosed person with HIV. INTERPRETATION: Machine learning algorithms can learn from electronic registry data and help to predict HIV status with a fairly high level of accuracy. Integration of prediction models into clinical software systems may complement existing strategies such as indicator condition-guided HIV testing and prove useful for identifying individuals suitable for PrEP. FUNDING: The study was supported by funds from the Preben and Anne Simonsens Foundation, the Novo Nordisk Foundation, Rigshospitalet, Copenhagen University, the Danish AIDS Foundation, the Augustinus Foundation and the Danish Health Foundation.

背景：HIV晚期诊断对个人和社会均有害。改善HIV早期诊断的策略必须是医疗保健的关键优先事项。我们研究了全国电子登记数据是否可用于通过机器学习算法预测HIV状态。方法：我们从丹麦登记处提取个体层面的数据，并使用算法预测HIV状态。我们使用各种算法训练预测模型并对这些模型进行验证。我们校准模型以模拟不同的临床场景，并基于校准后的模型创建混淆矩阵。研究结果：分析共纳入4384178人，其中包括4350例新发HIV感染者。在验证数据集中，包含人口统计学变量和既往病史信息等所有变量的完整模型在受试者工作特征曲线下的面积最高，为88.4%（95%CI：87.5% - 89.4%）。使用哪种机器学习算法，性能指标并无实质性差异。当我们将模型校准至99.9%的特异性（暴露前预防（PrEP）场景）时，完整模型的阳性预测值（PPV）为8.3%。当我们将模型校准至90%的敏感性（筛查场景）时，必须检测384人才能发现一名未被诊断的HIV感染者。解读：机器学习算法可以从电子登记数据中学习，并有助于以相当高的准确度预测HIV状态。将预测模型整合到临床软件系统中可能会补充现有策略，如指标条件引导的HIV检测，并证明对识别适合PrEP的个体有用。资金支持：该研究得到了Preben和Anne Simonsens基金会、诺和诺德基金会、哥本哈根大学 Rigshospitalet、丹麦艾滋病基金会、奥古斯汀基金会和丹麦健康基金会的资金支持。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

Algorithmic prediction of HIV status using nation-wide electronic registry data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

推荐工具