Center for Precision Health, McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States.
Dan L Duncan Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX 77030, United States.
J Am Med Inform Assoc. 2024 Feb 16;31(3):666-673. doi: 10.1093/jamia/ocad217.
The HIV epidemic remains a significant public health issue in the United States. HIV risk prediction models could be beneficial for reducing HIV transmission by helping clinicians identify patients at high risk for infection and refer them for testing. This would facilitate initiation on treatment for those unaware of their status and pre-exposure prophylaxis for those uninfected but at high risk. Existing HIV risk prediction algorithms rely on manual construction of features and are limited in their application across diverse electronic health record systems. Furthermore, the accuracy of these models in predicting HIV in females has thus far been limited.
We devised a pipeline for automatic construction of prediction models based on automatic feature engineering to predict HIV risk and tested our pipeline on a local electronic health records system and a national claims data. We also compared the performance of general models to female-specific models.
Our models obtain similarly good performance on both health record datasets despite difference in represented populations and data availability (AUC = 0.87). Furthermore, our general models obtain good performance on females but are also improved by constructing female-specific models (AUC between 0.81 and 0.86 across datasets).
We demonstrated that flexible construction of prediction models performs well on HIV risk prediction across diverse health records systems and perform as well in predicting HIV risk in females, making deployment of such models into existing health care systems tangible.
艾滋病病毒(HIV)疫情仍然是美国一个重大的公共卫生问题。HIV 风险预测模型可以通过帮助临床医生识别感染风险高的患者,并为他们提供检测,从而有助于减少 HIV 传播。这将有助于那些不知道自己感染状况的人开始接受治疗,并为那些未感染但处于高风险的人提供暴露前预防。现有的 HIV 风险预测算法依赖于人工构建特征,并且在不同的电子健康记录系统中的应用受到限制。此外,这些模型在预测女性 HIV 方面的准确性迄今为止一直受到限制。
我们设计了一个基于自动特征工程的自动构建预测模型的管道,以预测 HIV 风险,并在本地电子健康记录系统和全国索赔数据上测试了我们的管道。我们还比较了通用模型和女性专用模型的性能。
尽管代表人群和数据可用性不同,但我们的模型在两个健康记录数据集上都获得了类似的良好性能(AUC=0.87)。此外,我们的通用模型在女性中表现良好,但通过构建女性专用模型也可以提高性能(在不同数据集之间 AUC 为 0.81 到 0.86)。
我们证明了灵活构建的预测模型在不同的健康记录系统中进行 HIV 风险预测的性能良好,并且在预测女性 HIV 风险方面表现良好,这使得将此类模型部署到现有的医疗保健系统成为可能。