使用集成学习的约束二元分类：在具有成本效益的针对性暴露前预防策略中的应用

Constrained binary classification using ensemble learning: an application to cost-efficient targeted PrEP strategies.

作者信息

Zheng Wenjing, Balzer Laura, van der Laan Mark, Petersen Maya

机构信息

Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, U.S.A.

Department of Biostatistics, Havard T.H. Chan School of Public Health, Boston, MA, U.S.A.

出版信息

Stat Med. 2018 Jan 30;37(2):261-279. doi: 10.1002/sim.7296. Epub 2017 Apr 6.

DOI:10.1002/sim.7296

PMID:28384841

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5701877/

Abstract

Binary classification problems are ubiquitous in health and social sciences. In many cases, one wishes to balance two competing optimality considerations for a binary classifier. For instance, in resource-limited settings, an human immunodeficiency virus prevention program based on offering pre-exposure prophylaxis (PrEP) to select high-risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program. In this article, we consider a general class of constrained binary classification problems wherein the objective function and the constraint are both monotonic with respect to a threshold. These include the minimization of the rate of positive predictions subject to a minimum sensitivity, the maximization of sensitivity subject to a maximum rate of positive predictions, and the Neyman-Pearson paradigm, which minimizes the type II error subject to an upper bound on the type I error. We propose an ensemble approach to these binary classification problems based on the Super Learner methodology. This approach linearly combines a user-supplied library of scoring algorithms, with combination weights and a discriminating threshold chosen to minimize the constrained optimality criterion. We then illustrate the application of the proposed classifier to develop an individualized PrEP targeting strategy in a resource-limited setting, with the goal of minimizing the number of PrEP offerings while achieving a minimum required sensitivity. This proof of concept data analysis uses baseline data from the ongoing Sustainable East Africa Research in Community Health study. Copyright © 2017 John Wiley & Sons, Ltd.

摘要

二元分类问题在健康和社会科学中无处不在。在许多情况下，人们希望在二元分类器的两个相互竞争的最优性考量之间取得平衡。例如，在资源有限的环境中，一个基于为选定的高危个体提供暴露前预防（PrEP）的人类免疫缺陷病毒预防项目，必须在二元分类器检测未来血清转化者的敏感性（从而为他们提供PrEP方案）与该项目在财务和后勤上可行的PrEP方案总数之间取得平衡。在本文中，我们考虑一类一般的约束二元分类问题，其中目标函数和约束对于一个阈值都是单调的。这些问题包括在最小敏感性约束下最小化阳性预测率、在最大阳性预测率约束下最大化敏感性，以及奈曼 - 皮尔逊范式，即在I型错误有上限的情况下最小化II型错误。我们基于超级学习器方法为这些二元分类问题提出一种集成方法。这种方法将用户提供的评分算法库进行线性组合，并选择组合权重和判别阈值以最小化约束最优性准则。然后，我们展示了所提出的分类器在资源有限环境中开发个性化PrEP靶向策略的应用，目标是在实现最低要求敏感性的同时最小化PrEP的提供数量。这个概念验证数据分析使用了正在进行的东非社区健康可持续研究的基线数据。版权所有© 2017约翰威立父子有限公司。

相似文献

Constrained binary classification using ensemble learning: an application to cost-efficient targeted PrEP strategies.使用集成学习的约束二元分类：在具有成本效益的针对性暴露前预防策略中的应用

Stat Med. 2018 Jan 30;37(2):261-279. doi: 10.1002/sim.7296. Epub 2017 Apr 6.

Using electronic health records to identify candidates for human immunodeficiency virus pre-exposure prophylaxis: An application of super learning to risk prediction when the outcome is rare.利用电子健康记录识别人类免疫缺陷病毒暴露前预防的候选者：当结局罕见时，超级学习在风险预测中的应用。

Stat Med. 2020 Oct 15;39(23):3059-3073. doi: 10.1002/sim.8591. Epub 2020 Jun 24.

Stacked generalization: an introduction to super learning.堆叠泛化：超级学习导论。

Eur J Epidemiol. 2018 May;33(5):459-464. doi: 10.1007/s10654-018-0390-z. Epub 2018 Apr 10.

A Cost-effectiveness Analysis of Preexposure Prophylaxis for the Prevention of HIV Among Los Angeles County Men Who Have Sex With Men.洛杉矶县男男性行为者中 HIV 预防的暴露前预防的成本效益分析。

Clin Infect Dis. 2016 Dec 1;63(11):1495-1504. doi: 10.1093/cid/ciw578. Epub 2016 Aug 23.

Online cross-validation-based ensemble learning.基于在线交叉验证的集成学习。

Stat Med. 2018 Jan 30;37(2):249-260. doi: 10.1002/sim.7320. Epub 2017 May 4.

Modelling impact and cost-effectiveness of oral pre-exposure prophylaxis in 13 low-resource countries.模拟口服暴露前预防在 13 个资源匮乏国家的影响和成本效益。

J Int AIDS Soc. 2020 Feb;23(2):e25451. doi: 10.1002/jia2.25451.

Combined HIV Adolescent Prevention Study (CHAPS): comparison of HIV pre-exposure prophylaxis regimens for adolescents in sub-Saharan Africa-study protocol for a mixed-methods study including a randomised controlled trial.联合艾滋病毒青少年预防研究（CHAPS）：撒哈拉以南非洲青少年艾滋病毒暴露前预防方案的比较——包括一项随机对照试验的混合方法研究方案。

Trials. 2020 Oct 30;21(1):900. doi: 10.1186/s13063-020-04760-x.

Early Adopters of Human Immunodeficiency Virus Preexposure Prophylaxis in a Population-based Combination Prevention Study in Rural Kenya and Uganda.肯尼亚和乌干达基于人群的组合预防研究中人类免疫缺陷病毒暴露前预防的早期采用者。

Clin Infect Dis. 2018 Nov 28;67(12):1853-1860. doi: 10.1093/cid/ciy390.

PrEP as a feature in the optimal landscape of combination HIV prevention in sub-Saharan Africa.在撒哈拉以南非洲地区，暴露前预防作为联合预防艾滋病的理想模式中的一项特色措施。

J Int AIDS Soc. 2016 Oct 18;19(7(Suppl 6)):21104. doi: 10.7448/IAS.19.7.21104. eCollection 2016.

Cost utility analysis of HIV pre exposure prophylaxis among men who have sex with men in Israel.以色列男男性行为人群中 HIV 暴露前预防的成本效用分析。

BMC Public Health. 2020 Feb 27;20(1):271. doi: 10.1186/s12889-020-8334-4.

引用本文的文献

Predictors of HIV seroconversion in Botswana.博茨瓦纳艾滋病病毒血清转化的预测因素。

AIDS. 2025 Mar 1;39(3):290-297. doi: 10.1097/QAD.0000000000004055. Epub 2024 Nov 4.

PROVIDENT: Development and Validation of a Machine Learning Model to Predict Neighborhood-level Overdose Risk in Rhode Island.PROVIDENT：开发和验证一种机器学习模型，以预测罗德岛地区的社区级药物过量风险。

Epidemiology. 2024 Mar 1;35(2):232-240. doi: 10.1097/EDE.0000000000001695. Epub 2024 Jan 2.

Comparison of machine learning methods for predicting viral failure: a case study using electronic health record data.预测病毒学治疗失败的机器学习方法比较：一项使用电子健康记录数据的案例研究

Stat Commun Infect Dis. 2020 Nov 12;12(Suppl1):20190017. doi: 10.1515/scid-2019-0017. eCollection 2020 Sep 1.

Utility of a machine-guided tool for assessing risk behaviour associated with contracting HIV in three sites in South Africa.一种机器引导工具在南非三个地点评估与感染艾滋病毒相关的风险行为中的效用。

Inform Med Unlocked. 2023;37:101192. doi: 10.1016/j.imu.2023.101192.

The role of machine learning in HIV risk prediction.机器学习在HIV风险预测中的作用。

Front Reprod Health. 2022 Dec 22;4:1062387. doi: 10.3389/frph.2022.1062387. eCollection 2022.

Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.机器学习在样本量有限的情况下表现优于逻辑回归分类：预测儿科 HIV 死亡率和临床进展为艾滋病的模型。

PLoS One. 2022 Oct 14;17(10):e0276116. doi: 10.1371/journal.pone.0276116. eCollection 2022.

Providers' Attitudes and Experiences with Pre-Exposure Prophylaxis Implementation in a Population-Based Study in Kenya and Uganda.肯尼亚和乌干达基于人群研究中提供者对暴露前预防实施的态度和经验。

AIDS Patient Care STDS. 2022 Oct;36(10):396-404. doi: 10.1089/apc.2022.0084. Epub 2022 Oct 5.

Current Artificial Intelligence (AI) Techniques, Challenges, and Approaches in Controlling and Fighting COVID-19: A Review.当前人工智能 (AI) 技术在控制和抗击 COVID-19 方面的挑战和方法：综述。

Int J Environ Res Public Health. 2022 May 12;19(10):5901. doi: 10.3390/ijerph19105901.

Deep Ensemble Machine Learning Framework for the Estimation of Concentrations.深度集成机器学习框架用于估算浓度。

Environ Health Perspect. 2022 Mar;130(3):37004. doi: 10.1289/EHP9752. Epub 2022 Mar 7.

Prediction of HIV status based on socio-behavioural characteristics in East and Southern Africa.基于东非和南非的社会行为特征预测艾滋病毒感染状况。

PLoS One. 2022 Mar 3;17(3):e0264429. doi: 10.1371/journal.pone.0264429. eCollection 2022.

本文引用的文献

AUC-Maximizing Ensembles through Metalearning.通过元学习实现AUC最大化的集成方法。

Int J Biostat. 2016 May 1;12(1):203-18. doi: 10.1515/ijb-2015-0035.

Statistical Inference for Data Adaptive Target Parameters.数据自适应目标参数的统计推断

Int J Biostat. 2016 May 1;12(1):3-19. doi: 10.1515/ijb-2015-0013.

A hybrid mobile approach for population-wide HIV testing in rural east Africa: an observational study.在东非农村地区进行全民艾滋病毒检测的混合移动方法：一项观察性研究。

Lancet HIV. 2016 Mar;3(3):e111-9. doi: 10.1016/S2352-3018(15)00251-9. Epub 2016 Jan 26.

Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates.用于交叉验证的ROC曲线估计下面积的计算高效的置信区间。

Electron J Stat. 2015;9(1):1583-1607. doi: 10.1214/15-EJS1035.

Super Learner Analysis of Electronic Adherence Data Improves Viral Prediction and May Provide Strategies for Selective HIV RNA Monitoring.电子依从性数据的超级学习者分析可改善病毒预测，并可能为选择性HIV RNA监测提供策略。

J Acquir Immune Defic Syndr. 2015 May 1;69(1):109-18. doi: 10.1097/QAI.0000000000000548.

Optimal Allocation of Gold Standard Testing under Constrained Availability: Application to Assessment of HIV Treatment Failure.在有限可及性条件下金标准检测的优化分配：在评估HIV治疗失败中的应用

J Am Stat Assoc. 2013 Jan 1;108(504):1173-1188. doi: 10.1080/01621459.2013.810149.

Uptake of community-based HIV testing during a multi-disease health campaign in rural Uganda.乌干达农村地区多病种健康运动期间基于社区的艾滋病毒检测情况

PLoS One. 2014 Jan 2;9(1):e84317. doi: 10.1371/journal.pone.0084317. eCollection 2014.

Super learner.超级学习者。

Stat Appl Genet Mol Biol. 2007;6:Article25. doi: 10.2202/1544-6115.1309. Epub 2007 Sep 16.

The 2-item Generalized Anxiety Disorder scale had high sensitivity and specificity for detecting GAD in primary care.二项式广泛性焦虑障碍量表在初级保健中检测广泛性焦虑障碍具有较高的敏感性和特异性。

Evid Based Med. 2007 Oct;12(5):149. doi: 10.1136/ebm.12.5.149.

The Patient Health Questionnaire-2: validity of a two-item depression screener.患者健康问卷-2：一项两项抑郁症筛查工具的效度

Med Care. 2003 Nov;41(11):1284-92. doi: 10.1097/01.MLR.0000093487.78664.3C.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验