Zheng Wenjing, Balzer Laura, van der Laan Mark, Petersen Maya
Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, U.S.A.
Department of Biostatistics, Havard T.H. Chan School of Public Health, Boston, MA, U.S.A.
Stat Med. 2018 Jan 30;37(2):261-279. doi: 10.1002/sim.7296. Epub 2017 Apr 6.
Binary classification problems are ubiquitous in health and social sciences. In many cases, one wishes to balance two competing optimality considerations for a binary classifier. For instance, in resource-limited settings, an human immunodeficiency virus prevention program based on offering pre-exposure prophylaxis (PrEP) to select high-risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program. In this article, we consider a general class of constrained binary classification problems wherein the objective function and the constraint are both monotonic with respect to a threshold. These include the minimization of the rate of positive predictions subject to a minimum sensitivity, the maximization of sensitivity subject to a maximum rate of positive predictions, and the Neyman-Pearson paradigm, which minimizes the type II error subject to an upper bound on the type I error. We propose an ensemble approach to these binary classification problems based on the Super Learner methodology. This approach linearly combines a user-supplied library of scoring algorithms, with combination weights and a discriminating threshold chosen to minimize the constrained optimality criterion. We then illustrate the application of the proposed classifier to develop an individualized PrEP targeting strategy in a resource-limited setting, with the goal of minimizing the number of PrEP offerings while achieving a minimum required sensitivity. This proof of concept data analysis uses baseline data from the ongoing Sustainable East Africa Research in Community Health study. Copyright © 2017 John Wiley & Sons, Ltd.
二元分类问题在健康和社会科学中无处不在。在许多情况下,人们希望在二元分类器的两个相互竞争的最优性考量之间取得平衡。例如,在资源有限的环境中,一个基于为选定的高危个体提供暴露前预防(PrEP)的人类免疫缺陷病毒预防项目,必须在二元分类器检测未来血清转化者的敏感性(从而为他们提供PrEP方案)与该项目在财务和后勤上可行的PrEP方案总数之间取得平衡。在本文中,我们考虑一类一般的约束二元分类问题,其中目标函数和约束对于一个阈值都是单调的。这些问题包括在最小敏感性约束下最小化阳性预测率、在最大阳性预测率约束下最大化敏感性,以及奈曼 - 皮尔逊范式,即在I型错误有上限的情况下最小化II型错误。我们基于超级学习器方法为这些二元分类问题提出一种集成方法。这种方法将用户提供的评分算法库进行线性组合,并选择组合权重和判别阈值以最小化约束最优性准则。然后,我们展示了所提出的分类器在资源有限环境中开发个性化PrEP靶向策略的应用,目标是在实现最低要求敏感性的同时最小化PrEP的提供数量。这个概念验证数据分析使用了正在进行的东非社区健康可持续研究的基线数据。版权所有© 2017约翰威立父子有限公司。