Yang Wenlong, Liu Danping, Bao Le, Li Runze
Department of Statistics, The Pennsylvania State University, University Park, PA 16802, United States.
Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, United States.
Biometrics. 2024 Oct 3;80(4). doi: 10.1093/biomtc/ujae147.
Estimating new HIV infections is significant yet challenging due to the difficulty in distinguishing between recent and long-term infections. We demonstrate that HIV recency status (recent versus long-term) could be determined from self-report testing history and biomarkers, which are increasingly available in bio-behavioral surveys. HIV recency status is partially observed, given the self-report testing history. For example, people who tested positive for HIV over 1 year ago should have a long-term infection. Based on the nationally representative samples collected by the Population-based HIV Impact Assessment (PHIA) Project, we propose a likelihood-based probabilistic model for HIV recency classification. The model incorporates individuals with known recency status based on testing histories and individuals whose recency status could not be determined and integrates the mechanism of how HIV recency status depends on biomarkers and the mechanism of how HIV recency status, together with the self-report time of the most recent HIV test, impacts the test results. We compare our method to logistic regression and the binary classification tree (current practice) on Malawi PHIA data, as well as on simulated data. Our model obtains more efficient and less biased parameter estimates and is relatively robust to potential reporting error and model misspecification.
由于难以区分近期感染和长期感染,估计新增艾滋病毒感染病例既重要又具有挑战性。我们证明,艾滋病毒感染近期状态(近期感染与长期感染)可根据自我报告的检测史和生物标志物来确定,而这些在生物行为调查中越来越容易获得。考虑到自我报告的检测史,艾滋病毒感染近期状态是部分可观察的。例如,一年多前艾滋病毒检测呈阳性的人应该是长期感染。基于基于人群的艾滋病毒影响评估(PHIA)项目收集的具有全国代表性的样本,我们提出了一种基于似然性的艾滋病毒感染近期状态分类概率模型。该模型纳入了根据检测史已知感染近期状态的个体以及感染近期状态无法确定的个体,并整合了艾滋病毒感染近期状态如何依赖生物标志物的机制以及艾滋病毒感染近期状态与最近一次艾滋病毒检测的自我报告时间如何影响检测结果的机制。我们将我们的方法与逻辑回归和二元分类树(当前做法)在马拉维PHIA数据以及模拟数据上进行比较。我们的模型获得了更有效且偏差更小的参数估计,并且对潜在的报告误差和模型设定错误相对稳健。