Qu Tianyi, Li Bo, Chan Man-Pui Sally, Albarracin Dolores
Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, 61820, USA.
Annenberg School for Communication, University of Pennsylvania, Philadelphia, Pennsylvania, 19104, USA.
Stat. 2023 Jan-Dec;12(1). doi: 10.1002/sta4.555. Epub 2023 Mar 1.
Public health data, such as HIV new diagnoses, are often left-censored due to confidentiality issues. Standard analysis approaches that assume censored values as missing at random often lead to biased estimates and inferior predictions. Motivated by the Philadelphia areal counts of HIV new diagnosis for which all values less than or equal to 5 are suppressed, we propose two methods to reduce the adverse influence of missingness on predictions and imputation of areal HIV new diagnoses. One is the likelihood-based method that integrates the missing mechanism into the likelihood function, and the other is a nonparametric algorithm for matrix factorization imputation. Numerical studies and the Philadelphia data analysis demonstrate that the two proposed methods can significantly improve prediction and imputation based on left-censored HIV data. We also compare the two methods on their robustness to model misspecification and find that both methods appear to be robust for prediction, while their performance for imputation depends on model specification.
公共卫生数据,如艾滋病毒新诊断病例数,由于保密问题常常存在左删失情况。标准分析方法假定删失值为随机缺失,这往往会导致估计有偏差且预测效果不佳。受费城地区艾滋病毒新诊断病例数计数的启发,对于所有小于或等于5的值都进行了抑制,我们提出了两种方法来减少缺失对地区艾滋病毒新诊断病例数预测和插补的不利影响。一种是基于似然的方法,将缺失机制整合到似然函数中,另一种是用于矩阵分解插补的非参数算法。数值研究和费城数据分析表明,所提出的两种方法能够显著改善基于左删失艾滋病毒数据的预测和插补。我们还比较了这两种方法对模型误设的稳健性,发现两种方法在预测方面似乎都很稳健,而它们在插补方面的性能则取决于模型设定。