Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
Stat Med. 2024 Jan 30;43(2):379-394. doi: 10.1002/sim.9967. Epub 2023 Nov 21.
Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive effectiveness among 83 671 women living with HIV, whose data were originally extracted from electronic medical records, of whom 4732 had their charts reviewed, and a subsequent 1210 also had a telephone interview to validate key study variables.
验证研究通常用于在存在易错数据的环境中获取更可靠的信息。可以将经过验证的子样本数据与所有对象的易错数据结合起来,以改善估计结果。在实践中,可能需要进行多轮数据验证,并且直接将验证数据应用于分析的标准方法可能会导致效率低下的估计值,因为中间验证步骤中提供的信息仅被部分考虑,甚至完全被忽略。在本文中,我们提出了两种新的多重插补和广义耙式估计的扩展方法,它们充分利用了所有可用数据。通过模拟研究表明,结合中间步骤的信息可以显著提高效率。这项工作的动机是并通过对 83671 名艾滋病毒感染者的避孕效果研究进行了说明,这些数据最初是从电子病历中提取的,其中有 4732 名患者的病历被审查,随后有 1210 名患者还接受了电话访谈以验证关键的研究变量。