Department of Epidemiology and Biostatistics, University of California, San Francisco, 550 16th Street, San Francisco, CA USA.
Biostatistics. 2020 Jul 1;21(3):483-498. doi: 10.1093/biostatistics/kxy068.
With the advent of electronic health records, information collected in the course of regular health care is increasingly being used for clinical research. The hope is that the wealth of clinical data and the realistic setting (compared with information derived from highly controlled experiments like randomized trials) will aid in the investigation of determinants of disease and understanding of which treatments are effective in regular practice and for which patients. The availability of information in such databases is often driven by how a patient feels and may therefore be associated with the health outcomes being considered. We call this an outcome dependent visit process and recent work has shown that ignoring the outcome dependence can produce significant bias in the regression coefficients when fitting longitudinal data models. It is therefore important to have tools to recognize datasets exhibiting outcome dependence. We develop a score statistic to motivate the form of diagnostic test statistics, suggest a variety of approaches for diagnosing such situations, and evaluate their performance. Simple diagnostic tests achieve high power for diagnosing outcome dependent visit processes. This occurs when generalized estimating equations methods begin to be exhibit bias in estimating regression coefficients and before likelihood based methods are substantially biased.
随着电子健康记录的出现,在常规医疗保健过程中收集的信息越来越多地被用于临床研究。人们希望丰富的临床数据和现实环境(与随机试验等高度受控实验中得出的信息相比)将有助于研究疾病的决定因素,并了解哪些治疗方法在常规实践中有效,以及对哪些患者有效。此类数据库中信息的可用性通常取决于患者的感受,因此可能与正在考虑的健康结果有关。我们称这种情况为依赖结果的就诊过程,最近的研究表明,在拟合纵向数据模型时,如果忽略结果的依赖性,回归系数可能会产生显著的偏差。因此,拥有识别表现出依赖结果的数据集的工具非常重要。我们开发了一个评分统计量来推导出诊断检验统计量的形式,提出了多种诊断这种情况的方法,并评估了它们的性能。简单的诊断检验在诊断依赖结果的就诊过程方面具有很高的功效。当广义估计方程方法开始在估计回归系数时出现偏差,并且基于似然的方法没有出现显著偏差之前,就会出现这种情况。