Qian Jing, Payabvash Seyedmehdi, Kemmling André, Lev Michael H, Schwamm Lee H, Betensky Rebecca A
Division of Biostatistics and Epidemiology, University of Massachusetts, Amherst, Massachusetts 01003, U.S.A.; Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115, U.S.A.
Biometrics. 2014 Mar;70(1):153-63. doi: 10.1111/biom.12113. Epub 2013 Dec 9.
Matched case-control designs are commonly used in epidemiologic studies for increased efficiency. These designs have recently been introduced to the setting of modern imaging and genomic studies, which are characterized by high-dimensional covariates. However, appropriate statistical analyses that adjust for the matching have not been widely adopted. A matched case-control study of 430 acute ischemic stroke patients was conducted at Massachusetts General Hospital (MGH) in order to identify specific brain regions of acute infarction that are associated with hospital acquired pneumonia (HAP) in these patients. There are 138 brain regions in which infarction was measured, which introduce nearly 10,000 two-way interactions, and challenge the statistical analysis. We investigate penalized conditional and unconditional logistic regression approaches to this variable selection problem that properly differentiate between selection of main effects and of interactions, and that acknowledge the matching. This neuroimaging study was nested within a larger prospective study of HAP in 1915 stroke patients at MGH, which recorded clinical variables, but did not include neuroimaging. We demonstrate how the larger study, in conjunction with the nested, matched study, affords us the capability to derive a score for prediction of HAP in future stroke patients based on imaging and clinical features. We evaluate the proposed methods in simulation studies and we apply them to the MGH HAP study.
匹配病例对照设计在流行病学研究中常用以提高效率。这些设计最近已被引入到现代影像学和基因组学研究中,这些研究的特点是协变量具有高维度。然而,针对匹配进行调整的适当统计分析尚未得到广泛采用。马萨诸塞州总医院(MGH)对430例急性缺血性中风患者进行了一项匹配病例对照研究,以确定这些患者中与医院获得性肺炎(HAP)相关的急性梗死的特定脑区。测量梗死的脑区有138个,这引入了近10000个双向交互作用,并对统计分析提出了挑战。我们研究了针对此变量选择问题的惩罚条件和无条件逻辑回归方法,这些方法能正确区分主效应和交互作用的选择,并考虑到匹配因素。这项神经影像学研究嵌套在MGH对1915例中风患者进行的关于HAP的更大规模前瞻性研究中,该前瞻性研究记录了临床变量,但未包括神经影像学。我们展示了更大规模的研究与嵌套的匹配研究相结合如何使我们有能力根据影像学和临床特征为未来中风患者预测HAP得出一个分数。我们在模拟研究中评估了所提出的方法,并将其应用于MGH的HAP研究中。