Zabor Emily C, Begg Colin B
Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, 485 Lexington Ave, 2nd floor, New York, NY 10017, USA.
Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 W 168th St, New York, NY 10032, USA.
Stat Med. 2017 Nov 10;36(25):4050-4060. doi: 10.1002/sim.7405. Epub 2017 Jul 26.
Cancer epidemiologic research has traditionally been guided by the premise that certain diseases share an underlying etiology, or cause. However, with the rise of molecular and genomic profiling, attention has increasingly focused on identifying subtypes of disease. As subtypes are identified, it is natural to ask the question of whether they share a common etiology or in fact arise from distinct sets of risk factors. In this context, epidemiologic questions of interest include (1) whether a risk factor of interest has the same effect across all subtypes of disease and (2) whether risk factor effects differ across levels of each individual tumor marker of which the subtypes are comprised. A number of statistical models have been proposed to address these questions. In an effort to determine the similarities and differences among the proposed methods, and to identify any advantages or disadvantages, we use a simplified data example to elucidate the interpretation of model parameters and available hypothesis tests, and we perform a simulation study to assess bias in effect size, type I error, and power. The results show that when the number of tumor markers is small enough that the cross-classification of markers can be evaluated in the traditional polytomous logistic regression framework, then the statistical properties are at least as good as the more complex modeling approaches that have been proposed. The potential advantage of more complex methods is in the ability to accommodate multiple tumor markers in a model of reduced parametric dimension.
癌症流行病学研究传统上是基于某些疾病具有共同潜在病因这一前提展开的。然而,随着分子和基因组分析的兴起,人们越来越关注疾病亚型的识别。随着亚型被识别出来,自然而然会提出这样的问题:它们是否共享一个共同的病因,或者实际上是否源自不同的风险因素集合。在这种背景下,感兴趣的流行病学问题包括:(1)一个感兴趣的风险因素在疾病的所有亚型中是否具有相同的效应;(2)风险因素效应在构成亚型的每个个体肿瘤标志物的不同水平之间是否存在差异。已经提出了许多统计模型来解决这些问题。为了确定所提出方法之间的异同,并识别任何优点或缺点,我们使用一个简化的数据示例来阐明模型参数的解释和可用的假设检验,并且我们进行了一项模拟研究来评估效应大小、I型错误和检验效能方面的偏差。结果表明,当肿瘤标志物的数量足够少以至于可以在传统的多分类逻辑回归框架中评估标志物的交叉分类时,那么统计特性至少与所提出的更复杂的建模方法一样好。更复杂方法的潜在优势在于能够在参数维度降低的模型中纳入多个肿瘤标志物。