Department of Biostatistics, Johns Hopkins Bloomberg SPH, 615 N Wolfe St, Baltimore, MD 21205, USA and Division of Cancer Epidemiology and Genetics, National Cancer Institute, Shady Grove, 9609 Medical Center Drive, Rockville, MD 20850, USA.
Department of Biostatistics, Johns Hopkins Bloomberg SPH, 615 N Wolfe St, Baltimore, MD 21205, USA.
Biostatistics. 2021 Oct 13;22(4):772-788. doi: 10.1093/biostatistics/kxz065.
Cancers are routinely classified into subtypes according to various features, including histopathological characteristics and molecular markers. Previous genome-wide association studies have reported heterogeneous associations between loci and cancer subtypes. However, it is not evident what is the optimal modeling strategy for handling correlated tumor features, missing data, and increased degrees-of-freedom in the underlying tests of associations. We propose to test for genetic associations using a mixed-effect two-stage polytomous model score test (MTOP). In the first stage, a standard polytomous model is used to specify all possible subtypes defined by the cross-classification of the tumor characteristics. In the second stage, the subtype-specific case-control odds ratios are specified using a more parsimonious model based on the case-control odds ratio for a baseline subtype, and the case-case parameters associated with tumor markers. Further, to reduce the degrees-of-freedom, we specify case-case parameters for additional exploratory markers using a random-effect model. We use the Expectation-Maximization algorithm to account for missing data on tumor markers. Through simulations across a range of realistic scenarios and data from the Polish Breast Cancer Study (PBCS), we show MTOP outperforms alternative methods for identifying heterogeneous associations between risk loci and tumor subtypes. The proposed methods have been implemented in a user-friendly and high-speed R statistical package called TOP (https://github.com/andrewhaoyu/TOP).
癌症通常根据各种特征(包括组织病理学特征和分子标志物)分为亚型。以前的全基因组关联研究报告了基因座与癌症亚型之间存在异质性关联。然而,对于处理相关肿瘤特征、缺失数据和潜在关联检验中增加的自由度,哪种建模策略是最佳的,这一点并不明显。我们建议使用混合效应两阶段多项模型评分检验(MTOP)来检验遗传关联。在第一阶段,使用标准多项模型来指定由肿瘤特征交叉分类定义的所有可能的亚型。在第二阶段,使用基于基线亚型病例对照优势比和与肿瘤标志物相关的病例病例参数的更简约模型来指定亚型特异性病例对照优势比。此外,为了减少自由度,我们使用随机效应模型为额外的探索性标志物指定病例病例参数。我们使用期望最大化算法来处理肿瘤标志物上的缺失数据。通过在一系列现实场景和波兰乳腺癌研究(PBCS)的数据中进行模拟,我们表明 MTOP 在识别风险基因座与肿瘤亚型之间的异质性关联方面优于替代方法。所提出的方法已在一个名为 TOP(https://github.com/andrewhaoyu/TOP)的用户友好且高速的 R 统计软件包中实现。