School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China; Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America.
Genet Epidemiol. 2014 Apr;38(3):220-30. doi: 10.1002/gepi.21795. Epub 2014 Feb 24.
In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model misspecification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example, with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications.
在高通量研究中,一个重要的目标是确定与疾病结果和表型相关的基因-环境相互作用。许多常用的方法假设特定的参数或半参数模型,这些模型可能存在模型误设的问题。此外,它们通常使用显著水平作为选择重要相互作用的标准。在这项研究中,我们采用基于排名的估计,它比一些现有的方法对模型规范的敏感性要低得多,并且包括了几种常见的数据和模型作为特例。惩罚用于识别基因-环境相互作用。它实现了同时估计和识别,不依赖于显著水平。为了计算可行性,我们进一步提出了平滑的秩估计。模拟表明,在某些情况下,例如数据受到污染或具有重尾,所提出的方法可以显著优于现有的替代方法,从而更准确地识别。我们分析了一个肺癌预后研究,其中包括基因表达测量在 AFT(加速失效时间)模型下的数据。所提出的方法识别出的相互作用与使用替代方法的不同。一些确定的基因具有重要的意义。