Xu Yaqing, Wu Mengyun, Ma Shuangge, Ahmed Syed Ejaz
Department of Biostatistics, Yale University, New Haven, CT, USA.
School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China.
J Stat Comput Simul. 2018;88(18):3502-3528. doi: 10.1080/00949655.2018.1523411. Epub 2018 Sep 19.
In biomedical and epidemiological studies, gene-environment (G-E) interactions have been shown to importantly contribute to the etiology and progression of many complex diseases. Most existing approaches for identifying G-E interactions are limited by the lack of robustness against outliers/contaminations in response and predictor spaces. In this study, we develop a novel robust G-E identification approach using the trimmed regression technique under joint modeling. A robust data-driven criterion and stability selection are adopted to determine the trimmed subset which is free from both vertical outliers and leverage points. An effective penalization approach is developed to identify important G-E interactions, respecting the "main effects, interactions" hierarchical structure. Extensive simulations demonstrate the better performance of the proposed approach compared to multiple alternatives. Interesting findings with superior prediction accuracy and stability are observed in the analysis of TCGA data on cutaneous melanoma and breast invasive carcinoma.
在生物医学和流行病学研究中,基因-环境(G-E)相互作用已被证明对许多复杂疾病的病因和进展有重要贡献。大多数现有的识别G-E相互作用的方法受到响应和预测空间中对异常值/污染缺乏稳健性的限制。在本研究中,我们开发了一种在联合建模下使用修剪回归技术的新型稳健G-E识别方法。采用稳健的数据驱动准则和稳定性选择来确定既无垂直异常值又无杠杆点的修剪子集。开发了一种有效的惩罚方法来识别重要的G-E相互作用,同时尊重“主效应、相互作用”的层次结构。广泛的模拟表明,与多种替代方法相比,所提出的方法具有更好的性能。在对皮肤黑色素瘤和乳腺浸润性癌的TCGA数据分析中观察到了具有卓越预测准确性和稳定性的有趣发现。