Center for Applied Statistics, Renmin University of China, Beijing, China.
School of Statistics, Renmin University of China, Beijing, China.
Stat Med. 2019 Jul 30;38(17):3221-3242. doi: 10.1002/sim.8172. Epub 2019 Apr 16.
In this article, we consider a semiparametric additive partially linear interaction model for the integrative analysis of multiple genetic datasets. The goals are to identify important genetic predictors and gene-gene interactions and to estimate the nonparametric functions that describe the environmental effects at the same time. To find the similarities and differences of the genetic effects across different datasets, we impose a group structure on the regression coefficients matrix under the homogeneity assumption, ie, models for different datasets share the same sparsity structure, but the coefficients may differ across datasets. We develop an iterative approach to estimate the parameters of main effects, interactions and nonparametric functions, where a reparametrization of interaction parameters is implemented to meet the strong hierarchy assumption. We demonstrate the advantages of the proposed method in identification, estimation, and prediction in a series of numerical studies. We also apply the proposed method to the Skin Cutaneous Melanoma data and the lung cancer data from the Cancer Genome Atlas.
在本文中,我们考虑了一种用于综合分析多个遗传数据集的半参数加性部分线性交互模型。目的是识别重要的遗传预测因子和基因-基因相互作用,并同时估计描述环境效应的非参数函数。为了找到不同数据集之间遗传效应的相似性和差异,我们在同质性假设下对回归系数矩阵施加了一个群组结构,即不同数据集的模型共享相同的稀疏结构,但系数可能在数据集之间有所不同。我们开发了一种迭代方法来估计主要效应、交互作用和非参数函数的参数,其中实施了交互作用参数的重参数化,以满足强层次假设。我们在一系列数值研究中展示了所提出方法在识别、估计和预测方面的优势。我们还将所提出的方法应用于来自癌症基因组图谱的皮肤黑色素瘤数据和肺癌数据。