Lee Wonyul, Du Ying, Sun Wei, Hayes D Neil, Liu Yufeng
University of North Carolina at Chapel Hill.
Stat Anal Data Min. 2012 Dec 1;5(6). doi: 10.1002/sam.11158.
Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.
多响应回归是一种有用的回归技术,用于使用同一组预测变量对多个响应变量进行建模。大多数现有的多响应回归方法是为对同质数据进行建模而设计的。然而,在许多应用中,可能会遇到异质数据,其中样本被分为多个组。我们的激励示例是一个癌症数据集,其中样本属于多个癌症亚型。在本文中,我们考虑对来自具有已知组标签的多个高斯分布混合的数据进行建模。一种简单的方法是根据标签将数据分成几个组,并分别对每个组进行建模。虽然这种方法很简单,但它忽略了不同组之间潜在的共同结构。我们提出了新的惩罚方法来对所有组进行联合建模,从而可以识别共同结构和独特结构。所提出的方法估计回归系数矩阵以及响应变量的条件逆协方差矩阵。我们探索了所提出方法的渐近性质。通过数值示例,我们证明了使用所提出的方法对所有组进行联合建模可以提高估计和预测效果。对胶质母细胞瘤癌症数据集的应用揭示了不同癌症亚型之间一些有趣的共同和独特的基因关系。