Peng Jie, Zhu Ji, Bergamaschi Anna, Han Wonshik, Noh Dong-Young, Pollack Jonathan R, Wang Pei
Department of Statistics, University of California, Davis, CA, USA.
Department of Statistics, University of Michigan, Ann Arbor, MI, USA.
Ann Appl Stat. 2010 Mar;4(1):53-77. doi: 10.1214/09-AOAS271SUPP.
In this paper, we propose a new method remMap - REgularized Multivariate regression for identifying MAster Predictors - for fitting multivariate response regression models under the high-dimension-low-sample-size setting. remMap is motivated by investigating the regulatory relationships among different biological molecules based on multiple types of high dimensional genomic data. Particularly, we are interested in studying the influence of DNA copy number alterations on RNA transcript levels. For this purpose, we model the dependence of the RNA expression levels on DNA copy numbers through multivariate linear regressions and utilize proper regularization to deal with the high dimensionality as well as to incorporate desired network structures. Criteria for selecting the tuning parameters are also discussed. The performance of the proposed method is illustrated through extensive simulation studies. Finally, remMap is applied to a breast cancer study, in which genome wide RNA transcript levels and DNA copy numbers were measured for 172 tumor samples. We identify a trans-hub region in cytoband 17q12-q21, whose amplification influences the RNA expression levels of more than 30 unlinked genes. These findings may lead to a better understanding of breast cancer pathology.
在本文中,我们提出了一种新方法remMap——正则化多元回归法,用于识别主预测因子,以便在高维小样本量设置下拟合多元响应回归模型。remMap的灵感来自于基于多种类型的高维基因组数据研究不同生物分子之间的调控关系。特别地,我们感兴趣的是研究DNA拷贝数改变对RNA转录水平的影响。为此,我们通过多元线性回归对RNA表达水平对DNA拷贝数的依赖性进行建模,并利用适当的正则化来处理高维问题以及纳入所需的网络结构。还讨论了选择调优参数的标准。通过广泛的模拟研究说明了所提方法的性能。最后,将remMap应用于一项乳腺癌研究,该研究对172个肿瘤样本的全基因组RNA转录水平和DNA拷贝数进行了测量。我们在细胞带17q12 - q21中确定了一个跨枢纽区域,其扩增影响了30多个不相关基因的RNA表达水平。这些发现可能有助于更好地理解乳腺癌病理学。