School of Public Health, Yale University, New Haven, CT, USA.
Stat Med. 2011 Dec 10;30(28):3361-71. doi: 10.1002/sim.4337. Epub 2011 Aug 25.
Although in cancer research microarray gene profiling studies have been successful in identifying genetic variants predisposing to the development and progression of cancer, the identified markers from analysis of single datasets often suffer low reproducibility. Among multiple possible causes, the most important one is the small sample size hence the lack of power of single studies. Integrative analysis jointly considers multiple heterogeneous studies, has a significantly larger sample size, and can improve reproducibility. In this article, we focus on cancer prognosis studies, where the response variables are progression-free, overall, or other types of survival. A group minimax concave penalty (GMCP) penalized integrative analysis approach is proposed for analyzing multiple heterogeneous cancer prognosis studies with microarray gene expression measurements. An efficient group coordinate descent algorithm is developed. The GMCP can automatically accommodate the heterogeneity across multiple datasets, and the identified markers have consistent effects across multiple studies. Simulation studies show that the GMCP provides significantly improved selection results as compared with the existing meta-analysis approaches, intensity approaches, and group Lasso penalized integrative analysis. We apply the GMCP to four microarray studies and identify genes associated with the prognosis of breast cancer.
尽管在癌症研究中,微阵列基因谱研究成功地确定了易患癌症发展和进展的遗传变异体,但从单一数据集分析中识别出的标记往往存在低可重复性。在多个可能的原因中,最重要的原因是样本量小,因此单个研究的功效不足。综合分析联合考虑了多个异质研究,具有更大的样本量,可以提高可重复性。在本文中,我们专注于癌症预后研究,其中因变量是无进展、总生存期或其他类型的生存。针对具有微阵列基因表达测量的多个异质癌症预后研究,提出了一种组最小最大凹惩罚(GMCP)惩罚集成分析方法。开发了一种有效的组坐标下降算法。GMCP 可以自动适应多个数据集之间的异质性,并且确定的标记在多个研究中具有一致的影响。模拟研究表明,与现有的荟萃分析方法、强度方法和组 Lasso 惩罚集成分析相比,GMCP 提供了显著改进的选择结果。我们将 GMCP 应用于四个微阵列研究,并确定了与乳腺癌预后相关的基因。