Sun Yifan, Sun Zhengyang, Jiang Yu, Li Yang, Ma Shuangge
Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China.
School of Public Health, University of Memphis, Tennessee, USA.
Stat Methods Med Res. 2020 May;29(5):1325-1337. doi: 10.1177/0962280219859026. Epub 2019 Jul 7.
In cancer research, high-throughput profiling has been extensively conducted. In recent studies, the integrative analysis of data on multiple cancer patient groups/subgroups has been conducted. Such analysis has the potential to reveal the genomic commonality as well as difference across groups/subgroups. However, in the existing literature, methods with a special attention to the genomic commonality and difference are very limited. In this study, a novel estimation and marker selection method based on the sparse boosting technique is developed to address the commonality/difference problem. In terms of technical innovation, a new penalty and computation of increments are introduced. The proposed method can also effectively accommodate the grouping structure of covariates. Simulation shows that it can outperform direct competitors under a wide spectrum of settings. The analysis of two TCGA (The Cancer Genome Atlas) datasets is conducted, showing that the proposed analysis can identify markers with important biological implications and have satisfactory prediction and stability.
在癌症研究中,高通量分析已被广泛开展。在最近的研究中,对多个癌症患者组/亚组的数据进行了综合分析。这种分析有可能揭示不同组/亚组之间的基因组共性和差异。然而,在现有文献中,特别关注基因组共性和差异的方法非常有限。在本研究中,开发了一种基于稀疏提升技术的新型估计和标记选择方法来解决共性/差异问题。在技术创新方面,引入了一种新的惩罚和增量计算方法。所提出的方法还可以有效地适应协变量的分组结构。模拟表明,在广泛的设置下,它优于直接竞争对手。对两个TCGA(癌症基因组图谱)数据集进行了分析,结果表明所提出的分析能够识别具有重要生物学意义的标记,并且具有令人满意的预测能力和稳定性。