Zhu Ruoqing, Zhao Qing, Zhao Hongyu, Ma Shuangge
Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, USA.
Department of Biostatistics, Yale University, New Haven, CT, USA.
Biostatistics. 2016 Oct;17(4):605-18. doi: 10.1093/biostatistics/kxw010. Epub 2016 Mar 14.
In multidimensional cancer omics studies, one subject is profiled on multiple layers of omics activities. In this article, the goal is to integrate multiple types of omics measurements, identify markers, and build a model for cancer outcome. The proposed analysis is achieved in two steps. In the first step, we analyze the regulation among different types of omics measurements, through the construction of linear regulatory modules (LRMs). The LRMs have sound biological basis, and their construction differs from the existing analyses by modeling the regulation of sets of gene expressions (GEs) by sets of regulators. The construction is realized with the assistance of regularized singular value decomposition. In the second step, the proposed cancer outcome model includes the regulated GEs, "residuals" of GEs, and "residuals" of regulators, and we use regularized estimation to select relevant markers. Simulation shows that the proposed method outperforms the alternatives with more accurate marker identification. We analyze the The Cancer Genome Atlas data on cutaneous melanoma and lung adenocarcinoma and obtain meaningful results.
在多维癌症组学研究中,一个研究对象会在多层组学活动中进行剖析。在本文中,目标是整合多种类型的组学测量数据,识别标志物,并构建一个用于预测癌症预后的模型。所提出的分析分两步进行。第一步,我们通过构建线性调控模块(LRM)来分析不同类型组学测量之间的调控关系。LRM具有坚实的生物学基础,其构建与现有分析不同之处在于,它通过调节因子集对基因表达集(GE)进行建模。该构建借助正则化奇异值分解得以实现。第二步,所提出的癌症预后模型包括受调控的GE、GE的“残差”以及调节因子的“残差”,并且我们使用正则化估计来选择相关标志物。模拟结果表明,所提出的方法在标志物识别方面比其他方法更准确,性能更优。我们分析了来自癌症基因组图谱(The Cancer Genome Atlas)的皮肤黑色素瘤和肺腺癌数据,并获得了有意义的结果。