IEEE/ACM Trans Comput Biol Bioinform. 2017 Jan-Feb;14(1):145-154. doi: 10.1109/TCBB.2015.2511758. Epub 2015 Dec 23.
Different types of genomic aberration may simultaneously contribute to tumorigenesis. To obtain a more accurate prognostic assessment to guide therapeutic regimen choice for cancer patients, the heterogeneous multi-omics data should be integrated harmoniously, which can often be difficult. For this purpose, we propose a Gene Interaction Regularized Elastic Net (GIREN) model that predicts clinical outcome by integrating multiple data types. GIREN conveniently embraces both gene measurements and gene-gene interaction information under an elastic net formulation, enforcing structure sparsity, and the "grouping effect" in solution to select the discriminate features with prognostic value. An iterative gradient descent algorithm is also developed to solve the model with regularized optimization. GIREN was applied to human ovarian cancer and breast cancer datasets obtained from The Cancer Genome Atlas, respectively. Result shows that, the proposed GIREN algorithm obtained more accurate and robust performance over competing algorithms (LASSO, Elastic Net, and Semi-supervised PCA, with or without average pathway expression features) in predicting cancer progression on both two datasets in terms of median area under curve (AUC) and interquartile range (IQR), suggesting a promising direction for more effective integration of gene measurement and gene interaction information.
不同类型的基因组异常可能同时促成肿瘤的发生。为了更准确地评估预后,从而指导癌症患者的治疗方案选择,需要将异质的多组学数据进行和谐地整合,但这往往颇具难度。为此,我们提出了一种基因互作正则化弹性网络(GIREN)模型,该模型通过整合多种数据类型来预测临床结局。GIREN 方便地在弹性网络公式下纳入了基因测量值和基因-基因互作信息,从而强制实施结构稀疏性,并在求解过程中产生“分组效应”,以选择具有预后价值的判别特征。我们还开发了一种迭代梯度下降算法来对正则化优化后的模型进行求解。我们将所提出的 GIREN 算法分别应用于从癌症基因组图谱(TCGA)获取的人类卵巢癌和乳腺癌数据集,结果表明,与竞争算法(LASSO、弹性网络和半监督 PCA,是否带有平均通路表达特征)相比,该算法在预测两种数据集中的癌症进展方面具有更准确和稳健的性能,表现在中位数曲线下面积(AUC)和四分位间距(IQR)方面,这为更有效地整合基因测量值和基因互作信息提供了一个很有前景的方向。