Department of Statistics, Ohio State University, Columbus, OH, USA.
Department of Biostatistics, Harvard University, Boston, MA, USA.
Stat Med. 2018 Jul 20;37(16):2501-2515. doi: 10.1002/sim.7681. Epub 2018 Apr 17.
Attempts to predict prognosis in cancer patients using high-dimensional genomic data such as gene expression in tumor tissue can be made difficult by the large number of features and the potential complexity of the relationship between features and the outcome. Integrating prior biological knowledge into risk prediction with such data by grouping genomic features into pathways and networks reduces the dimensionality of the problem and could improve prediction accuracy. Additionally, such knowledge-based models may be more biologically grounded and interpretable. Prediction could potentially be further improved by allowing for complex nonlinear pathway effects. The kernel machine framework has been proposed as an effective approach for modeling the nonlinear and interactive effects of genes in pathways for both censored and noncensored outcomes. When multiple pathways are under consideration, one may efficiently select informative pathways and aggregate their signals via multiple kernel learning (MKL), which has been proposed for prediction of noncensored outcomes. In this paper, we propose MKL methods for censored survival outcomes. We derive our approach for a general survival modeling framework with a convex objective function and illustrate its application under the Cox proportional hazards and semiparametric accelerated failure time models. Numerical studies demonstrate that the proposed MKL-based prediction methods work well in finite sample and can potentially outperform models constructed assuming linear effects or ignoring the group knowledge. The methods are illustrated with an application to 2 cancer data sets.
试图使用肿瘤组织中的基因表达等高维基因组数据来预测癌症患者的预后,可能会受到特征数量庞大以及特征与结果之间关系的潜在复杂性的影响。通过将基因组特征分组为途径和网络,将先验生物学知识整合到此类数据的风险预测中,可以降低问题的维度,并提高预测准确性。此外,基于知识的此类模型可能更具有生物学基础且可解释。通过允许复杂的非线性途径效应,预测的准确性可能会进一步提高。核机器框架已被提议作为用于建模途径中基因的非线性和交互效应的有效方法,适用于已删失和未删失的结果。当考虑多个途径时,可以通过多核学习(MKL)有效地选择信息途径并聚合其信号,已提议用于未删失结果的预测。在本文中,我们提出了用于删失生存结果的 MKL 方法。我们为具有凸目标函数的一般生存建模框架推导了我们的方法,并说明了在 Cox 比例风险和半参数加速失效时间模型下的应用。数值研究表明,所提出的基于 MKL 的预测方法在有限样本中表现良好,并且在假设线性效应或忽略组知识的情况下,有潜力胜过模型。该方法通过对 2 个癌症数据集的应用进行了说明。