Shen Dixin, Lewinger Juan Pablo, Kawaguchi Eric
Clinical Data Science, Gilead Sciences, Foster City, USA.
Division of Biostatistics, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, USA.
BioData Min. 2024 Oct 24;17(1):44. doi: 10.1186/s13040-024-00398-6.
Associated with high-dimensional omics data there are often "meta-features" such as biological pathways and functional annotations, summary statistics from similar studies that can be informative for predicting an outcome of interest. We introduce a regularized hierarchical framework for integrating meta-features, with the goal of improving prediction and feature selection performance with time-to-event outcomes.
A hierarchical framework is deployed to incorporate meta-features. Regularization is applied to the omic features as well as the meta-features so that high-dimensional data can be handled at both levels. The proposed hierarchical Cox model can be efficiently fitted by a combination of iterative reweighted least squares and cyclic coordinate descent.
In a simulation study we show that when the external meta-features are informative, the regularized hierarchical model can substantially improve prediction performance over standard regularized Cox regression. We illustrate the proposed model with applications to breast cancer and melanoma survival based on gene expression profiles, which show the improvement in prediction performance by applying meta-features, as well as the discovery of important omic feature sets with sparse regularization at meta-feature level.
The proposed hierarchical regularized regression model enables integration of external meta-feature information directly into the modeling process for time-to-event outcomes, improves prediction performance when the external meta-feature data is informative. Importantly, when the external meta-features are uninformative, the prediction performance based on the regularized hierarchical model is on par with standard regularized Cox regression, indicating robustness of the framework. In addition to developing predictive signatures, the model can also be deployed in discovery applications where the main goal is to identify important features associated with the outcome rather than developing a predictive model.
与高维组学数据相关的通常有“元特征”,如生物途径和功能注释,以及来自相似研究的汇总统计量,这些对于预测感兴趣的结果可能具有参考价值。我们引入了一个正则化分层框架来整合元特征,目的是提高对事件发生时间结局的预测和特征选择性能。
部署一个分层框架来纳入元特征。对组学特征和元特征都应用正则化,以便在两个层面处理高维数据。所提出的分层Cox模型可以通过迭代加权最小二乘法和循环坐标下降法的组合有效地进行拟合。
在一项模拟研究中,我们表明当外部元特征具有参考价值时,正则化分层模型相对于标准正则化Cox回归能够显著提高预测性能。我们基于基因表达谱将所提出的模型应用于乳腺癌和黑色素瘤生存分析,展示了通过应用元特征在预测性能上的提升,以及在元特征层面通过稀疏正则化发现重要的组学特征集。
所提出的分层正则化回归模型能够将外部元特征信息直接整合到事件发生时间结局的建模过程中,当外部元特征数据具有参考价值时可提高预测性能。重要的是,当外部元特征没有参考价值时,基于正则化分层模型的预测性能与标准正则化Cox回归相当,表明该框架具有稳健性。除了开发预测性特征外,该模型还可应用于探索性研究,其主要目标是识别与结局相关的重要特征,而不是开发预测模型。