Department of Mathematics and Statistics, University of Turku, Turku, Finland.
Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
PLoS Comput Biol. 2023 Mar 10;19(3):e1010333. doi: 10.1371/journal.pcbi.1010333. eCollection 2023 Mar.
In many real-world applications, such as those based on electronic health records, prognostic prediction of patient survival is based on heterogeneous sets of clinical laboratory measurements. To address the trade-off between the predictive accuracy of a prognostic model and the costs related to its clinical implementation, we propose an optimized L0-pseudonorm approach to learn sparse solutions in multivariable regression. The model sparsity is maintained by restricting the number of nonzero coefficients in the model with a cardinality constraint, which makes the optimization problem NP-hard. In addition, we generalize the cardinality constraint for grouped feature selection, which makes it possible to identify key sets of predictors that may be measured together in a kit in clinical practice. We demonstrate the operation of our cardinality constraint-based feature subset selection method, named OSCAR, in the context of prognostic prediction of prostate cancer patients, where it enables one to determine the key explanatory predictors at different levels of model sparsity. We further explore how the model sparsity affects the model accuracy and implementation cost. Lastly, we demonstrate generalization of the presented methodology to high-dimensional transcriptomics data.
在许多实际应用中,例如基于电子健康记录的应用,患者生存预后的预测是基于临床实验室测量的异质数据集。为了解决预后模型的预测准确性与模型临床实施相关成本之间的权衡问题,我们提出了一种优化的 L0-伪范数方法,用于学习多变量回归中的稀疏解。通过使用基数约束来限制模型中非零系数的数量来保持模型的稀疏性,这使得优化问题变得 NP 难。此外,我们还将基数约束推广到了分组特征选择中,这使得在临床实践中有可能一起测量试剂盒中关键的预测因子。我们在前列腺癌患者预后预测的背景下展示了我们基于基数约束的特征子集选择方法 OSCAR 的操作,它可以在不同的模型稀疏度水平上确定关键的解释性预测因子。我们进一步探讨了模型稀疏性如何影响模型准确性和实施成本。最后,我们展示了所提出的方法学在高维转录组学数据中的泛化。