Department of Health Statistics, School of Public Health and Management, Binzhou Medical University, Yantai City, Shandong 264003, China.
Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan City, Shanxi 030001, China.
Math Biosci Eng. 2023 Jan 11;20(3):5352-5378. doi: 10.3934/mbe.2023248.
Penalized Cox regression can efficiently be used for the determination of biomarkers in high-dimensional genomic data related to disease prognosis. However, results of Penalized Cox regression is influenced by the heterogeneity of the samples who have different dependent structure between survival time and covariates from most individuals. These observations are called influential observations or outliers. A robust penalized Cox model (Reweighted Elastic Net-type maximum trimmed partial likelihood estimator, Rwt MTPL-EN) is proposed to improve the prediction accuracy and identify influential observations. A new algorithm AR-Cstep to solve Rwt MTPL-EN model is also proposed. This method has been validated by simulation study and application to glioma microarray expression data. When there were no outliers, the results of Rwt MTPL-EN were close to the Elastic Net (EN). When outliers existed, the results of EN were impacted by outliers. And whenever the censored rate was large or low, the robust Rwt MTPL-EN performed better than EN. and could resist the outliers in both predictors and response. In terms of outliers detection accuracy, Rwt MTPL-EN was much higher than EN. The outliers who "lived too long" made EN perform worse, but were accurately detected by Rwt MTPL-EN. Through the analysis of glioma gene expression data, most of the outliers identified by EN were those "failed too early", but most of them were not obvious outliers according to risk estimated from omics data or clinical variables. Most of the outliers identified by Rwt MTPL-EN were those who "lived too long", and most of them were obvious outliers according to risk estimated from omics data or clinical variables. Rwt MTPL-EN can be adopted to detect influential observations in high-dimensional survival data.
惩罚 Cox 回归可有效地用于确定与疾病预后相关的高维基因组数据中的生物标志物。然而,惩罚 Cox 回归的结果受到样本异质性的影响,这些样本的生存时间和协变量之间的依赖结构与大多数个体不同。这些观察结果称为有影响的观察结果或异常值。本文提出了一种稳健的惩罚 Cox 模型(重加权弹性网络型最大修剪部分似然估计量,Rwt MTPL-EN),以提高预测准确性并识别有影响的观察结果。还提出了一种新的算法 AR-Cstep 来求解 Rwt MTPL-EN 模型。该方法通过模拟研究和Glioma 微阵列表达数据的应用得到了验证。当不存在异常值时,Rwt MTPL-EN 的结果与弹性网络(EN)接近。当存在异常值时,EN 的结果会受到异常值的影响。并且,无论删失率高还是低,稳健的 Rwt MTPL-EN 都比 EN 表现更好。并且可以抵抗预测因子和响应中的异常值。在异常值检测准确性方面,Rwt MTPL-EN 明显高于 EN。那些“活得太久”的异常值使 EN 的表现更差,但却被 Rwt MTPL-EN 准确地检测到。通过对Glioma 基因表达数据的分析,EN 识别出的大多数异常值是那些“过早失败”的异常值,但根据来自组学数据或临床变量的风险估计,大多数异常值并不是明显的异常值。Rwt MTPL-EN 识别出的大多数异常值是那些“活得太久”的异常值,并且根据来自组学数据或临床变量的风险估计,大多数异常值都是明显的异常值。Rwt MTPL-EN 可用于检测高维生存数据中的有影响的观察结果。