Cai Kaida, Fu Wenzhi, Liu Hanwen, Yang Xiaofang, Wang Zhengyan, Zhao Xin
Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China.
Department of Statistics and Actuarial Science, School of Mathematics, Southeast University, Nanjing 211189, China.
Genes (Basel). 2024 Nov 21;15(12):1497. doi: 10.3390/genes15121497.
There exist significant challenges for lung adenocarcinoma (LUAD) due to its poor prognosis and limited treatment options, particularly in the advanced stages. It is crucial to identify genetic biomarkers for improving outcome predictions and guiding personalized therapies. In this study, we utilize a multi-step approach that combines principled sure independence screening, penalized regression methods and information gain to identify the key genetic features of the ultra-high dimensional RNA-sequencing data from LUAD patients. We then evaluate three methods of survival analysis: the Cox model, survival tree, and random survival forests (RSFs), to compare their predictive performance. Additionally, a protein-protein interaction network is used to explore the biological significance of identified genes. and are consistently selected as significant predictors across all feature selection methods. The Kaplan-Meier method shows that high expression levels of these genes are strongly correlated with poorer survival outcomes, suggesting their potential as prognostic biomarkers. RSF outperforms Cox and survival tree methods, showing higher AUC and C-index values. The protein-protein interaction network highlights key nodes such as and , which play central roles in LUAD progression. Our findings provide valuable insights into the genetic mechanisms of LUAD. These results contribute to the development of more accurate prognostic tools and personalized treatment strategies for LUAD.
由于预后不良和治疗选择有限,尤其是在晚期阶段,肺腺癌(LUAD)面临着重大挑战。识别基因生物标志物对于改善预后预测和指导个性化治疗至关重要。在本研究中,我们采用了一种多步骤方法,该方法结合了有原则的确定独立筛选、惩罚回归方法和信息增益,以识别来自LUAD患者的超高维RNA测序数据的关键基因特征。然后,我们评估了三种生存分析方法:Cox模型、生存树和随机生存森林(RSF),以比较它们的预测性能。此外,利用蛋白质-蛋白质相互作用网络来探索已识别基因的生物学意义。在所有特征选择方法中,[具体基因1]和[具体基因2]一直被选为显著预测因子。Kaplan-Meier方法表明,这些基因的高表达水平与较差的生存结果密切相关,表明它们作为预后生物标志物的潜力。RSF优于Cox模型和生存树方法,具有更高的AUC和C指数值。蛋白质-蛋白质相互作用网络突出了[关键节点基因1]和[关键节点基因2]等关键节点,它们在LUAD进展中起核心作用。我们的研究结果为LUAD的遗传机制提供了有价值的见解。这些结果有助于开发更准确的LUAD预后工具和个性化治疗策略。