Wu Siwei, Yin Chaoyi, Wang Yuezhu, Sun Huiyan
School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, 130012 Changchun, China.
International Center of Future Science, Jilin University, 3003 Qianjin Street, 130012 Changchun, China.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae721.
Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models. Subsequently, an iterative conditional independence test combined with graph pruning is utilized to infer the causal skeleton, thereby pinpointing prognosis-related genes. Experiments on transcriptomic data from 18 cancer types sourced from The Cancer Genome Atlas Project demonstrate CPCG's effectiveness in predicting prognosis under four evaluation metrics. Validations on 24 additional datasets covering 12 cancer types from the Gene Expression Omnibus and the Chinese Glioma Genome Atlas Project further demonstrate CPCG's robustness and generalizability. CPCG identifies a concise but reliable set of genes, obviating the need for gene combination enumeration for survival time estimation. These genes are also proved closely linked to crucial biological processes in cancer. Moreover, CPCG constructs a stable causal skeleton and exhibits insensitivity to the order of data shuffling. Overall, CPCG is a powerful tool for extracting cancer prognostic biomarkers, offering interpretability, generalizability, and robustness. CPCG holds promise for facilitating targeted interventions in clinical treatment strategies.
准确识别癌症预后的因果基因对于估计疾病进展和指导治疗干预至关重要。在本研究中,我们提出了CPCG(癌症预后因果基因),这是一个两阶段框架,使用转录组数据识别与不同癌症类型患者预后因果相关的基因集。首先,一种集成方法使用参数和半参数风险模型对基因表达对生存的影响进行建模。随后,利用迭代条件独立性检验结合图剪枝来推断因果骨架,从而确定与预后相关的基因。对来自癌症基因组图谱项目的18种癌症类型的转录组数据进行的实验表明,CPCG在四种评估指标下预测预后方面是有效的。对来自基因表达综合数据库和中国胶质瘤基因组图谱项目的涵盖12种癌症类型的另外24个数据集的验证进一步证明了CPCG的稳健性和通用性。CPCG识别出一组简洁但可靠的基因,无需为生存时间估计进行基因组合枚举。这些基因也被证明与癌症中的关键生物学过程密切相关。此外,CPCG构建了一个稳定的因果骨架,并且对数据洗牌顺序不敏感。总体而言,CPCG是提取癌症预后生物标志物的强大工具,具有可解释性、通用性和稳健性。CPCG有望促进临床治疗策略中的靶向干预。