Heilongjiang Institute of Technology, Harbin 150050, China.
School of Science at Heilongjiang Institute of Technology, Harbin 150050, China.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab212.
For high-dimensional expression data, most prognostic models perform feature selection based on individual genes, which usually lead to unstable prognosis, and the identified risk genes are inherently insufficient in revealing complex molecular mechanisms. Since most genes carry out cellular functions by forming protein complexes-basic representatives of functional modules, identifying risk protein complexes may greatly improve our understanding of disease biology. Coupled with the fact that protein complexes have been shown to have innate resistance to batch effects and are effective predictors of disease phenotypes, constructing prognostic models and selecting features with protein complexes as the basic unit should improve the robustness and biological interpretability of the model. Here, we propose a protein complex-based, group lasso-Cox model (PCLasso) to predict patient prognosis and identify risk protein complexes. Experiments on three cancer types have proved that PCLasso has better prognostic performance than prognostic models based on individual genes. The resulting risk protein complexes not only contain individual risk genes but also incorporate close partners that synergize with them, which may promote the revealing of molecular mechanisms related to cancer progression from a comprehensive perspective. Furthermore, a pan-cancer prognostic analysis was performed to identify risk protein complexes of 19 cancer types, which may provide novel potential targets for cancer research.
对于高维表达数据,大多数预后模型基于单个基因进行特征选择,这通常会导致不稳定的预后,并且确定的风险基因在揭示复杂的分子机制方面本质上是不足的。由于大多数基因通过形成蛋白质复合物来执行细胞功能——这是功能模块的基本代表,因此识别风险蛋白质复合物可能会极大地提高我们对疾病生物学的理解。此外,事实证明,蛋白质复合物具有内在的抗批次效应能力,并且是疾病表型的有效预测因子,因此使用蛋白质复合物作为基本单位构建预后模型和选择特征应该会提高模型的稳健性和生物学可解释性。在这里,我们提出了一种基于蛋白质复合物的组套索-Cox 模型(PCLasso)来预测患者的预后并识别风险蛋白质复合物。对三种癌症类型的实验证明,PCLasso 比基于单个基因的预后模型具有更好的预后性能。由此产生的风险蛋白质复合物不仅包含单个风险基因,还包含与其协同作用的密切伙伴,这可能从综合角度揭示与癌症进展相关的分子机制。此外,进行了泛癌预后分析以鉴定 19 种癌症类型的风险蛋白质复合物,这可能为癌症研究提供新的潜在靶点。