Cai Qidong, He Boxue, Zhang Pengfei, Zhao Zhenyu, Peng Xiong, Zhang Yuqian, Xie Hui, Wang Xiang
Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410011, Hunan, China.
Hunan Key Laboratory of Early Diagnosis and Precision Therapy, Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410011, China.
J Transl Med. 2020 Dec 7;18(1):463. doi: 10.1186/s12967-020-02635-y.
Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore pivotal AS events (ASEs) to deepen understanding and improve prognostic assessments of lung adenocarcinoma (LUAD) via machine learning algorithms.
RNA sequencing data and AS data were extracted from The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database. Using several machine learning methods, we identified 24 pairs of LUAD-related ASEs implicated in splicing switches and a random forest-based classifiers for identifying lymph node metastasis (LNM) consisting of 12 ASEs. Furthermore, we identified key prognosis-related ASEs and established a 16-ASE-based prognostic model to predict overall survival for LUAD patients using Cox regression model, random survival forest analysis, and forward selection model. Bioinformatics analyses were also applied to identify underlying mechanisms and associated upstream splicing factors (SFs).
Each pair of ASEs was spliced from the same parent gene, and exhibited perfect inverse intrapair correlation (correlation coefficient = - 1). The 12-ASE-based classifier showed robust ability to evaluate LNM status of LUAD patients with the area under the receiver operating characteristic (ROC) curve (AUC) more than 0.7 in fivefold cross-validation. The prognostic model performed well at 1, 3, 5, and 10 years in both the training cohort and internal test cohort. Univariate and multivariate Cox regression indicated the prognostic model could be used as an independent prognostic factor for patients with LUAD. Further analysis revealed correlations between the prognostic model and American Joint Committee on Cancer stage, T stage, N stage, and living status. The splicing network constructed of survival-related SFs and ASEs depicts regulatory relationships between them.
In summary, our study provides insight into LUAD researches and managements based on these AS biomarkers.
可变剪接(AS)在产生蛋白质多样性和复杂性方面发挥着关键作用。AS失调是肿瘤发生和发展的基础。机器学习方法已成为识别有前景的生物标志物的有效工具。通过机器学习算法探索关键的AS事件(ASEs),以加深对肺腺癌(LUAD)的理解并改善其预后评估,具有重要意义。
从癌症基因组图谱(TCGA)数据库和TCGA SpliceSeq数据库中提取RNA测序数据和AS数据。使用多种机器学习方法,我们鉴定出24对与LUAD相关的涉及剪接开关的ASEs,以及一个由12个ASEs组成的基于随机森林的用于识别淋巴结转移(LNM)的分类器。此外,我们鉴定出关键的预后相关ASEs,并使用Cox回归模型、随机生存森林分析和向前选择模型建立了一个基于16个ASEs的预后模型,以预测LUAD患者的总生存期。还应用生物信息学分析来确定潜在机制和相关的上游剪接因子(SFs)。
每对ASEs均由同一个亲本基因剪接而来,并且在配对内呈现出完美的负相关(相关系数 = -1)。基于12个ASEs的分类器在五重交叉验证中显示出强大的评估LUAD患者LNM状态的能力,受试者操作特征(ROC)曲线下面积(AUC)超过0.7。该预后模型在训练队列和内部测试队列的1年、3年、5年和10年时表现良好。单因素和多因素Cox回归表明,该预后模型可作为LUAD患者的独立预后因素。进一步分析揭示了预后模型与美国癌症联合委员会分期、T分期、N分期和生存状态之间的相关性。由生存相关SFs和ASEs构建的剪接网络描绘了它们之间的调控关系。
总之,我们的研究基于这些AS生物标志物为LUAD的研究和管理提供了见解。