He Boxue, Wei Cong, Cai Qidong, Zhang Pengfei, Shi Shuai, Peng Xiong, Zhao Zhenyu, Yin Wei, Tu Guangxu, Peng Weilin, Tao Yongguang, Wang Xiang
Department of Thoracic Surgery, Second Xiangya Hospital, Central South University, Changsha, 410011, China.
Hunan Key Laboratory of Early Diagnosis and Precise Treatment of Lung Cancer, Second Xiangya Hospital, Central South University, Changsha, 410011, China.
Cancer Cell Int. 2022 Jan 5;22(1):5. doi: 10.1186/s12935-021-02429-2.
Alternative splicing (AS) plays important roles in transcriptome and proteome diversity. Its dysregulation has a close affiliation with oncogenic processes. This study aimed to evaluate AS-based biomarkers by machine learning algorithms for lung squamous cell carcinoma (LUSC) patients.
The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database were utilized. After data composition balancing, Boruta feature selection and Spearman correlation analysis were used for differentially expressed AS events. Random forests and a nested fivefold cross-validation were applied for lymph node metastasis (LNM) classifier building. Random survival forest combined with Cox regression model was performed for a prognostic model, based on which a nomogram was developed. Functional enrichment analysis and Spearman correlation analysis were also conducted to explore underlying mechanisms. The expression of some switch-involved AS events along with parent genes was verified by qRT-PCR with 20 pairs of normal and LUSC tissues.
We found 16 pairs of splicing events from same parent genes which were strongly related to the splicing switch (intrapair correlation coefficient = - 1). Next, we built a reliable LNM classifier based on 13 AS events as well as a nice prognostic model, in which switched AS events behaved prominently. The qRT-PCR presented consistent results with previous bioinformatics analysis, and some AS events like ITIH5-10715-AT and QKI-78404-AT showed remarkable detection efficiency for LUSC.
AS events, especially switched ones from the same parent genes, could provide new insights into the molecular diagnosis and therapeutic drug design of LUSC.
可变剪接(AS)在转录组和蛋白质组多样性中发挥着重要作用。其失调与致癌过程密切相关。本研究旨在通过机器学习算法评估肺鳞状细胞癌(LUSC)患者基于AS的生物标志物。
利用癌症基因组图谱(TCGA)数据库和TCGA SpliceSeq数据库。在数据组成平衡后,使用Boruta特征选择和Spearman相关性分析来检测差异表达的AS事件。应用随机森林和嵌套五重交叉验证构建淋巴结转移(LNM)分类器。基于随机生存森林结合Cox回归模型构建预后模型,并据此绘制列线图。还进行了功能富集分析和Spearman相关性分析以探索潜在机制。通过qRT-PCR对20对正常组织和LUSC组织中一些与剪接转换相关的AS事件及其亲本基因的表达进行了验证。
我们发现来自同一亲本基因的16对剪接事件与剪接转换密切相关(配对内相关系数=-1)。接下来,我们基于13个AS事件构建了一个可靠的LNM分类器以及一个良好的预后模型,其中转换的AS事件表现突出。qRT-PCR结果与先前的生物信息学分析一致,一些AS事件如ITIH5-10715-AT和QKI-78404-AT对LUSC具有显著的检测效率。
AS事件,特别是来自同一亲本基因的转换事件,可为LUSC的分子诊断和治疗药物设计提供新的见解。