Suppr超能文献

使用机器学习方法探索肺腺癌中预测性和预后性可变剪接特征

Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods.

作者信息

Cai Qidong, He Boxue, Zhang Pengfei, Zhao Zhenyu, Peng Xiong, Zhang Yuqian, Xie Hui, Wang Xiang

机构信息

Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410011, Hunan, China.

Hunan Key Laboratory of Early Diagnosis and Precision Therapy, Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410011, China.

出版信息

J Transl Med. 2020 Dec 7;18(1):463. doi: 10.1186/s12967-020-02635-y.

Abstract

BACKGROUND

Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore pivotal AS events (ASEs) to deepen understanding and improve prognostic assessments of lung adenocarcinoma (LUAD) via machine learning algorithms.

METHOD

RNA sequencing data and AS data were extracted from The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database. Using several machine learning methods, we identified 24 pairs of LUAD-related ASEs implicated in splicing switches and a random forest-based classifiers for identifying lymph node metastasis (LNM) consisting of 12 ASEs. Furthermore, we identified key prognosis-related ASEs and established a 16-ASE-based prognostic model to predict overall survival for LUAD patients using Cox regression model, random survival forest analysis, and forward selection model. Bioinformatics analyses were also applied to identify underlying mechanisms and associated upstream splicing factors (SFs).

RESULTS

Each pair of ASEs was spliced from the same parent gene, and exhibited perfect inverse intrapair correlation (correlation coefficient = - 1). The 12-ASE-based classifier showed robust ability to evaluate LNM status of LUAD patients with the area under the receiver operating characteristic (ROC) curve (AUC) more than 0.7 in fivefold cross-validation. The prognostic model performed well at 1, 3, 5, and 10 years in both the training cohort and internal test cohort. Univariate and multivariate Cox regression indicated the prognostic model could be used as an independent prognostic factor for patients with LUAD. Further analysis revealed correlations between the prognostic model and American Joint Committee on Cancer stage, T stage, N stage, and living status. The splicing network constructed of survival-related SFs and ASEs depicts regulatory relationships between them.

CONCLUSION

In summary, our study provides insight into LUAD researches and managements based on these AS biomarkers.

摘要

背景

可变剪接(AS)在产生蛋白质多样性和复杂性方面发挥着关键作用。AS失调是肿瘤发生和发展的基础。机器学习方法已成为识别有前景的生物标志物的有效工具。通过机器学习算法探索关键的AS事件(ASEs),以加深对肺腺癌(LUAD)的理解并改善其预后评估,具有重要意义。

方法

从癌症基因组图谱(TCGA)数据库和TCGA SpliceSeq数据库中提取RNA测序数据和AS数据。使用多种机器学习方法,我们鉴定出24对与LUAD相关的涉及剪接开关的ASEs,以及一个由12个ASEs组成的基于随机森林的用于识别淋巴结转移(LNM)的分类器。此外,我们鉴定出关键的预后相关ASEs,并使用Cox回归模型、随机生存森林分析和向前选择模型建立了一个基于16个ASEs的预后模型,以预测LUAD患者的总生存期。还应用生物信息学分析来确定潜在机制和相关的上游剪接因子(SFs)。

结果

每对ASEs均由同一个亲本基因剪接而来,并且在配对内呈现出完美的负相关(相关系数 = -1)。基于12个ASEs的分类器在五重交叉验证中显示出强大的评估LUAD患者LNM状态的能力,受试者操作特征(ROC)曲线下面积(AUC)超过0.7。该预后模型在训练队列和内部测试队列的1年、3年、5年和10年时表现良好。单因素和多因素Cox回归表明,该预后模型可作为LUAD患者的独立预后因素。进一步分析揭示了预后模型与美国癌症联合委员会分期、T分期、N分期和生存状态之间的相关性。由生存相关SFs和ASEs构建的剪接网络描绘了它们之间的调控关系。

结论

总之,我们的研究基于这些AS生物标志物为LUAD的研究和管理提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ecd/7720605/02acc6867daf/12967_2020_2635_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验