使用自监督学习对结直肠癌患者的生存状态进行 RNA-Seq 数据分析。

Analysis of RNA-Seq data using self-supervised learning for vital status prediction of colorectal cancer patients.

机构信息

PES Center for Pattern Recognition, Department of Computer Science and Engineering, PES University, Bengaluru, 560085, India.

Department of Computer Science and Engineering, PES University Electronic City Campus, Bengaluru, 560100, India.

出版信息

BMC Bioinformatics. 2023 Jun 7;24(1):241. doi: 10.1186/s12859-023-05347-4.

DOI:10.1186/s12859-023-05347-4

PMID:37286944

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10249191/

Abstract

BACKGROUND

RNA sequencing (RNA-Seq) is a technique that utilises the capabilities of next-generation sequencing to study a cellular transcriptome i.e., to determine the amount of RNA at a given time for a given biological sample. The advancement of RNA-Seq technology has resulted in a large volume of gene expression data for analysis.

RESULTS

Our computational model (built on top of TabNet) is first pretrained on an unlabelled dataset of multiple types of adenomas and adenocarcinomas and later fine-tuned on the labelled dataset, showing promising results in the context of the estimation of the vital status of colorectal cancer patients. We achieve a final cross-validated (ROC-AUC) Score of 0.88 by using multiple modalities of data.

CONCLUSION

The results of this study demonstrate that self-supervised learning methods pretrained on a vast corpus of unlabelled data outperform traditional supervised learning methods such as XGBoost, Neural Networks, and Decision Trees that have been prevalent in the tabular domain. The results of this study are further boosted by the inclusion of multiple modalities of data pertaining to the patients in question. We find that genes such as RBM3, GSPT1, MAD2L1, and others important to the computation model's prediction task obtained through model interpretability corroborate with pathological evidence in current literature.

摘要

背景

RNA 测序（RNA-Seq）是一种利用下一代测序技术来研究细胞转录组的技术，即确定给定生物样本在给定时间的 RNA 量。RNA-Seq 技术的进步产生了大量用于分析的基因表达数据。

结果

我们的计算模型（建立在 TabNet 之上）首先在多种腺瘤和腺癌的无标签数据集上进行预训练，然后在有标签数据集上进行微调，在估计结直肠癌患者的生存状态方面取得了有希望的结果。我们通过使用多种数据模态实现了最终的交叉验证（ROC-AUC）得分为 0.88。

结论

这项研究的结果表明，在大量无标签数据上进行预训练的自监督学习方法优于传统的监督学习方法，如在表格领域中流行的 XGBoost、神经网络和决策树。通过纳入与患者相关的多种数据模态，进一步提高了研究结果。我们发现，计算模型的预测任务中重要的基因，如 RBM3、GSPT1、MAD2L1 等，通过模型可解释性获得，与当前文献中的病理证据相符。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc77/10249191/d7b005da02e4/12859_2023_5347_Fig1_HTML.jpg

相似文献

Analysis of RNA-Seq data using self-supervised learning for vital status prediction of colorectal cancer patients.使用自监督学习对结直肠癌患者的生存状态进行 RNA-Seq 数据分析。

BMC Bioinformatics. 2023 Jun 7;24(1):241. doi: 10.1186/s12859-023-05347-4.

Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers?基于 RNA-seq 数据的生物学分类：剪接转录本表达能否增强机器学习分类器？

RNA. 2018 Sep;24(9):1119-1132. doi: 10.1261/rna.062802.117. Epub 2018 Jun 25.

A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data.基于堆叠稀疏自动编码器的半监督深度学习方法在 RNA-seq 数据癌症预测中的应用。

Comput Methods Programs Biomed. 2018 Nov;166:99-105. doi: 10.1016/j.cmpb.2018.10.004. Epub 2018 Oct 5.

DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning.DEGnext：使用具有迁移学习的卷积神经网络对 RNA-seq 数据进行差异表达基因分类。

BMC Bioinformatics. 2022 Jan 6;23(1):17. doi: 10.1186/s12859-021-04527-4.

A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica.半监督学习在哥斯达黎加当地诊所的乳房 X 光分类中的实际应用案例。

Med Biol Eng Comput. 2022 Apr;60(4):1159-1175. doi: 10.1007/s11517-021-02497-6. Epub 2022 Mar 3.

Using RNentropy to Detect Significant Variation in Gene Expression Across Multiple RNA-Seq or Single-Cell RNA-Seq Samples.使用 RNentropy 检测多个 RNA-Seq 或单细胞 RNA-Seq 样本中基因表达的显著变化。

Methods Mol Biol. 2021;2284:77-96. doi: 10.1007/978-1-0716-1307-8_6.

Processing and Analysis of RNA-seq Data from Public Resources.从公共资源中处理和分析 RNA-seq 数据。

Methods Mol Biol. 2021;2243:81-94. doi: 10.1007/978-1-0716-1103-6_4.

Supervised Adversarial Alignment of Single-Cell RNA-seq Data.监督对抗性单细胞 RNA-seq 数据对齐。

J Comput Biol. 2021 May;28(5):501-513. doi: 10.1089/cmb.2020.0439. Epub 2021 Jan 19.

scReClassify: post hoc cell type classification of single-cell rNA-seq data.scReClassify：单细胞 RNA-seq 数据的事后细胞类型分类。

BMC Genomics. 2019 Dec 24;20(Suppl 9):913. doi: 10.1186/s12864-019-6305-x.

A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq.单细胞 RNA-seq 中细胞类型识别的有监督与无监督方法的全面比较。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab567.

引用本文的文献

Reliable RNA-seq analysis from FFPE specimens as a means to accelerate cancer-related health disparities research.从福尔马林固定石蜡包埋（FFPE）样本中进行可靠的RNA测序分析，作为加速癌症相关健康差异研究的一种手段。

PLoS One. 2025 Apr 21;20(4):e0321631. doi: 10.1371/journal.pone.0321631. eCollection 2025.

Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.基于SHapley加性解释可解释机器学习构建新生儿重症监护病房新生儿早发性败血症的预测模型。

Transl Pediatr. 2024 Nov 30;13(11):1933-1946. doi: 10.21037/tp-24-278. Epub 2024 Nov 26.

Assessing the Impact and Cost-Effectiveness of Exposome Interventions on Alzheimer's Disease: A Review of Agent-Based Modeling and Other Data Science Methods for Causal Inference.评估暴露组干预对阿尔茨海默病的影响和成本效益：基于代理的建模和其他数据科学方法在因果推断中的应用综述。

Genes (Basel). 2024 Nov 12;15(11):1457. doi: 10.3390/genes15111457.

Potential of GSPT1 as a novel target for glioblastoma therapy.GSPT1作为胶质母细胞瘤治疗新靶点的潜力。

Cell Death Dis. 2024 Aug 8;15(8):572. doi: 10.1038/s41419-024-06967-1.

本文引用的文献

MAD2L1 is transcriptionally regulated by TEAD4 and promotes cell proliferation and migration in colorectal cancer.MAD2L1 受 TEAD4 转录调控，促进结直肠癌细胞增殖和迁移。

Cancer Gene Ther. 2023 May;30(5):727-737. doi: 10.1038/s41417-022-00586-8. Epub 2023 Jan 4.

XGBoost-based and tumor-immune characterized gene signature for the prediction of metastatic status in breast cancer.基于 XGBoost 的肿瘤免疫特征基因标志物用于预测乳腺癌的转移状态。

J Transl Med. 2022 Apr 18;20(1):177. doi: 10.1186/s12967-022-03369-9.

PanClassif: Improving pan cancer classification of single cell RNA-seq gene expression data using machine learning.PanClassif：使用机器学习改进单细胞RNA测序基因表达数据的泛癌分类

Genomics. 2022 Mar;114(2):110264. doi: 10.1016/j.ygeno.2022.01.001. Epub 2022 Jan 6.

Human α-defensin 5 suppressed colon cancer growth by targeting PI3K pathway.人α-防御素5通过靶向PI3K通路抑制结肠癌生长。

Exp Cell Res. 2021 Oct 15;407(2):112809. doi: 10.1016/j.yexcr.2021.112809. Epub 2021 Sep 3.

Long-term cancer survival prediction using multimodal deep learning.基于多模态深度学习的癌症长期生存预测。

Sci Rep. 2021 Jun 29;11(1):13505. doi: 10.1038/s41598-021-92799-4.

A Comparison for Dimensionality Reduction Methods of Single-Cell RNA-seq Data.单细胞RNA测序数据降维方法的比较

Front Genet. 2021 Mar 23;12:646936. doi: 10.3389/fgene.2021.646936. eCollection 2021.

Identification of GSPT1 as prognostic biomarker and promoter of malignant colon cancer cell phenotypes via the GSK-3β/CyclinD1 pathway.通过 GSK-3β/CyclinD1 通路鉴定 GSPT1 作为预后生物标志物和恶性结肠癌细胞表型的促进因子。

Aging (Albany NY). 2021 Apr 4;13(7):10354-10368. doi: 10.18632/aging.202796.

Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.《全球癌症统计数据 2020：全球 185 个国家和地区 36 种癌症的发病率和死亡率估计》。

CA Cancer J Clin. 2021 May;71(3):209-249. doi: 10.3322/caac.21660. Epub 2021 Feb 4.

Classification of Cancer Types Using Graph Convolutional Neural Networks.使用图卷积神经网络对癌症类型进行分类

Front Phys. 2020 Jun;8. doi: 10.3389/fphy.2020.00203. Epub 2020 Jun 17.

Ensemble dimensionality reduction and feature gene extraction for single-cell RNA-seq data.单细胞 RNA-seq 数据的集成降维和特征基因提取。

Nat Commun. 2020 Nov 17;11(1):5853. doi: 10.1038/s41467-020-19465-7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用自监督学习对结直肠癌患者的生存状态进行 RNA-Seq 数据分析。

Analysis of RNA-Seq data using self-supervised learning for vital status prediction of colorectal cancer patients.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献