Suppr超能文献

使用自监督学习对结直肠癌患者的生存状态进行 RNA-Seq 数据分析。

Analysis of RNA-Seq data using self-supervised learning for vital status prediction of colorectal cancer patients.

机构信息

PES Center for Pattern Recognition, Department of Computer Science and Engineering, PES University, Bengaluru, 560085, India.

Department of Computer Science and Engineering, PES University Electronic City Campus, Bengaluru, 560100, India.

出版信息

BMC Bioinformatics. 2023 Jun 7;24(1):241. doi: 10.1186/s12859-023-05347-4.

Abstract

BACKGROUND

RNA sequencing (RNA-Seq) is a technique that utilises the capabilities of next-generation sequencing to study a cellular transcriptome i.e., to determine the amount of RNA at a given time for a given biological sample. The advancement of RNA-Seq technology has resulted in a large volume of gene expression data for analysis.

RESULTS

Our computational model (built on top of TabNet) is first pretrained on an unlabelled dataset of multiple types of adenomas and adenocarcinomas and later fine-tuned on the labelled dataset, showing promising results in the context of the estimation of the vital status of colorectal cancer patients. We achieve a final cross-validated (ROC-AUC) Score of 0.88 by using multiple modalities of data.

CONCLUSION

The results of this study demonstrate that self-supervised learning methods pretrained on a vast corpus of unlabelled data outperform traditional supervised learning methods such as XGBoost, Neural Networks, and Decision Trees that have been prevalent in the tabular domain. The results of this study are further boosted by the inclusion of multiple modalities of data pertaining to the patients in question. We find that genes such as RBM3, GSPT1, MAD2L1, and others important to the computation model's prediction task obtained through model interpretability corroborate with pathological evidence in current literature.

摘要

背景

RNA 测序(RNA-Seq)是一种利用下一代测序技术来研究细胞转录组的技术,即确定给定生物样本在给定时间的 RNA 量。RNA-Seq 技术的进步产生了大量用于分析的基因表达数据。

结果

我们的计算模型(建立在 TabNet 之上)首先在多种腺瘤和腺癌的无标签数据集上进行预训练,然后在有标签数据集上进行微调,在估计结直肠癌患者的生存状态方面取得了有希望的结果。我们通过使用多种数据模态实现了最终的交叉验证(ROC-AUC)得分为 0.88。

结论

这项研究的结果表明,在大量无标签数据上进行预训练的自监督学习方法优于传统的监督学习方法,如在表格领域中流行的 XGBoost、神经网络和决策树。通过纳入与患者相关的多种数据模态,进一步提高了研究结果。我们发现,计算模型的预测任务中重要的基因,如 RBM3、GSPT1、MAD2L1 等,通过模型可解释性获得,与当前文献中的病理证据相符。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc77/10249191/d7b005da02e4/12859_2023_5347_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验