基于基因表达数据的癌症生存预测的卷积神经网络迁移学习。

Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data.

机构信息

Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain.

出版信息

PLoS One. 2020 Mar 26;15(3):e0230536. doi: 10.1371/journal.pone.0230536. eCollection 2020.

DOI:10.1371/journal.pone.0230536

PMID:32214348

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7098575/

Abstract

Precision medicine in oncology aims at obtaining data from heterogeneous sources to have a precise estimation of a given patient's state and prognosis. With the purpose of advancing to personalized medicine framework, accurate diagnoses allow prescription of more effective treatments adapted to the specificities of each individual case. In the last years, next-generation sequencing has impelled cancer research by providing physicians with an overwhelming amount of gene-expression data from RNA-seq high-throughput platforms. In this scenario, data mining and machine learning techniques have widely contribute to gene-expression data analysis by supplying computational models to supporting decision-making on real-world data. Nevertheless, existing public gene-expression databases are characterized by the unfavorable imbalance between the huge number of genes (in the order of tenths of thousands) and the small number of samples (in the order of a few hundreds) available. Despite diverse feature selection and extraction strategies have been traditionally applied to surpass derived over-fitting issues, the efficacy of standard machine learning pipelines is far from being satisfactory for the prediction of relevant clinical outcomes like follow-up end-points or patient's survival. Using the public Pan-Cancer dataset, in this study we pre-train convolutional neural network architectures for survival prediction on a subset composed of thousands of gene-expression samples from thirty-one tumor types. The resulting architectures are subsequently fine-tuned to predict lung cancer progression-free interval. The application of convolutional networks to gene-expression data has many limitations, derived from the unstructured nature of these data. In this work we propose a methodology to rearrange RNA-seq data by transforming RNA-seq samples into gene-expression images, from which convolutional networks can extract high-level features. As an additional objective, we investigate whether leveraging the information extracted from other tumor-type samples contributes to the extraction of high-level features that improve lung cancer progression prediction, compared to other machine learning approaches.

摘要

肿瘤精准医学旨在从异质来源获取数据，以精确估计给定患者的状态和预后。为了推进个体化医学框架，准确的诊断可以允许根据每个病例的具体情况开处方更有效的治疗方法。在过去的几年中，下一代测序通过为医生提供来自 RNA-seq 高通量平台的大量基因表达数据，推动了癌症研究。在这种情况下，数据挖掘和机器学习技术通过为支持真实数据决策提供计算模型，广泛地为基因表达数据分析做出了贡献。然而，现有的公共基因表达数据库的特点是基因数量（成千上万）和可用样本数量（几百）之间极不平衡。尽管传统上已经应用了多种特征选择和提取策略来克服由此产生的过拟合问题，但标准机器学习管道的效果远不能令人满意，无法预测相关的临床结果，如随访终点或患者的生存。在这项研究中，我们使用公共的泛癌数据集，在由三十一种肿瘤类型的数千个基因表达样本组成的子集中对卷积神经网络架构进行生存预测的预训练。随后，对这些架构进行微调，以预测肺癌无进展间隔。卷积网络在基因表达数据上的应用有许多限制，这些限制源于这些数据的非结构化性质。在这项工作中，我们提出了一种将 RNA-seq 数据重新排列的方法，通过将 RNA-seq 样本转换为基因表达图像，卷积网络可以从这些图像中提取高级特征。作为附加目标，我们研究了利用来自其他肿瘤类型样本的信息是否有助于提取高级特征，与其他机器学习方法相比，这些特征可以提高肺癌进展预测的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea2/7098575/1243dd105625/pone.0230536.g001.jpg

相似文献

Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data.基于基因表达数据的癌症生存预测的卷积神经网络迁移学习。

PLoS One. 2020 Mar 26;15(3):e0230536. doi: 10.1371/journal.pone.0230536. eCollection 2020.

Pan-Cancer Metastasis Prediction Based on Graph Deep Learning Method.基于图深度学习方法的泛癌转移预测

Front Cell Dev Biol. 2021 Jun 4;9:675978. doi: 10.3389/fcell.2021.675978. eCollection 2021.

DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning.DEGnext：使用具有迁移学习的卷积神经网络对 RNA-seq 数据进行差异表达基因分类。

BMC Bioinformatics. 2022 Jan 6;23(1):17. doi: 10.1186/s12859-021-04527-4.

[Prognosis Prediction of Lung Cancer Patients Using CT Images: Feature Extraction by Convolutional Neural Network and Prediction by Machine Learning].[利用CT图像预测肺癌患者的预后：通过卷积神经网络进行特征提取及机器学习预测]

Nihon Hoshasen Gijutsu Gakkai Zasshi. 2022 Aug 20;78(8):829-837. doi: 10.6009/jjrt.2022-1224. Epub 2022 Jul 8.

Convolutional neural network approach to lung cancer classification integrating protein interaction network and gene expression profiles.结合蛋白质相互作用网络和基因表达谱的卷积神经网络肺癌分类方法。

J Bioinform Comput Biol. 2019 Jun;17(3):1940007. doi: 10.1142/S0219720019400079.

Network-based drug sensitivity prediction.基于网络的药物敏感性预测。

BMC Med Genomics. 2020 Dec 28;13(Suppl 11):193. doi: 10.1186/s12920-020-00829-3.

A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data.基于堆叠稀疏自动编码器的半监督深度学习方法在 RNA-seq 数据癌症预测中的应用。

Comput Methods Programs Biomed. 2018 Nov;166:99-105. doi: 10.1016/j.cmpb.2018.10.004. Epub 2018 Oct 5.

Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network.基于迁移学习的神经网络对 RNA 测序缺失数据进行推断。

Gigascience. 2020 Jul 1;9(7). doi: 10.1093/gigascience/giaa076.

Personal Health Information Inference Using Machine Learning on RNA Expression Data from Patients With Cancer: Algorithm Validation Study.利用癌症患者 RNA 表达数据进行机器学习的个人健康信息推断：算法验证研究。

J Med Internet Res. 2020 Aug 10;22(8):e18387. doi: 10.2196/18387.

RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.基于新型混合深度学习跨域知识整合方法的RNA-蛋白质结合基序挖掘

BMC Bioinformatics. 2017 Feb 28;18(1):136. doi: 10.1186/s12859-017-1561-8.

引用本文的文献

Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A Review.用于癌症诊断和预后的知识驱动型机器学习综述

IEEE Trans Autom Sci Eng. 2025;22:10008-10028. doi: 10.1109/tase.2024.3515839. Epub 2024 Dec 18.

Deep Reinforcement Learning for CT-Based Non-Invasive Prediction of SOX9 Expression in Hepatocellular Carcinoma.基于CT的肝细胞癌中SOX9表达无创预测的深度强化学习

Diagnostics (Basel). 2025 May 15;15(10):1255. doi: 10.3390/diagnostics15101255.

Meta-Learning on Augmented Gene Expression Profiles for Enhanced Lung Cancer Detection.基于增强基因表达谱的元学习用于增强肺癌检测

AMIA Annu Symp Proc. 2025 May 22;2024:828-837. eCollection 2024.

Gene expression and agent-based modeling improve precision prognosis in breast cancer.基因表达与基于主体的建模改善乳腺癌的精准预后。

Sci Rep. 2025 May 16;15(1):17059. doi: 10.1038/s41598-025-01275-w.

A comprehensive review of cancer survival prediction using multi-omics integration and clinical variables.使用多组学整合和临床变量进行癌症生存预测的综合综述。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf150.

Strategies to include prior knowledge in omics analysis with deep neural networks.在组学分析中利用深度神经网络纳入先验知识的策略。

Patterns (N Y). 2025 Mar 14;6(3):101203. doi: 10.1016/j.patter.2025.101203.

Cox-Sage: enhancing Cox proportional hazards model with interpretable graph neural networks for cancer prognosis.Cox-Sage：使用可解释的图神经网络增强Cox比例风险模型以进行癌症预后分析

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf108.

BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches.BIMSSA：利用樽海鞘群优化算法和集成机器学习方法增强癌症预测

Front Genet. 2025 Jan 6;15:1491602. doi: 10.3389/fgene.2024.1491602. eCollection 2024.

SurvConvMixer: robust and interpretable cancer survival prediction based on ConvMixer using pathway-level gene expression images.SurvConvMixer：基于 ConvMixer 使用通路水平基因表达图像的稳健且可解释的癌症生存预测。

BMC Bioinformatics. 2024 Mar 27;25(1):133. doi: 10.1186/s12859-024-05745-2.

Cancer Diagnosis through Contour Visualization of Gene Expression Leveraging Deep Learning Techniques.通过利用深度学习技术的基因表达轮廓可视化进行癌症诊断。

Diagnostics (Basel). 2023 Nov 15;13(22):3452. doi: 10.3390/diagnostics13223452.

本文引用的文献

Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data.深度学习方法通过高维基因组数据识别癌症亚型。

Bioinformatics. 2020 Mar 1;36(5):1476-1483. doi: 10.1093/bioinformatics/btz769.

DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture.深视：一种将非图像数据转换为卷积神经网络架构图像的方法。

Sci Rep. 2019 Aug 6;9(1):11399. doi: 10.1038/s41598-019-47765-6.

Architectures and accuracy of artificial neural network for disease classification from omics data.基于组学数据的疾病分类的人工神经网络结构和准确性。

BMC Genomics. 2019 Mar 4;20(1):167. doi: 10.1186/s12864-019-5546-z.

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization.GSAE：一种带有嵌入式基因集节点的自动编码器，用于基因组功能表征。

BMC Syst Biol. 2018 Dec 21;12(Suppl 8):142. doi: 10.1186/s12918-018-0642-2.

Comput Methods Programs Biomed. 2018 Nov;166:99-105. doi: 10.1016/j.cmpb.2018.10.004. Epub 2018 Oct 5.

Deep Learning-Based Multi-Omics Data Integration Reveals Two Prognostic Subtypes in High-Risk Neuroblastoma.基于深度学习的多组学数据整合揭示高危神经母细胞瘤的两种预后亚型。

Front Genet. 2018 Oct 18;9:477. doi: 10.3389/fgene.2018.00477. eCollection 2018.

Transfer Learning for Molecular Cancer Classification Using Deep Neural Networks.基于深度神经网络的分子癌症分类的迁移学习。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):2089-2100. doi: 10.1109/TCBB.2018.2822803. Epub 2018 Apr 4.

An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics.TCGA 泛癌临床数据资源整合，推动高质量生存预后分析。

Cell. 2018 Apr 5;173(2):400-416.e11. doi: 10.1016/j.cell.2018.02.052.

Phylogenetic convolutional neural networks in metagenomics.元基因组学中的系统发生卷积神经网络。

BMC Bioinformatics. 2018 Mar 8;19(Suppl 2):49. doi: 10.1186/s12859-018-2033-5.

Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders.使用变分自编码器从癌症转录组中提取生物学相关的潜在空间。

Pac Symp Biocomput. 2018;23:80-91.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于基因表达数据的癌症生存预测的卷积神经网络迁移学习。

Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献