一种基于新型深度自动编码器的微阵列数据集生存分析方法。

A novel deep autoencoder based survival analysis approach for microarray dataset.

作者信息

Torkey Hanaa, Atlam Mostafa, El-Fishawy Nawal, Salem Hanaa

机构信息

Computer Science & Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt.

Faculty of Engineering, Delta University for Science and Technology, Gamasa, Egypt.

出版信息

PeerJ Comput Sci. 2021 Apr 21;7:e492. doi: 10.7717/peerj-cs.492. eCollection 2021.

DOI:10.7717/peerj-cs.492

PMID:33981841

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8080419/

Abstract

BACKGROUND

Breast cancer is one of the major causes of mortality globally. Therefore, different Machine Learning (ML) techniques were deployed for computing survival and diagnosis. Survival analysis methods are used to compute survival probability and the most important factors affecting that probability. Most survival analysis methods are used to deal with clinical features (up to hundreds), hence applying survival analysis methods like cox regression on RNAseq microarray data with many features (up to thousands) is considered a major challenge.

METHODS

In this paper, a novel approach applying autoencoder to reduce the number of features is proposed. Our approach works on features reconstruction, and removal of noise within the data and features with zero variance across the samples, which facilitates extraction of features with the highest variances (across the samples) that most influence the survival probabilities. Then, it estimates the survival probability for each patient by applying random survival forests and cox regression. Applying the autoencoder on thousands of features takes a long time, thus our model is applied to the Graphical Processing Unit (GPU) in order to speed up the process. Finally, the model is evaluated and compared with the existing models on three different datasets in terms of run time, concordance index, and calibration curve, and the most related genes to survival are discovered. Finally, the biological pathways and GO molecular functions are analyzed for these significant genes.

RESULTS

We fine-tuned our autoencoder model on RNA-seq data of three datasets to train the weights in our survival prediction model, then using different samples in each dataset for testing the model. The results show that the proposed AutoCox and AutoRandom algorithms based on our feature selection autoencoder approach have better concordance index results comparing the most recent deep learning approaches when applied to each dataset. Each gene resulting from our autoencoder model weight is computed. The weights show the degree of effect for each gene upon the survival probability. For instance, four of the most survival-related experimentally validated genes are on the top of our discovered genes weights list, including PTPRG, MYST1, BG683264, and AK094562 for the breast cancer gene expression dataset. Our approach improves the survival analysis in terms of speeding up the process, enhancing the prediction accuracy, and reducing the error rate in the estimated survival probability.

摘要

背景

乳腺癌是全球主要的致死原因之一。因此，人们采用了不同的机器学习（ML）技术来进行生存分析和诊断。生存分析方法用于计算生存概率以及影响该概率的最重要因素。大多数生存分析方法用于处理临床特征（多达数百个），因此将像Cox回归这样的生存分析方法应用于具有许多特征（多达数千个）的RNA测序微阵列数据被认为是一项重大挑战。

方法

本文提出了一种应用自动编码器来减少特征数量的新方法。我们的方法致力于特征重构，去除数据中的噪声以及样本间方差为零的特征，这有助于提取对生存概率影响最大的（样本间）方差最高的特征。然后，通过应用随机生存森林和Cox回归来估计每个患者的生存概率。将自动编码器应用于数千个特征需要很长时间，因此我们的模型应用于图形处理单元（GPU）以加速该过程。最后，在运行时间、一致性指数和校准曲线方面，在三个不同数据集上对该模型进行评估并与现有模型进行比较，并发现与生存最相关的基因。最后，对这些重要基因的生物途径和基因本体（GO）分子功能进行分析。

结果

我们在三个数据集的RNA测序数据上对自动编码器模型进行微调，以训练生存预测模型中的权重，然后在每个数据集中使用不同的样本对模型进行测试。结果表明，与最新的深度学习方法相比，基于我们的特征选择自动编码器方法提出的AutoCox和AutoRandom算法在应用于每个数据集时具有更好的一致性指数结果。计算了由我们的自动编码器模型权重产生的每个基因。这些权重显示了每个基因对生存概率的影响程度。例如，在我们发现的基因权重列表顶部有四个经过实验验证的与生存最相关的基因，在乳腺癌基因表达数据集中包括PTPRG、MYST1、BG683264和AK094562。我们的方法在加速过程、提高预测准确性和降低估计生存概率的错误率方面改进了生存分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e812/8080419/8cca2c501a7f/peerj-cs-07-492-g001.jpg

相似文献

A novel deep autoencoder based survival analysis approach for microarray dataset.一种基于新型深度自动编码器的微阵列数据集生存分析方法。

PeerJ Comput Sci. 2021 Apr 21;7:e492. doi: 10.7717/peerj-cs.492. eCollection 2021.

Coronavirus disease 2019 (COVID-19): survival analysis using deep learning and Cox regression model.2019冠状病毒病（COVID-19）：使用深度学习和Cox回归模型的生存分析

Pattern Anal Appl. 2021;24(3):993-1005. doi: 10.1007/s10044-021-00958-0. Epub 2021 Feb 15.

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合，以预测放射性肺损伤。

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

Improved survival analysis by learning shared genomic information from pan-cancer data.从泛癌数据中学习共享基因组信息以改善生存分析。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i389-i398. doi: 10.1093/bioinformatics/btaa462.

Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations.基于深度学习的 RNA-seq 数据癌症生存预后：方法与评估。

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):41. doi: 10.1186/s12920-020-0686-1.

Adaptation of Autoencoder for Sparsity Reduction From Clinical Notes Representation Learning.基于自动编码器的稀疏表示学习的临床笔记自适应。

IEEE J Transl Eng Health Med. 2023 Feb 2;11:469-478. doi: 10.1109/JTEHM.2023.3241635. eCollection 2023.

Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network.基于ReliefF和卷积神经网络的混合模型用于癌症的诊断与分类

Med Hypotheses. 2020 Apr;137:109577. doi: 10.1016/j.mehy.2020.109577. Epub 2020 Jan 20.

Anomaly Detection for Sensor Signals Utilizing Deep Learning Autoencoder-Based Neural Networks.利用基于深度学习自动编码器的神经网络进行传感器信号异常检测

Bioengineering (Basel). 2023 Mar 24;10(4):405. doi: 10.3390/bioengineering10040405.

Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification.深度学习特征提取方法在血液肿瘤亚型分类中的应用。

Int J Environ Res Public Health. 2021 Feb 23;18(4):2197. doi: 10.3390/ijerph18042197.

Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers.多轮混凝土自动编码器识别 12 种癌症的预后 lncRNAs。

Int J Mol Sci. 2021 Nov 3;22(21):11919. doi: 10.3390/ijms222111919.

引用本文的文献

DeepOmicsSurv: a deep learning-based model for survival prediction of oral cancer.深度组学生存分析：一种基于深度学习的口腔癌生存预测模型。

Discov Oncol. 2025 Apr 25;16(1):614. doi: 10.1007/s12672-025-02346-0.

Metaheuristic integrated machine learning classification of colon cancer using STFT LASSO and EHO feature extraction from microarray gene expressions.基于短时傅里叶变换（STFT）套索和从微阵列基因表达中提取的帝王蝶优化算法（EHO）特征的元启发式集成机器学习结肠癌分类法

Sci Rep. 2024 Jul 17;14(1):16485. doi: 10.1038/s41598-024-67135-1.

Cross-attention enables deep learning on limited omics-imaging-clinical data of 130 lung cancer patients.跨注意力使深度学习能够利用 130 名肺癌患者的有限组学-影像-临床数据。

Cell Rep Methods. 2024 Jul 15;4(7):100817. doi: 10.1016/j.crmeth.2024.100817. Epub 2024 Jul 8.

SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction.SaPt-CNN-LSTM-AR-EA：一种用于基于时间序列的多变量DNA序列预测的混合集成学习框架。

PeerJ. 2023 Oct 4;11:e16192. doi: 10.7717/peerj.16192. eCollection 2023.

A deep learning-based framework for predicting survival-associated groups in colon cancer by integrating multi-omics and clinical data.一种基于深度学习的框架，通过整合多组学和临床数据来预测结肠癌的生存相关分组。

Heliyon. 2023 Jul 5;9(7):e17653. doi: 10.1016/j.heliyon.2023.e17653. eCollection 2023 Jul.

Identification of offensive language in Urdu using semantic and embedding models.使用语义和嵌入模型识别乌尔都语中的冒犯性语言。

PeerJ Comput Sci. 2022 Dec 12;8:e1169. doi: 10.7717/peerj-cs.1169. eCollection 2022.

Predicting Genetic Disorder and Types of Disorder Using Chain Classifier Approach.使用链式分类器方法预测遗传疾病和疾病类型。

Genes (Basel). 2022 Dec 26;14(1):71. doi: 10.3390/genes14010071.

Deep learning techniques for cancer classification using microarray gene expression data.使用微阵列基因表达数据进行癌症分类的深度学习技术。

Front Physiol. 2022 Sep 30;13:952709. doi: 10.3389/fphys.2022.952709. eCollection 2022.

Towards the Use of Big Data in Healthcare: A Literature Review.论大数据在医疗保健中的应用：文献综述

Healthcare (Basel). 2022 Jul 1;10(7):1232. doi: 10.3390/healthcare10071232.

本文引用的文献

Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations.基于深度学习的 RNA-seq 数据癌症生存预后：方法与评估。

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):41. doi: 10.1186/s12920-020-0686-1.

Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data.基于基因组和临床数据的可解释深度神经网络在癌症生存分析中的应用。

BMC Med Genomics. 2019 Dec 23;12(Suppl 10):189. doi: 10.1186/s12920-019-0624-2.

Deep learning-based survival prediction of oral cancer patients.基于深度学习的口腔癌患者生存预测。

Sci Rep. 2019 May 6;9(1):6994. doi: 10.1038/s41598-019-43372-7.

A Selective Review on Random Survival Forests for High Dimensional Data.高维数据随机生存森林的选择性综述

Quant Biosci. 2017;36(2):85-96. doi: 10.22283/qbs.2017.36.2.85.

Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data.Cox-nnet：一种用于高通量组学数据预后预测的人工神经网络方法。

PLoS Comput Biol. 2018 Apr 10;14(4):e1006076. doi: 10.1371/journal.pcbi.1006076. eCollection 2018 Apr.

Genetic and Functional Drivers of Diffuse Large B Cell Lymphoma.弥漫性大B细胞淋巴瘤的遗传和功能驱动因素

Cell. 2017 Oct 5;171(2):481-494.e15. doi: 10.1016/j.cell.2017.09.027.

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.Enrichr：一个全面的基因集富集分析网络服务器2016年更新版。

Nucleic Acids Res. 2016 Jul 8;44(W1):W90-7. doi: 10.1093/nar/gkw377. Epub 2016 May 3.

Estimating the concordance probability in a survival analysis with a discrete number of risk groups.在具有离散数量风险组的生存分析中估计一致性概率。

Lifetime Data Anal. 2016 Apr;22(2):263-79. doi: 10.1007/s10985-015-9330-3. Epub 2015 May 29.

Signal transduction in cancer.癌症中的信号转导

Cold Spring Harb Perspect Med. 2015 Apr 1;5(4):a006098. doi: 10.1101/cshperspect.a006098.

Application of artificial neural network-based survival analysis on two breast cancer datasets.基于人工神经网络的生存分析在两个乳腺癌数据集上的应用。

AMIA Annu Symp Proc. 2007 Oct 11;2007:130-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于新型深度自动编码器的微阵列数据集生存分析方法。

A novel deep autoencoder based survival analysis approach for microarray dataset.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

背景

方法

结果

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献