Suppr超能文献

基于样本扩展的深度学习进行肿瘤基因表达数据分类

Tumor gene expression data classification via sample expansion-based deep learning.

作者信息

Liu Jian, Wang Xuesong, Cheng Yuhu, Zhang Lin

机构信息

School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.

出版信息

Oncotarget. 2017 Nov 30;8(65):109646-109660. doi: 10.18632/oncotarget.22762. eCollection 2017 Dec 12.

Abstract

Since tumor is seriously harmful to human health, effective diagnosis measures are in urgent need for tumor therapy. Early detection of tumor is particularly important for better treatment of patients. A notable issue is how to effectively discriminate tumor samples from normal ones. Many classification methods, such as Support Vector Machines (SVMs), have been proposed for tumor classification. Recently, deep learning has achieved satisfactory performance in the classification task of many areas. However, the application of deep learning is rare in tumor classification due to insufficient training samples of gene expression data. In this paper, a Sample Expansion method is proposed to address the problem. Inspired by the idea of Denoising Autoencoder (DAE), a large number of samples are obtained by randomly cleaning partially corrupted input many times. The expanded samples can not only maintain the merits of corrupted data in DAE but also deal with the problem of insufficient training samples of gene expression data to a certain extent. Since Stacked Autoencoder (SAE) and Convolutional Neural Network (CNN) models show excellent performance in classification task, the applicability of SAE and 1-dimensional CNN (1DCNN) on gene expression data is analyzed. Finally, two deep learning models, Sample Expansion-Based SAE (SESAE) and Sample Expansion-Based 1DCNN (SE1DCNN), are designed to carry out tumor gene expression data classification by using the expanded samples. Experimental studies indicate that SESAE and SE1DCNN are very effective in tumor classification.

摘要

由于肿瘤对人类健康危害严重,肿瘤治疗迫切需要有效的诊断措施。肿瘤的早期检测对于更好地治疗患者尤为重要。一个值得注意的问题是如何有效地将肿瘤样本与正常样本区分开来。已经提出了许多分类方法,如支持向量机(SVM)用于肿瘤分类。近年来,深度学习在许多领域的分类任务中取得了令人满意的性能。然而,由于基因表达数据的训练样本不足,深度学习在肿瘤分类中的应用很少。本文提出了一种样本扩展方法来解决这个问题。受去噪自编码器(DAE)思想的启发,通过多次随机清理部分损坏的输入来获得大量样本。扩展后的样本不仅可以保留DAE中损坏数据的优点,还可以在一定程度上处理基因表达数据训练样本不足的问题。由于堆叠自编码器(SAE)和卷积神经网络(CNN)模型在分类任务中表现出优异的性能,因此分析了SAE和一维CNN(1DCNN)对基因表达数据的适用性。最后,设计了两种深度学习模型,基于样本扩展的SAE(SESAE)和基于样本扩展的1DCNN(SE1DCNN),使用扩展后的样本对肿瘤基因表达数据进行分类。实验研究表明,SESAE和SE1DCNN在肿瘤分类中非常有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f62/5752549/a82fd1df604d/oncotarget-08-109646-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验