Suppr超能文献

使用基因表达数据进行癌症分类的机器学习方法:综述

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review.

作者信息

Alharbi Fadi, Vakanski Aleksandar

机构信息

Department of Computer Science, University of Idaho, Moscow, ID 83844, USA.

出版信息

Bioengineering (Basel). 2023 Jan 28;10(2):173. doi: 10.3390/bioengineering10020173.

Abstract

Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.

摘要

癌症是一个术语,指的是由细胞异常生长引起的一组疾病,这些疾病可扩散至身体的不同部位。根据世界卫生组织(WHO)的数据,癌症是仅次于心血管疾病的第二大主要死因。基因表达在癌症的早期检测中可发挥重要作用,因为它能反映组织和细胞中的生化过程以及生物体的遗传特征。用于基因表达数据的脱氧核糖核酸(DNA)微阵列和核糖核酸(RNA)测序方法能够量化基因的表达水平,并为计算分析提供有价值的数据。本研究回顾了使用机器学习方法进行癌症分类的基因表达分析的最新进展。对传统方法和基于深度学习的方法均进行了综述,由于深度学习模型在识别各类癌症独特基因模式方面具有比较优势,因此重点介绍了其应用。涵盖了采用最常用深度神经网络架构的相关研究,包括多层感知器以及卷积、循环、图和Transformer网络。本综述还概述了基因表达分析的数据收集方法,并列出了常用于此任务的监督机器学习的重要数据集。此外,我们回顾了通常用于处理基因表达数据高维度问题的特征工程和数据预处理相关技术,该高维度问题是由数据样本中存在的大量基因导致的。本文最后讨论了基于机器学习的癌症分类基因表达分析的未来研究方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/663f/9952758/83d57bc8923f/bioengineering-10-00173-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验