使用基因表达数据发现生物标志物的机器学习方法

Machine Learning Approaches for Biomarker Discovery Using Gene Expression Data

作者信息

Zhang Xiaokang, Jonassen Inge, Goksøyr Anders

机构信息

Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway

Center for Cancer Biomarkers, Department of Informatics, University of Bergen, Bergen, Norway

DOI:10.36255/exonpublications.bioinformatics.2021.ch4

PMID:33877765

Abstract

Biomarkers are of great importance in many fields, such as cancer research, toxicology, diagnosis and treatment of diseases, and to better understand biological response mechanisms to internal or external intervention. High-throughput gene expression profiling technologies, such as DNA microarrays and RNA sequencing, provide large gene expression data sets which enable data-driven biomarker discovery. Traditional statistical tests have been the mainstream for identifying differentially expressed genes as biomarkers. In recent years, machine learning techniques such as feature selection have gained more popularity. Given many options, picking the most appropriate method for a particular data becomes essential. Different evaluation metrics have therefore been proposed. Being evaluated on different aspects, a method’s varied performance across different datasets leads to the idea of integrating multiple methods. Many integration strategies are proposed and have shown great potential. This chapter gives an overview of the current research advances and existing issues in biomarker discovery using machine learning approaches on gene expression data.

摘要

生物标志物在许多领域都非常重要，如癌症研究、毒理学、疾病的诊断和治疗，以及为了更好地理解对内部或外部干预的生物反应机制。高通量基因表达谱技术，如DNA微阵列和RNA测序，提供了大量的基因表达数据集，从而能够进行数据驱动的生物标志物发现。传统统计测试一直是识别差异表达基因作为生物标志物的主流方法。近年来，诸如特征选择等机器学习技术越来越受欢迎。面对众多选择，为特定数据挑选最合适的方法变得至关重要。因此，人们提出了不同的评估指标。由于在不同方面进行评估，一种方法在不同数据集上的表现各异，这就催生了整合多种方法的想法。人们提出了许多整合策略，并已显示出巨大潜力。本章概述了使用机器学习方法处理基因表达数据进行生物标志物发现的当前研究进展和存在的问题。

相似文献

Machine Learning Approaches for Biomarker Discovery Using Gene Expression Data使用基因表达数据发现生物标志物的机器学习方法

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.通过多种特征选择方法从高通量数据中发现用于肝细胞癌的稳健生物标志物。

BMC Med Genomics. 2021 Aug 25;14(Suppl 1):112. doi: 10.1186/s12920-021-00957-4.

Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods.基于稳健机器学习-递归特征消除方法的基因表达数据的稳健生物标志物筛选。

Comput Biol Chem. 2022 Oct;100:107747. doi: 10.1016/j.compbiolchem.2022.107747. Epub 2022 Jul 29.

Integrated approach of machine learning, Mendelian randomization and experimental validation for biomarker discovery in diabetic nephropathy.基于机器学习、孟德尔随机化和实验验证的综合方法在糖尿病肾病生物标志物发现中的应用。

Diabetes Obes Metab. 2024 Dec;26(12):5646-5660. doi: 10.1111/dom.15933. Epub 2024 Oct 6.

Deciphering the role of lipid metabolism-related genes in Alzheimer's disease: a machine learning approach integrating Traditional Chinese Medicine.解析脂质代谢相关基因在阿尔茨海默病中的作用：一种整合中医的机器学习方法。

Front Endocrinol (Lausanne). 2024 Oct 23;15:1448119. doi: 10.3389/fendo.2024.1448119. eCollection 2024.

Advancements within Modern Machine Learning Methodology: Impacts and Prospects in Biomarker Discovery.现代机器学习方法的进展：生物标志物发现中的影响与前景

Curr Med Chem. 2021;28(32):6512-6531. doi: 10.2174/0929867328666210208111821.

Identifying candidate RNA-seq biomarkers for severity discrimination in chemical injuries: A machine learning and molecular dynamics approach.识别用于化学损伤严重程度判别的候选RNA测序生物标志物：一种机器学习和分子动力学方法。

Int Immunopharmacol. 2025 Feb 20;148:114090. doi: 10.1016/j.intimp.2025.114090. Epub 2025 Jan 22.

Deep learning facilitates multi-data type analysis and predictive biomarker discovery in cancer precision medicine.深度学习有助于癌症精准医学中的多数据类型分析和预测性生物标志物发现。

Comput Struct Biotechnol J. 2023 Jan 31;21:1372-1382. doi: 10.1016/j.csbj.2023.01.043. eCollection 2023.

Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling.整合RNA测序数据与异质性微阵列数据用于乳腺癌分析。

BMC Bioinformatics. 2017 Nov 21;18(1):506. doi: 10.1186/s12859-017-1925-0.

Stable feature selection utilizing Graph Convolutional Neural Network and Layer-wise Relevance Propagation for biomarker discovery in breast cancer.利用图卷积神经网络和逐层相关性传播进行稳定特征选择，以发现乳腺癌的生物标志物。

Artif Intell Med. 2024 May;151:102840. doi: 10.1016/j.artmed.2024.102840. Epub 2024 Mar 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验