通过多种特征选择方法从高通量数据中发现用于肝细胞癌的稳健生物标志物。

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.

机构信息

Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China.

Center for Intelligent Medicine, Shandong University, Jinan, 250061, Shandong, China.

出版信息

BMC Med Genomics. 2021 Aug 25;14(Suppl 1):112. doi: 10.1186/s12920-021-00957-4.

DOI:10.1186/s12920-021-00957-4

PMID:34433487

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8386074/

Abstract

BACKGROUND

Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques.

METHODS

We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers.

RESULTS

In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified.

CONCLUSIONS

It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What's more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.

摘要

背景

肝细胞癌（HCC）是最常见的癌症之一。发现特异性基因作为生物标志物对癌症的诊断和预后具有重要意义。癌症基因组图谱（TCGA）联盟生成的高通量组学数据为发现 HCC 生物标志物基因提供了宝贵的资源。已经提出了许多用于选择癌症生物标志物的方法。然而，这些方法并没有研究不同特征选择技术下识别的稳健性。

方法

我们使用六种不同的递归特征消除方法从 TCGA 肝癌数据中选择 HCC 的基因特征。六个选定子集中共有的基因被提议作为稳健的生物标志物。我们使用 Akaike 信息准则（AIC）来解释特征选择的优化过程，这为机器学习方法中的特征选择提供了统计解释。并且我们使用了几种方法来验证筛选出的生物标志物。

结果

在本文中，我们提出了一种从基因表达数据中发现 HCC 生物标志物的稳健方法。具体来说，我们实现了基于六种不同分类算法的递归特征消除交叉验证（RFE-CV）方法。不同方法发现的基因集之间的重叠被称为鉴定的生物标志物。我们使用统计学中的 AIC 对基于机器学习的特征选择过程进行了解释。此外，通过 AIC 最小理论的向后逻辑逐步回归选择的特征完全包含在鉴定的生物标志物中。通过分类结果，验证了可解释的稳健生物标志物发现方法的优越性。

结论

发现基因子集之间的重叠包含了 6 个分类器的 RFE-CV 选择的不同定量特征。模型选择中的 AIC 值为通过机器学习进行生物标志物发现的特征选择过程提供了理论基础。此外，在更优化选择的子集中包含的基因具有更好的生物学意义和含义。通过从不同分类器中选择的生物标志物的交集，提高了特征选择的质量。这是一种适用于从高通量数据中筛选复杂疾病生物标志物的通用方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00d4/8386074/203ae08d488a/12920_2021_957_Fig1_HTML.jpg

相似文献

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.通过多种特征选择方法从高通量数据中发现用于肝细胞癌的稳健生物标志物。

BMC Med Genomics. 2021 Aug 25;14(Suppl 1):112. doi: 10.1186/s12920-021-00957-4.

Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods.基于稳健机器学习-递归特征消除方法的基因表达数据的稳健生物标志物筛选。

Comput Biol Chem. 2022 Oct;100:107747. doi: 10.1016/j.compbiolchem.2022.107747. Epub 2022 Jul 29.

Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning.利用 RNA-Seq 数据集和机器学习技术鉴定肝细胞癌（HCC）的新型转录生物标志物。

BMC Cancer. 2021 Aug 27;21(1):962. doi: 10.1186/s12885-021-08704-9.

Identification and validation of immune-related gene signature models for predicting prognosis and immunotherapy response in hepatocellular carcinoma.鉴定和验证免疫相关基因特征模型，以预测肝细胞癌的预后和免疫治疗反应。

Front Immunol. 2024 Jun 12;15:1371829. doi: 10.3389/fimmu.2024.1371829. eCollection 2024.

Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery.微阵列转录数据中存在许多准确的小判别特征子集：生物标志物发现。

BMC Bioinformatics. 2005 Apr 13;6:97. doi: 10.1186/1471-2105-6-97.

Finding key genes (UBE2T, KIF4A, CDCA3, and CDCA5) co-expressed in hepatitis, cirrhosis and hepatocellular carcinoma based on multiple bioinformatics techniques.基于多种生物信息学技术，发现乙型肝炎、肝硬化和肝细胞癌中共同表达的关键基因（UBE2T、KIF4A、CDCA3 和 CDCA5）。

BMC Gastroenterol. 2024 Jun 18;24(1):205. doi: 10.1186/s12876-024-03288-7.

Circulating miRNA's biomarkers for early detection of hepatocellular carcinoma in Egyptian patients based on machine learning algorithms.基于机器学习算法的循环 miRNA 生物标志物在埃及患者肝细胞癌早期检测中的应用。

Sci Rep. 2024 Feb 29;14(1):4989. doi: 10.1038/s41598-024-54795-2.

Construction and Validation of a Prognostic Gene-Based Model for Overall Survival Prediction in Hepatocellular Carcinoma Using an Integrated Statistical and Bioinformatic Approach.基于统计与生物信息学综合分析方法构建并验证用于预测肝细胞癌患者总生存期的预后基因模型

Int J Mol Sci. 2021 Feb 5;22(4):1632. doi: 10.3390/ijms22041632.

TopMarker: Computational screening biomarkers of hepatocellular carcinoma from transcriptome and interactome based on differential network topological parameters.基于差异网络拓扑参数的转录组和互作组计算筛选肝癌的生物标志物。

Comput Biol Chem. 2024 Oct;112:108166. doi: 10.1016/j.compbiolchem.2024.108166. Epub 2024 Aug 2.

WGCNA combined with machine learning to find potential biomarkers of liver cancer.WGCNA 联合机器学习寻找肝癌的潜在生物标志物。

Medicine (Baltimore). 2023 Dec 15;102(50):e36536. doi: 10.1097/MD.0000000000036536.

引用本文的文献

Metabolic profile changes in patients with rheumatoid arthritis detected using mass spectrometry.采用质谱法检测类风湿关节炎患者的代谢谱变化。

Sci Rep. 2025 Aug 7;15(1):28887. doi: 10.1038/s41598-025-12994-5.

Oviduct Glycoprotein 1 (OVGP1) Diagnoses Polycystic Ovary Syndrome (PCOS) Based on Machine Learning Algorithms.基于机器学习算法的输卵管糖蛋白1（OVGP1）诊断多囊卵巢综合征（PCOS）

ACS Omega. 2024 Dec 3;9(50):49054-49063. doi: 10.1021/acsomega.4c03111. eCollection 2024 Dec 17.

Pioneering noninvasive colorectal cancer detection with an AI-enhanced breath volatilomics platform.利用人工智能增强的呼吸挥发性有机化合物分析平台进行开创性的非侵入性结直肠癌检测。

Theranostics. 2024 Jul 8;14(11):4240-4255. doi: 10.7150/thno.94950. eCollection 2024.

Artificial Intelligence in Point-of-Care Biosensing: Challenges and Opportunities.即时护理生物传感中的人工智能：挑战与机遇

Diagnostics (Basel). 2024 May 25;14(11):1100. doi: 10.3390/diagnostics14111100.

Multiomic Investigations into Lung Health and Disease.肺部健康与疾病的多组学研究

Microorganisms. 2023 Aug 19;11(8):2116. doi: 10.3390/microorganisms11082116.

The Role of Artificial Intelligence in the Detection and Implementation of Biomarkers for Hepatocellular Carcinoma: Outlook and Opportunities.人工智能在肝细胞癌生物标志物检测与应用中的作用：展望与机遇

Cancers (Basel). 2023 May 26;15(11):2928. doi: 10.3390/cancers15112928.

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review.使用基因表达数据进行癌症分类的机器学习方法：综述

Bioengineering (Basel). 2023 Jan 28;10(2):173. doi: 10.3390/bioengineering10020173.

Stacking Ensemble Method for Gestational Diabetes Mellitus Prediction in Chinese Pregnant Women: A Prospective Cohort Study.基于队列研究的中国孕妇妊娠期糖尿病预测的集成堆叠方法

J Healthc Eng. 2022 Sep 13;2022:8948082. doi: 10.1155/2022/8948082. eCollection 2022.

A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci.基于机器学习的 SNP 集分析方法，用于鉴定与疾病相关的易感性基因座。

Sci Rep. 2022 Sep 22;12(1):15817. doi: 10.1038/s41598-022-19708-1.

Identification of potential biomarkers in ovarian carcinoma and an evaluation of their prognostic value.卵巢癌潜在生物标志物的鉴定及其预后价值评估。

Ann Transl Med. 2021 Sep;9(18):1472. doi: 10.21037/atm-21-4606.

本文引用的文献

Identifying potential drug targets in hepatocellular carcinoma based on network analysis and one-class support vector machine.基于网络分析和单类支持向量机鉴定肝细胞癌的潜在药物靶点。

Sci Rep. 2019 Jul 18;9(1):10442. doi: 10.1038/s41598-019-46540-x.

ANXA2, PRKCE, and OXT are critical differentially genes in Nonalcoholic fatty liver disease.膜联蛋白A2、蛋白激酶Cε和催产素是非酒精性脂肪性肝病中的关键差异基因。

Gastroenterol Hepatol Bed Bench. 2019 Spring;12(2):131-137.

J Cancer. 2019 Jan 29;10(4):864-873. doi: 10.7150/jca.27663. eCollection 2019.

Network-guided prediction of aromatase inhibitor response in breast cancer.基于网络的乳腺癌芳香化酶抑制剂反应预测。

PLoS Comput Biol. 2019 Feb 11;15(2):e1006730. doi: 10.1371/journal.pcbi.1006730. eCollection 2019 Feb.

Integrative analysis with expanded DNA methylation data reveals common key regulators and pathways in cancers.整合扩展的DNA甲基化数据进行分析，揭示癌症中的常见关键调节因子和通路。

NPJ Genom Med. 2019 Feb 1;4:2. doi: 10.1038/s41525-019-0077-8. eCollection 2019.

A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA.基于鲁棒模糊规则的 TCGA 基因表达数据综合特征选择策略。

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):14. doi: 10.1186/s12920-018-0451-x.

Using Supervised Learning Methods for Gene Selection in RNA-Seq Case-Control Studies.在RNA测序病例对照研究中使用监督学习方法进行基因选择

Front Genet. 2018 Aug 3;9:297. doi: 10.3389/fgene.2018.00297. eCollection 2018.

RETREG1 (FAM134B): A new player in human diseases: 15 years after the discovery in cancer.RETREG1（FAM134B）：人类疾病的新角色：发现于癌症 15 年后。

J Cell Physiol. 2018 Jun;233(6):4479-4489. doi: 10.1002/jcp.26384. Epub 2018 Jan 15.

Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma.肝细胞癌的综合与整合基因组特征分析

Cell. 2017 Jun 15;169(7):1327-1341.e23. doi: 10.1016/j.cell.2017.05.046.

Hepatocellular carcinoma: a review.肝细胞癌：综述

J Hepatocell Carcinoma. 2016 Oct 5;3:41-53. doi: 10.2147/JHC.S61146. eCollection 2016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过多种特征选择方法从高通量数据中发现用于肝细胞癌的稳健生物标志物。

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献