• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从基因组规模数据中发现生物标志物的特征集优化。

Feature set optimization in biomarker discovery from genome-scale data.

机构信息

Institute of Biomedicine, University of Eastern Finland, Kuopio 70210, Finland.

Faculty of Medicine and Health Technology, Tampere University, Tampere 33100, Finland.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3393-3400. doi: 10.1093/bioinformatics/btaa144.

DOI:10.1093/bioinformatics/btaa144
PMID:32119073
Abstract

MOTIVATION

Omics technologies have the potential to facilitate the discovery of new biomarkers. However, only few omics-derived biomarkers have been successfully translated into clinical applications to date. Feature selection is a crucial step in this process that identifies small sets of features with high predictive power. Models consisting of a limited number of features are not only more robust in analytical terms, but also ensure cost effectiveness and clinical translatability of new biomarker panels. Here we introduce GARBO, a novel multi-island adaptive genetic algorithm to simultaneously optimize accuracy and set size in omics-driven biomarker discovery problems.

RESULTS

Compared to existing methods, GARBO enables the identification of biomarker sets that best optimize the trade-off between classification accuracy and number of biomarkers. We tested GARBO and six alternative selection methods with two high relevant topics in precision medicine: cancer patient stratification and drug sensitivity prediction. We found multivariate biomarker models from different omics data types such as mRNA, miRNA, copy number variation, mutation and DNA methylation. The top performing models were evaluated by using two different strategies: the Pareto-based selection, and the weighted sum between accuracy and set size (w = 0.5). Pareto-based preferences show the ability of the proposed algorithm to search minimal subsets of relevant features that can be used to model accurate random forest-based classification systems. Moreover, GARBO systematically identified, on larger omics data types, such as gene expression and DNA methylation, biomarker panels exhibiting higher classification accuracy or employing a number of features much lower than those discovered with other methods. These results were confirmed on independent datasets.

AVAILABILITY AND IMPLEMENTATION

github.com/Greco-Lab/GARBO.

CONTACT

dario.greco@tuni.fi.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

组学技术具有发现新生物标志物的潜力。然而,迄今为止,只有少数基于组学的生物标志物成功转化为临床应用。特征选择是这一过程中的关键步骤,它可以识别出具有高预测能力的小特征集。由有限数量特征组成的模型不仅在分析方面更稳健,而且还确保了新生物标志物组合的成本效益和临床可转化性。在这里,我们引入了 GARBO,这是一种新颖的多岛自适应遗传算法,用于在组学驱动的生物标志物发现问题中同时优化准确性和特征集大小。

结果

与现有方法相比,GARBO 能够识别出最佳优化分类准确性和生物标志物数量之间权衡的生物标志物集。我们使用两种与精准医学相关的高话题:癌症患者分层和药物敏感性预测,测试了 GARBO 和六种替代选择方法。我们从不同的组学数据类型(如 mRNA、miRNA、拷贝数变异、突变和 DNA 甲基化)中找到了多变量生物标志物模型。使用两种不同的策略评估表现最佳的模型:基于 Pareto 的选择和准确性与特征集大小的加权和(w=0.5)。基于 Pareto 的偏好表明,该算法能够搜索相关特征的最小子集,这些子集可用于构建准确的基于随机森林的分类系统。此外,GARBO 系统地在更大的组学数据类型(如基因表达和 DNA 甲基化)上发现了具有更高分类准确性或使用比其他方法发现的特征数量低得多的生物标志物组合。这些结果在独立数据集上得到了验证。

可用性和实现

github.com/Greco-Lab/GARBO。

联系方式

dario.greco@tuni.fi。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

1
Feature set optimization in biomarker discovery from genome-scale data.从基因组规模数据中发现生物标志物的特征集优化。
Bioinformatics. 2020 Jun 1;36(11):3393-3400. doi: 10.1093/bioinformatics/btaa144.
2
Improved NSGA-II algorithms for multi-objective biomarker discovery.改进的 NSGA-II 算法用于多目标生物标志物发现。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii20-ii26. doi: 10.1093/bioinformatics/btac463.
3
Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer.比较五种监督特征选择算法,这些算法可从癌症的多组学数据中得到顶级特征和基因特征。
BMC Bioinformatics. 2022 Apr 28;23(Suppl 3):153. doi: 10.1186/s12859-022-04678-y.
4
Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data.基于多组学数据预测卵巢癌生存的最小冗余最大相关性多视图特征选择。
BMC Med Genomics. 2018 Sep 14;11(Suppl 3):71. doi: 10.1186/s12920-018-0388-0.
5
Global proteomics profiling improves drug sensitivity prediction: results from a multi-omics, pan-cancer modeling approach.全球蛋白质组学分析可提高药物敏感性预测:来自多组学、泛癌建模方法的结果。
Bioinformatics. 2018 Apr 15;34(8):1353-1362. doi: 10.1093/bioinformatics/btx766.
6
MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets.MEvA-X:一种混合多目标进化工具,使用 XGBoost 分类器在生物医学数据集上发现生物标志物。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad384.
7
-Omics biomarker identification pipeline for translational medicine.组学生物标志物鉴定在转化医学中的应用
J Transl Med. 2019 May 14;17(1):155. doi: 10.1186/s12967-019-1912-5.
8
Identifying interactions in omics data for clinical biomarker discovery using symbolic regression.利用符号回归识别组学数据中的相互作用,以发现临床生物标志物。
Bioinformatics. 2022 Aug 2;38(15):3749-3758. doi: 10.1093/bioinformatics/btac405.
9
A Comprehensive Evaluation Framework for Benchmarking Multi-Objective Feature Selection in Omics-Based Biomarker Discovery.基于组学的生物标志物发现中多目标特征选择基准测试的综合评估框架
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):2432-2446. doi: 10.1109/TCBB.2024.3480150. Epub 2024 Dec 10.
10
NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.NCC-AUC:一种 AUC 优化方法,用于从基因组和临床数据中识别用于癌症预后的多生物标志物组。
Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.

引用本文的文献

1
Bridging Genomics to Cardiology Clinical Practice: Artificial Intelligence in Optimizing Polygenic Risk Scores: A Systematic Review.将基因组学与心脏病临床实践相联系:人工智能在优化多基因风险评分中的应用:一项系统综述
JACC Adv. 2025 Jun;4(6 Pt 2):101803. doi: 10.1016/j.jacadv.2025.101803.
2
Dual-stage optimizer for systematic overestimation adjustment applied to multi-objective genetic algorithms for biomarker selection.用于系统高估调整的双阶段优化器应用于生物标志物选择的多目标遗传算法
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae674.
3
Transforming Clinical Research: The Power of High-Throughput Omics Integration.
变革临床研究:高通量组学整合的力量
Proteomes. 2024 Sep 6;12(3):25. doi: 10.3390/proteomes12030025.
4
Enhancing prediction accuracy of coronary artery disease through machine learning-driven genomic variant selection.通过机器学习驱动的基因组变异选择提高冠状动脉疾病预测准确性。
J Transl Med. 2024 Apr 16;22(1):356. doi: 10.1186/s12967-024-05090-1.
5
miRDM-rfGA: Genetic algorithm-based identification of a miRNA set for detecting type 2 diabetes.miRDM-rfGA:基于遗传算法的 miRNA 集识别用于检测 2 型糖尿病。
BMC Med Genomics. 2023 Aug 22;16(1):195. doi: 10.1186/s12920-023-01636-2.
6
Machine Learning Models for the Identification of Prognostic and Predictive Cancer Biomarkers: A Systematic Review.机器学习模型在预后和预测性癌症生物标志物识别中的应用:系统评价。
Int J Mol Sci. 2023 Apr 24;24(9):7781. doi: 10.3390/ijms24097781.
7
Identifying gene expression-based biomarkers in online learning environments.在在线学习环境中识别基于基因表达的生物标志物。
Bioinform Adv. 2022 Oct 13;2(1):vbac074. doi: 10.1093/bioadv/vbac074. eCollection 2022.
8
Biomarkers of nanomaterials hazard from multi-layer data.多层面数据的纳米材料危害生物标志物。
Nat Commun. 2022 Jul 1;13(1):3798. doi: 10.1038/s41467-022-31609-5.
9
Nextcast: A software suite to analyse and model toxicogenomics data.Nextcast:一个用于分析和建模毒理基因组学数据的软件套件。
Comput Struct Biotechnol J. 2022 Mar 18;20:1413-1426. doi: 10.1016/j.csbj.2022.03.014. eCollection 2022.
10
Supervised Methods for Biomarker Detection from Microarray Experiments.基于微阵列实验的生物标志物检测的有监督方法。
Methods Mol Biol. 2022;2401:101-120. doi: 10.1007/978-1-0716-1839-4_8.