从基因组规模数据中发现生物标志物的特征集优化。

Feature set optimization in biomarker discovery from genome-scale data.

机构信息

Institute of Biomedicine, University of Eastern Finland, Kuopio 70210, Finland.

Faculty of Medicine and Health Technology, Tampere University, Tampere 33100, Finland.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3393-3400. doi: 10.1093/bioinformatics/btaa144.

DOI:10.1093/bioinformatics/btaa144

PMID:32119073

Abstract

MOTIVATION

Omics technologies have the potential to facilitate the discovery of new biomarkers. However, only few omics-derived biomarkers have been successfully translated into clinical applications to date. Feature selection is a crucial step in this process that identifies small sets of features with high predictive power. Models consisting of a limited number of features are not only more robust in analytical terms, but also ensure cost effectiveness and clinical translatability of new biomarker panels. Here we introduce GARBO, a novel multi-island adaptive genetic algorithm to simultaneously optimize accuracy and set size in omics-driven biomarker discovery problems.

RESULTS

Compared to existing methods, GARBO enables the identification of biomarker sets that best optimize the trade-off between classification accuracy and number of biomarkers. We tested GARBO and six alternative selection methods with two high relevant topics in precision medicine: cancer patient stratification and drug sensitivity prediction. We found multivariate biomarker models from different omics data types such as mRNA, miRNA, copy number variation, mutation and DNA methylation. The top performing models were evaluated by using two different strategies: the Pareto-based selection, and the weighted sum between accuracy and set size (w = 0.5). Pareto-based preferences show the ability of the proposed algorithm to search minimal subsets of relevant features that can be used to model accurate random forest-based classification systems. Moreover, GARBO systematically identified, on larger omics data types, such as gene expression and DNA methylation, biomarker panels exhibiting higher classification accuracy or employing a number of features much lower than those discovered with other methods. These results were confirmed on independent datasets.

AVAILABILITY AND IMPLEMENTATION

github.com/Greco-Lab/GARBO.

CONTACT

dario.greco@tuni.fi.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

组学技术具有发现新生物标志物的潜力。然而，迄今为止，只有少数基于组学的生物标志物成功转化为临床应用。特征选择是这一过程中的关键步骤，它可以识别出具有高预测能力的小特征集。由有限数量特征组成的模型不仅在分析方面更稳健，而且还确保了新生物标志物组合的成本效益和临床可转化性。在这里，我们引入了 GARBO，这是一种新颖的多岛自适应遗传算法，用于在组学驱动的生物标志物发现问题中同时优化准确性和特征集大小。

结果

与现有方法相比，GARBO 能够识别出最佳优化分类准确性和生物标志物数量之间权衡的生物标志物集。我们使用两种与精准医学相关的高话题：癌症患者分层和药物敏感性预测，测试了 GARBO 和六种替代选择方法。我们从不同的组学数据类型（如 mRNA、miRNA、拷贝数变异、突变和 DNA 甲基化）中找到了多变量生物标志物模型。使用两种不同的策略评估表现最佳的模型：基于 Pareto 的选择和准确性与特征集大小的加权和（w=0.5）。基于 Pareto 的偏好表明，该算法能够搜索相关特征的最小子集，这些子集可用于构建准确的基于随机森林的分类系统。此外，GARBO 系统地在更大的组学数据类型（如基因表达和 DNA 甲基化）上发现了具有更高分类准确性或使用比其他方法发现的特征数量低得多的生物标志物组合。这些结果在独立数据集上得到了验证。

可用性和实现

github.com/Greco-Lab/GARBO。

联系方式

dario.greco@tuni.fi。

补充信息

补充数据可在 Bioinformatics 在线获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

从基因组规模数据中发现生物标志物的特征集优化。

Feature set optimization in biomarker discovery from genome-scale data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

从基因组规模数据中发现生物标志物的特征集优化。

Feature set optimization in biomarker discovery from genome-scale data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献