Suppr超能文献

基于多模态数据的无算法监督学习中的显著性检验。

Algorithm-agnostic significance testing in supervised learning with multimodal data.

机构信息

Institute for Statistics and Mathematics, Vienna University of Economics and Business, Welthandelsplatz 1, AT-1020 Vienna, Austria.

Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, DK-2100 Copenhagen, Denmark.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae475.

Abstract

MOTIVATION

Valid statistical inference is crucial for decision-making but difficult to obtain in supervised learning with multimodal data, e.g. combinations of clinical features, genomic data, and medical images. Multimodal data often warrants the use of black-box algorithms, for instance, random forests or neural networks, which impede the use of traditional variable significance tests.

RESULTS

We address this problem by proposing the use of COvariance MEasure Tests (COMETs), which are calibrated and powerful tests that can be combined with any sufficiently predictive supervised learning algorithm. We apply COMETs to several high-dimensional, multimodal data sets to illustrate (i) variable significance testing for finding relevant mutations modulating drug-activity, (ii) modality selection for predicting survival in liver cancer patients with multiomics data, and (iii) modality selection with clinical features and medical imaging data. In all applications, COMETs yield results consistent with domain knowledge without requiring data-driven pre-processing, which may invalidate type I error control. These novel applications with high-dimensional multimodal data corroborate prior results on the power and robustness of COMETs for significance testing.

AVAILABILITY AND IMPLEMENTATION

COMETs are implemented in the cometsR package available on CRAN and pycometsPython library available on GitHub. Source code for reproducing all results is available at https://github.com/LucasKook/comets. All data sets used in this work are openly available.

摘要

动机

有效的统计推断对于决策至关重要,但在多模态数据(例如临床特征、基因组数据和医学图像的组合)的监督学习中很难获得。多模态数据通常需要使用黑盒算法,例如随机森林或神经网络,这阻碍了传统变量重要性检验的使用。

结果

我们通过提出使用协方差度量检验(COMETs)来解决这个问题,COMETs 是经过校准且强大的检验方法,可以与任何具有足够预测能力的监督学习算法结合使用。我们将 COMETs 应用于几个高维多模态数据集,以说明:(i)用于发现调节药物活性的相关突变的变量重要性检验,(ii)使用多组学数据预测肝癌患者生存的模态选择,以及(iii)使用临床特征和医学影像数据的模态选择。在所有应用中,COMETs 的结果与领域知识一致,而无需数据驱动的预处理,这可能会使 I 型错误控制无效。这些具有高维多模态数据的新应用验证了 COMETs 用于显著性检验的功效和稳健性的先前结果。

可用性和实现

COMETs 已在 CRAN 上的 cometsR 包和 GitHub 上的 pycometsPython 库中实现。重现所有结果的源代码可在 https://github.com/LucasKook/comets 上获得。本工作中使用的所有数据集均可公开获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc1d/11424510/e93d721bca6f/bbae475f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验