Suppr超能文献

用于网络模型选择的基于成本的特征选择

Cost-based feature selection for network model choice.

作者信息

Raynal Louis, Hoffmann Till, Onnela Jukka-Pekka

机构信息

Department of Biostatistics, T.H. Chan School of Public Health, Harvard University.

出版信息

J Comput Graph Stat. 2023;32(3):1109-1118. doi: 10.1080/10618600.2022.2151453. Epub 2023 Jan 20.

Abstract

Selecting a small set of informative features from a large number of possibly noisy candidates is a challenging problem with many applications in machine learning and approximate Bayesian computation. In practice, the cost of computing informative features also needs to be considered. This is particularly important for networks because the computational costs of individual features can span several orders of magnitude. We addressed this issue for the network model selection problem using two approaches. First, we adapted nine feature selection methods to account for the cost of features. We show for two classes of network models that the cost can be reduced by two orders of magnitude without considerably affecting classification accuracy (proportion of correctly identified models). Second, we selected features using pilot simulations with smaller networks. This approach reduced the computational cost by a factor of 50 without affecting classification accuracy. To demonstrate the utility of our approach, we applied it to three different yeast protein interaction networks and identified the best-fitting duplication divergence model. Supplemental materials, including computer code to reproduce our results, are available online.

摘要

从大量可能存在噪声的候选特征中选择一小部分信息丰富的特征是一个具有挑战性的问题,在机器学习和近似贝叶斯计算中有许多应用。在实践中,还需要考虑计算信息丰富特征的成本。这对于网络来说尤为重要,因为单个特征的计算成本可能跨越几个数量级。我们使用两种方法解决了网络模型选择问题中的这个问题。首先,我们采用了九种特征选择方法来考虑特征成本。对于两类网络模型,我们表明可以将成本降低两个数量级,而不会对分类准确率(正确识别模型的比例)产生太大影响。其次,我们使用较小网络的先导模拟来选择特征。这种方法在不影响分类准确率的情况下将计算成本降低了50倍。为了证明我们方法的实用性,我们将其应用于三个不同的酵母蛋白质相互作用网络,并确定了最佳拟合的复制分歧模型。补充材料,包括用于重现我们结果的计算机代码,可在线获取。

相似文献

1
Cost-based feature selection for network model choice.用于网络模型选择的基于成本的特征选择
J Comput Graph Stat. 2023;32(3):1109-1118. doi: 10.1080/10618600.2022.2151453. Epub 2023 Jan 20.
5
Cost-Sensitive Feature Selection by Optimizing F-Measures.基于 F 测度优化的代价敏感特征选择
IEEE Trans Image Process. 2018 Mar;27(3):1323-1335. doi: 10.1109/TIP.2017.2781298. Epub 2017 Dec 8.
7
Feature-selected tree-based classification.基于特征选择的决策树分类。
IEEE Trans Cybern. 2013 Dec;43(6):1990-2004. doi: 10.1109/TSMCB.2012.2237394.
10
Technology of Informative Feature Selection for Immunosignature Analysis.免疫特征分析信息特征选择技术。
Sovrem Tekhnologii Med. 2021;12(5):19-25. doi: 10.17691/stm2020.12.5.02. Epub 2020 Oct 28.

引用本文的文献

本文引用的文献

1
Framework for converting mechanistic network models to probabilistic models.将机械网络模型转换为概率模型的框架。
J Complex Netw. 2023 Oct 20;11(5):cnad034. doi: 10.1093/comnet/cnad034. eCollection 2023 Oct.
2
Flexible model selection for mechanistic network models.机械网络模型的灵活模型选择
J Complex Netw. 2020 Apr;8(2):cnz024. doi: 10.1093/comnet/cnz024. Epub 2019 Aug 2.
3
ABC random forests for Bayesian parameter inference.ABC 随机森林用于贝叶斯参数推断。
Bioinformatics. 2019 May 15;35(10):1720-1728. doi: 10.1093/bioinformatics/bty867.
4
Constraining Effective Field Theories with Machine Learning.用机器学习约束有效场论。
Phys Rev Lett. 2018 Sep 14;121(11):111801. doi: 10.1103/PhysRevLett.121.111801.
5
Relief-based feature selection: Introduction and review.基于缓解的特征选择:介绍与综述。
J Biomed Inform. 2018 Sep;85:189-203. doi: 10.1016/j.jbi.2018.07.014. Epub 2018 Jul 18.
7
Reliable ABC model choice via random forests.基于随机森林的可靠 ABC 模型选择。
Bioinformatics. 2016 Mar 15;32(6):859-66. doi: 10.1093/bioinformatics/btv684. Epub 2015 Nov 20.
10
Estimating mutual information.估计互信息。
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun;69(6 Pt 2):066138. doi: 10.1103/PhysRevE.69.066138. Epub 2004 Jun 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验