用于网络模型选择的基于成本的特征选择

Cost-based feature selection for network model choice.

作者信息

Raynal Louis, Hoffmann Till, Onnela Jukka-Pekka

机构信息

Department of Biostatistics, T.H. Chan School of Public Health, Harvard University.

出版信息

J Comput Graph Stat. 2023;32(3):1109-1118. doi: 10.1080/10618600.2022.2151453. Epub 2023 Jan 20.

DOI:10.1080/10618600.2022.2151453

PMID:37982131

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10655949/

Abstract

Selecting a small set of informative features from a large number of possibly noisy candidates is a challenging problem with many applications in machine learning and approximate Bayesian computation. In practice, the cost of computing informative features also needs to be considered. This is particularly important for networks because the computational costs of individual features can span several orders of magnitude. We addressed this issue for the network model selection problem using two approaches. First, we adapted nine feature selection methods to account for the cost of features. We show for two classes of network models that the cost can be reduced by two orders of magnitude without considerably affecting classification accuracy (proportion of correctly identified models). Second, we selected features using pilot simulations with smaller networks. This approach reduced the computational cost by a factor of 50 without affecting classification accuracy. To demonstrate the utility of our approach, we applied it to three different yeast protein interaction networks and identified the best-fitting duplication divergence model. Supplemental materials, including computer code to reproduce our results, are available online.

摘要

从大量可能存在噪声的候选特征中选择一小部分信息丰富的特征是一个具有挑战性的问题，在机器学习和近似贝叶斯计算中有许多应用。在实践中，还需要考虑计算信息丰富特征的成本。这对于网络来说尤为重要，因为单个特征的计算成本可能跨越几个数量级。我们使用两种方法解决了网络模型选择问题中的这个问题。首先，我们采用了九种特征选择方法来考虑特征成本。对于两类网络模型，我们表明可以将成本降低两个数量级，而不会对分类准确率（正确识别模型的比例）产生太大影响。其次，我们使用较小网络的先导模拟来选择特征。这种方法在不影响分类准确率的情况下将计算成本降低了50倍。为了证明我们方法的实用性，我们将其应用于三个不同的酵母蛋白质相互作用网络，并确定了最佳拟合的复制分歧模型。补充材料，包括用于重现我们结果的计算机代码，可在线获取。

相似文献

Cost-based feature selection for network model choice.用于网络模型选择的基于成本的特征选择

J Comput Graph Stat. 2023;32(3):1109-1118. doi: 10.1080/10618600.2022.2151453. Epub 2023 Jan 20.

Hybrid Feature-Learning-Based PSO-PCA Feature Engineering Approach for Blood Cancer Classification.基于混合特征学习的粒子群优化-主成分分析特征工程方法用于血癌分类

Diagnostics (Basel). 2023 Aug 14;13(16):2672. doi: 10.3390/diagnostics13162672.

Fast, Accurate, and Stable Feature Selection Using Neural Networks.基于神经网络的快速、准确、稳定的特征选择。

Neuroinformatics. 2018 Apr;16(2):253-268. doi: 10.1007/s12021-018-9371-3.

Feature selection for elderly faller classification based on wearable sensors.基于可穿戴传感器的老年人跌倒者分类特征选择

J Neuroeng Rehabil. 2017 May 30;14(1):47. doi: 10.1186/s12984-017-0255-9.

Cost-Sensitive Feature Selection by Optimizing F-Measures.基于 F 测度优化的代价敏感特征选择

IEEE Trans Image Process. 2018 Mar;27(3):1323-1335. doi: 10.1109/TIP.2017.2781298. Epub 2017 Dec 8.

A BAYESIAN NONPARAMETRIC MIXTURE MODEL FOR SELECTING GENES AND GENE SUBNETWORKS.一种用于选择基因和基因子网的贝叶斯非参数混合模型。

Ann Appl Stat. 2014 Jun;8(2):999-1021. doi: 10.1214/14-AOAS719.

Feature-selected tree-based classification.基于特征选择的决策树分类。

IEEE Trans Cybern. 2013 Dec;43(6):1990-2004. doi: 10.1109/TSMCB.2012.2237394.

A filter approach for feature selection in classification: application to automatic atrial fibrillation detection in electrocardiogram recordings.一种用于分类特征选择的滤波器方法：在心电图记录中自动检测心房颤动的应用。

BMC Med Inform Decis Mak. 2021 May 4;21(Suppl 4):130. doi: 10.1186/s12911-021-01427-8.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.使用微阵列基因表达数据的用于疾病分类的核嵌入高斯过程。

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Technology of Informative Feature Selection for Immunosignature Analysis.免疫特征分析信息特征选择技术。

Sovrem Tekhnologii Med. 2021;12(5):19-25. doi: 10.17691/stm2020.12.5.02. Epub 2020 Oct 28.

引用本文的文献

Maximum likelihood estimation for reversible mechanistic network models.最大似然估计在可逆转机制网络模型中的应用。

Phys Rev E. 2023 Aug;108(2-1):024308. doi: 10.1103/PhysRevE.108.024308.

本文引用的文献

Framework for converting mechanistic network models to probabilistic models.将机械网络模型转换为概率模型的框架。

J Complex Netw. 2023 Oct 20;11(5):cnad034. doi: 10.1093/comnet/cnad034. eCollection 2023 Oct.

Flexible model selection for mechanistic network models.机械网络模型的灵活模型选择

J Complex Netw. 2020 Apr;8(2):cnz024. doi: 10.1093/comnet/cnz024. Epub 2019 Aug 2.

ABC random forests for Bayesian parameter inference.ABC 随机森林用于贝叶斯参数推断。

Bioinformatics. 2019 May 15;35(10):1720-1728. doi: 10.1093/bioinformatics/bty867.

Constraining Effective Field Theories with Machine Learning.用机器学习约束有效场论。

Phys Rev Lett. 2018 Sep 14;121(11):111801. doi: 10.1103/PhysRevLett.121.111801.

Relief-based feature selection: Introduction and review.基于缓解的特征选择：介绍与综述。

J Biomed Inform. 2018 Sep;85:189-203. doi: 10.1016/j.jbi.2018.07.014. Epub 2018 Jul 18.

Benchmarking relief-based feature selection methods for bioinformatics data mining.基于基准的特征选择方法在生物信息学数据挖掘中的应用。

J Biomed Inform. 2018 Sep;85:168-188. doi: 10.1016/j.jbi.2018.07.015. Epub 2018 Jul 17.

Reliable ABC model choice via random forests.基于随机森林的可靠 ABC 模型选择。

Bioinformatics. 2016 Mar 15;32(6):859-66. doi: 10.1093/bioinformatics/btv684. Epub 2015 Nov 20.

High-quality binary protein interaction map of the yeast interactome network.酵母相互作用组网络的高质量二元蛋白质相互作用图谱。

Science. 2008 Oct 3;322(5898):104-10. doi: 10.1126/science.1158684. Epub 2008 Aug 21.

Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.基于互信息的特征选择：最大依赖、最大相关和最小冗余准则。

IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1226-38. doi: 10.1109/TPAMI.2005.159.

Estimating mutual information.估计互信息。

Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun;69(6 Pt 2):066138. doi: 10.1103/PhysRevE.69.066138. Epub 2004 Jun 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。