基于参数置信区域估计的高维二值特征贝叶斯线性模型的主动学习。

Active Learning of Bayesian Linear Models with High-Dimensional Binary Features by Parameter Confidence-Region Estimation.

机构信息

RIKEN Center for Advanced Intelligent Project, Chuo-ku, Tokyo, 103-0027, Japan

Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, Aichi, 466-8555, Japan; JST, PRESTO, Kawaguchi, Saitama, 332-0012, Japan; and Center for Materials Research by Information Integration, National Institute for Material Science, Sengen, Tsukuba, Ibaraki, 305-0047, Japan

出版信息

Neural Comput. 2020 Oct;32(10):1998-2031. doi: 10.1162/neco_a_01310. Epub 2020 Aug 14.

DOI:10.1162/neco_a_01310

PMID:32795233

Abstract

In this letter, we study an active learning problem for maximizing an unknown linear function with high-dimensional binary features. This problem is notoriously complex but arises in many important contexts. When the sampling budget, that is, the number of possible function evaluations, is smaller than the number of dimensions, it tends to be impossible to identify all of the optimal binary features. Therefore, in practice, only a small number of such features are considered, with the majority kept fixed at certain default values, which we call the . The main contribution of this letter is to formally study the working set heuristic and present a suite of theoretically robust algorithms for more efficient use of the sampling budget. Technically, we introduce a novel method for estimating the confidence regions of model parameters that is tailored to active learning with high-dimensional binary features. We provide a rigorous theoretical analysis of these algorithms and prove that a commonly used working set heuristic can identify optimal binary features with favorable sample complexity. We explore the performance of the proposed approach through numerical simulations and an application to a functional protein design problem.

摘要

在这封信中，我们研究了一个具有高维二进制特征的未知线性函数最大化的主动学习问题。这个问题非常复杂，但在许多重要的背景下都存在。当采样预算（即可能的函数评估数量）小于维度数量时，识别所有最优二进制特征往往变得不可能。因此，在实践中，只考虑少数这样的特征，而将大多数特征固定在某些默认值上，我们称之为. 这封信的主要贡献是正式研究工作集启发式，并提出了一系列理论上稳健的算法，以更有效地利用采样预算。从技术上讲，我们引入了一种新的方法来估计模型参数的置信区域，该方法专门针对具有高维二进制特征的主动学习。我们对这些算法进行了严格的理论分析，并证明了常用的工作集启发式可以以有利的样本复杂度识别最优的二进制特征。我们通过数值模拟和对功能蛋白质设计问题的应用来探索所提出方法的性能。

相似文献

Active Learning of Bayesian Linear Models with High-Dimensional Binary Features by Parameter Confidence-Region Estimation.基于参数置信区域估计的高维二值特征贝叶斯线性模型的主动学习。

Neural Comput. 2020 Oct;32(10):1998-2031. doi: 10.1162/neco_a_01310. Epub 2020 Aug 14.

Applications of Monte Carlo Simulation in Modelling of Biochemical Processes蒙特卡罗模拟在生化过程建模中的应用

Effects of additional data on Bayesian clustering.额外数据对贝叶斯聚类的影响。

Neural Netw. 2017 Oct;94:86-95. doi: 10.1016/j.neunet.2017.06.015. Epub 2017 Jul 12.

Efficient computation of confidence intervals for Bayesian model predictions based on multidimensional parameter space.基于多维参数空间的贝叶斯模型预测置信区间的高效计算。

Methods Enzymol. 2009;454:213-31. doi: 10.1016/S0076-6879(08)03808-1.

Application of supervised machine learning as a method for identifying DEM contact law parameters.监督机器学习在识别 DEM 接触律参数中的应用。

Math Biosci Eng. 2021 Sep 1;18(6):7490-7505. doi: 10.3934/mbe.2021370.

Accuracy of latent-variable estimation in Bayesian semi-supervised learning.贝叶斯半监督学习中潜在变量估计的准确性。

Neural Netw. 2015 Sep;69:1-10. doi: 10.1016/j.neunet.2015.04.012. Epub 2015 May 9.

Voxel-based supervised machine learning of peripheral zone prostate cancer using noncontrast multiparametric MRI.基于体素的外周带前列腺癌非增强多参数磁共振成像监督式机器学习

J Appl Clin Med Phys. 2020 Oct;21(10):179-191. doi: 10.1002/acm2.12992. Epub 2020 Aug 8.

Generalized Fiducial Inference for Binary Logistic Item Response Models.二元逻辑斯蒂项目反应模型的广义置信推断

Psychometrika. 2016 Jun;81(2):290-324. doi: 10.1007/s11336-015-9492-7. Epub 2016 Jan 14.

Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest.使用 DIYABC 随机森林将带监督机器学习的近似贝叶斯计算扩展到使用遗传多态性推断人口历史。

Mol Ecol Resour. 2021 Nov;21(8):2598-2613. doi: 10.1111/1755-0998.13413. Epub 2021 May 21.

An assessment of estimation methods for generalized linear mixed models with binary outcomes.二项式结局广义线性混合模型估计方法的评估。

Stat Med. 2013 Nov 20;32(26):4550-66. doi: 10.1002/sim.5866. Epub 2013 Jul 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于参数置信区域估计的高维二值特征贝叶斯线性模型的主动学习。

Active Learning of Bayesian Linear Models with High-Dimensional Binary Features by Parameter Confidence-Region Estimation.

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献