• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于Huber准则的马尔可夫子采样

Markov Subsampling Based on Huber Criterion.

作者信息

Gong Tieliang, Dong Yuxin, Chen Hong, Dong Bo, Li Chen

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2250-2262. doi: 10.1109/TNNLS.2022.3189069. Epub 2024 Feb 5.

DOI:10.1109/TNNLS.2022.3189069
PMID:35834451
Abstract

Subsampling is an important technique to tackle the computational challenges brought by big data. Many subsampling procedures fall within the framework of importance sampling, which assigns high sampling probabilities to the samples appearing to have big impacts. When the noise level is high, those sampling procedures tend to pick many outliers and thus often do not perform satisfactorily in practice. To tackle this issue, we design a new Markov subsampling strategy based on Huber criterion (HMS) to construct an informative subset from the noisy full data; the constructed subset then serves as refined working data for efficient processing. HMS is built upon a Metropolis-Hasting procedure, where the inclusion probability of each sampling unit is determined using the Huber criterion to prevent over scoring the outliers. Under mild conditions, we show that the estimator based on the subsamples selected by HMS is statistically consistent with a sub-Gaussian deviation bound. The promising performance of HMS is demonstrated by extensive studies on large-scale simulations and real data examples.

摘要

子采样是应对大数据带来的计算挑战的一项重要技术。许多子采样过程都属于重要性采样框架,该框架会给那些似乎有重大影响的样本赋予高采样概率。当噪声水平较高时,那些采样过程往往会选取许多离群值,因此在实际中往往表现不佳。为了解决这个问题,我们基于Huber准则设计了一种新的马尔可夫子采样策略(HMS),以便从有噪声的完整数据中构建一个信息丰富的子集;然后,构建的子集将作为经过细化的工作数据用于高效处理。HMS基于一个Metropolis-Hasting过程构建,其中每个采样单元的包含概率使用Huber准则来确定,以防止对离群值过度评分。在温和条件下,我们表明基于HMS选择的子样本的估计量在统计上是一致的,且具有次高斯偏差界。通过对大规模模拟和实际数据示例的广泛研究,证明了HMS具有良好的性能。

相似文献

1
Markov Subsampling Based on Huber Criterion.基于Huber准则的马尔可夫子采样
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2250-2262. doi: 10.1109/TNNLS.2022.3189069. Epub 2024 Feb 5.
2
Robust and efficient subsampling algorithms for massive data logistic regression.用于海量数据逻辑回归的稳健且高效的子采样算法。
J Appl Stat. 2023 Apr 26;51(8):1427-1445. doi: 10.1080/02664763.2023.2205611. eCollection 2024.
3
Optimal Subsampling for Large Sample Logistic Regression.大样本逻辑回归的最优子采样
J Am Stat Assoc. 2018;113(522):829-844. doi: 10.1080/01621459.2017.1292914. Epub 2018 Jun 6.
4
LOCAL CASE-CONTROL SAMPLING: EFFICIENT SUBSAMPLING IN IMBALANCED DATA SETS.局部病例对照抽样:不平衡数据集中的高效子抽样
Ann Stat. 2014 Oct 1;42(5):1693-1724. doi: 10.1214/14-AOS1220.
5
Non-parametric estimation of transition probabilities in non-Markov multi-state models: The landmark Aalen-Johansen estimator.非马尔可夫多状态模型中转移概率的非参数估计: landmark Aalen-Johansen 估计量。
Stat Methods Med Res. 2018 Jul;27(7):2081-2092. doi: 10.1177/0962280216674497. Epub 2016 Oct 20.
6
Sampling-based estimation for massive survival data with additive hazards model.基于抽样的加性风险模型在海量生存数据分析中的估计。
Stat Med. 2021 Jan 30;40(2):441-450. doi: 10.1002/sim.8783. Epub 2020 Nov 3.
7
Optimal subsampling for parametric accelerated failure time models with massive survival data.针对大规模生存数据的参数加速失效时间模型的最优抽样。
Stat Med. 2022 Nov 30;41(27):5421-5431. doi: 10.1002/sim.9576. Epub 2022 Sep 20.
8
Adaptive Huber Regression.自适应稳健回归
J Am Stat Assoc. 2020;115(529):254-265. doi: 10.1080/01621459.2018.1543124. Epub 2019 Apr 22.
9
Efficient posterior sampling for high-dimensional imbalanced logistic regression.高维不平衡逻辑回归的高效后验抽样
Biometrika. 2020 Jun 17;107(4):1005-1012. doi: 10.1093/biomet/asaa035. eCollection 2020 Dec.
10
Pairwise stochastic approximation for confirmatory factor analysis of categorical data.用于分类数据验证性因子分析的成对随机逼近法。
Br J Math Stat Psychol. 2025 Feb;78(1):22-43. doi: 10.1111/bmsp.12347. Epub 2024 Apr 27.

引用本文的文献

1
Asymptotics of Subsampling for Generalized Linear Regression Models under Unbounded Design.无界设计下广义线性回归模型的子采样渐近性
Entropy (Basel). 2022 Dec 31;25(1):84. doi: 10.3390/e25010084.