• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于稀有事件数据和稀疏模型的尺度不变最优采样

Scale-invariant Optimal Sampling for Rare-events Data and Sparse Models.

作者信息

Wang Jing, Wang HaiYing, Zhang Hao Helen

机构信息

Department of Statistics, University of Connecticut, Storrs, CT 06269.

Department of Mathematics, University of Arizona.

出版信息

Adv Neural Inf Process Syst. 2024;37:98384-98418.

PMID:40641564
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12245189/
Abstract

Subsampling is effective in tackling computational challenges for massive data with rare events. Overly aggressive subsampling may adversely affect estimation efficiency, and optimal subsampling is essential to mitigate the information loss. However, existing optimal subsampling probabilities depends on data scales, and some scaling transformations may result in inefficient subsamples. This problem is more significant when there are inactive features, because their influence on the subsampling probabilities can be arbitrarily magnified by inappropriate scaling transformations. We tackle this challenge and introduce a scale-invariant optimal subsampling function in the context of sparse models, where inactive features are commonly assumed. Instead of focusing on estimating model parameters, we define an optimal subsampling function to minimize the prediction error, using adaptive lasso as an example to outline the estimation procedure and study its theoretical guarantee. We first introduce the adaptive lasso estimator for rare-events data and establish its oracle properties, thereby validating the use of subsampling. Then we derive a scale-invariant optimal subsampling function that minimizes the prediction error of the inverse probability weighted (IPW) adaptive lasso. Finally, we present an estimator based on the maximum sampled conditional likelihood (MSCL) to further improve the estimation efficiency. We conduct numerical experiments using both simulated and real-world data sets to demonstrate the performance of the proposed methods.

摘要

子采样对于处理包含罕见事件的海量数据的计算挑战是有效的。过度激进的子采样可能会对估计效率产生不利影响,而最优子采样对于减轻信息损失至关重要。然而,现有的最优子采样概率依赖于数据规模,并且一些缩放变换可能会导致低效的子样本。当存在非活跃特征时,这个问题会更加显著,因为不适当的缩放变换可能会任意放大它们对子采样概率的影响。我们应对这一挑战,并在稀疏模型的背景下引入一种尺度不变的最优子采样函数,在该模型中通常假定存在非活跃特征。我们不是专注于估计模型参数,而是定义一个最优子采样函数以最小化预测误差,以自适应套索为例概述估计过程并研究其理论保证。我们首先为罕见事件数据引入自适应套索估计器并建立其神谕性质,从而验证子采样的使用。然后我们推导一个尺度不变的最优子采样函数,该函数可最小化逆概率加权(IPW)自适应套索的预测误差。最后,我们提出一种基于最大采样条件似然(MSCL)的估计器,以进一步提高估计效率。我们使用模拟数据集和真实世界数据集进行数值实验,以证明所提出方法的性能。

相似文献

1
Scale-invariant Optimal Sampling for Rare-events Data and Sparse Models.用于稀有事件数据和稀疏模型的尺度不变最优采样
Adv Neural Inf Process Syst. 2024;37:98384-98418.
2
Sexual Harassment and Prevention Training性骚扰与预防培训
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.
6
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
7
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
8
Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.使用移动应用程序与其他方法收集的自我管理调查问卷回复的比较。
Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.
9
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
10
Atypical antipsychotics for disruptive behaviour disorders in children and youths.用于治疗儿童和青少年破坏性行为障碍的非典型抗精神病药物。
Cochrane Database Syst Rev. 2017 Aug 9;8(8):CD008559. doi: 10.1002/14651858.CD008559.pub3.

本文引用的文献

1
Optimal Subsampling for Large Sample Logistic Regression.大样本逻辑回归的最优子采样
J Am Stat Assoc. 2018;113(522):829-844. doi: 10.1080/01621459.2017.1292914. Epub 2018 Jun 6.
2
Classification of Imbalanced Data by Oversampling in Kernel Space of Support Vector Machines.支持向量机核空间中基于过采样的不平衡数据分类
IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):4065-4076. doi: 10.1109/TNNLS.2017.2751612. Epub 2017 Oct 10.
3
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.
4
Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.《超高维特征空间中的确定独立性筛选》讨论
J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.
5
Exploratory undersampling for class-imbalance learning.用于类别不平衡学习的探索性欠采样
IEEE Trans Syst Man Cybern B Cybern. 2009 Apr;39(2):539-50. doi: 10.1109/TSMCB.2008.2007853. Epub 2008 Dec 16.