• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种通过随机化和镜像统计进行错误发现率控制和功效最大化的计算高效方法。

A computationally efficient approach to false discovery rate control and power maximisation via randomisation and mirror statistic.

作者信息

Molinari Marco, Thoresen Magne

机构信息

Department of Biostatistics, University of Oslo, Oslo, Norway.

出版信息

Stat Methods Med Res. 2025 Jun;34(6):1233-1253. doi: 10.1177/09622802251329768. Epub 2025 Mar 31.

DOI:10.1177/09622802251329768
PMID:40165448
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12209545/
Abstract

Simultaneously performing variable selection and inference in high-dimensional regression models is an open challenge in statistics and machine learning. The increasing availability of vast amounts of variables requires the adoption of specific statistical procedures to accurately select the most important predictors in a high-dimensional space, while controlling the false discovery rate (FDR) associated with the variable selection procedure. In this paper, we propose the joint adoption of the Mirror Statistic approach to FDR control, coupled with outcome randomisation to maximise the statistical power of the variable selection procedure, measured through the true positive rate. Through extensive simulations, we show how our proposed strategy allows us to combine the benefits of the two techniques. The Mirror Statistic is a flexible method to control FDR, which only requires mild model assumptions, but requires two sets of independent regression coefficient estimates, usually obtained after splitting the original dataset. Outcome randomisation is an alternative to data splitting that allows to generate two independent outcomes, which can then be used to estimate the coefficients that go into the construction of the Mirror Statistic. The combination of these two approaches provides increased testing power in a number of scenarios, such as highly correlated covariates and high percentages of active variables. Moreover, it is scalable to very high-dimensional problems, since the algorithm has a low memory footprint and only requires a single run on the full dataset, as opposed to iterative alternatives such as multiple data splitting.

摘要

在高维回归模型中同时进行变量选择和推断是统计学和机器学习领域的一个开放性挑战。大量变量的可得性不断增加,这就需要采用特定的统计程序,以便在高维空间中准确选择最重要的预测变量,同时控制与变量选择程序相关的错误发现率(FDR)。在本文中,我们建议联合采用用于控制FDR的镜像统计方法,并结合结果随机化,以通过真阳性率衡量最大化变量选择程序的统计功效。通过广泛的模拟,我们展示了我们提出的策略如何使我们能够结合这两种技术的优点。镜像统计是一种控制FDR的灵活方法,它只需要适度的模型假设,但需要两组独立的回归系数估计值,通常是在拆分原始数据集后获得的。结果随机化是数据拆分的一种替代方法,它允许生成两个独立的结果,然后可用于估计构建镜像统计所需的系数。这两种方法的结合在许多情况下都能提高检验功效,比如在协变量高度相关和活跃变量比例很高的情况下。此外,它可扩展到非常高维的问题,因为该算法内存占用低,并且只需要在完整数据集上运行一次,这与诸如多次数据拆分等迭代方法不同。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/c68265b23f5f/10.1177_09622802251329768-fig13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/0e52c91ea7f5/10.1177_09622802251329768-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/360edebbdd44/10.1177_09622802251329768-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/4c8918c074df/10.1177_09622802251329768-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/7588d0c8bc29/10.1177_09622802251329768-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/dbc6ba9e605f/10.1177_09622802251329768-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/2d5d7e6e1e74/10.1177_09622802251329768-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/a11723c9bbc5/10.1177_09622802251329768-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/2c88c3cc5297/10.1177_09622802251329768-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/a732e91a97fe/10.1177_09622802251329768-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/609a9f801151/10.1177_09622802251329768-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/dc85f282266f/10.1177_09622802251329768-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/ffbd9724e002/10.1177_09622802251329768-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/c68265b23f5f/10.1177_09622802251329768-fig13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/0e52c91ea7f5/10.1177_09622802251329768-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/360edebbdd44/10.1177_09622802251329768-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/4c8918c074df/10.1177_09622802251329768-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/7588d0c8bc29/10.1177_09622802251329768-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/dbc6ba9e605f/10.1177_09622802251329768-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/2d5d7e6e1e74/10.1177_09622802251329768-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/a11723c9bbc5/10.1177_09622802251329768-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/2c88c3cc5297/10.1177_09622802251329768-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/a732e91a97fe/10.1177_09622802251329768-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/609a9f801151/10.1177_09622802251329768-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/dc85f282266f/10.1177_09622802251329768-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/ffbd9724e002/10.1177_09622802251329768-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb4/12209545/c68265b23f5f/10.1177_09622802251329768-fig13.jpg

相似文献

1
A computationally efficient approach to false discovery rate control and power maximisation via randomisation and mirror statistic.一种通过随机化和镜像统计进行错误发现率控制和功效最大化的计算高效方法。
Stat Methods Med Res. 2025 Jun;34(6):1233-1253. doi: 10.1177/09622802251329768. Epub 2025 Mar 31.
2
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
3
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
4
Interventions for central serous chorioretinopathy: a network meta-analysis.中心性浆液性脉络膜视网膜病变的干预措施:一项网状Meta分析
Cochrane Database Syst Rev. 2025 Jun 16;6(6):CD011841. doi: 10.1002/14651858.CD011841.pub3.
5
Measures implemented in the school setting to contain the COVID-19 pandemic.学校为控制 COVID-19 疫情而采取的措施。
Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029.
6
Interventions targeted at women to encourage the uptake of cervical screening.针对女性的干预措施,以鼓励她们接受宫颈癌筛查。
Cochrane Database Syst Rev. 2021 Sep 6;9(9):CD002834. doi: 10.1002/14651858.CD002834.pub3.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
8
Automated monitoring compared to standard care for the early detection of sepsis in critically ill patients.与标准护理相比,自动监测用于危重症患者脓毒症的早期检测
Cochrane Database Syst Rev. 2018 Jun 25;6(6):CD012404. doi: 10.1002/14651858.CD012404.pub2.
9
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
10
Direct composite resin fillings versus amalgam fillings for permanent posterior teeth.直接复合树脂充填与银汞合金充填用于永久性后牙。
Cochrane Database Syst Rev. 2021 Aug 13;8(8):CD005620. doi: 10.1002/14651858.CD005620.pub3.

本文引用的文献

1
Comments on "Data fission: splitting a single data point" by James Leiner, Boyan Duan, Larry Wasserman, and Aaditya Ramdas.对詹姆斯·莱纳、段博岩、拉里·瓦瑟曼和阿迪蒂亚·拉姆达斯所著的《数据裂变:拆分单个数据点》的评论
J Am Stat Assoc. 2025;120(549):176-177. doi: 10.1080/01621459.2024.2412808. Epub 2025 Apr 14.
2
A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis.一篇关于高通量测序数据分析中特征选择和特征提取进展的综述。
Funct Integr Genomics. 2024 Aug 19;24(5):139. doi: 10.1007/s10142-024-01415-x.
3
Comments on "A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models".
对《广义线性模型中用于错误发现率控制的无标度方法》的评论
J Am Stat Assoc. 2023;118(543):1586-1589. doi: 10.1080/01621459.2023.2224412. Epub 2023 Aug 21.
4
Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges.高维生物医学数据的统计分析:分析目标、常见方法和挑战简介。
BMC Med. 2023 May 15;21(1):182. doi: 10.1186/s12916-023-02858-y.
5
Small sample sizes: A big data problem in high-dimensional data analysis.小样本量:高维数据分析中的大数据问题。
Stat Methods Med Res. 2021 Mar;30(3):687-701. doi: 10.1177/0962280220970228. Epub 2020 Nov 24.
6
Human postprandial responses to food and potential for precision nutrition.人类对食物的餐后反应和精准营养的潜力。
Nat Med. 2020 Jun;26(6):964-973. doi: 10.1038/s41591-020-0934-0. Epub 2020 Jun 11.
7
Exchanging a few commercial, regularly consumed food items with improved fat quality reduces total cholesterol and LDL-cholesterol: a double-blind, randomised controlled trial.用脂肪质量更佳的常见商业食品替换几种经常食用的食品可降低总胆固醇和低密度脂蛋白胆固醇:一项双盲随机对照试验。
Br J Nutr. 2016 Oct;116(8):1383-1393. doi: 10.1017/S0007114516003445. Epub 2016 Oct 14.
8
Non-Concave Penalized Likelihood with NP-Dimensionality.具有NP维数的非凹惩罚似然法
IEEE Trans Inf Theory. 2011 Aug;57(8):5467-5484. doi: 10.1109/TIT.2011.2158486.