• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在高维情形下控制错误发现:基于稳定性选择的增强方法

Controlling false discoveries in high-dimensional situations: boosting with stability selection.

作者信息

Hofner Benjamin, Boccuto Luigi, Göker Markus

机构信息

Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-University Erlangen-Nuremberg, Waldstraße 6, Erlangen, 91054, Germany.

Greenwood Genetic Center, 113 Gregor Mendel Circle, Greenwood, 29646, SC, USA.

出版信息

BMC Bioinformatics. 2015 May 6;16:144. doi: 10.1186/s12859-015-0575-3.

DOI:10.1186/s12859-015-0575-3
PMID:25943565
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4464883/
Abstract

BACKGROUND

Modern biotechnologies often result in high-dimensional data sets with many more variables than observations (n≪p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. Similar challenges arise if in modern data sets from observational studies, e.g., in ecology, where flexible, non-linear models are fitted to high-dimensional data. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provide insights into the usefulness of this combination. The interpretation of the used error bounds is elaborated and insights for practical data analysis are given.

RESULTS

Stability selection with boosting was able to detect influential predictors in high-dimensional settings while controlling the given error bound in various simulation scenarios. The dependence on various parameters such as the sample size, the number of truly influential variables or tuning parameters of the algorithm was investigated. The results were applied to investigate phenotype measurements in patients with autism spectrum disorders using a log-linear interaction model which was fitted by boosting. Stability selection identified five differentially expressed amino acid pathways.

CONCLUSION

Stability selection is implemented in the freely available R package stabs (http://CRAN.R-project.org/package=stabs). It proved to work well in high-dimensional settings with more predictors than observations for both, linear and additive models. The original version of stability selection, which controls the per-family error rate, is quite conservative, though, this is much less the case for its improvement, complementary pairs stability selection. Nevertheless, care should be taken to appropriately specify the error bound.

摘要

背景

现代生物技术常常会产生高维数据集,其中变量的数量远多于观测值(n≪p)。这些数据集给统计分析带来了新的挑战:在这种情况下,变量选择成为最重要的任务之一。在现代观测研究的数据集(例如生态学中的数据集)中,如果要对高维数据拟合灵活的非线性模型,也会出现类似的挑战。我们评估了最近提出的一种名为稳定性选择的灵活变量选择框架。通过使用重采样程序,稳定性选择为诸如套索回归或提升法等高维变量选择程序添加了有限样本误差控制。我们考虑了提升法与稳定性选择的结合,并展示了详细模拟研究的结果,这些结果深入揭示了这种结合的实用性。阐述了所用误差界限的解释,并给出了对实际数据分析的见解。

结果

结合提升法的稳定性选择能够在高维环境中检测出有影响力的预测变量,同时在各种模拟场景中控制给定的误差界限。研究了对各种参数的依赖性,如样本大小、真正有影响力的变量数量或算法的调优参数。研究结果被应用于使用通过提升法拟合的对数线性交互模型来研究自闭症谱系障碍患者的表型测量。稳定性选择识别出了五条差异表达的氨基酸途径。

结论

稳定性选择已在免费的R包stabs(http://CRAN.R-project.org/package=stabs)中实现。它在预测变量多于观测值的高维环境中,无论是线性模型还是加性模型,都被证明效果良好。稳定性选择的原始版本控制的是族错误率,相当保守,不过其改进版本——互补对稳定性选择则并非如此。尽管如此,仍应谨慎适当地指定误差界限。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/3ac47e5f062f/12859_2015_575_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/4e18aa310939/12859_2015_575_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/747217d8507f/12859_2015_575_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/34ec30341bdf/12859_2015_575_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/ac2fc26382f8/12859_2015_575_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/25cafe422761/12859_2015_575_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/6d6ca7b94b72/12859_2015_575_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/38775b82662c/12859_2015_575_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/c2df664a2dfb/12859_2015_575_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/18552e04adc7/12859_2015_575_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/dc8e667bb6c3/12859_2015_575_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/f6e68b9569ab/12859_2015_575_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/e81b30ae99ea/12859_2015_575_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/c5ba961296b5/12859_2015_575_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/23535c129acd/12859_2015_575_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/3ac47e5f062f/12859_2015_575_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/4e18aa310939/12859_2015_575_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/747217d8507f/12859_2015_575_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/34ec30341bdf/12859_2015_575_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/ac2fc26382f8/12859_2015_575_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/25cafe422761/12859_2015_575_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/6d6ca7b94b72/12859_2015_575_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/38775b82662c/12859_2015_575_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/c2df664a2dfb/12859_2015_575_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/18552e04adc7/12859_2015_575_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/dc8e667bb6c3/12859_2015_575_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/f6e68b9569ab/12859_2015_575_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/e81b30ae99ea/12859_2015_575_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/c5ba961296b5/12859_2015_575_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/23535c129acd/12859_2015_575_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8852/4464883/3ac47e5f062f/12859_2015_575_Fig15_HTML.jpg

相似文献

1
Controlling false discoveries in high-dimensional situations: boosting with stability selection.在高维情形下控制错误发现:基于稳定性选择的增强方法
BMC Bioinformatics. 2015 May 6;16:144. doi: 10.1186/s12859-015-0575-3.
2
Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.通过优化一致性指数和稳定性选择提高稀疏生存模型的判别能力。
BMC Bioinformatics. 2016 Jul 22;17:288. doi: 10.1186/s12859-016-1149-8.
3
Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings.高维环境下生存数据分析中变量选择方法的比较。
Comput Math Methods Med. 2020 Jul 1;2020:6795392. doi: 10.1155/2020/6795392. eCollection 2020.
4
Component-wise gradient boosting and false discovery control in survival analysis with high-dimensional covariates.高维协变量生存分析中的逐分量梯度提升与错误发现率控制
Bioinformatics. 2016 Jan 1;32(1):50-7. doi: 10.1093/bioinformatics/btv517. Epub 2015 Sep 17.
5
False discovery control for penalized variable selections with high-dimensional covariates.具有高维协变量的惩罚变量选择的错误发现控制
Stat Appl Genet Mol Biol. 2018 Dec 15;17(6):/j/sagmb.2018.17.issue-6/sagmb-2018-0038/sagmb-2018-0038.xml. doi: 10.1515/sagmb-2018-0038.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
Boosting distributional copula regression.提升分布型关联回归。
Biometrics. 2023 Sep;79(3):2298-2310. doi: 10.1111/biom.13765. Epub 2022 Oct 11.
8
Ensembling Variable Selectors by Stability Selection for the Cox Model.基于稳定性选择的 Cox 模型变量集成选择器。
Comput Intell Neurosci. 2017;2017:2747431. doi: 10.1155/2017/2747431. Epub 2017 Nov 15.
9
Comparison of variable selection methods for high-dimensional survival data with competing events.高维生存数据中存在竞争事件时的变量选择方法比较。
Comput Biol Med. 2017 Dec 1;91:159-167. doi: 10.1016/j.compbiomed.2017.10.021. Epub 2017 Oct 20.
10
Estimating causal effects of time-dependent exposures on a binary endpoint in a high-dimensional setting.在高维环境中估计时变暴露对二分类结局的因果效应。
BMC Med Res Methodol. 2018 Jul 3;18(1):67. doi: 10.1186/s12874-018-0527-5.

引用本文的文献

1
Comparative Ungulate Diversity and Biomass Change With Human Use and Drought: Implications for Community Stability and Protected Area Prioritization in African Savannas.有蹄类动物多样性及生物量随人类活动和干旱的变化比较:对非洲稀树草原群落稳定性和保护区优先排序的影响
Ecol Evol. 2025 Aug 28;15(9):e71946. doi: 10.1002/ece3.71946. eCollection 2025 Sep.
2
A novel and robust feature selection method with FDR control for omics-wide association analysis.一种用于组学全关联分析的具有错误发现率控制的新颖且稳健的特征选择方法。
PLoS One. 2025 Aug 22;20(8):e0300490. doi: 10.1371/journal.pone.0300490. eCollection 2025.
3
A longitudinal cohort study uncovers plasma protein biomarkers predating clinical onset and treatment response of rheumatoid arthritis.

本文引用的文献

1
A test for comparing two groups of samples when analyzing multiple omics profiles.用于分析多个组学图谱时比较两组样本的检验方法。
BMC Bioinformatics. 2014 Jul 8;15:236. doi: 10.1186/1471-2105-15-236.
2
Boosting the concordance index for survival data--a unified framework to derive and evaluate biomarker combinations.提高生存数据的一致性指数——一种用于推导和评估生物标志物组合的统一框架。
PLoS One. 2014 Jan 6;9(1):e84483. doi: 10.1371/journal.pone.0084483. eCollection 2014.
3
opm: an R package for analysing OmniLog(R) phenotype microarray data.
一项纵向队列研究发现了早于类风湿性关节炎临床发病和治疗反应的血浆蛋白生物标志物。
Nat Commun. 2025 Jul 21;16(1):6692. doi: 10.1038/s41467-025-62032-1.
4
The language of paranoia: linguistic analysis of SMI speech with considerations of race and sex.偏执狂的语言:对严重精神疾病患者言语的语言分析,并考虑种族和性别因素
J Ment Health. 2025 Jun 12:1-8. doi: 10.1080/09638237.2025.2512313.
5
Nonparametric IPSS: fast, flexible feature selection with false discovery control.非参数IPSS:具有错误发现控制的快速、灵活的特征选择
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf299.
6
Association of tumor microbiome with survival in resected early-stage PDAC.肿瘤微生物群与切除的早期胰腺导管腺癌生存率的关联
mSystems. 2025 Mar 18;10(3):e0122924. doi: 10.1128/msystems.01229-24. Epub 2025 Feb 27.
7
Modulatory Neurotransmitter Genotypes Shape Dynamic Functional Connectome Reconfigurations.调节性神经递质基因型塑造动态功能连接组重构。
J Neurosci. 2025 Mar 5;45(10):e1939242025. doi: 10.1523/JNEUROSCI.1939-24.2025.
8
Stable multivariate lesion symptom mapping.稳定的多变量病变症状映射
Apert Neuro. 2024;4. doi: 10.52294/001c.117311. Epub 2024 Jun 7.
9
Cluster effect for SNP-SNP interaction pairs for predicting complex traits.用于预测复杂性状的 SNP-SNP 相互作用对的聚类效应。
Sci Rep. 2024 Aug 12;14(1):18677. doi: 10.1038/s41598-024-66311-7.
10
Genetic diversity, gene flow, and landscape resistance in a pond-breeding amphibian in agricultural and natural forested landscapes in Norway.挪威农业和天然森林景观中一种池塘繁殖两栖动物的遗传多样性、基因流动和景观抗性。
Evol Appl. 2023 Dec 20;17(1):e13633. doi: 10.1111/eva.13633. eCollection 2024 Jan.
opm:用于分析 OmniLog(R) 表型微阵列数据的 R 包。
Bioinformatics. 2013 Jul 15;29(14):1823-4. doi: 10.1093/bioinformatics/btt291. Epub 2013 Jun 5.
4
Decreased tryptophan metabolism in patients with autism spectrum disorders.自闭症谱系障碍患者色氨酸代谢减少。
Mol Autism. 2013 Jun 3;4(1):16. doi: 10.1186/2040-2392-4-16.
5
Autism spectrum disorders.自闭症谱系障碍
Curr Probl Pediatr Adolesc Health Care. 2013 Jan;43(1):2-11. doi: 10.1016/j.cppeds.2012.08.001.
6
TIGRESS: Trustful Inference of Gene REgulation using Stability Selection.TIGRESS:利用稳定性选择进行基因调控的可信推断
BMC Syst Biol. 2012 Nov 22;6:145. doi: 10.1186/1752-0509-6-145.
7
A PAUC-based estimation technique for disease classification and biomarker selection.一种基于PAUC的疾病分类和生物标志物选择估计技术。
Stat Appl Genet Mol Biol. 2012 Oct 1;11(5):/j/sagmb.2012.11.issue-5/1544-6115.1792/1544-6115.1792.xml. doi: 10.1515/1544-6115.1792.
8
Wisdom of crowds for robust gene network inference.群体智慧在稳健基因网络推断中的应用。
Nat Methods. 2012 Jul 15;9(8):796-804. doi: 10.1038/nmeth.2016.
9
Visualization and curve-parameter estimation strategies for efficient exploration of phenotype microarray kinetics.用于高效探索表型微阵列动力学的可视化和曲线参数估计策略。
PLoS One. 2012;7(4):e34846. doi: 10.1371/journal.pone.0034846. Epub 2012 Apr 20.
10
The importance of knowing when to stop. A sequential stopping rule for component-wise gradient boosting.知道何时停止的重要性。一种用于按分量梯度提升的顺序停止规则。
Methods Inf Med. 2012;51(2):178-86. doi: 10.3414/ME11-02-0030. Epub 2012 Feb 20.