• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

参数数量发散的回归模型有限混合中的正则化

Regularization in finite mixture of regression models with diverging number of parameters.

作者信息

Khalili Abbas, Lin Shili

机构信息

Department of Mathematics and Statistics, McGill University, Montreal, Quebec, Canada H3A 2K6.

出版信息

Biometrics. 2013 Jun;69(2):436-46. doi: 10.1111/biom.12020. Epub 2013 Apr 4.

DOI:10.1111/biom.12020
PMID:23556535
Abstract

Feature (variable) selection has become a fundamentally important problem in recent statistical literature. Sometimes, in applications, many variables are introduced to reduce possible modeling biases, but the number of variables a model can accommodate is often limited by the amount of data available. In other words, the number of variables considered depends on the sample size, which reflects the estimability of the parametric model. In this article, we consider the problem of feature selection in finite mixture of regression models when the number of parameters in the model can increase with the sample size. We propose a penalized likelihood approach for feature selection in these models. Under certain regularity conditions, our approach leads to consistent variable selection. We carry out extensive simulation studies to evaluate the performance of the proposed approach under controlled settings. We also applied the proposed method to two real data. The first is on telemonitoring of Parkinson's disease (PD), where the problem concerns whether dysphonic features extracted from the patients' speech signals recorded at home can be used as surrogates to study PD severity and progression. The second is on breast cancer prognosis, in which one is interested in assessing whether cell nuclear features may offer prognostic values on long-term survival of breast cancer patients. Our analysis in each of the application revealed a mixture structure in the study population and uncovered a unique relationship between the features and the response variable in each of the mixture component.

摘要

特征(变量)选择已成为近期统计文献中一个极其重要的问题。有时,在应用中会引入许多变量以减少可能的建模偏差,但模型能够容纳的变量数量通常受到可用数据量的限制。换句话说,所考虑的变量数量取决于样本大小,这反映了参数模型的可估计性。在本文中,当模型中的参数数量会随着样本大小增加时,我们考虑回归模型有限混合中的特征选择问题。我们提出了一种用于这些模型特征选择的惩罚似然方法。在某些正则条件下,我们的方法会导致一致的变量选择。我们进行了广泛的模拟研究,以评估所提出方法在受控设置下的性能。我们还将所提出的方法应用于两个实际数据。第一个是关于帕金森病(PD)的远程监测,问题在于从在家记录的患者语音信号中提取的发声特征是否可以用作研究PD严重程度和进展的替代指标。第二个是关于乳腺癌预后,其中人们感兴趣的是评估细胞核特征是否可能对乳腺癌患者的长期生存提供预后价值。我们在每个应用中的分析都揭示了研究人群中的混合结构,并揭示了每个混合成分中特征与响应变量之间的独特关系。

相似文献

1
Regularization in finite mixture of regression models with diverging number of parameters.参数数量发散的回归模型有限混合中的正则化
Biometrics. 2013 Jun;69(2):436-46. doi: 10.1111/biom.12020. Epub 2013 Apr 4.
2
Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space.高维特征空间中稀疏正态线性模型有限混合的特征选择。
Biostatistics. 2011 Jan;12(1):156-72. doi: 10.1093/biostatistics/kxq048. Epub 2010 Aug 16.
3
Variable selection for clustering with Gaussian mixture models.用于高斯混合模型聚类的变量选择
Biometrics. 2009 Sep;65(3):701-9. doi: 10.1111/j.1541-0420.2008.01160.x. Epub 2009 Feb 4.
4
Hypothesis testing in a mixture case-control model.混合病例对照模型中的假设检验。
Biometrics. 2011 Mar;67(1):182-93. doi: 10.1111/j.1541-0420.2010.01409.x.
5
Likelihood methods for regression models with expensive variables missing by design.针对因设计而缺失昂贵变量的回归模型的似然方法。
Biom J. 2009 Feb;51(1):123-36. doi: 10.1002/bimj.200810487.
6
Sliced inverse regression with regularizations.带正则化的切片逆回归
Biometrics. 2008 Mar;64(1):124-31. doi: 10.1111/j.1541-0420.2007.00836.x. Epub 2007 Jul 25.
7
Analysis of matched case-control data in presence of nonignorable missing exposure.存在不可忽略的缺失暴露情况下匹配病例对照数据的分析。
Biometrics. 2008 Mar;64(1):106-14. doi: 10.1111/j.1541-0420.2007.00828.x. Epub 2007 Jun 15.
8
Regression analysis of panel count data with dependent observation times.具有相依观测时间的面板计数数据的回归分析。
Biometrics. 2007 Dec;63(4):1053-9. doi: 10.1111/j.1541-0420.2007.00808.x.
9
Penalized generalized estimating equations for high-dimensional longitudinal data analysis.用于高维纵向数据分析的惩罚广义估计方程
Biometrics. 2012 Jun;68(2):353-60. doi: 10.1111/j.1541-0420.2011.01678.x. Epub 2011 Sep 28.
10
Variable selection for marginal longitudinal generalized linear models.边际纵向广义线性模型的变量选择
Biometrics. 2005 Jun;61(2):507-14. doi: 10.1111/j.1541-0420.2005.00331.x.

引用本文的文献

1
Heterogeneity-aware integrative regression for ancestry-specific association studies.基于异质性感知的祖先特异性关联研究整合回归。
Biometrics. 2024 Oct 3;80(4). doi: 10.1093/biomtc/ujae109.
2
Mixture of regressions with multivariate responses for discovering subtypes in Alzheimer's biomarkers with detection limits.用于发现具有检测限的阿尔茨海默病生物标志物亚型的多元响应回归混合模型
Data Sci Sci. 2024;3(1). doi: 10.1080/26941899.2024.2309403. Epub 2024 Mar 6.
3
Estimation of multiple networks with common structures in heterogeneous subgroups.
异质子组中具有共同结构的多个网络的估计。
J Multivar Anal. 2024 Jul;202. doi: 10.1016/j.jmva.2024.105298. Epub 2024 Feb 13.
4
HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES.通过整合多源高维数据进行异质性分析及其在癌症研究中的应用
Stat Sin. 2023 Apr;33(2):729-758. doi: 10.5705/ss.202021.0002.
5
Structured Analysis of the High-dimensional FMR Model.高维FMR模型的结构化分析
Comput Stat Data Anal. 2020 Apr;144. doi: 10.1016/j.csda.2019.106883. Epub 2019 Nov 13.