关于统计学习中最小惩罚的使用

On the Use of Minimum Penalties in Statistical Learning.

作者信息

Sherwood Ben, Price Bradley S

机构信息

School of Business, University of Kansas.

Management Information Systems Department, West Virginia University.

出版信息

J Comput Graph Stat. 2024;33(1):138-151. doi: 10.1080/10618600.2023.2210174. Epub 2023 Jun 20.

DOI:10.1080/10618600.2023.2210174

PMID:38706715

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11065433/

Abstract

Modern multivariate machine learning and statistical methodologies estimate parameters of interest while leveraging prior knowledge of the association between outcome variables. The methods that do allow for estimation of relationships do so typically through an error covariance matrix in multivariate regression which does not generalize to other types of models. In this article we proposed the MinPen framework to simultaneously estimate regression coefficients associated with the multivariate regression model and the relationships between outcome variables using common assumptions. The MinPen framework utilizes a novel penalty based on the minimum function to simultaneously detect and exploit relationships between responses. An iterative algorithm is proposed as a solution to the non-convex optimization. Theoretical results such as high dimensional convergence rates, model selection consistency, and a framework for post selection inference are provided. We extend the proposed MinPen framework to other exponential family loss functions, with a specific focus on multiple binomial responses. Tuning parameter selection is also addressed. Finally, simulations and two data examples are presented to show the finite sample properties of this framework. Supplemental material providing proofs, additional simulations, code, and data sets are available online.

摘要

现代多变量机器学习和统计方法在利用结果变量之间关联的先验知识时，估计感兴趣的参数。那些确实允许估计关系的方法通常是通过多元回归中的误差协方差矩阵来实现的，而这种方法并不能推广到其他类型的模型。在本文中，我们提出了MinPen框架，以使用常见假设同时估计与多元回归模型相关的回归系数以及结果变量之间的关系。MinPen框架利用基于最小函数的新型惩罚来同时检测和利用响应之间的关系。提出了一种迭代算法来解决非凸优化问题。提供了诸如高维收敛率、模型选择一致性以及选择后推断框架等理论结果。我们将所提出的MinPen框架扩展到其他指数族损失函数，特别关注多个二项式响应。还讨论了调优参数的选择。最后，给出了模拟和两个数据示例，以展示该框架的有限样本性质。提供证明、额外模拟、代码和数据集的补充材料可在线获取。