O'Neill Meadhbh, Burke Kevin
Department of Mathematics and Statistics, University of Limerick, Limerick, Republic of Ireland.
Stat Comput. 2023;33(3):71. doi: 10.1007/s11222-023-10204-8. Epub 2023 Apr 21.
Modern variable selection procedures make use of penalization methods to execute simultaneous model selection and estimation. A popular method is the least absolute shrinkage and selection operator, the use of which requires selecting the value of a tuning parameter. This parameter is typically tuned by minimizing the cross-validation error or Bayesian information criterion, but this can be computationally intensive as it involves fitting an array of different models and selecting the best one. In contrast with this standard approach, we have developed a procedure based on the so-called "smooth IC" (SIC) in which the tuning parameter is automatically selected in one step. We also extend this model selection procedure to the distributional regression framework, which is more flexible than classical regression modelling. Distributional regression, also known as multiparameter regression, introduces flexibility by taking account of the effect of covariates through multiple distributional parameters simultaneously, e.g., mean and variance. These models are useful in the context of normal linear regression when the process under study exhibits heteroscedastic behaviour. Reformulating the distributional regression estimation problem in terms of penalized likelihood enables us to take advantage of the close relationship between model selection criteria and penalization. Utilizing the SIC is computationally advantageous, as it obviates the issue of having to choose multiple tuning parameters.
The online version contains supplementary material available at 10.1007/s11222-023-10204-8.
现代变量选择程序利用惩罚方法来同时进行模型选择和估计。一种流行的方法是最小绝对收缩和选择算子,使用该方法需要选择一个调优参数的值。这个参数通常通过最小化交叉验证误差或贝叶斯信息准则来调整,但这可能计算量很大,因为它涉及拟合一系列不同的模型并选择最佳模型。与这种标准方法不同,我们开发了一种基于所谓“平滑信息准则”(SIC)的程序,其中调优参数可以一步自动选择。我们还将这种模型选择程序扩展到分布回归框架,该框架比经典回归建模更灵活。分布回归,也称为多参数回归,通过同时考虑协变量对多个分布参数(例如均值和方差)的影响来引入灵活性。当所研究的过程表现出异方差行为时,这些模型在正态线性回归的背景下很有用。将分布回归估计问题重新表述为惩罚似然,使我们能够利用模型选择标准与惩罚之间的密切关系。使用SIC在计算上具有优势,因为它避免了必须选择多个调优参数的问题。
在线版本包含可在10.1007/s11222-023-10204-8获取的补充材料。