Rossi Riccardo, Murari Andrea, Gaudio Pasquale, Gelfusa Michela
Department of Industrial Engineering, University of Rome "Tor Vergata", via del Politecnico 1, 01100 Roma, Italy.
Consorzio RFX (CNR, ENEA, INFN, Università di Padova, Acciaierie Venete SpA), Corso Stati Uniti 4, 35127 Padova, Italy.
Entropy (Basel). 2020 Apr 15;22(4):447. doi: 10.3390/e22040447.
The Bayesian information criterion (BIC), the Akaike information criterion (AIC), and some other indicators derived from them are widely used for model selection. In their original form, they contain the likelihood of the data given the models. Unfortunately, in many applications, it is practically impossible to calculate the likelihood, and, therefore, the criteria have been reformulated in terms of descriptive statistics of the residual distribution: the variance and the mean-squared error of the residuals. These alternative versions are strictly valid only in the presence of additive noise of Gaussian distribution, not a completely satisfactory assumption in many applications in science and engineering. Moreover, the variance and the mean-squared error are quite crude statistics of the residual distributions. More sophisticated statistical indicators, capable of better quantifying how close the residual distribution is to the noise, can be profitably used. In particular, specific goodness of fit tests have been included in the expressions of the traditional criteria and have proved to be very effective in improving their discriminating capability. These improved performances have been demonstrated with a systematic series of simulations using synthetic data for various classes of functions and different noise statistics.
贝叶斯信息准则(BIC)、赤池信息准则(AIC)以及从它们衍生出的其他一些指标被广泛用于模型选择。在其原始形式中,它们包含给定模型下数据的似然性。不幸的是,在许多应用中,实际上不可能计算似然性,因此,这些准则已根据残差分布的描述性统计量重新制定:残差的方差和均方误差。这些替代版本仅在存在高斯分布的加性噪声时才严格有效,而这在科学和工程中的许多应用中并非完全令人满意的假设。此外,方差和均方误差是残差分布相当粗略的统计量。能够更好地量化残差分布与噪声接近程度的更复杂统计指标可以得到有效利用。特别是,传统准则的表达式中纳入了特定的拟合优度检验,并且已证明在提高其判别能力方面非常有效。通过使用针对各类函数和不同噪声统计量的合成数据进行的一系列系统模拟,展示了这些改进的性能。