Jäntschi Lorentz, Bálint Donatella, Bolboacă Sorana D
Department of Physics and Chemistry, Faculty of Materials and Environmental Engineering, Technical University of Cluj-Napoca, Muncii Boulevard No. 103-105, 400641 Cluj-Napoca, Romania; Doctoral School of Chemistry, Institute for Doctoral Studies, Babeş-Bolyai University, Kogălniceanu Street No. 1, 400084 Cluj-Napoca, Romania; Department of Chemistry, Faculty of Science, University of Oradea, Universităţii Street No. 1, 410087 Oradea, Romania.
Doctoral School of Chemistry, Institute for Doctoral Studies, Babeş-Bolyai University, Kogălniceanu Street No. 1, 400084 Cluj-Napoca, Romania.
Comput Math Methods Med. 2016;2016:8578156. doi: 10.1155/2016/8578156. Epub 2016 Dec 7.
Multiple linear regression analysis is widely used to link an outcome with predictors for better understanding of the behaviour of the outcome of interest. Usually, under the assumption that the errors follow a normal distribution, the coefficients of the model are estimated by minimizing the sum of squared deviations. A new approach based on maximum likelihood estimation is proposed for finding the coefficients on linear models with two predictors without any constrictive assumptions on the distribution of the errors. The algorithm was developed, implemented, and tested as proof-of-concept using fourteen sets of compounds by investigating the link between activity/property (as outcome) and structural feature information incorporated by molecular descriptors (as predictors). The results on real data demonstrated that in all investigated cases the power of the error is significantly different by the convenient value of two when the Gauss-Laplace distribution was used to relax the constrictive assumption of the normal distribution of the error. Therefore, the Gauss-Laplace distribution of the error could not be rejected while the hypothesis that the power of the error from Gauss-Laplace distribution is normal distributed also failed to be rejected.
多元线性回归分析被广泛用于将一个结果与预测变量联系起来,以便更好地理解感兴趣结果的行为。通常,在误差服从正态分布的假设下,通过最小化平方偏差之和来估计模型的系数。本文提出了一种基于最大似然估计的新方法,用于在对误差分布没有任何约束性假设的情况下,找到具有两个预测变量的线性模型的系数。通过研究活性/性质(作为结果)与分子描述符纳入的结构特征信息(作为预测变量)之间的联系,开发、实现并测试了该算法,作为概念验证使用了十四组化合物。实际数据结果表明,在所有研究案例中,当使用高斯 - 拉普拉斯分布来放宽误差正态分布的约束性假设时,误差的幂与方便值二有显著差异。因此,误差的高斯 - 拉普拉斯分布不能被拒绝,而来自高斯 - 拉普拉斯分布的误差幂呈正态分布的假设也未能被拒绝。