Naderi Mehrdad, Mirfarah Elham, Bernhardt Matthew, Chen Ding-Geng
Department of Statistics, Faculty of Natural & Agricultural Sciences, University of Pretoria, Pretoria, South Africa.
Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.
J Appl Stat. 2021 May 25;49(12):3022-3043. doi: 10.1080/02664763.2021.1931821. eCollection 2022.
In the censored data exploration, the classical linear regression model which assumes normally distributed random errors is perhaps one of the commonly used frameworks. However, practical studies have often criticized the classical linear regression model because of its sensitivity to departure from the normality and partial nonlinearity. This paper proposes to solve these potential issues simultaneously in the context of the partial linear regression model by assuming that the random errors follow a scale-mixture of normal (SMN) family of distributions. The postulated method allows us to model data with great flexibility, accommodating heavy tails and outliers. By implementing the B-spline approximation and using the convenient hierarchical representation of the SMN distributions, a computationally analytical EM-type algorithm is developed for obtaining maximum likelihood (ML) parameter estimates. Various simulation studies are conducted to investigate the finite sample properties, as well as the robustness of the model in dealing with the heavy tails distributed datasets. Real-world data examples are finally analyzed for illustrating the usefulness of the proposed methodology.
在删失数据探索中,假定随机误差服从正态分布的经典线性回归模型或许是常用的框架之一。然而,实际研究常常批评经典线性回归模型,因其对偏离正态性和部分非线性较为敏感。本文提出在部分线性回归模型的背景下,通过假定随机误差服从正态分布的尺度混合(SMN)分布族来同时解决这些潜在问题。所提出的方法使我们能够以极大的灵活性对数据进行建模,适应重尾分布和异常值。通过实施B样条逼近并使用SMN分布便利的分层表示,开发了一种计算解析的期望最大化(EM)型算法来获得最大似然(ML)参数估计。进行了各种模拟研究以考察有限样本性质以及该模型处理重尾分布数据集时的稳健性。最后分析了实际数据示例以说明所提出方法的实用性。