Davillas Apostolos, Jones Andrew M
Institute for Social and Economic Research, University of Essex, Colchester, UK.
Department of Economics and Related Studies, University of York, York, UK.
Health Econ. 2018 Oct;27(10):1617-1624. doi: 10.1002/hec.3787. Epub 2018 Jun 14.
Recent advances in social science surveys include collection of biological samples. Although biomarkers offer a large potential for social science and economic research, they impose a number of statistical challenges, often being distributed asymmetrically with heavy tails. Using data from the UK Household Panel Survey, we illustrate the comparative performance of a set of flexible parametric distributions, which allow for a wide range of skewness and kurtosis: the four-parameter generalized beta of the second kind (GB2), the three-parameter generalized gamma, and their three-, two-, or one-parameter nested and limiting cases. Commonly used blood-based biomarkers for inflammation, diabetes, cholesterol, and stress-related hormones are modelled. Although some of the three-parameter distributions nested within the GB2 outperform the latter for most of the biomarkers considered, the GB2 can be used as a guide for choosing among competing parametric distributions for biomarkers. Going "beyond the mean" to estimate tail probabilities, we find that GB2 performs fairly well with some disparities at the very high levels of glycated hemoglobin and fibrinogen. Commonly used linear models are shown to perform worse than almost all the flexible distributions.
社会科学调查的最新进展包括生物样本的收集。尽管生物标志物在社会科学和经济研究中具有巨大潜力,但它们带来了一些统计挑战,其分布往往不对称且具有厚尾。利用英国家庭调查的数据,我们展示了一组灵活的参数分布的比较性能,这些分布允许广泛的偏度和峰度:四参数第二类广义贝塔分布(GB2)、三参数广义伽马分布,以及它们的三参数、两参数或单参数嵌套和极限情况。对常用的基于血液的炎症、糖尿病、胆固醇和应激相关激素的生物标志物进行了建模。尽管嵌套在GB2中的一些三参数分布在大多数所考虑的生物标志物方面优于GB2,但GB2可作为在生物标志物的竞争参数分布中进行选择的指南。在估计尾部概率时“超越均值”,我们发现GB2在糖化血红蛋白和纤维蛋白原的非常高水平上存在一些差异时表现相当不错。结果表明,常用的线性模型的表现比几乎所有灵活分布都要差。