Oka Masayoshi
Department of Management, Faculty of Management, Josai University, 1-1 Keyakidai, Sakado City, Saitama Prefecture, 350-0295, Japan.
Arch Public Health. 2021 Dec 15;79(1):226. doi: 10.1186/s13690-021-00750-w.
Standardization and normalization of continuous covariates are used to ease the interpretation of regression coefficients. Although these scaling techniques serve different purposes, they are sometimes used interchangeably or confused for one another. Therefore, the objective of this study is to demonstrate how these scaling techniques lead to different interpretations of the regression coefficient in multilevel logistic regression analyses.
Area-based socioeconomic data at the census tract level were obtained from the 2015-2019 American Community Survey for creating two measures of neighborhood socioeconomic status (SES), and a hypothetical data on health condition (favorable versus unfavorable) was constructed to represent 3000 individuals living across 300 census tracts (i.e., neighborhoods). Two measures of neighborhood SES were standardized by subtracting its mean and dividing by its standard deviation (SD) or by dividing by its interquartile range (IQR), and were normalized into a range between 0 and 1. Then, four separate multilevel logistic regression analyses were conducted to assess the association between neighborhood SES and health condition.
Based on standardized measures, the odds of having unfavorable health condition was roughly 1.34 times higher for a one-SD change or a one-IQR change in neighborhood SES; these reflect a health difference of individuals living in relatively high SES (relatively affluent) neighborhoods and those living in relatively low SES (relatively deprived) neighborhoods. On the other hand, when these standardized measures were replaced by its respective normalized measures, the odds of having unfavorable health condition was roughly 3.48 times higher for a full unit change in neighborhood SES; these reflect a health difference of individuals living in highest SES (most affluent) neighborhoods and those living in lowest SES (most deprived) neighborhoods.
Multilevel logistic regression analyses using standardized and normalized measures of neighborhood SES lead to different interpretations of the effect of neighborhood SES on health. Since both measures are valuable in their own right, interpreting a standardized and normalized measure of neighborhood SES will allow us to gain a more rounded view of the health differences of individuals along the gradient of neighborhood SES in a certain geographic location as well as across different geographic locations.
连续协变量的标准化和归一化用于便于解释回归系数。尽管这些缩放技术有不同的用途,但它们有时会被互换使用或相互混淆。因此,本研究的目的是展示这些缩放技术如何在多水平逻辑回归分析中导致对回归系数的不同解释。
从2015 - 2019年美国社区调查中获取普查区层面基于区域的社会经济数据,以创建邻里社会经济地位(SES)的两种测量指标,并构建了一个关于健康状况(良好与不佳)的假设数据,以代表生活在300个普查区(即邻里)的3000个人。邻里SES的两种测量指标通过减去其均值并除以其标准差(SD)或除以其四分位距(IQR)进行标准化,并归一化为0到1之间的范围。然后,进行四项独立的多水平逻辑回归分析,以评估邻里SES与健康状况之间的关联。
基于标准化测量指标,邻里SES每变化一个标准差或一个四分位距,健康状况不佳的几率大约高1.34倍;这些反映了生活在相对高SES(相对富裕)邻里的个体与生活在相对低SES(相对贫困)邻里的个体之间的健康差异。另一方面,当这些标准化测量指标被各自的归一化测量指标取代时,邻里SES每完整变化一个单位,健康状况不佳的几率大约高3.48倍;这些反映了生活在最高SES(最富裕)邻里的个体与生活在最低SES(最贫困)邻里的个体之间的健康差异。
使用邻里SES的标准化和归一化测量指标进行多水平逻辑回归分析会导致对邻里SES对健康影响产生不同的解释。由于这两种测量指标本身都有价值,解释邻里SES的标准化和归一化测量指标将使我们能够更全面地了解在特定地理位置以及不同地理位置上,个体在邻里SES梯度上的健康差异情况。