McHorney C A, Ware J E, Lu J F, Sherbourne C D
Health Institute, New England Medical Center, Boston, MA 02111.
Med Care. 1994 Jan;32(1):40-66. doi: 10.1097/00005650-199401000-00004.
The widespread use of standardized health surveys is predicated on the largely untested assumption that scales constructed from those surveys will satisfy minimum psychometric requirements across diverse population groups. Data from the Medical Outcomes Study (MOS) were used to evaluate data completeness and quality, test scaling assumptions, and estimate internal-consistency reliability for the eight scales constructed from the MOS SF-36 Health Survey. Analyses were conducted among 3,445 patients and were replicated across 24 subgroups differing in sociodemographic characteristics, diagnosis, and disease severity. For each scale, item-completion rates were high across all groups (88% to 95%), but tended to be somewhat lower among the elderly, those with less than a high school education, and those in poverty. On average, surveys were complete enough to compute scales scores for more than 96% of the sample. Across patient groups, all scales passed tests for item-internal consistency (97% passed) and item-discriminant validity (92% passed). Reliability coefficients ranged from a low of 0.65 to a high of 0.94 across scales (median = 0.85) and varied somewhat across patient subgroups. Floor effects were negligible except for the two role disability scales. Noteworthy ceiling effects were observed for both role disability scales and the social functioning scale. These findings support the use of the SF-36 survey across the diverse populations studied and identify population groups in which use of standardized health status measures may or may not be problematic.
标准化健康调查的广泛使用基于一个很大程度上未经检验的假设,即从这些调查构建的量表将满足不同人群的最低心理测量要求。医学结果研究(MOS)的数据用于评估数据的完整性和质量、检验量表假设,并估计从MOS SF-36健康调查构建的八个量表的内部一致性信度。对3445名患者进行了分析,并在24个在社会人口学特征、诊断和疾病严重程度方面存在差异的亚组中进行了重复分析。对于每个量表,所有组的项目完成率都很高(88%至95%),但老年人、高中以下学历者和贫困人口中的完成率往往略低。平均而言,调查问卷的完整性足以计算超过96%样本的量表得分。在所有患者组中,所有量表都通过了项目内部一致性检验(97%通过)和项目区分效度检验(通过)。各量表的信度系数范围从低至0.65到高至0.94(中位数 = 0.85),且在不同患者亚组中略有差异。除了两个角色残疾量表外,地板效应可以忽略不计。在角色残疾量表和社会功能量表中均观察到了明显的天花板效应。这些发现支持在研究的不同人群中使用SF-36调查,并确定了使用标准化健康状况测量可能有问题或可能没有问题的人群组。