Gavett Brandon E, Ilango Sindana D, Koscik Rebecca, Ma Yue, Helfand Benjamin, Eng Chloe W, Gross Alden, Trittschuh Emily H, Jones Richard N, Mungas Dan
School of Psychological Science University of Western Australia Perth Western Australia Australia.
Department of Epidemiology University of Washington School of Public Health Seattle Washington USA.
Alzheimers Dement (Amst). 2023 Jun 18;15(2):e12438. doi: 10.1002/dad2.12438. eCollection 2023 Apr-Jun.
Research focusing on cognitive aging and dementia is a global endeavor. However, cross-national differences in cognition are embedded in other sociocultural differences, precluding direct comparisons of test scores. Such comparisons can be facilitated by co-calibration using item response theory (IRT). The goal of this study was to explore, using simulation, the necessary conditions for accurate harmonization of cognitive data.
Neuropsychological test scores from the US Health and Retirement Study (HRS) and the Mexican Health and Aging Study (MHAS) were subjected to IRT analysis to estimate item parameters and sample means and standard deviations. These estimates were used to generate simulated item response patterns under 10 scenarios that adjusted the quality and quantity of linking items used in harmonization. IRT-derived factor scores were compared to the known population values to assess bias, efficiency, accuracy, and reliability of the harmonized data.
The current configuration of HRS and MHAS data was not suitable for harmonization, as poor linking item quality led to large bias in both cohorts. Scenarios with more numerous and higher quality linking items led to less biased and more accurate harmonization.
Linking items must possess low measurement error across the range of latent ability for co-calibration to be successful.
We developed a statistical simulation platform to evaluate the degree to which cross-sample harmonization accuracy varies as a function of the quality and quantity of linking items.Two large studies of aging-one in Mexico and one in the United States-use three common items to measure cognition.These three common items have weak correspondence with the ability being measured and are all low in difficulty.Harmonized scores derived from the three common linking items will provide biased and inaccurate estimates of cognitive ability.Harmonization accuracy is greatest when linking items vary in difficulty and are strongly related to the ability being measured.
关注认知衰老和痴呆症的研究是一项全球性的工作。然而,认知方面的跨国差异嵌入在其他社会文化差异之中,这使得无法直接比较测试分数。使用项目反应理论(IRT)进行共同校准有助于进行此类比较。本研究的目的是通过模拟探索准确协调认知数据的必要条件。
对来自美国健康与退休研究(HRS)和墨西哥健康与老龄化研究(MHAS)的神经心理学测试分数进行IRT分析,以估计项目参数以及样本均值和标准差。这些估计值用于在10种情景下生成模拟的项目反应模式,这些情景调整了协调中使用的链接项目的质量和数量。将IRT得出的因子分数与已知的总体值进行比较,以评估协调后数据的偏差、效率、准确性和可靠性。
HRS和MHAS数据的当前配置不适合进行协调,因为链接项目质量差导致两个队列都存在较大偏差。链接项目数量更多且质量更高的情景导致偏差更小且协调更准确。
链接项目在潜在能力范围内必须具有低测量误差,共同校准才能成功。
我们开发了一个统计模拟平台,以评估跨样本协调准确性随链接项目质量和数量变化的程度。两项关于衰老的大型研究——一项在墨西哥,一项在美国——使用三个共同项目来测量认知。这三个共同项目与所测量的能力对应性较弱,且难度都较低。从这三个共同链接项目得出的协调分数将对认知能力提供有偏差且不准确的估计。当链接项目难度不同且与所测量的能力密切相关时,协调准确性最高。