Rubio-Codina Marta, Araujo M Caridad, Attanasio Orazio, Muñoz Pablo, Grantham-McGregor Sally
Social Protection and Health Division, Inter-American Development Bank, Washington, D.C., United States of America.
Centre for the Evaluation of Development Policies, Institute for Fiscal Studies, London, United Kingdom.
PLoS One. 2016 Aug 22;11(8):e0160962. doi: 10.1371/journal.pone.0160962. eCollection 2016.
In low- and middle-income countries (LIMCs), measuring early childhood development (ECD) with standard tests in large scale surveys and evaluations of interventions is difficult and expensive. Multi-dimensional screeners and single-domain tests ('short tests') are frequently used as alternatives. However, their validity in these circumstances is unknown. We examined the feasibility, reliability, and concurrent validity of three multi-dimensional screeners (Ages and Stages Questionnaires (ASQ-3), Denver Developmental Screening Test (Denver-II), Battelle Developmental Inventory screener (BDI-2)) and two single-domain tests (MacArthur-Bates Short-Forms (SFI and SFII), WHO Motor Milestones (WHO-Motor)) in 1,311 children 6-42 months in Bogota, Colombia. The scores were compared with those on the Bayley Scales of Infant and Toddler Development (Bayley-III), taken as the 'gold standard'. The Bayley-III was given at a center by psychologists; whereas the short tests were administered in the home by interviewers, as in a survey setting. Findings indicated good internal validity of all short tests except the ASQ-3. The BDI-2 took long to administer and was expensive, while the single-domain tests were quickest and cheapest and the Denver-II and ASQ-3 were intermediate. Concurrent validity of the multi-dimensional tests' cognitive, language, and fine motor scales with the corresponding Bayley-III scale was low below 19 months. However, it increased with age, becoming moderate-to-high over 30 months. In contrast, gross motor scales' concurrence was high under 19 months and then decreased. Of the single-domain tests, the WHO-Motor had high validity with gross motor under 16 months, and the SFI and SFII expressive scales showed moderate correlations with language under 30 months. Overall, the Denver-II was the most feasible and valid multi-dimensional test and the ASQ-3 performed poorly under 31 months. By domain, gross motor development had the highest concurrence below 19 months, and language above. Predictive validity investigation is needed to further guide the choice of instruments for large scale studies.
在低收入和中等收入国家(中低收入国家),在大规模调查以及干预措施评估中,使用标准测试来衡量幼儿发展(ECD)既困难又昂贵。多维度筛查工具和单领域测试(“简短测试”)经常被用作替代方法。然而,它们在这些情况下的有效性尚不清楚。我们在哥伦比亚波哥大对1311名6至42个月大的儿童,检验了三种多维度筛查工具(年龄与阶段问卷(ASQ - 3)、丹佛发育筛查测验(丹佛 - II)、贝利婴幼儿发展量表筛查版(BDI - 2))和两种单领域测试(麦克阿瑟 - 贝茨简版(SFI和SFII)、世界卫生组织运动里程碑(WHO - Motor))的可行性、可靠性和同时效度。将这些测试的分数与被视为“金标准”的贝利婴幼儿发展量表第三版(贝利 - III)的分数进行比较。贝利 - III由心理学家在中心进行测试;而简短测试则由访员在家庭中进行,如同在调查环境中那样。研究结果表明,除ASQ - 3外,所有简短测试都具有良好的内部效度。BDI - 2实施耗时且成本高,而单领域测试最快且最便宜,但丹佛 - II和ASQ - 3介于两者之间。多维度测试的认知、语言和精细动作量表与相应的贝利 - III量表在低于19个月时同时效度较低,但随着年龄增长而提高,在30个月以上时达到中到高度。相比之下,大动作量表的一致性在19个月以下时较高,之后下降。在单领域测试中,WHO - Motor在16个月以下时对大动作具有较高效度,SFI和SFII表达量表在30个月以下时与语言呈中度相关。总体而言,丹佛 - II是最可行且有效的多维度测试,ASQ - 3在31个月以下表现不佳。按领域划分,大动作发展在19个月以下时一致性最高,语言发展在19个月以上时一致性最高。需要进行预测效度调查,以进一步指导大规模研究中工具的选择。