Memorial University Faculty of Education, 323 Prince Philip Drive, St. John's, NL A1B 3X8, Canada.
Department of Leadership, Higher and Adult Education, Ontario Institute for Studies in Education, 252 Bloor Street West, Toronto, ON M5S 1V6, Canada.
Int J Popul Data Sci. 2023 Feb 2;8(1):1843. doi: 10.23889/ijpds.v8i1.1843. eCollection 2023.
Longitudinal data that tracks student achievement over many years are crucial for understanding children's learning and for guiding effective policies and interventions. Despite being Canada's most populous province, Ontario lacks such large-scale and longitudinal data on student learning. Linking datasets across cohorts requires rigorous linkage protocols, flexible handling of complex cohort structures, methods to validate linked datasets, and viable organizational partnerships. We linked administrative data on early child development and educational achievement and merged two datasets on characteristics of students' neighborhoods and schools. We developed a linkage protocol and validated how the resulting database could be generalized to Ontario's student population.
Two main individual-level data sources were linked: 1) the Early Development Instrument (EDI), a school readiness assessment of all Ontario public school kindergartners that is administered in three-year cycles, and 2) Ontario's Educational Quality and Assessment Office's (EQAO) math and reading assessments in grades 3, 6, 9, and 10. To compensate for their lack of a common personal identification number, a deterministic linkage process was developed using several administrative variables. A school-level and a neighborhood-level dataset were also later linked. We examined differences between unlinked and linked cases across several variables.
We successfully linked 50% of the EDI's 374,239 cases, 86,778 of which contained all five datapoints, creating a database tracking achievement for multiple cohorts from kindergarten through grade 10, with covariates for their development, demographics, affect, neighborhoods, and schools. Analyses revealed only negligible differences between linked and unlinked cases across several demographic measures, while small differences were detected across a neighborhood socioeconomic index and some measures of child development. In conclusion, we recommend the filling of key voids in sustainable research capacity by creating representative data through linkage protocols and data verification.
跟踪学生多年学业成绩的纵向数据对于了解儿童学习情况以及指导有效的政策和干预措施至关重要。尽管安大略省是加拿大人口最多的省份,但它缺乏关于学生学习的这种大规模和长期的纵向数据。链接跨队列数据集需要严格的链接协议、灵活处理复杂的队列结构、验证链接数据集的方法以及可行的组织合作关系。我们链接了关于幼儿发展和教育成就的行政数据,并合并了关于学生社区和学校特征的两个数据集。我们制定了一个链接协议,并验证了由此产生的数据库如何推广到安大略省的学生群体。
我们链接了两个主要的个人层面数据源:1)早期发展工具(EDI),这是安大略省所有公立学校幼儿园学生的学校准备评估,每三年进行一次评估;2)安大略省教育质量和评估办公室(EQAO)的数学和阅读评估,在 3 年级、6 年级、9 年级和 10 年级进行。为了弥补它们缺乏共同的个人识别号码的不足,我们使用几个行政变量开发了一个确定性链接过程。后来还链接了一个学校层面和一个社区层面的数据集。我们检查了几个变量的未链接和链接案例之间的差异。
我们成功链接了 EDI 的 374,239 个案例中的 50%,其中 86,778 个案例包含所有五个数据点,创建了一个从幼儿园到 10 年级跟踪多个队列成绩的数据库,其中包含他们发展、人口统计学、情感、社区和学校的协变量。分析结果表明,在几个人口统计学测量方面,链接和未链接案例之间只有微不足道的差异,而在社区社会经济指数和一些儿童发展测量方面则存在较小的差异。总之,我们建议通过创建具有代表性的数据集来填补可持续研究能力的关键空白,这些数据集通过链接协议和数据验证创建。