Suppr超能文献

利用全国性巴西行政数据库检查记录链接过程的质量,以建立一个大型出生队列。

Examining the quality of record linkage process using nationwide Brazilian administrative databases to build a large birth cohort.

机构信息

Centre for Data and Knowledge Integration for Health (CIDACS), Fiocruz Bahia, Salvador, Brazil.

University of Arizona, Computer Science Department, Tucson, Arizona, USA.

出版信息

BMC Med Inform Decis Mak. 2020 Jul 25;20(1):173. doi: 10.1186/s12911-020-01192-0.

Abstract

BACKGROUND

Research using linked routine population-based data collected for non-research purposes has increased in recent years because they are a rich and detailed source of information. The objective of this study is to present an approach to prepare and link data from administrative sources in a middle-income country, to estimate its quality and to identify potential sources of bias by comparing linked and non-linked individuals.

METHODS

We linked two administrative datasets with data covering the period 2001 to 2015, using maternal attributes (name, age, date of birth, and municipally of residence) from Brazil: live birth information system and the 100 Million Brazilian Cohort (created using administrative records from over 114 million individuals whose families applied for social assistance via the Unified Register for Social Programmes) implementing an in house developed linkage tool CIDACS-RL. We then estimated the proportion of highly probably link and examined the characteristics of missed-matches to identify any potential source of bias.

RESULTS

A total of 27,699,891 live births were submited to linkage with maternal information recorded in the baseline of the 100 Million Brazilian Cohort dataset of those, 16,447,414 (59.4%) children were found registered in the 100 Million Brazilian Cohort dataset. The proportion of highly probably link ranged from 39.3% in 2001 to 82.1% in 2014. A substantial improvement in the linkage after the introduction of maternal date of birth attribute, in 2011, was observed. Our analyses indicated a slightly higher proportion of missing data among missed matches and a higher proportion of people living in an urban area and self-declared as Caucasian among linked pairs when compared with non-linked sets.

DISCUSSION

We demonstrated that CIDACS-RL is capable of performing high quality linkage even with a limited number of common attributes, using indexation as a blocking strategy in larg e routine databases from a middle-income country. However, residual records occurred more among people under worse living conditions. The results presented in this study reinforce the need of evaluating linkage quality and when necessary to take linkage error into account for the analyses of any generated dataset.

摘要

背景

近年来,利用为非研究目的而收集的常规人口基础数据进行研究的情况有所增加,因为这些数据是丰富而详细的信息来源。本研究的目的是介绍一种从中等收入国家的行政来源准备和链接数据的方法,通过比较链接和非链接个体来评估其质量并确定潜在的偏差来源。

方法

我们使用来自巴西的产妇属性(姓名、年龄、出生日期和居住的市),使用内部开发的链接工具 CIDACS-RL 将两个涵盖 2001 年至 2015 年期间的数据的行政数据集进行链接:活产信息系统和 1 亿巴西队列(使用超过 1.14 亿个人的行政记录创建,这些家庭通过统一登记册申请社会援助,以参与社会方案)。然后,我们估计了高度可能链接的比例,并检查了错过匹配的特征,以确定任何潜在的偏差来源。

结果

共有 27699891 例活产提交链接,其中 16447414 例(59.4%)儿童在 1 亿巴西队列数据集的基线中记录了产妇信息。高度可能链接的比例范围从 2001 年的 39.3%到 2014 年的 82.1%。在 2011 年引入产妇出生日期属性后,链接得到了显著改善。我们的分析表明,在错过匹配中,缺失数据的比例略高,在链接对中,居住在城市地区和自我申报为白种人的比例也略高,而在非链接组中则略低。

讨论

我们证明了 CIDACS-RL 即使使用有限数量的公共属性,也能够使用索引作为大的中等收入国家常规数据库的阻断策略来执行高质量的链接。然而,在生活条件较差的人群中,仍存在残留记录。本研究中的结果强调了评估链接质量的必要性,并且在必要时,需要考虑链接错误,以对任何生成的数据集进行分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f02e/7382864/431deb512088/12911_2020_1192_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验