Suppr超能文献

围产期关联研究数据的数据清理与管理协议:来自吸烟母亲(母亲用药与安全)研究的一个良好实践案例

Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study.

作者信息

Tran Duong Thuy, Havard Alys, Jorm Louisa R

机构信息

Centre for Big Data Research in Health, Faculty of Medicine, UNSW Sydney (The University of New South Wales), Sydney, NSW, 2052, Australia.

出版信息

BMC Med Res Methodol. 2017 Jul 11;17(1):97. doi: 10.1186/s12874-017-0385-6.

Abstract

BACKGROUND

Data cleaning is an important quality assurance in data linkage research studies. This paper presents the data cleaning and preparation process for a large-scale cross-jurisdictional Australian study (the Smoking MUMS Study) to evaluate the utilisation and safety of smoking cessation pharmacotherapies during pregnancy.

METHODS

Perinatal records for all deliveries (2003-2012) in the States of New South Wales (NSW) and Western Australia were linked to State-based data collections including hospital separation, emergency department and death data (mothers and babies) and congenital defect notifications (babies in NSW) by State-based data linkage units. A national data linkage unit linked pharmaceutical dispensing data for the mothers. All linkages were probabilistic. Twenty two steps assessed the uniqueness of records and consistency of items within and across data sources, resolved discrepancies in the linkages between units, and identified women having records in both States.

RESULTS

State-based linkages yielded a cohort of 783,471 mothers and 1,232,440 babies. Likely false positive links relating to 3703 mothers were identified. Corrections of baby's date of birth and age, and parity were made for 43,578 records while 1996 records were flagged as duplicates. Checks for the uniqueness of the matches between State and national linkages detected 3404 ID clusters, suggestive of missed links in the State linkages, and identified 1986 women who had records in both States.

CONCLUSIONS

Analysis of content data can identify inaccurate links that cannot be detected by data linkage units that have access to personal identifiers only. Perinatal researchers are encouraged to adopt the methods presented to ensure quality and consistency among studies using linked administrative data.

摘要

背景

数据清理是数据关联研究中的一项重要质量保证措施。本文介绍了一项大规模跨司法管辖区的澳大利亚研究(吸烟母亲研究)的数据清理和准备过程,以评估孕期戒烟药物疗法的使用情况和安全性。

方法

新南威尔士州(NSW)和西澳大利亚州所有分娩(2003 - 2012年)的围产期记录,通过各州的数据关联单位,与包括医院出院、急诊科和死亡数据(母亲和婴儿)以及先天性缺陷通知(新南威尔士州的婴儿)等基于州的数据收集信息进行了关联。一个国家数据关联单位对母亲的药品配药数据进行了关联。所有关联均为概率性关联。二十二个步骤评估了记录的唯一性以及数据源内部和之间项目的一致性,解决了各单位之间关联中的差异,并识别出在两个州都有记录的女性。

结果

基于州的关联产生了一个由783,471名母亲和1,232,440名婴儿组成的队列。识别出了与3703名母亲相关的可能的假阳性关联。对43,578条记录的婴儿出生日期、年龄和产次进行了修正,同时将1996条记录标记为重复记录。对州和国家关联之间匹配的唯一性检查发现了3404个身份集群,表明州关联中存在遗漏的关联,并识别出1986名在两个州都有记录的女性。

结论

对内容数据的分析可以识别仅能访问个人标识符的数据关联单位无法检测到的不准确关联。鼓励围产期研究人员采用本文介绍的方法,以确保使用关联行政数据的研究之间的质量和一致性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1477/5504784/b0f2ddb2453f/12874_2017_385_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验