Health Informatics Centre, University of Dundee, Mail Box 15, Ninewells Hospital & Medical School, Dundee, DD1 9SY, UK.
Edinburgh Parallel Computing Centre, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UK.
Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy060.
The Health Informatics Centre at the University of Dundee provides a service to securely host clinical datasets and extract relevant data for anonymized cohorts to researchers to enable them to answer key research questions. As is common in research using routine healthcare data, the service was historically delivered using ad-hoc processes resulting in the slow provision of data whose provenance was often hidden to the researchers using it. This paper describes the development and evaluation of the Research Data Management Platform (RDMP): an open source tool to load, manage, clean, and curate longitudinal healthcare data for research and provide reproducible and updateable datasets for defined cohorts to researchers.
Between 2013 and 2017, RDMP tool implementation tripled the productivity of data analysts producing data releases for researchers from 7.1 to 25.3 per month and reduced the error rate from 12.7% to 3.1%. The effort on data management reduced from a mean of 24.6 to 3.0 hours per data release. The waiting time for researchers to receive data after agreeing a specification reduced from approximately 6 months to less than 1 week. The software is scalable and currently manages 163 datasets. A total 1,321 data extracts for research have been produced, with the largest extract linking data from 70 different datasets.
The tools and processes that encompass the RDMP not only fulfil the research data management requirements of researchers but also support the seamless collaboration of data cleaning, data transformation, data summarization and data quality assessment activities by different research groups.
邓迪大学健康信息学中心为研究人员安全托管临床数据集并提取相关数据,以创建匿名队列,从而帮助他们回答关键的研究问题。与使用常规医疗保健数据进行的研究一样,该服务过去一直采用临时流程,导致数据提供缓慢,而且其来源常常对使用数据的研究人员隐藏。本文介绍了研究数据管理平台(RDMP)的开发和评估:这是一个开源工具,用于加载、管理、清理和管理用于研究的纵向医疗保健数据,并为研究人员提供可重现和可更新的定义队列数据集。
在 2013 年至 2017 年间,RDMP 工具的实施使分析师每月为研究人员发布数据的效率提高了两倍,从 7.1 份增加到 25.3 份,错误率从 12.7%降低到 3.1%。数据管理的工作量从平均每次数据发布 24.6 小时减少到 3.0 小时。研究人员在同意规范后等待接收数据的时间从大约 6 个月缩短到不到 1 周。该软件具有可扩展性,目前管理着 163 个数据集。已生成了 1321 个用于研究的数据提取,最大的提取将来自 70 个不同数据集的数据链接起来。
RDMP 所包含的工具和流程不仅满足了研究人员对研究数据管理的要求,还支持不同研究小组之间的数据清理、数据转换、数据汇总和数据质量评估活动的无缝协作。