Gao Chuang, Mumtaz Shahzad, McCall Sophie, O'Sullivan Katherine, McGilchrist Mark, Morales Daniel R, Hall Christopher, Wilde Katie, Mayor Charlie, Linksted Pamela, Harrison Kathy, Cole Christian, Jefferson Emily
Health Informatics Centre, School of Medicine, University of Dundee, UK; Population Health and Genomics, School of Medicine, University of Dundee, UK.
School of Natural & Computing Sciences, University of Aberdeen, UK.
J Biomed Inform. 2025 Feb;162:104771. doi: 10.1016/j.jbi.2024.104771. Epub 2025 Jan 2.
Medical laboratory data together with prescribing and hospitalisation records are three of the most used electronic health records (EHRs) for data-driven health research. In Scotland, hospitalisation, prescribing and the death register data are available nationally whereas laboratory data is captured, stored and reported from local health board systems with significant heterogeneity. For researchers or other users of this regionally curated data, working on laboratory datasets across regional cohorts requires effort and time. As part of this study, the Scottish Safe Haven Network have developed an open-source software pipeline to generate a harmonised laboratory dataset.
We obtained sample laboratory data from the four regional Safe Havens in Scotland covering people within the SHARE consented cohort. We compared the variables collected by each regional Safe Haven and mapped these to 11 FHIR and 2 Scottish-specific standardised terms (i.e., one to indicate the regional health board and a second to describe the source clinical code description).
We compared the laboratory data and found that 180 test codes covered 98.7 % of test records performed across Scotland. Focusing on the 180 test codes, we developed a set of transformations to convert test results captured in different units to the same unit. We included both Read Codes and SNOMED CT to encode the tests within the pipeline.
We validated our harmonisation pipeline by comparing the results across the different regional datasets. The pipeline can be reused by researchers and/or Safe Havens to generate clean, harmonised laboratory data at a national level with minimal effort.
医学实验室数据与处方及住院记录是数据驱动型健康研究中最常用的三种电子健康记录(EHR)。在苏格兰,全国范围内可获取住院、处方和死亡登记数据,而实验室数据是从具有显著异质性的地方卫生委员会系统中采集、存储和报告的。对于研究人员或该地区整理数据的其他用户而言,处理跨区域队列的实验室数据集需要耗费精力和时间。作为本研究的一部分,苏格兰安全避风港网络开发了一个开源软件管道,以生成一个统一的实验室数据集。
我们从苏格兰的四个区域安全避风港获取了样本实验室数据,涵盖了参与SHARE同意队列的人群。我们比较了每个区域安全避风港收集的变量,并将这些变量映射到11个FHIR和2个苏格兰特定的标准化术语(即一个表示区域卫生委员会,另一个描述源临床代码描述)。
我们比较了实验室数据,发现180个测试代码涵盖了苏格兰进行的98.7%的测试记录。专注于这180个测试代码,我们开发了一组转换方法,将以不同单位捕获的测试结果转换为相同单位。我们在管道中纳入了阅读代码和SNOMED CT来对测试进行编码。
我们通过比较不同区域数据集的结果验证了我们的统一管道。研究人员和/或安全避风港可以轻松地重复使用该管道,以在国家层面生成干净、统一的实验室数据。