Health Informatics Centre (HIC), School of Medicine, University of Dundee, (Main level 5 corridor), Second Floor, Level 7, Mailbox 15, Ninewells Hospital & Medical School, Dundee DD1 9SY2, UK.
Edinburgh Parallel Computing Centre (EPCC), Edinburgh University, Bayes Centre, 47 Potterrow, Edinburgh EH8 9BT, UK.
Gigascience. 2020 Sep 29;9(10). doi: 10.1093/gigascience/giaa095.
To enable a world-leading research dataset of routinely collected clinical images linked to other routinely collected data from the whole Scottish national population. This includes more than 30 million different radiological examinations from a population of 5.4 million and >2 PB of data collected since 2010.
Scotland has a central archive of radiological data used to directly provide clinical care to patients. We have developed an architecture and platform to securely extract a copy of those data, link it to other clinical or social datasets, remove personal data to protect privacy, and make the resulting data available to researchers in a controlled Safe Haven environment.
An extensive software platform has been developed to host, extract, and link data from cohorts to answer research questions. The platform has been tested on 5 different test cases and is currently being further enhanced to support 3 exemplar research projects.
The data available are from a range of radiological modalities and scanner types and were collected under different environmental conditions. These real-world, heterogenous data are valuable for training algorithms to support clinical decision making, especially for deep learning where large data volumes are required. The resource is now available for international research access. The platform and data can support new health research using artificial intelligence and machine learning technologies, as well as enabling discovery science.
创建一个世界领先的临床图像数据集,这些图像与苏格兰全国人口的其他常规数据相关联。这包括来自 540 万人口的 3000 多万种不同的放射学检查,以及自 2010 年以来收集的超过 20PB 的数据。
苏格兰拥有一个用于直接为患者提供临床护理的放射学数据中央档案。我们开发了一种架构和平台,用于安全地提取这些数据的副本,将其与其他临床或社会数据集链接,删除个人数据以保护隐私,并将由此产生的数据在受控的安全环境中提供给研究人员。
已经开发了一个广泛的软件平台,用于托管、提取和链接队列数据以回答研究问题。该平台已经在 5 个不同的测试案例中进行了测试,目前正在进一步增强,以支持 3 个范例研究项目。
可用的数据来自各种放射学模态和扫描仪类型,并且是在不同的环境条件下收集的。这些真实世界的异构数据对于训练支持临床决策的算法非常有价值,特别是对于需要大量数据的深度学习。该资源现已可供国际研究使用。该平台和数据可以支持使用人工智能和机器学习技术的新健康研究,并能够支持发现科学。