Schabdach Jenna M, Williams Remo M S, Logan Joseph, Padmanabhan Viveknarayanan, D'Aiello Iii Russell, Mclaughlin Johnny, Gonzalez Alexander, Krause Edward M, Tasian Gregory E, Sotardi Susan, Alexander-Bloch Aaron F
Department of Child and Adolescent Psychiatry and Behavioral Science, Children's Hospital of Philadelphia, Philadelphia, PA.
Lifespan Brain Institute of the Children's Hospital of Philadelphia and the University of Pennsylvania, Philadelphia, PA.
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:471-480. eCollection 2025.
Growth in the field of medical imaging research has revealed a need for larger volume and variety in available data. This need could be met using curated clinically acquired data, but the process for getting this data from the scanners to the scientists is complex and lengthy. We present a manifest-driven modular Extract, Transform, and Load (ETL) process named Locutus designed to appropriately handle difficulties present in the process of reusing clinically acquired medical imaging data. The design of Locutus was based on four foundational assumptions about medical data, research data, and communication. All parts of a workflow must communicate with each other and be adaptable to unique data delivery requests. In addition, the workflow must be robust to possible errors and uncertainties in clinically-acquired data, which may require human intervention to resolve. With these assumptions in mind,Locutus presents a five-phase workflow for downloading, deidentifying, and delivering unique requests for imaging data. The phases include initialization, data preparation, extraction of data from the research server to a pre-deidentification data warehouse, transformation into deidentified space, and loading into post-deidentification data warehouse. To date, this workflow has been used to process 32,962 imaging accessions for research use. This number is expected to grow as technical challenges are addressed and the role of humans is expected to shift from frequent intervention to regular monitoring.
医学成像研究领域的发展表明,需要有更大规模和更多样化的可用数据。使用经过整理的临床获取数据可以满足这一需求,但将这些数据从扫描仪传输到科学家手中的过程复杂且漫长。我们提出了一种名为Locutus的清单驱动模块化提取、转换和加载(ETL)流程,旨在妥善处理在重用临床获取的医学成像数据过程中出现的困难。Locutus的设计基于关于医学数据、研究数据和通信的四个基本假设。工作流程的所有部分必须相互通信,并能适应独特的数据交付请求。此外,工作流程必须能应对临床获取数据中可能出现的错误和不确定性,这可能需要人工干预来解决。基于这些假设,Locutus提出了一个用于下载、去识别和交付成像数据独特请求的五阶段工作流程。这些阶段包括初始化、数据准备、从研究服务器将数据提取到预去识别数据仓库、转换到去识别空间以及加载到后去识别数据仓库。迄今为止,这个工作流程已用于处理32962份用于研究的成像数据。随着技术挑战得到解决,预计这个数字会增加,而且人工的角色预计将从频繁干预转变为定期监测。