Suppr超能文献

使用XNAT对骨髓瘤观察性研究MALIMAR进行管理:解决真实世界数据带来的挑战

Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data.

作者信息

Doran Simon J, Barfoot Theo, Wedlake Linda, Winfield Jessica M, Petts James, Glocker Ben, Li Xingfeng, Leach Martin, Kaiser Martin, Barwick Tara D, Chaidos Aristeidis, Satchwell Laura, Soneji Neil, Elgendy Khalil, Sheeka Alexander, Wallitt Kathryn, Koh Dow-Mu, Messiou Christina, Rockall Andrea

机构信息

Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK.

National Cancer Imaging Translational Accelerator, London, UK.

出版信息

Insights Imaging. 2024 Feb 16;15(1):47. doi: 10.1186/s13244-023-01591-7.

Abstract

OBJECTIVES

MAchine Learning In MyelomA Response (MALIMAR) is an observational clinical study combining "real-world" and clinical trial data, both retrospective and prospective. Images were acquired on three MRI scanners over a 10-year window at two institutions, leading to a need for extensive curation.

METHODS

Curation involved image aggregation, pseudonymisation, allocation between project phases, data cleaning, upload to an XNAT repository visible from multiple sites, annotation, incorporation of machine learning research outputs and quality assurance using programmatic methods.

RESULTS

A total of 796 whole-body MR imaging sessions from 462 subjects were curated. A major change in scan protocol part way through the retrospective window meant that approximately 30% of available imaging sessions had properties that differed significantly from the remainder of the data. Issues were found with a vendor-supplied clinical algorithm for "composing" whole-body images from multiple imaging stations. Historic weaknesses in a digital video disk (DVD) research archive (already addressed by the mid-2010s) were highlighted by incomplete datasets, some of which could not be completely recovered. The final dataset contained 736 imaging sessions for 432 subjects. Software was written to clean and harmonise data. Implications for the subsequent machine learning activity are considered.

CONCLUSIONS

MALIMAR exemplifies the vital role that curation plays in machine learning studies that use real-world data. A research repository such as XNAT facilitates day-to-day management, ensures robustness and consistency and enhances the value of the final dataset. The types of process described here will be vital for future large-scale multi-institutional and multi-national imaging projects.

CRITICAL RELEVANCE STATEMENT

This article showcases innovative data curation methods using a state-of-the-art image repository platform; such tools will be vital for managing the large multi-institutional datasets required to train and validate generalisable ML algorithms and future foundation models in medical imaging.

KEY POINTS

• Heterogeneous data in the MALIMAR study required the development of novel curation strategies. • Correction of multiple problems affecting the real-world data was successful, but implications for machine learning are still being evaluated. • Modern image repositories have rich application programming interfaces enabling data enrichment and programmatic QA, making them much more than simple "image marts".

摘要

目标

骨髓瘤反应中的机器学习(MALIMAR)是一项观察性临床研究,结合了“真实世界”数据与临床试验数据,包括回顾性数据和前瞻性数据。在10年的时间跨度内,于两家机构通过三台MRI扫描仪采集图像,这就需要进行大量的数据整理工作。

方法

数据整理工作包括图像汇总、匿名化处理、在项目阶段之间进行分配、数据清理、上传至可从多个站点访问的XNAT存储库、注释、纳入机器学习研究成果以及使用编程方法进行质量保证。

结果

共整理了来自462名受试者的796次全身MR成像检查。回顾性研究阶段中途扫描协议发生了重大变化,这意味着约30%的可用成像检查具有与其余数据显著不同的特性。发现了一种由供应商提供的用于从多个成像站“合成”全身图像的临床算法存在问题。数字视频磁盘(DVD)研究存档中的历史缺陷(到2010年代中期已得到解决)因数据集不完整而凸显出来,其中一些数据集无法完全恢复。最终数据集包含432名受试者的736次成像检查。编写了软件来清理和协调数据。并考虑了对后续机器学习活动的影响。

结论

MALIMAR例证了数据整理在使用真实世界数据的机器学习研究中所起的关键作用。诸如XNAT这样的研究存储库便于日常管理,确保数据的稳健性和一致性,并提升最终数据集的价值。这里描述的这些流程类型对于未来大规模的多机构和跨国成像项目至关重要。

关键相关声明

本文展示了使用先进图像存储库平台的创新数据整理方法;此类工具对于管理训练和验证医学成像中通用机器学习算法及未来基础模型所需的大型多机构数据集至关重要。

要点

• MALIMAR研究中的异构数据需要开发新颖的数据整理策略。• 对影响真实世界数据的多个问题的纠正取得了成功,但对机器学习的影响仍在评估中。• 现代图像存储库具有丰富的应用程序编程接口,可实现数据丰富和编程质量保证,使其远不止是简单的“图像集市”。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e73/10869673/e107df7fff5b/13244_2023_1591_Fig1_HTML.jpg

相似文献

1
Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data.
Insights Imaging. 2024 Feb 16;15(1):47. doi: 10.1186/s13244-023-01591-7.
3
Image annotation and curation in radiology: an overview for machine learning practitioners.
Eur Radiol Exp. 2024 Feb 6;8(1):11. doi: 10.1186/s41747-023-00408-y.
4
The future of Cochrane Neonatal.
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
6
The Stroke Neuro-Imaging Phenotype Repository: An Open Data Science Platform for Stroke Research.
Front Neuroinform. 2021 Jun 24;15:597708. doi: 10.3389/fninf.2021.597708. eCollection 2021.
7
Preliminary Planning for Mars Sample Return (MSR) Curation Activities in a Sample Receiving Facility (SRF).
Astrobiology. 2022 Jun;22(S1):S57-S80. doi: 10.1089/AST.2021.0105. Epub 2022 May 19.
8
Deep Learning in Large and Multi-Site Structural Brain MR Imaging Datasets.
Front Neuroinform. 2022 Jan 20;15:805669. doi: 10.3389/fninf.2021.805669. eCollection 2021.
9
Understanding the value of curation: A survey of US data repository curation practices and perceptions.
PLoS One. 2024 Jun 14;19(6):e0301171. doi: 10.1371/journal.pone.0301171. eCollection 2024.
10

本文引用的文献

1
TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images.
Radiol Artif Intell. 2023 Jul 5;5(5):e230024. doi: 10.1148/ryai.230024. eCollection 2023 Sep.
5
Fully automated segmentation of lumbar bone marrow in sagittal, high-resolution T1-weighted magnetic resonance images using 2D U-NET.
Comput Biol Med. 2022 Jan;140:105105. doi: 10.1016/j.compbiomed.2021.105105. Epub 2021 Dec 1.
6
Prospective Evaluation of Whole-Body MRI versus FDG PET/CT for Lesion Detection in Participants with Myeloma.
Radiol Imaging Cancer. 2021 Sep;3(5):e210048. doi: 10.1148/rycan.2021210048.
8
A preliminary study using spinal MRI-based radiomics to predict high-risk cytogenetic abnormalities in multiple myeloma.
Radiol Med. 2021 Sep;126(9):1226-1235. doi: 10.1007/s11547-021-01388-y. Epub 2021 Jun 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验