使用XNAT对骨髓瘤观察性研究MALIMAR进行管理：解决真实世界数据带来的挑战

Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data.

作者信息

Doran Simon J, Barfoot Theo, Wedlake Linda, Winfield Jessica M, Petts James, Glocker Ben, Li Xingfeng, Leach Martin, Kaiser Martin, Barwick Tara D, Chaidos Aristeidis, Satchwell Laura, Soneji Neil, Elgendy Khalil, Sheeka Alexander, Wallitt Kathryn, Koh Dow-Mu, Messiou Christina, Rockall Andrea

机构信息

Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK.

National Cancer Imaging Translational Accelerator, London, UK.

出版信息

Insights Imaging. 2024 Feb 16;15(1):47. doi: 10.1186/s13244-023-01591-7.

DOI:10.1186/s13244-023-01591-7

PMID:38361108

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10869673/

Abstract

OBJECTIVES

MAchine Learning In MyelomA Response (MALIMAR) is an observational clinical study combining "real-world" and clinical trial data, both retrospective and prospective. Images were acquired on three MRI scanners over a 10-year window at two institutions, leading to a need for extensive curation.

METHODS

Curation involved image aggregation, pseudonymisation, allocation between project phases, data cleaning, upload to an XNAT repository visible from multiple sites, annotation, incorporation of machine learning research outputs and quality assurance using programmatic methods.

RESULTS

A total of 796 whole-body MR imaging sessions from 462 subjects were curated. A major change in scan protocol part way through the retrospective window meant that approximately 30% of available imaging sessions had properties that differed significantly from the remainder of the data. Issues were found with a vendor-supplied clinical algorithm for "composing" whole-body images from multiple imaging stations. Historic weaknesses in a digital video disk (DVD) research archive (already addressed by the mid-2010s) were highlighted by incomplete datasets, some of which could not be completely recovered. The final dataset contained 736 imaging sessions for 432 subjects. Software was written to clean and harmonise data. Implications for the subsequent machine learning activity are considered.

CONCLUSIONS

MALIMAR exemplifies the vital role that curation plays in machine learning studies that use real-world data. A research repository such as XNAT facilitates day-to-day management, ensures robustness and consistency and enhances the value of the final dataset. The types of process described here will be vital for future large-scale multi-institutional and multi-national imaging projects.

CRITICAL RELEVANCE STATEMENT

This article showcases innovative data curation methods using a state-of-the-art image repository platform; such tools will be vital for managing the large multi-institutional datasets required to train and validate generalisable ML algorithms and future foundation models in medical imaging.

KEY POINTS

• Heterogeneous data in the MALIMAR study required the development of novel curation strategies. • Correction of multiple problems affecting the real-world data was successful, but implications for machine learning are still being evaluated. • Modern image repositories have rich application programming interfaces enabling data enrichment and programmatic QA, making them much more than simple "image marts".

摘要

目标

骨髓瘤反应中的机器学习（MALIMAR）是一项观察性临床研究，结合了“真实世界”数据与临床试验数据，包括回顾性数据和前瞻性数据。在10年的时间跨度内，于两家机构通过三台MRI扫描仪采集图像，这就需要进行大量的数据整理工作。

方法

数据整理工作包括图像汇总、匿名化处理、在项目阶段之间进行分配、数据清理、上传至可从多个站点访问的XNAT存储库、注释、纳入机器学习研究成果以及使用编程方法进行质量保证。

结果

共整理了来自462名受试者的796次全身MR成像检查。回顾性研究阶段中途扫描协议发生了重大变化，这意味着约30%的可用成像检查具有与其余数据显著不同的特性。发现了一种由供应商提供的用于从多个成像站“合成”全身图像的临床算法存在问题。数字视频磁盘（DVD）研究存档中的历史缺陷（到2010年代中期已得到解决）因数据集不完整而凸显出来，其中一些数据集无法完全恢复。最终数据集包含432名受试者的736次成像检查。编写了软件来清理和协调数据。并考虑了对后续机器学习活动的影响。

结论

MALIMAR例证了数据整理在使用真实世界数据的机器学习研究中所起的关键作用。诸如XNAT这样的研究存储库便于日常管理，确保数据的稳健性和一致性，并提升最终数据集的价值。这里描述的这些流程类型对于未来大规模的多机构和跨国成像项目至关重要。

关键相关声明

本文展示了使用先进图像存储库平台的创新数据整理方法；此类工具对于管理训练和验证医学成像中通用机器学习算法及未来基础模型所需的大型多机构数据集至关重要。

要点

• MALIMAR研究中的异构数据需要开发新颖的数据整理策略。• 对影响真实世界数据的多个问题的纠正取得了成功，但对机器学习的影响仍在评估中。• 现代图像存储库具有丰富的应用程序编程接口，可实现数据丰富和编程质量保证，使其远不止是简单的“图像集市”。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e73/10869673/e107df7fff5b/13244_2023_1591_Fig1_HTML.jpg

相似文献

Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data.

Insights Imaging. 2024 Feb 16;15(1):47. doi: 10.1186/s13244-023-01591-7.

Development of machine learning support for reading whole body diffusion-weighted MRI (WB-MRI) in myeloma for the detection and quantification of the extent of disease before and after treatment (MALIMAR): protocol for a cross-sectional diagnostic test accuracy study.

BMJ Open. 2022 Oct 5;12(10):e067140. doi: 10.1136/bmjopen-2022-067140.

Image annotation and curation in radiology: an overview for machine learning practitioners.

Eur Radiol Exp. 2024 Feb 6;8(1):11. doi: 10.1186/s41747-023-00408-y.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Integrating the OHIF Viewer into XNAT: Achievements, Challenges and Prospects for Quantitative Imaging Studies.

Tomography. 2022 Feb 11;8(1):497-512. doi: 10.3390/tomography8010040.

The Stroke Neuro-Imaging Phenotype Repository: An Open Data Science Platform for Stroke Research.

Front Neuroinform. 2021 Jun 24;15:597708. doi: 10.3389/fninf.2021.597708. eCollection 2021.

Preliminary Planning for Mars Sample Return (MSR) Curation Activities in a Sample Receiving Facility (SRF).

Astrobiology. 2022 Jun;22(S1):S57-S80. doi: 10.1089/AST.2021.0105. Epub 2022 May 19.

Deep Learning in Large and Multi-Site Structural Brain MR Imaging Datasets.

Front Neuroinform. 2022 Jan 20;15:805669. doi: 10.3389/fninf.2021.805669. eCollection 2021.

Understanding the value of curation: A survey of US data repository curation practices and perceptions.

PLoS One. 2024 Jun 14;19(6):e0301171. doi: 10.1371/journal.pone.0301171. eCollection 2024.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

本文引用的文献

TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images.

Radiol Artif Intell. 2023 Jul 5;5(5):e230024. doi: 10.1148/ryai.230024. eCollection 2023 Sep.

Development and Evaluation of Machine Learning in Whole-Body Magnetic Resonance Imaging for Detecting Metastases in Patients With Lung or Colon Cancer: A Diagnostic Test Accuracy Study.

Invest Radiol. 2023 Dec 1;58(12):823-831. doi: 10.1097/RLI.0000000000000996. Epub 2023 Jun 26.

BMJ Open. 2022 Oct 5;12(10):e067140. doi: 10.1136/bmjopen-2022-067140.

Combining Deep Learning and Radiomics for Automated, Objective, Comprehensive Bone Marrow Characterization From Whole-Body MRI: A Multicentric Feasibility Study.

Invest Radiol. 2022 Nov 1;57(11):752-763. doi: 10.1097/RLI.0000000000000891. Epub 2022 May 27.

Fully automated segmentation of lumbar bone marrow in sagittal, high-resolution T1-weighted magnetic resonance images using 2D U-NET.

Comput Biol Med. 2022 Jan;140:105105. doi: 10.1016/j.compbiomed.2021.105105. Epub 2021 Dec 1.

Prospective Evaluation of Whole-Body MRI versus FDG PET/CT for Lesion Detection in Participants with Myeloma.

Radiol Imaging Cancer. 2021 Sep;3(5):e210048. doi: 10.1148/rycan.2021210048.

Vertebral MRI-based radiomics model to differentiate multiple myeloma from metastases: influence of features number on logistic regression model performance.

Eur Radiol. 2022 Jan;32(1):572-581. doi: 10.1007/s00330-021-08150-y. Epub 2021 Jul 13.

A preliminary study using spinal MRI-based radiomics to predict high-risk cytogenetic abnormalities in multiple myeloma.

Radiol Med. 2021 Sep;126(9):1226-1235. doi: 10.1007/s11547-021-01388-y. Epub 2021 Jun 22.

"Real-world" radiomics from multi-vendor MRI: an original retrospective study on the prediction of nodal status and disease survival in breast cancer, as an exemplar to promote discussion of the wider issues.

Cancer Imaging. 2021 May 20;21(1):37. doi: 10.1186/s40644-021-00406-6.

Differentiating Between Multiple Myeloma and Metastasis Subtypes of Lumbar Vertebra Lesions Using Machine Learning-Based Radiomics.

Front Oncol. 2021 Feb 24;11:601699. doi: 10.3389/fonc.2021.601699. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用XNAT对骨髓瘤观察性研究MALIMAR进行管理：解决真实世界数据带来的挑战

Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

CRITICAL RELEVANCE STATEMENT

KEY POINTS

目标

方法

结果

结论

关键相关声明

要点

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献