Kim Sejin, Kazmierski Michal, Qu Kevin, Peoples Jacob, Nakano Minoru, Ramanathan Vishwesh, Marsilla Joseph, Welch Mattea, Simpson Amber, Haibe-Kains Benjamin
Princess Margaret Cancer Centre, University Health Network, Canada, Toronto, ON, Canada.
Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
F1000Res. 2025 Feb 7;12:118. doi: 10.12688/f1000research.127142.3. eCollection 2023.
Machine learning and AI promise to revolutionize the way we leverage medical imaging data for improving care but require large datasets to train computational models that can be implemented in clinical practice. However, processing large and complex medical imaging datasets remains an open challenge.
To address this issue, we developed Med-ImageTools, a new Python open-source software package to automate data curation and processing while allowing researchers to share their data processing configurations more easily, lowering the barrier for other researchers to reproduce published works.
We have demonstrated the efficiency of Med-ImageTools across three different datasets, resulting in significantly reduced processing times.
The AutoPipeline feature will improve the accessibility of raw clinical datasets on public archives, such as the Cancer Imaging Archive (TCIA), the largest public repository of cancer imaging, allowing machine learning researchers to process analysis-ready formats without requiring deep domain knowledge.
机器学习和人工智能有望彻底改变我们利用医学影像数据改善医疗护理的方式,但需要大型数据集来训练可在临床实践中实施的计算模型。然而,处理大型复杂的医学影像数据集仍然是一个悬而未决的挑战。
为解决这一问题,我们开发了Med-ImageTools,这是一个新的Python开源软件包,用于自动进行数据管理和处理,同时使研究人员能够更轻松地共享其数据处理配置,降低其他研究人员重现已发表作品的难度。
我们已经在三个不同的数据集上展示了Med-ImageTools的效率,从而显著缩短了处理时间。
自动管道功能将提高公共档案库(如最大的癌症影像公共存储库癌症影像存档(TCIA))上原始临床数据集的可访问性,使机器学习研究人员无需深厚的领域知识就能处理可供分析的格式。