Osuala Richard, Skorupko Grzegorz, Lazrak Noussair, Garrucho Lidia, García Eloy, Joshi Smriti, Jouide Socayna, Rutherford Michael, Prior Fred, Kushibar Kaisar, Díaz Oliver, Lekadir Karim
Universitat de Barcelona, Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Barcelona, Spain.
Universitat de Barcelona, Facultat de Matemàtiques i Informàtica, Barcelona, Spain.
J Med Imaging (Bellingham). 2023 Nov;10(6):061403. doi: 10.1117/1.JMI.10.6.061403. Epub 2023 Feb 20.
Deep learning has shown great promise as the backbone of clinical decision support systems. Synthetic data generated by generative models can enhance the performance and capabilities of data-hungry deep learning models. However, there is (1) limited availability of (synthetic) datasets and (2) generative models are complex to train, which hinders their adoption in research and clinical applications. To reduce this entry barrier, we explore generative model sharing to allow more researchers to access, generate, and benefit from synthetic data.
We propose , a one-stop shop for pretrained generative models implemented as an open-source framework-agnostic Python library. After gathering end-user requirements, design decisions based on usability, technical feasibility, and scalability are formulated. Subsequently, we implement based on modular components for generative model (i) execution, (ii) visualization, (iii) search & ranking, and (iv) contribution. We integrate pretrained models with applications across modalities such as mammography, endoscopy, x-ray, and MRI.
The scalability and design of the library are demonstrated by its growing number of integrated and readily-usable pretrained generative models, which include 21 models utilizing nine different generative adversarial network architectures trained on 11 different datasets. We further analyze three applications, which include (a) enabling community-wide sharing of restricted data, (b) investigating generative model evaluation metrics, and (c) improving clinical downstream tasks. In (b), we extract Fréchet inception distances (FID) demonstrating FID variability based on image normalization and radiology-specific feature extractors.
allows researchers and developers to create, increase, and domain-adapt their training data in just a few lines of code. Capable of enriching and accelerating the development of clinical machine learning models, we show 's viability as platform for generative model sharing. Our multimodel synthetic data experiments uncover standards for assessing and reporting metrics, such as FID, in image synthesis studies.
深度学习作为临床决策支持系统的核心已展现出巨大潜力。生成模型生成的合成数据可提升对数据需求大的深度学习模型的性能与能力。然而,存在以下问题:(1)(合成)数据集的可用性有限;(2)生成模型训练复杂,这阻碍了它们在研究和临床应用中的采用。为降低这一进入壁垒,我们探索生成模型共享,以使更多研究人员能够访问、生成合成数据并从中受益。
我们提出了 ,这是一个用于预训练生成模型的一站式平台,以与框架无关的开源Python库形式实现。在收集最终用户需求后,基于可用性、技术可行性和可扩展性制定设计决策。随后,我们基于用于生成模型的模块化组件来实现 ,这些组件包括:(i)执行;(ii)可视化;(iii)搜索与排序;(iv)贡献。我们将预训练模型与乳腺X线摄影、内窥镜检查、X射线和MRI等跨模态应用进行集成。
该库的可扩展性和设计通过其不断增加的集成且易于使用的预训练生成模型得到证明,其中包括21个模型,这些模型利用在11个不同数据集上训练的9种不同生成对抗网络架构。我们进一步分析了三个 应用,其中包括:(a)实现社区范围内受限数据的共享;(b)研究生成模型评估指标;(c)改进临床下游任务。在(b)中,我们提取了基于图像归一化和放射学特定特征提取器的弗雷歇因距离(FID),展示了FID的变异性。
使研究人员和开发人员只需几行代码就能创建、增加并对其训练数据进行领域适配。我们展示了 作为生成模型共享平台的可行性,它能够丰富和加速临床机器学习模型的开发。我们关于多模型合成数据的实验揭示了图像合成研究中评估和报告诸如FID等指标的标准。