Osorio Pedro, Jimenez-Perez Guillermo, Montalt-Tordera Javier, Hooge Jens, Duran-Ballester Guillem, Singh Shivam, Radbruch Moritz, Bach Ute, Schroeder Sabrina, Siudak Krystyna, Vienenkoetter Julia, Lawrenz Bettina, Mohammadi Sadegh
Decision Science & Advanced Analytics, Bayer AG, 13353 Berlin, Germany.
Pathology and Clinical Pathology, Bayer AG, 13353 Berlin, Germany.
Diagnostics (Basel). 2024 Jul 5;14(13):1442. doi: 10.3390/diagnostics14131442.
Artificial Intelligence (AI)-based image analysis has immense potential to support diagnostic histopathology, including cancer diagnostics. However, developing supervised AI methods requires large-scale annotated datasets. A potentially powerful solution is to augment training data with synthetic data. Latent diffusion models, which can generate high-quality, diverse synthetic images, are promising. However, the most common implementations rely on detailed textual descriptions, which are not generally available in this domain. This work proposes a method that constructs structured textual prompts from automatically extracted image features. We experiment with the PCam dataset, composed of tissue patches only loosely annotated as healthy or cancerous. We show that including image-derived features in the prompt, as opposed to only healthy and cancerous labels, improves the Fréchet Inception Distance (FID) by 88.6. We also show that pathologists find it challenging to detect synthetic images, with a median sensitivity/specificity of 0.55/0.55. Finally, we show that synthetic data effectively train AI models.
基于人工智能(AI)的图像分析在支持诊断组织病理学(包括癌症诊断)方面具有巨大潜力。然而,开发有监督的AI方法需要大规模的带注释数据集。一个潜在的强大解决方案是用合成数据增强训练数据。能够生成高质量、多样化合成图像的潜在扩散模型很有前景。然而,最常见的实现方式依赖于详细的文本描述,而在这个领域通常无法获得这些描述。这项工作提出了一种从自动提取的图像特征构建结构化文本提示的方法。我们使用PCam数据集进行实验,该数据集由仅粗略标注为健康或癌性的组织切片组成。我们表明,与仅使用健康和癌性标签相比,在提示中包含图像衍生特征可将弗雷歇因距离(FID)提高88.6。我们还表明,病理学家发现检测合成图像具有挑战性,中位灵敏度/特异性为0.55/0.55。最后,我们表明合成数据能有效训练AI模型。