Suppr超能文献

通过扩散模型在数字病理学中生成和评估合成数据。

Generating and evaluating synthetic data in digital pathology through diffusion models.

机构信息

Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento, 38123, Italy.

Department for Computational and Integrative Biology, Università degli Studi di Trento, Via Sommarive, 9, Povo, Trento, 38123, Italy.

出版信息

Sci Rep. 2024 Nov 18;14(1):28435. doi: 10.1038/s41598-024-79602-w.

Abstract

Synthetic data is becoming a valuable tool for computational pathologists, aiding in tasks like data augmentation and addressing data scarcity and privacy. However, its use necessitates careful planning and evaluation to prevent the creation of clinically irrelevant artifacts.This manuscript introduces a comprehensive pipeline for generating and evaluating synthetic pathology data using a diffusion model. The pipeline features a multifaceted evaluation strategy with an integrated explainability procedure, addressing two key aspects of synthetic data use in the medical domain.The evaluation of the generated data employs an ensemble-like approach. The first step includes assessing the similarity between real and synthetic data using established metrics. The second step involves evaluating the usability of the generated images in deep learning models accompanied with explainable AI methods. The final step entails verifying their histopathological realism through questionnaires answered by professional pathologists. We show that each of these evaluation steps are necessary as they provide complementary information on the generated data's quality.The pipeline is demonstrated on the public GTEx dataset of 650 Whole Slide Images (WSIs), including five different tissues. An equal number of tiles from each tissue are generated and their reliability is assessed using the proposed evaluation pipeline, yielding promising results.In summary, the proposed workflow offers a comprehensive solution for generative AI in digital pathology, potentially aiding the community in their transition towards digitalization and data-driven modeling.

摘要

合成数据正成为计算病理学家的宝贵工具,有助于实现数据扩充以及解决数据稀缺和隐私问题。然而,其使用需要仔细规划和评估,以防止产生临床无关的伪影。

本文介绍了一种使用扩散模型生成和评估合成病理学数据的综合管道。该管道采用了一种多方面的评估策略,结合了集成的可解释性程序,解决了医疗领域中使用合成数据的两个关键方面。

生成数据的评估采用类似集成的方法。第一步包括使用既定指标评估真实数据和合成数据之间的相似性。第二步涉及评估在深度学习模型中使用生成图像的可用性,同时结合可解释 AI 方法。最后一步通过专业病理学家回答的问卷调查来验证其组织病理学的真实性。我们表明,这些评估步骤中的每一步都是必要的,因为它们提供了关于生成数据质量的互补信息。

该管道在公共 GTEx 数据集(包含 650 张全切片图像(WSI),涵盖了五种不同的组织)上进行了演示。从每种组织中生成相同数量的切片,并使用提出的评估管道评估其可靠性,结果令人鼓舞。

总之,该工作流程为数字病理学中的生成式人工智能提供了一个全面的解决方案,可能有助于该领域向数字化和数据驱动建模的转变。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab3b/11574254/485214b12d42/41598_2024_79602_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验