Suppr超能文献

用于人工智能稳健、隐私保护训练的合成医学图像:在早产儿视网膜病变诊断中的应用

Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis.

作者信息

Coyner Aaron S, Chen Jimmy S, Chang Ken, Singh Praveer, Ostmo Susan, Chan R V Paul, Chiang Michael F, Kalpathy-Cramer Jayashree, Campbell J Peter

机构信息

Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon.

Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, San Diego, California.

出版信息

Ophthalmol Sci. 2022 Feb 11;2(2):100126. doi: 10.1016/j.xops.2022.100126. eCollection 2022 Jun.

Abstract

PURPOSE

Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets.

DESIGN

Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data.

PARTICIPANTS

Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants.

METHODS

Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets.

MAIN OUTCOME MEASURES

Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fréchet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong's test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar's chi-square test and Cohen's κ value.

RESULTS

The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; = 0.006 and  = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fréchet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations.

CONCLUSIONS

Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.

摘要

目的

开发用于医学图像分析的强大人工智能(AI)模型需要大量多样且精心挑选的数据,由于隐私问题、疾病罕见性或诊断标签质量等原因,收集这些数据可能具有挑战性。收集用于早产儿视网膜病变(ROP,一种可能致盲的疾病)的基于图像的数据集就面临这些挑战。渐进式增长生成对抗网络(PGAN)可能会有所帮助,因为它们可以合成高度逼真的图像,这可能会增加医学数据集的规模和多样性。

设计

使用合成数据对用于检测重度ROP的一个组成部分——加性病变的卷积神经网络(CNN)进行诊断验证研究。

参与者

从963名早产儿收集的5842张视网膜眼底图像(RFI)。

方法

从RFI中分割出视网膜血管图(RVM)。训练PGAN以合成具有正常、加性病变前期或加性病变脉管系统的RVM。使用真实或合成的RVM训练卷积神经网络,以从2个真实RVM测试数据集中检测加性病变。

主要观察指标

使用均匀流形近似和投影(UMAP)评估真实和合成RVM的特征。分别使用弗雷歇初始距离和欧几里得距离在数据集和特征级别评估相似性。通过受试者操作特征曲线下面积(AUC)评估CNN性能;通过自举法和用于相关受试者操作特征曲线的德龙检验比较AUC。使用麦克内马尔卡方检验和科恩κ值比较混淆矩阵。

结果

在合成RVM上训练的CNN显示出显著更高的AUC(0.971; = 0.006且 = 0.004),并且与一组8名国际专家相比,对加性病变的分类更相似(κ = 0.922),而在真实RVM上训练的CNN的AUC为0.934;κ = 0.701。根据加性病变诊断,真实和合成的RVM在UMAP流形上重叠,表明合成图像涵盖了疾病严重程度谱。弗雷歇初始距离和欧几里得距离表明,真实和合成的RVM彼此之间的差异比真实RVM之间的差异更大,这进一步表明,就隐私考虑而言,合成RVM与训练数据不同。

结论

合成数据集可能有助于训练强大的医学AI模型。此外,PGAN可能能够合成逼真的数据,以便在无需担心受保护健康信息的情况下使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b781/9560638/daf3e76524c8/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验