IEEE J Biomed Health Inform. 2021 Jul;25(7):2686-2697. doi: 10.1109/JBHI.2020.3041848. Epub 2021 Jul 27.
With the scenario of limited labeled dataset, this paper introduces a deep learning-based approach that leverages Diabetic Retinopathy (DR) severity recognition performance using fundus images combined with wide-field swept-source optical coherence tomography angiography (SS-OCTA).
The proposed architecture comprises a backbone convolutional network associated with a Twofold Feature Augmentation mechanism, namely TFA-Net. The former includes multiple convolution blocks extracting representational features at various scales. The latter is constructed in a two-stage manner, i.e., the utilization of weight-sharing convolution kernels and the deployment of a Reverse Cross-Attention (RCA) stream.
The proposed model achieves a Quadratic Weighted Kappa rate of 90.2% on the small-sized internal KHUMC dataset. The robustness of the RCA stream is also evaluated by the single-modal Messidor dataset, of which the obtained mean Accuracy (94.8%) and Area Under Receiver Operating Characteristic (99.4%) outperform those of the state-of-the-arts significantly.
Utilizing a network strongly regularized at feature space to learn the amalgamation of different modalities is of proven effectiveness. Thanks to the widespread availability of multi-modal retinal imaging for each diabetes patient nowadays, such approach can reduce the heavy reliance on large quantity of labeled visual data.
Our TFA-Net is able to coordinate hybrid information of fundus photos and wide-field SS-OCTA for exhaustively exploiting DR-oriented biomarkers. Moreover, the embedded feature-wise augmentation scheme can enrich generalization ability efficiently despite learning from small-scale labeled data.
在有限的标记数据集的情况下,本文提出了一种基于深度学习的方法,利用眼底图像和宽场扫频源光相干断层扫描血管造影(SS-OCTA)来识别糖尿病视网膜病变(DR)严重程度。
所提出的架构包括一个骨干卷积网络,以及一种双重特征增强机制,即 TFA-Net。前者包括多个卷积块,可在各种尺度上提取代表性特征。后者以两阶段方式构建,即使用共享权值卷积核和部署反向交叉注意(RCA)流。
所提出的模型在内部 KHUMC 小型数据集上实现了 90.2%的二次加权 Kappa 率。RCA 流的稳健性也通过单模态 Messidor 数据集进行了评估,其获得的平均准确率(94.8%)和接收者操作特征曲线下的面积(99.4%)明显优于现有技术。
利用在特征空间上受到强正则化的网络来学习不同模态的融合是有效的。由于现在每个糖尿病患者都可以广泛获得多模态视网膜成像,因此这种方法可以减少对大量标记视觉数据的严重依赖。
我们的 TFA-Net 能够协调眼底照片和宽场 SS-OCTA 的混合信息,以充分挖掘 DR 相关的生物标志物。此外,嵌入的基于特征的增强方案可以在从小规模标记数据学习时有效地提高泛化能力。