School of Mathematics and Statistics, Central South University, Changsha 410083.
Department of Otorhinolaryngology, Xiangya Hospital, Central South University, Changsha 410008.
Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2022 Aug 28;47(8):1037-1048. doi: 10.11817/j.issn.1672-7347.2022.210704.
OBJECTIVES: Chronic suppurative otitis media (CSOM) and middle ear cholesteatoma (MEC) are the 2 most common chronic middle ear diseases. In the process of diagnosis and treatment, the 2 diseases are prone to misdiagnosis and missed diagnosis due to their similar clinical manifestations. High resolution computed tomography (HRCT) can clearly display the fine anatomical structure of the temporal bone, accurately reflect the middle ear lesions and the extent of the lesions, and has advantages in the differential diagnosis of chronic middle ear diseases. This study aims to develop a deep learning model for automatic information extraction and classification diagnosis of chronic middle ear diseases based on temporal bone HRCT image data to improve the classification and diagnosis efficiency of chronic middle ear diseases in clinical practice and reduce the occurrence of missed diagnosis and misdiagnosis. METHODS: The clinical records and temporal bone HRCT imaging data for patients with chronic middle ear diseases hospitalized in the Department of Otorhinolaryngology, Xiangya Hospital from January 2018 to October 2020 were retrospectively collected. The patient's medical records were independently reviewed by 2 experienced otorhinolaryngologist and the final diagnosis was reached a consensus. A total of 499 patients (998 ears) were enrolled in this study. The 998 ears were divided into 3 groups: an MEC group (108 ears), a CSOM group (622 ears), and a normal group (268 ears). The Gaussian noise with different variances was used to amplify the samples of the dataset to offset the imbalance in the number of samples between groups. The sample size of the amplified experimental dataset was 1 806 ears. In the study, 75% (1 355) samples were randomly selected for training, 10% (180) samples for validation, and the remaining 15% (271) samples for testing and evaluating the model performance. The overall design for the model was a serial structure, and the deep learning model with 3 different functions was set up. The first model was the regional recommendation network algorithm, which searched the middle ear image from the whole HRCT image, and then cut and saved the image. The second model was image contrast convolutional neural network (CNN) based on twin network structure, which searched the images matching the key layers of HRCT images from the cut images, and constructed 3D data blocks. The third model was based on 3D-CNN operation, which was used for the final classification and diagnosis of the 3D data block construction, and gave the final prediction probability. RESULTS: The special level search network based on twin network structure showed an average AUC of 0.939 on 10 special levels. The overall accuracy of the classification network based on 3D-CNN was 96.5%, the overall recall rate was 96.4%, and the average AUC under the 3 classifications was 0.983. The recall rates of CSOM cases and MEC cases were 93.7% and 97.4%, respectively. In the subsequent comparison experiments, the average accuracy of some classical CNN was 79.3%, and the average recall rate was 87.6%. The precision rate and the recall rate of the deep learning network constructed in this study were about 17.2% and 8.8% higher than those of the common CNN. CONCLUSIONS: The deep learning network model proposed in this study can automatically extract 3D data blocks containing middle ear features from the HRCT image data of patients' temporal bone, which can reduce the overall size of the data while preserve the relationship between corresponding images, and further use 3D-CNN for classification and diagnosis of CSOM and MEC. The design of this model is well fitting to the continuous characteristics of HRCT data, and the experimental results show high precision and adaptability, which is better than the current common CNN methods.
目的:慢性化脓性中耳炎(CSOM)和中耳胆脂瘤(MEC)是两种最常见的慢性中耳疾病。在诊断和治疗过程中,由于这两种疾病的临床表现相似,容易出现误诊和漏诊。高分辨率计算机断层扫描(HRCT)可以清晰显示颞骨精细解剖结构,准确反映中耳病变及病变范围,在慢性中耳疾病的鉴别诊断中具有优势。本研究旨在基于颞骨 HRCT 图像数据开发一种深度学习模型,用于慢性中耳疾病的自动信息提取和分类诊断,以提高临床实践中慢性中耳疾病的分类和诊断效率,减少误诊和漏诊的发生。
方法:回顾性收集 2018 年 1 月至 2020 年 10 月在湘雅医院耳鼻喉科住院的慢性中耳疾病患者的临床记录和颞骨 HRCT 影像学资料。由 2 名经验丰富的耳鼻喉科医生独立对患者的病历进行复查,并达成最终诊断共识。共纳入 499 例患者(998 耳)。将 998 耳分为 3 组:MEC 组(108 耳)、CSOM 组(622 耳)和正常组(268 耳)。使用不同方差的高斯噪声对数据集样本进行放大,以抵消组间样本数量的不平衡。扩增实验数据集的样本量为 1806 耳。在研究中,随机选择 75%(1355 个)样本进行训练,10%(180 个)样本进行验证,其余 15%(271 个)样本进行测试和模型性能评估。模型整体设计为串联结构,设置了 3 种不同功能的深度学习模型。第一个模型是基于区域推荐网络算法的,它从整个 HRCT 图像中搜索中耳图像,然后进行裁剪和保存图像。第二个模型是基于孪生网络结构的图像对比卷积神经网络(CNN),它从裁剪图像中搜索与 HRCT 图像关键层匹配的图像,并构建 3D 数据块。第三个模型是基于 3D-CNN 操作的,用于最终构建 3D 数据块的分类和诊断,并给出最终预测概率。
结果:基于孪生网络结构的特殊级别搜索网络在 10 个特殊级别上的平均 AUC 为 0.939。基于 3D-CNN 的分类网络整体准确率为 96.5%,整体召回率为 96.4%,3 种分类下的平均 AUC 为 0.983。CSOM 病例和 MEC 病例的召回率分别为 93.7%和 97.4%。在后续的对比实验中,一些经典 CNN 的平均准确率为 79.3%,平均召回率为 87.6%。本研究构建的深度学习网络的准确率和召回率分别比普通 CNN 高约 17.2%和 8.8%。
结论:本研究提出的深度学习网络模型可以从患者颞骨 HRCT 图像数据中自动提取包含中耳特征的 3D 数据块,在保持对应图像之间关系的同时,减少整体数据量,进一步使用 3D-CNN 对 CSOM 和 MEC 进行分类诊断。该模型的设计很好地适应了 HRCT 数据的连续特征,实验结果显示出较高的精度和适应性,优于目前常用的 CNN 方法。
Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2022-8-28
Multimed Tools Appl. 2023-5-4
Zhonghua Er Bi Yan Hou Tou Jing Wai Ke Za Zhi. 2011-5
Lin Chuang Er Bi Yan Hou Tou Jing Wai Ke Za Zhi. 2021-10
Radiol Med. 2021-10
IEEE J Biomed Health Inform. 2022-2
Comput Methods Programs Biomed. 2020-7
Curr Opin Otolaryngol Head Neck Surg. 2019-12